[Impala-ASF-CR] [PREVIEW] IMPALA-5058: Improve concurrency of DDL/DML operations

2017-11-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8545


Change subject: [PREVIEW] IMPALA-5058: Improve concurrency of DDL/DML operations
..

[PREVIEW] IMPALA-5058: Improve concurrency of DDL/DML operations

Problem: A long running table metadata operation (e.g. refresh) could
prevent any other metadata operation from making progress if it
coincided with the gather catalog updates operation. The problem was due
to the conservative locking scheme used when catalog updates were
collected. In particular, in order to collect a consistent snapshot of
metadata changes, the global catalog lock was held for the entire
duration of that operation.

Solution: To improve the concurrency of catalog operations the following
changes are performed:
* A range of catalog versions determines the catalog changes to be
  included in a catalog update. Any catalog changes that do not fall in
  the specified range are ignored (to be processed in subsequent catalog
  updates).
* The catalog allows metadata operations to make progress while collecting
  catalog updates.
* To prevent starvation of catalog updates (i.e. frequently updated
  catalog objects skipping catalog updates indefinitely), we keep track
  of the number of times a catalog object has skipped an update and if
  that number exceeds a threshold it is included in the next catalog
  update even if its version is not in the specified catalog update
  version range. Hence, the same catalog object may be sent in two
  consecutive catalog updates.

Change-Id: I3032437f83d39bcc8cff14d897c7c106a4ab62d3
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/catalog-op-executor.h
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M common/thrift/CatalogService.thrift
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
A fe/src/main/java/org/apache/impala/catalog/CatalogObjectImpl.java
A fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionManager.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSource.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCachePool.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Role.java
M fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/SentryProxy.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
30 files changed, 1,153 insertions(+), 458 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/8545/1
--
To view, visit http://gerrit.cloudera.org:8080/8545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3032437f83d39bcc8cff14d897c7c106a4ab62d3
Gerrit-Change-Number: 8545
Gerrit-PatchSet: 1
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI.

2017-11-13 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8529 )

Change subject: [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web 
UI.
..


Patch Set 1:

Screenshots of the catalog web UI to illustrate the changes:
https://cloudera.box.com/s/tgvt0yc3mfhvc25i2tm30m3h49769jfp
https://cloudera.box.com/s/hc0ru8q2ajfne6gzy5x4s5ri5p6kgawz


--
To view, visit http://gerrit.cloudera.org:8080/8529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e
Gerrit-Change-Number: 8529
Gerrit-PatchSet: 1
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Comment-Date: Mon, 13 Nov 2017 19:40:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI.

2017-11-13 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8529


Change subject: [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web 
UI.
..

[PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI.

The following changes are included in this commit:
* Adds a lightweight framework for registering metrics in the JVM.
* Adds table-level metrics and enables these metrics to be exposed
through the catalog web UI.
* Adds a CatalogUsageMonitor class that monitors and reports the catalog
usage in terms of the tables with the highest memory requirements and
the tables with the highest number of metadata operations. The catalog
usage information is exposed in the /catalog page of the catalog web UI.

Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M common/thrift/Frontend.thrift
M common/thrift/JniCatalog.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/common/Metrics.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
A fe/src/main/java/org/apache/impala/util/TopNCache.java
M www/catalog.tmpl
A www/table_metrics.tmpl
21 files changed, 953 insertions(+), 73 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8529/1
--
To view, visit http://gerrit.cloudera.org:8080/8529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e
Gerrit-Change-Number: 8529
Gerrit-PatchSet: 1
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-11-09 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..


Patch Set 22: Code-Review+2

(7 comments)

Minor formatting nits. Tests seem reasonable to me.

http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impala-server.cc@1930
PS22, Line 1930: int32_t hs2_port) {
fix indentation


http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impala-server.cc@1948
PS22, Line 1948: ;
remove


http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impala-server.cc@1980
PS22, Line 1980:   boost::shared_ptr beeswax_processor(new 
ImpalaServiceProcessor(handler));
nit: long line


http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impalad-main.cc
File be/src/service/impalad-main.cc:

http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impalad-main.cc@88
PS22, Line 88: FLAGS_hs2_port
nit: indentation


http://gerrit.cloudera.org:8080/#/c/8202/22/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/8202/22/bin/start-impala-cluster.py@316
PS22, Line 316:   """Waits for a catalog copy to be received by the impalad. 
When its received, additionally
nit: long line


http://gerrit.cloudera.org:8080/#/c/8202/22/tests/common/custom_cluster_test_suite.py
File tests/common/custom_cluster_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/8202/22/tests/common/custom_cluster_test_suite.py@125
PS22, Line 125: cluster_size=CLUSTER_SIZE, num_coordinators=NUM_COORDINATORS,
  : use_exclusive_coordinators=False, 
log_level=1,
  : 
expected_num_executors=CLUSTER_SIZE):
indentation


http://gerrit.cloudera.org:8080/#/c/8202/22/tests/custom_cluster/test_catalog_wait.py
File tests/custom_cluster/test_catalog_wait.py:

http://gerrit.cloudera.org:8080/#/c/8202/22/tests/custom_cluster/test_catalog_wait.py@81
PS22, Line 81: # TODO: add more tests
 :   # - impalads that are just coordinators
 :   # - impalads that are just executors
 :   # - process reincarnation, e.g., healthy impalad crashes and 
is restarted
remove



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 22
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 10 Nov 2017 07:00:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-06 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385
PS3, Line 385: std::pair
> It will until doSubscriberUpdate is called which will remove the entry. You
I was chatting with Michael and he mentioned that in general we don't recommend 
the use of weak_ptrs. So, if keeping the entries for some time is a concern by 
using shared_ptrs, you may want to ignore my recommendation.



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 06 Nov 2017 21:44:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-06 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385
PS3, Line 385: std::pair
> Wouldn't that keep a bunch of unregistered 'Subscriber' objects around due
It will until doSubscriberUpdate is called which will remove the entry. You can 
even use a weak_ptr here if keeping these entries for some time is a concern.



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 06 Nov 2017 21:28:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances

2017-11-06 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8449 )

Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber 
instances
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385
PS3, Line 385: std::pair
I think the code would be much simpler if you stored a pointer (probably a 
shared_ptr is needed) to the Subscriber here and simply compared it to the 
registered Subscriber.



--
To view, visit http://gerrit.cloudera.org:8080/8449
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65
Gerrit-Change-Number: 8449
Gerrit-PatchSet: 3
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Mon, 06 Nov 2017 20:08:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-25 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 10: Code-Review+2

Nice work!


--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 10
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 25 Oct 2017 22:22:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-25 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 9:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG@42
PS9, Line 42: datastructure
nit: data structure


http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG@50
PS9, Line 50: speed up
speedup


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@122
PS9, Line 122: Limits the
"Maximum"


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@223
PS9, Line 223: public FileMetadataLoadStats(Path path) { hdfsPath = path; }
nit: move the ctor below the class variable members.


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@228
PS9, Line 228:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@232
PS9, Line 232: the
remove


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@248
PS9, Line 248: runnable
callable


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@266
PS9, Line 266: // Computes the file metadata from scratch (by calling 
resetAndLoadFileMetadata())
 : // when reuseFileMd_ is false, else refreshes the existing 
file metadata for
 : // modified files using refreshFileMetadata().
nit: you can actually remove the comment as it simply describes the code below.


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@465
PS9, Line 465: hasFileChanged
I think you need to put back the caching check. Alex made a good point that it 
is used for updating the cached bytes size.


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773
PS9, Line 773: loadParitionFileMetadataFromScratch
"load from scratch" and "reset and load" essentially mean the same thing. Maybe 
rename this to resetAndLoadPartitionFileMetadata()?


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@792
PS9, Line 792: S3
non-HDFS (S3/ADLS)


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@792
PS9, Line 792: S3
"the latter"


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@844
PS9, Line 844: if (LOG.isTraceEnabled()) {
 : LOG.trace(loadStats.debugString());
 :   }
nit: single line


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/service/BackendConfig.java
File fe/src/main/java/org/apache/impala/service/BackendConfig.java:

http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/service/BackendConfig.java@71
PS9, Line 71: maxS3PartsParallelLoad
Why "maxS3..."? Everywhere else we use "maxNonHdfs..".


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java
File fe/src/main/java/org/apache/impala/util/ListMap.java:

http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java@43
PS9, Line 43: getList() { return list_; }
Hm, why is list_ exposed to the world? You may want to check who wants this and 
what do they do with it (i.e modify or simply iterate). If it's the latter you 
can pass an ImmutableList(). If it is the former, see if we need to expand the 
public api of this class instead.


http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java@76
PS9, Line 76: synchronized (this) {
this is equivalent to public synchronized void populate...



-- 
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 9
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 25 Oct 2017 19:24:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-10-25 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..


Patch Set 5:

(8 comments)

Did you add any planner tests?

http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@228
PS5, Line 228:if (newConjunct instanceof BoolLiteral) {
 :smap.put(conjunct, newConjunct);
 :continue;
 :}
nit: remove tabs. Also, I am not sure I understand what is happening here.


http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@308
PS5, Line 308: correlated aggregates
what is a correlated aggregate?


http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@323
PS5, Line 323: c
use capital C just to be consistent with L321


http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@329
PS5, Line 329: 10 IS NULL OR
Since we know what the constant literal is, can't we avoid the is null check 
and the disjunction in some cases?


http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@355
PS5, Line 355: or 10 IS NOT NULL
Same comment as above.


http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@360
PS5, Line 360: .
nit: remove '.'


http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@366
PS5, Line 366:
nit: remove empty lines (L366, L369 and 371)


http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@428
PS5, Line 428: newSubquery.reset();
 :   newSubquery.analyze(rhsQuery.getAnalyzer());
Why do you need to go through reset/analyze? Same question for L459-460.



--
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 5
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 25 Oct 2017 18:42:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.

2017-10-23 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
..


Patch Set 4:

(20 comments)

First pass. I think you may want to add a few planner tests to show the 
resulting plans.

http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@96
PS4, Line 96: Preconditions.checkState(rawPath_ != null);
Just by looking at some of the constructors, this precondition doesn't seem 
particularly correct.


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@225
PS4, Line 225: if (conjunct instanceof InPredicate) {
I think it's better if you extract the check from L347 and merge it here. Then 
you can also remove the comment.


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@288
PS4, Line 288: the resulting expression
 :* is not analyzed
Out of curiosity, why not analyzing the rewritten expression here?


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@296
PS4, Line 296: 1)
For each case, you may want to mention if it works for correlated/uncorrelated 
subqueries.


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@302
PS4, Line 302: Example:
Case with "limit 1"?


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@313
PS4, Line 313: NOT EXISTS (RHS AND C = e OR e IS NULL)
Not sure I understand what this means. Is C = e OR e is NULL injected into the 
WHERE clause of RHS or something else?


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@332
PS4, Line 332: NOT EXISTS (S ? F (RHS) t WHERE C = e or e IS NOT NULL)
:) I am not following this notation. Let's talk, maybe I am missing something 
obvious. Alternatively, simple examples might just work for all these cases.


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340
PS4, Line 340:* @param inPred
 :* @return rewritten expression or null
 :* @throws AnalysisException
Remove, we don't use javadoc :)


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@349
PS4, Line 349: Expr rhs = inPred.getChild(1);
 : Preconditions.checkArgument(rhs.contains(Subquery.class));
I believe this check is redundant. Subqueries can only be in the rhs of the in 
predicate.


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@353
PS4, Line 353:
nit: empty lines (remove)


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@381
PS4, Line 381: new ArrayList()
nit: we use guava's static functions. i.e. Lists.newArrayList(); even if you 
use the standard java classes, you don't need to specify the type after the 
class name, it can be inferred by path's type (e.g. ArrayList path = 
new ArrayList<>();).


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@408
PS4, Line 408: Subquery newSubquery = new Subquery(rewriteQuery);
 :   newSubquery.analyze(rhsQuery.getAnalyzer());
Hm, rewriteQuery is already analyzed, so I am wondering if the analyze() call 
will have any affect at all.


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@414
PS4, Line 414: aggregation
"or analytic function".


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@430
PS4, Line 430: new ArrayList<>();
use guava's static functions


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@679
PS4, Line 679:
remove empty lines


http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@341
PS4, Line 341: TODO for 2.3
I know it's not your fault but can you remove the reference to 2.3 from all the 
TODOs in this file?



[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-10-23 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..


Patch Set 11: Code-Review+1

(4 comments)

You may want to ask someone from BE and QE to sing off.

http://gerrit.cloudera.org:8080/#/c/8202/11/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8202/11/be/src/service/impala-server.cc@1979
PS11, Line 1979:   // Wait for the frontend catalog to be ready prior to 
opening client ports.
nit: Maybe expand this comment to mention that this call will block 
indefinitely in the case when the first catalog update never arrives for some 
reason.


http://gerrit.cloudera.org:8080/#/c/8202/11/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/8202/11/bin/start-impala-cluster.py@82
PS11, Line 82: action="store_true",
 :   help=SUPPRESS_HELP)
nit: merge these two and save a line :)


http://gerrit.cloudera.org:8080/#/c/8202/11/bin/start-impala-cluster.py@304
PS11, Line 304: wait_for_client
I think this should be named wait_for_catalog since that is the high level 
operation we're interested in.


http://gerrit.cloudera.org:8080/#/c/8202/11/tests/common/custom_cluster_test_suite.py
File tests/common/custom_cluster_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/8202/11/tests/common/custom_cluster_test_suite.py@127
PS11, Line 127: if disable_catalog:
  :   cmd.append("--disable_catalog")
nit: single line



--
To view, visit http://gerrit.cloudera.org:8080/8202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6
Gerrit-Change-Number: 8202
Gerrit-PatchSet: 11
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 23 Oct 2017 18:43:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.

2017-10-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Turns on client connections when local catalog 
initialized.
..


Patch Set 9:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8202/9/be/src/testutil/in-process-servers.cc
File be/src/testutil/in-process-servers.cc:

http://gerrit.cloudera.org:8080/#/c/8202/9/be/src/testutil/in-process-servers.cc@75
PS9, Line 75: if (started.ok()) return impala;
: LOG(WARNING) << started.GetDetail();
:
: delete impala;
The changes in this file seem to alter the behavior a bit. In particular, 
previously even if SetCatalogInitialized would return an error, impala_server_ 
would still be initialized (see L107-108). With your change, if 
SetCatalogInitialized() throws an error, StartWithClientServers () will return 
with an error causing impala to be deleted. I don't recall how we use this 
in-process-server thing and your changes are most likely doing the right thing, 
but just wanted to point out the change in behavior.


http://gerrit.cloudera.org:8080/#/c/8202/9/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/8202/9/bin/start-impala-cluster.py@80
PS9, Line 80: parser.add_option("--disable_catalog", dest="disable_catalog", 
default=False,
:   action="store_true",
:   help="Starts all processes except catalogd.")
This is used only for testing and using this doesn't result in a valid cluster 
configuration. So, I think it's best if we hide/remove this option. One option 
is to hide it using something like SUPPRESS_HELP/USAGE. Other option is to 
control this behavior using an env variable and not a startup option. Maybe the 
first one is not too bad. Thoughts?


http://gerrit.cloudera.org:8080/#/c/8202/9/bin/start-impala-cluster.py@289
PS9, Line 289: impalad's catalog
"catalog cache"


http://gerrit.cloudera.org:8080/#/c/8202/9/bin/start-impala-cluster.py@304
PS9, Line 304: def wait_for_catalog(impalad, timeout_in_seconds):
 :   """Waits for the impalad catalog to become ready"""
 :   start_time = time()
 :   catalog_ready = False
 :   attempt = 0
 :   while (time() - start_time < timeout_in_seconds and not 
catalog_ready):
 : try:
 :   num_dbs = 
impalad.service.get_metric_value('catalog.num-databases')
 :   num_tbls = 
impalad.service.get_metric_value('catalog.num-tables')
 :   catalog_ready = 
impalad.service.get_metric_value('catalog.ready')
 :   if catalog_ready or attempt % 4 == 0:
 :   print 'Waiting for Catalog... Status: %s DBs / %s 
tables (ready=%s)' %\
 :   (num_dbs, num_tbls, catalog_ready)
 :   attempt += 1
 : except Exception, e:
 :   print e
 : sleep(0.5)
 :   if not catalog_ready:
 : raise RuntimeError('Catalog was not initialized in expected 
time period.')
 :
 : def wait_for_client(impalad, 
timeout_in_seconds=CLUSTER_WAIT_TIMEOUT_IN_SECONDS):
 :   """Waits for the client ports to become ready."""
 :   start_time = time()
 :   client_beeswax = None
 :   client_hs2 = None
 :   while (time() - start_time < timeout_in_seconds):
 : try:
 :   client_beeswax = impalad.service.create_beeswax_client()
 :   client_hs2 = impalad.service.create_hs2_client()
 :   break;
 : except Exception as e:
 :   print 'Client services not ready: %s. Trying again ...'
 : finally:
 :   if client_beeswax is not None: client_beeswax.close()
 : sleep(0.5)
 :
 :   if client_beeswax is None or client_hs2 is None:
 : raise RuntimeError('Client port not openned in expected time 
period.')
Does it make sense to merge these two functions into a single wait_for_catalog 
function? wait_for_client() is only used for checking if the catalog is ready 
because we can't rely on the 'catalog.ready' metric, no? So, if this thing is 
not useful why not remove it and check for the client connections instead? And 
then we can retrieve the num_dbs and num_tbls from the metrics.


http://gerrit.cloudera.org:8080/#/c/8202/9/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/8202/9/fe/src/main/java/org/apache/impala/service/Frontend.java@860
PS9, Line 860: state
update


http://gerrit.cloudera.org:8080/#/c/8202/9/fe/src/main/java/org/apache/impala/service/Frontend.java@904

[Impala-ASF-CR] IMPALA-5058: Improve concurrency of DDL/DML during catalog updates

2017-10-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has abandoned this change. ( 
http://gerrit.cloudera.org:8080/8266 )

Change subject: IMPALA-5058: Improve concurrency of DDL/DML during catalog 
updates
..


Abandoned

There are a number of serious issues in this patch. Abandoning for now.
--
To view, visit http://gerrit.cloudera.org:8080/8266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I16ae27be79eb0c5a9648a15d368ca60a7d04507b
Gerrit-Change-Number: 8266
Gerrit-PatchSet: 1
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-16 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 4: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974
PS3, Line 1974: // HMS because some of them may already exist there. In that 
case, we load in the
  :   // catalog the partitions that already exist in HMS but 
aren't in the catalog yet.
  :   if (allHmsPartitionsToAdd.size() != 
addedHmsPartitions.size()) {
  : List difference = 
computeDifference(allHmsPartitionsToAdd,
  : addedHmsPartitions);
  : addedHmsPartitions.addAll(
  : getPartitionsFromHms(msTbl, msClient, tableName, 
difference));
  :   }
  :
  :   for (Partition partition: addedHmsPartitions) {
  : // Create and add the HdfsPartition to catalog. Return 
the table object with an
  :
> Sure. I tightened the partitioned loop to only include the msClient call.
Well, you're only calling it for the diff, right? I would expect in most cases 
the diff to be small, so I am not really worried about this call. That said, we 
can reevaluate that decision if we have any reason to believe this is going to 
cause any problems.


http://gerrit.cloudera.org:8080/#/c/8238/4/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/8238/4/tests/metadata/test_ddl.py@522
PS4, Line 522: alter_stmt = "alter table t add " + " 
".join("partition(p=%d)" % (i,) for i in xrange(MAX_PARTITION_UPDATES_PER_RPC + 
2))
nit: long line



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 16 Oct 2017 23:15:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4704: Disallow client connections to imapalad until catalog is received.

2017-10-13 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8202 )

Change subject: IMPALA-4704: Disallow client connections to imapalad until 
catalog is received.
..


Patch Set 7:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/8202/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8202/7//COMMIT_MSG@7
PS7, Line 7: imapalad
nit: typo


http://gerrit.cloudera.org:8080/#/c/8202/7//COMMIT_MSG@7
PS7, Line 7: catalog
catalog update


http://gerrit.cloudera.org:8080/#/c/8202/7/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8202/7/be/src/service/impala-server.cc@a2045
PS7, Line 2045:
:) nice


http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py@81
PS7, Line 81:   action="store_true", help="Starts all cluster 
processes except catalogd.")
nit: long line


http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py@254
PS7, Line 254: catalog_needs_wait
I think that function tests if the catalog is ready given that we haven't 
disabled it. So maybe "is_catalog_ready" is a better name. Thoughts?


http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py@323
PS7, Line 323: def wait_for_client(impalad, 
timeout_in_seconds=CLUSTER_WAIT_TIMEOUT_IN_SECONDS):
Add a function comment.


http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py@332
PS7, Line 332: ready = True
You can just break and use client_hs2 or client_beeswax in the check in L339.


http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py@336
PS7, Line 336: client_beeswax
hs2_client doesn't have a close() function?


http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/Frontend.java@859
PS7, Line 859: ready
Maybe also mention what "ready" means, i.e. receive the first update from the 
catalog.


http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/Frontend.java@867
PS7, Line 867:   if (getCatalog().isReady()) return;
It may make sense to add a log message here to indicate that the catalog is now 
ready after waiting for MAX_CATALOG_UPDATE_WAIT_TIME_MS * num_triews msec.


http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/Frontend.java@868
PS7, Line 868: catalog to be ready
That phrase kind of suggests that the catalog server is not ready. Maybe best 
to say that we're waiting for the initial catalog update from the statestore or 
for the local catalog cache to be initialized.


http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/Frontend.java@899
PS7, Line 899: support when the catalog is not ready.
Maybe "is not supported if the local catalog cache is not initialized."


http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/JniFrontend.java@587
PS7, Line 587: public void waitForCatalog() {
 : frontend_.waitForCatalog();
 :   }
single line


http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/test/java/org/apache/impala/service/FrontendTest.java
File fe/src/test/java/org/apache/impala/service/FrontendTest.java:

http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/test/java/org/apache/impala/service/FrontendTest.java@117
PS7, Line 117: private void testCatalogIsNotReady(String stmt, Frontend fe) {
 : TQueryCtx queryCtx = TestUtils.createQueryContext(
 : Catalog.DEFAULT_DB, AuthorizationTest.USER.getName());
 : queryCtx.client_request.setStmt(stmt);
 : try {
 :   fe.createExecRequest(queryCtx, new StringBuilder());
 :   fail("Expected failure to due uninitialized catalog.");
 : } catch (IllegalStateException e) {
 :   assertEquals("Analyzing a query is not support when the 
catalog is not ready.",
 :   e.getMessage());
 : } catch (Exception e) {
 :   fail("Failed to create exec request due to: " + 
ExceptionUtils.getStackTrace(e));
 : }
 :   }
Does this even make sense to keep given the introduced behavior in this patch?


http://gerrit.cloudera.org:8080/#/c/8202/7/tests/common/custom_cluster_test_suite.py
File tests/common/custom_cluster_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/8202/7/tests/common/custom_cluster_test_suite.py@138
PS7, Line 138:  

[Impala-ASF-CR] IMPALA-5058: Improve concurrency of DDL/DML during catalog updates

2017-10-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8266


Change subject: IMPALA-5058: Improve concurrency of DDL/DML during catalog 
updates
..

IMPALA-5058: Improve concurrency of DDL/DML during catalog updates

Problem: A long running table metadata operation (e.g. refresh) could
prevent any other metadata operation from making progress if it
coincided with the gather catalog updates operation. The problem was due
to the conservative locking scheme used when catalog updates were
collected. In particular, in order to collect a consistent snapshot of
metadata changes, the global catalog lock was held for the entire
duration of that operation.

Solution: To improve the concurrency of catalog operations the following
changes are performed:
* A range of catalog versions determines the catalog changes to be
  included in a catalog update. Any catalog changes that do not fall in
  the specified range are ignored (to be processed in subsequent catalog
  updates).
* The catalog allows metadata operations to make progress while collecting
  catalog updates.
* To prevent starvation of catalog updates (i.e. frequently updated
  catalog objects skipping catalog updates indefinitely), we keep track
  of the number of times a catalog object has skipped an update and if
  that number exceeds a threshold it is included in the next catalog
  update even if its version is not in the specified catalog update
  version range. Hence, the same catalog object may be sent in two
  consecutive catalog updates.

Change-Id: I16ae27be79eb0c5a9648a15d368ca60a7d04507b
---
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java
A fe/src/main/java/org/apache/impala/catalog/CatalogObjectImpl.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSource.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCachePool.java
M fe/src/main/java/org/apache/impala/catalog/Role.java
M fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
11 files changed, 265 insertions(+), 191 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/8266/1
--
To view, visit http://gerrit.cloudera.org:8080/8266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I16ae27be79eb0c5a9648a15d368ca60a7d04507b
Gerrit-Change-Number: 8266
Gerrit-PatchSet: 1
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 3:

(2 comments)

Re testing, a simple python test that adds at least 
MAX_PARTITION_UPDATES_PER_SEC + 1 partitions and then validates that they were 
actually added in the HMS (e.g. by doing show partitions) is sufficient. If you 
want you can convert the analysis test in AnalyzeDDLTest.java to a positive 
test but it is kind of redundant if you add the python test I just mentioned.

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@a1954
PS3, Line 1954:
Why did you remove this?


http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974
PS3, Line 1974: if (hmsSublist.size() != addedHmsPartitions.size()) {
  : List difference = 
computeDifference(hmsSublist,
  : addedHmsPartitions);
  : addedHmsPartitions.addAll(
  : getPartitionsFromHms(msTbl, msClient, tableName, 
difference));
  :   }
  :
  :   for (Partition partition: addedHmsPartitions) {
  : // Create and add the HdfsPartition to catalog. Return 
the table object with an
  : // updated catalog version.
  : addHdfsPartition(tbl, partition);
  :   }
I think you can refactor this so that you make only one call to 
getPartitionsFromHms() outside the for loop.



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Comment-Date: Thu, 12 Oct 2017 16:59:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading

2017-10-11 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
..


Patch Set 5:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@215
PS5, Line 215:   // Block metadata loading stats for a single HDFS path.
nit: File/block (since we're also loading/refreshing files). Also, you may want 
to change the name of the private class to reflect that, e.g. 
PathMetadataLoadingStats or something like that.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@217
PS5, Line 217: loadedFiles_
You may want to add a comment here. What is loaded vs refreshed? Is the one 
superset of the other. Good to clarify to avoid confusion.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@218
PS5, Line 218: _
I believe the convention is that we don't use '_' for public members.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@231
PS5, Line 231: Runnable
I don't know how this is used later on, but alternatively you can make 
PathBlockMetadataLoadRequest implement Callable, hence 
returning the stats when calling call(). Now you seem to access the stats only 
through the debugString() function.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@247
PS5, Line 247: Blocks on the loadBlockMetadata() call.
Not following this comment. run() either calls refreshBlockMetadata() or 
loadBlockMetadata(), so I can't really interpret what this comment is saying.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@333
PS5, Line 333: loadBlockMetadata
I know I am guilty for some of these names but maybe rename this to 
"resetAndLoadBlockMetadata"? Then it is more clear what the differences are 
between this function and rerfreshBlockMetadata().


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@363
PS5, Line 363: numUnknownDiskIds
Are you overriding or incrementing the value of numUnknownDiskIds in the 
create() function?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@368
PS5, Line 368: newFileDescs
Hm, that doesn't seem particularly safe (i.e. using the same list for every 
partition). Are we certain that any other partition modification operation 
(e.g. alter partition set location) will not try to override the file 
descriptors, thereby affecting all the other partitions?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@380
PS5, Line 380: changed
 :* mtime
It's not just the changed mtime that we're looking for in order to determiner 
modification, so you may want to either remove this or mention all the criteria 
we use. I prefer the former.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@383
PS5, Line 383: The initial table load still uses the listFiles()
 :* on the data directory that fetches both the FileStatus as 
well as BlockLocations in
 :* a single call.
remove


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@398
PS5, Line 398: get(0)
Comment why we take the file descriptors of one partition and that it doesn't 
really matter which one we choose.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@399
PS5, Line 399: public String apply(FileDescriptor desc) {
 :   return desc.getFileName();
 : }
Add @Override and make it a single line (if it fits)


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@448
PS5, Line 448: (fd == null) || (fd.getFileLength() != status.getLen()) ||
 :   (fd.getModificationTime() != status.getModificationTime());
I think we were also checking if the partition was cached. Isn't this check 
needed anymore?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@455
PS5, Line 455: refreshFileMetadata
Maybe call it refreshPartitionStorageMetadata? Overall, it may make sense to 
replace "File/Block" with "Storage" whenever it makes sense. Thoughts?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@694
PS5, Line 694: Exception
Why this change?



[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 9: Code-Review+1

(5 comments)

Nice. So much better. I'll let Alex take a look as well and do the final +2.

http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@292
PS9, Line 292: paresResult
I believe you meant parseResult?


http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java:

http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@125
PS9, Line 125: public int getStartingLineNum() {
 :   return startLineNum;
 : }
single line


http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@129
PS9, Line 129: public TQueryOptions getOptions() {
 :   return this.options;
 : }
nit: single line


http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@133
PS9, Line 133: public void setOptions(TQueryOptions options) {
 :   this.options = options;
 : }
single line


http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@367
PS9, Line 367:
Add a Preconditions.checkNotNull(result) here.



--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 05 Oct 2017 18:55:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4682 Fix IllegalStateException issue

2017-10-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8143 )

Change subject: IMPALA-4682 Fix IllegalStateException issue
..


Patch Set 1:

Zoram, let's try to move this forward. We don't want too many open CRs.


--
To view, visit http://gerrit.cloudera.org:8080/8143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7
Gerrit-Change-Number: 8143
Gerrit-PatchSet: 1
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 04 Oct 2017 18:12:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-10-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 8:

(3 comments)

Thanks! That looks much better.

http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java@734
PS8, Line 734: if (options == null) {
 :   options = defaultQueryOptions();
 : } else {
 :   options = mergeQueryOptions(defaultQueryOptions(), 
options);
 : }
Move above L732 and make options an input parameter to TestFileParser 
constructor.


http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java
File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java:

http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@26
PS8, Line 26: import java.lang.reflect.Method;
Is this needed?


http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@355
PS8, Line 355: private void parseOneQueryOption(String line, TestCase testCase) 
{
Why parsing one query option at a time? ParseQueryOptions() simply expects a 
comma-separated list of options and parses all of them. Wouldn't be simpler to 
make a JNI call to ParseQueryOptions() instead?



-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 04 Oct 2017 18:09:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6009: Upgrade Guava to 14.0.1

2017-10-03 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8198 )

Change subject: IMPALA-6009: Upgrade Guava to 14.0.1
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8198
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddc5da8849d5aa7317d3dc572884d05dee859bdd
Gerrit-Change-Number: 8198
Gerrit-PatchSet: 4
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-Comment-Date: Tue, 03 Oct 2017 21:01:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege

2017-09-29 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8168 )

Change subject: IMPALA-4951: Fix database visibility for user with only column 
privilege
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a
Gerrit-Change-Number: 8168
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Comment-Date: Fri, 29 Sep 2017 20:40:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5951: Remove flaky test catalogd timeout

2017-09-29 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8154 )

Change subject: IMPALA-5951: Remove flaky test_catalogd_timeout
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05
Gerrit-Change-Number: 8154
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Fri, 29 Sep 2017 20:40:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege

2017-09-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8168 )

Change subject: IMPALA-4951: Fix database visibility for user with only column 
privilege
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test:

http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@744
PS2, Line 744: revoke role grant_revoke_test_ROOT from group root
Is this needed for your test?


http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@750
PS2, Line 750:  QUERY
 : show current roles
 :  RESULTS: VERIFY_IS_SUBSET
 : 'grant_revoke_test_ALL_SERVER'
 :  TYPES
 : STRING
 : 
No need for that. Let's remove to make the test a bit smaller.


http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@808
PS2, Line 808: # IMPALA-4951: make sure database is visible to a user having 
only column level access
 : # to a table in the database
I think it may be best to move this comment at the beginning of your changes so 
that it's more obvious what these added statements test.



--
To view, visit http://gerrit.cloudera.org:8080/8168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a
Gerrit-Change-Number: 8168
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Comment-Date: Fri, 29 Sep 2017 04:56:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege

2017-09-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8168 )

Change subject: IMPALA-4951: Fix database visibility for user with only column 
privilege
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test:

http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@761
PS1, Line 761: root
> I wanted to use a different user for this test, otherwise I would have to g
I think it's fine to use $GROUP_NAME throughout.



--
To view, visit http://gerrit.cloudera.org:8080/8168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a
Gerrit-Change-Number: 8168
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Comment-Date: Thu, 28 Sep 2017 21:00:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.

2017-09-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8136 )

Change subject: IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
File fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java:

http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java@74
PS1, Line 74: import com.google.common.base.Joiner;
Is it needed?


http://gerrit.cloudera.org:8080/#/c/8136/1/testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test:

http://gerrit.cloudera.org:8080/#/c/8136/1/testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test@37
PS1, Line 37: 0
Is this correct?


http://gerrit.cloudera.org:8080/#/c/8136/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test:

http://gerrit.cloudera.org:8080/#/c/8136/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test@38
PS1, Line 38: 71
There are a bunch of second digit changes in most of the tests. Are these ok?


http://gerrit.cloudera.org:8080/#/c/8136/1/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

http://gerrit.cloudera.org:8080/#/c/8136/1/tests/metadata/test_explain.py@131
PS1, Line 131:
nit: extra space



--
To view, visit http://gerrit.cloudera.org:8080/8136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7
Gerrit-Change-Number: 8136
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Balazs Jeszenszky 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 28 Sep 2017 19:48:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5951: test catalogd timeout fails to cause expected exception

2017-09-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8154 )

Change subject: IMPALA-5951: test_catalogd_timeout fails to cause expected 
exception
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8154/1/testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test
File 
testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test:

http://gerrit.cloudera.org:8080/#/c/8154/1/testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test@24
PS1, Line 24: describe functional_kudu.alltypes
Isn't this test also susceptible to the same timing issue? I don't see much 
value in these tests and it seems to me they will always be kind of flaky. I 
suggest we remove kudu-timeouts-catalog.test altogether.



--
To view, visit http://gerrit.cloudera.org:8080/8154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05
Gerrit-Change-Number: 8154
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Comment-Date: Thu, 28 Sep 2017 18:52:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege

2017-09-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8168 )

Change subject: IMPALA-4951: Fix database visibility for user with only column 
privilege
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test:

http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@761
PS1, Line 761: root
Why are you using root and not $GROUP_NAME for this test?


http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@786
PS1, Line 786: # IMPALA-4951: make sure database is visible to a user having 
only column level access
 : # to a table in the database
 : show databases
 :  RESULTS
 : 'default','Default Hive database'
 : 'grant_rev_db',''
 :  TYPES
 : STRING,STRING
 : 
I think the test for this jira should work as follows:
1. You have/create a db/table for which user has no privileges.
2. You do show databases and verify the db is not listed
3. You add a column privilege on the table.
4. You do show databases and see that db is now listed.



--
To view, visit http://gerrit.cloudera.org:8080/8168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a
Gerrit-Change-Number: 8168
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Comment-Date: Thu, 28 Sep 2017 18:44:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-09-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7564 )

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 6:

Thanks for adding the support for parsing the query options. Did you try to 
reuse the c++ implementation through a JNI call? It would be nice if we could 
avoid implementing the same functionality in both c++ and java. See 
FeSupport.java class for other cases where we call into the backend from the 
java side. Thanks.


--
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-Change-Number: 7564
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 28 Sep 2017 17:51:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.

2017-09-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8136 )

Change subject: IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.
..


Patch Set 1:

(26 comments)

First pass. High level approach seems reasonable to me. Will take a look at the 
tests next.

http://gerrit.cloudera.org:8080/#/c/8136/1/be/src/exec/catalog-op-executor.cc
File be/src/exec/catalog-op-executor.cc:

http://gerrit.cloudera.org:8080/#/c/8136/1/be/src/exec/catalog-op-executor.cc@218
PS1, Line 218: // The first column is the COUNT(*) expr of the original query.
Can you put a similar comment above L206. I had a hard time figuring out what 
rows[0].colVals[0] represented in L208.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@58
PS1, Line 58: Existing partition-level row counts are not modified.
Mention what happens to the partitions that don't have the row count set.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@60
PS1, Line 60: unmodified
remove


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@61
PS1, Line 61: extrapolated
Maybe add some details about how extrapolation is computed. I haven't looked at 
the entire patch so you may mention it elsewhere. If so, you can just point to 
the file that provides that description.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@62
PS1, Line 62:  so as not to confuse other engines like Hive/SparkSQL which may 
rely on
:  *   the shared HMS fields being accurate.
That part may be confusing to someone with no background knowledge. For 
instance, the fact that other engines require "accurate" statistics which we 
update using sampling and extrapolation :), kind of oxymoron. I think it is ok 
to just say that we store the extrapolated stats and remove that part. Thoughts?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@64
PS1, Line 64: Independently, row-count extrapolation is also done during 
planning based on the
:  *   numRows / totalSize ratio because the table contents may 
have changed since the
:  *   last time COMPUTE STATS was run.
Remove and put it near the relevant code (planning).


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@77
PS1, Line 77: Not supported for now to keep the logic/code simple.
Ha, this is the opposite in the COMPUTE STATS case.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@106
PS1, Line 106: totalFileBytes_
Can't you use the HdfsTable.totalFileBytes_ instead?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@108
PS1, Line 108: Only set when
 :   // TABLESAMPLE was specified. Set to -1 for non-HDFS tables
Maybe "Set to -1 for non-HDFS tables or when TABLESAMPLE is not specified"?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@305
PS1, Line 305: expectAllPartitions_ = !updateTableStatsOnly()
So if I do not update table stats only I should expect to compute stats on all 
partitions? Essentially, is isIncremental check blended in this function? I 
think it may make sense to extract that check.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@344
PS1, Line 344: expectAllPartitions_ = false;
Why is this needed? Isn't L305 sufficient?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@440
PS1, Line 440: // Only add group by clause for HdfsTables.
Comment doesn't seem relevant to the code that follows.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@442
PS1, Line 442: !updateTableStatsOnly()
expectAllPartitions_?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452
PS1, Line 452: !updateTableStatsOnly()
expectAllPartitions_?


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@506
PS1, Line 506: sets 'expectAllPartitions_' to false
Hm, I don't see that in the code.


http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@511
PS1, Line 511: base
remove?



[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 8: Code-Review+2

Rebase and keep Dan's +2


--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 8
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 26 Sep 2017 16:20:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-25 Thread Dimitris Tsirogiannis (Code Review)
Hello Bharath Vissapragada, Alex Behm, Vuk Ercegovac, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7731

to look at the new patch set (#7).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M tests/statestore/test_statestore.py
21 files changed, 480 insertions(+), 441 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/7
--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 7
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-25 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h@131
PS6, Line 131: Also determines any
 :   /// deletions of catalog objects by looking at the
 :   /// difference of the last set of topic entry keys that were 
sent and the current set
 :   /// of topic entry keys
> Remove?
Done


http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc@308
PS6, Line 308: bool topi
> should that be DCHECK_GT?  The old conditional to continue on was <= ...
Correct. Done


http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h
File be/src/catalog/catalog.h:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h@59
PS6, Line 59: Gets all Catalog objects and the metadata that is applicable for 
the given request.
:   /// Always returns all object names that exist in the Catalog, 
but allows for extended
:   /// metadata for objects that were modified after the specified 
version.
> is that accurate?  from our discussion, my understanding was that only obje
Yeah, wording is not clear. Changed the comment.


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc@177
PS5, Line 177: entry_it->second.SetDeleted(true);
 : entry_it->second.SetVersion(last_version_);
> What I mean is that topics that are transient do not get values in deleted
Hm, it's still not clear to me why a transient topic cannot get values in 
deleted topics, in the sense that I see the topic being transient and how to 
process deletions two orthogonal issues and I don't understand why one should 
influence the other.


http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc@170
PS6, Line 170: topic_update_log_.erase(version);
> also, does that still make sense (now that we no longer do the SetValue(NUL
Done


http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift@32
PS6, Line 32:
> does it include objects with changes in version > or >= the from_version?
Updated the comment (>). Done


http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@310
PS6, Line 310:  Views, Databases, and
 :* Functions)
> datasources, cachepools, sentry stuff ... I think the list is too big, may
Done



--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 5
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 25 Sep 2017 17:55:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5955: Use totalSize tblproperty instead of rawDataSize.

2017-09-21 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8110 )

Change subject: IMPALA-5955: Use totalSize tblproperty instead of rawDataSize.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8110/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8110/1//COMMIT_MSG@9
PS1, Line 9: Today, Impala populates the 'rawDataSize' property
   : during COMPUTE STATS for the purpose of extrapolating
   : row counts based on file sizes.
   :
   : Intended meaning/use of tblproperties:
   : - rawDataSize' is the estimated in-memory size of a table
   :   (without encoding and compression)
   : - 'totalSize' represents the on-disk size
   :
   : Using the fields correctly is important for compatibility
   : with other users of the HMS such as Hive and SparkSQL.
   : For example, SparkSQL relies on the 'totalSize' for
   : join ordering.
Although this is very informative, I don't think I understand what this commit 
changes. Will we be populating both rawDataSize and totalSize, replace one with 
the other, or something else?


http://gerrit.cloudera.org:8080/#/c/8110/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8110/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1178
PS1, Line 1178: droppedRawDataSize
rename to droppedTotalSize to be consistent with the tblproperty being updated?



--
To view, visit http://gerrit.cloudera.org:8080/8110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7c2c4e1e99b297c849f9f0d18b2bef34ad811c6
Gerrit-Change-Number: 8110
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Comment-Date: Thu, 21 Sep 2017 18:21:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5954: Set DO NOT UPDATE STATS in alterTable() RPCs to HMS.

2017-09-21 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8112 )

Change subject: IMPALA-5954: Set DO_NOT_UPDATE_STATS in alterTable() RPCs to 
HMS.
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8112
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05c63315579d66efeaa4c65105372ae68820ab2f
Gerrit-Change-Number: 8112
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Comment-Date: Thu, 21 Sep 2017 18:00:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-19 Thread Dimitris Tsirogiannis (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7731

to look at the new patch set (#6).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M tests/statestore/test_statestore.py
21 files changed, 473 insertions(+), 425 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-19 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

PS5, Line 231:   // If this is not a delta update, request an update from 
version 0 from the local
 :   // catalog. There is an optimization that checks if 
pending_topic_updates_ was just
 :   // reloaded from version 0. If so, then skip this step and use 
that data.
> I found it impossible to understand this comment without reading through mo
Done


PS5, Line 234: delta.from_version == 0 && delta.to_version == 0
> from the comment above, presumably this condition means "this is not a delt
Yeah, you're right. This condition is really confusing. from_version = 0 
implies delta = false and can be used here instead. The part that is not clear 
to me is the to_version check but I couldn't find any case where it would make 
sense to check the to_version in order to decide whether to send the full 
catalog update or not. I will remove it and see if it breaks something.


PS5, Line 310: if (catalog_object.catalog_version <= 
last_sent_catalog_version_) continue;
> given that last_sent_catalog_version_ was passed to GetAllCatalogObjects(),
Right, it's not needed. Replaced it with a DCHECK.


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

PS5, Line 150: added
> nit: May be 'added/updated/removed' to be consistent with other places?
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS5, Line 1392: 
> can we delete this function now?
Yes. Done


PS5, Line 1366: ;
> Could you add 'len' too, given this is trace logging anyway. (might be help
Done


PS5, Line 1399: // Nothing to do here.
> Remove? Kinda seems obvious
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

PS5, Line 177: entry_it->second.SetDeleted(true);
 : entry_it->second.SetVersion(last_version_);
> now that some subscribes require deleted entries to have a value, how does 
The behavior is topic specific and currently only subscribers of catalog-update 
topic rely on the value for deleted entries. Not sure how to make it less 
fragile in the sense that "value_" is opaque to the statestore and is 
interpreted only by subscribers.


PS5, Line 490:  
> would be easier to read if you line break it there to keep the last arg all
Done


PS5, Line 538: if (topic_entry.value() != Statestore::TopicEntry::NULL_VALUE)
> why do we need that guard? why can't we set topic_item.value even in that c
Yeap, removed it.


PS5, Line 546: int64_t topic_size = topic.total_key_size_bytes() - 
deleted_key_size_bytes
 : + topic.total_value_size_bytes();
> that math doesn't look correct anymore (deleted entries can have non-zero v
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

PS5, Line 180: may
> may or will?
"will". Changed it.


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS5, Line 25: GetAllCatalogObjects
> see below. I think we should rename this.
Done


PS5, Line 25: Contains all known Catalog objects
: // (databases, tables/views, and functions) from the 
CatalogService's cache.
> I think that should be updated to explain that this is relative to a partic
Done


PS5, Line 38: empty list if no objects detected in the Catalog
> Thanks. As we talked about in person, I think we should rename GetAllCatalo
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

PS5, Line 87: If true, this item was deleted
> How about:
Done


PS5, Line 87: If true, this item was deleted
> Actually, the parenthesis I suggested doesn't clarify.  Maybe that part sho
Done


PS5, Line 97: List of changes to topic entries, including deletions.
> That's only true when is_delta==true.  So, how about we say:
Done


PS5, Line 100: applied to in-memory state,
> that's kinda dependent on the subscriber's implementation, right? would it 
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

PS5, Line 53: identiy
> nit: typo
Done


PS5, Line 123: if (first.getType() != second.getType()) return false;
> This seems to already be handled by the next line (by generating a differen
Good point. Removed



[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(2 comments)

Addressing some questions by Dan.

http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS5, Line 38: empty list if no objects detected in the Catalog
> s/through/trouble/g
Yes, there is some context that is missing here, hence it's not easy to 
understand what these comments mean. Let me try to explain first here and I 
will expand the comments to clarify some things. 

1. What I think that comment should say is "no deleted objects were detected 
since the last GetAllCatalogObjects request". 

2. Similarly to the previous comment. Deleted and updated/new objects are with 
respect to the previous GetAllCatalogObjects call. The base from which these 
deltas are computed is the "fromVersion" param of the TGetAllCatalogObjects 
struct. 

I'll update the comments.


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 84:   // is not included in the topic key (e.g. catalog version of deleted 
catalog objects).
> Sorry, opaque was the wrong word. I meant to say agnostic.
Let me try to explain. Statestore only uses that flag to avoid sending deleted 
topic entries when the topic update is not a delta one (this is an 
optimization). The statestore subscribers (e.g. impalad) are the ones that 
really need what's in the 'value' field in order to determine how to process a 
deleted topic entry. For the case of catalog update, the value contains among 
other things the catalog version when the deletion occurred and the impalad 
uses that information to determine if the deletion should be applied at the 
local catalog or not. 

We can definitely add the deleted flag inside whatever object we're putting in 
'value' (e.g. TCatalogObject) and I am happy to try this out if you think it's 
better.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS5, Line 38: empty list if no objects detected in the Catalog
> that seems kinda misleading. won't the list also be empty if nothing was de
Sorry not following. This says detected not deleted. Maybe I misunderstood you 
comment.


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 84:   // is not included in the topic key (e.g. catalog version of deleted 
catalog objects).
> hmm, without reading through how the catalog topic works, this is still pre
Hm, I don't see why the value should be opaque to the subscriber. On the 
contrary, it should be opaque to the statestore but each topic subscriber could 
have different logic implemented around deletions. If we want to be able to 
handle deletions with only the information from the key then we need to expand 
the key and for the case of catalog updates, it should also include the catalog 
version. Unfortunately, that won't work well with the current implementation of 
the statestore.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

PS3, Line 86: deleted
> What I meant is that we should explain in the comment here what 'value' mea
Fair enough. Done


PS3, Line 105: The from_version will always be 0 for non-delta updates
> Yeah. what I meant was we should document it in this comment, right?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7731

to look at the new patch set (#5).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/statestore/test_statestore.py
16 files changed, 427 insertions(+), 338 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7731

to look at the new patch set (#4).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/statestore/test_statestore.py
16 files changed, 422 insertions(+), 336 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS3, Line 34: updated
> does "updated" also include new objects? or is it really just objects that 
It includes both added and changed objects. I updated the comment but let me 
know if you have a better name option.


http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

PS3, Line 86: deleted
> in this case, what is in 'value'? Is it defined to be set or is only the ke
What I tried to explain in the commit msg (let me know if it still unclear) was 
that we should not use the contents of "value" to indicate if a topic item is 
deleted or not. It doesn't mean that a topic item with deleted set to true 
should have a specific value, i.e. each topic can have a separate behavior.


PS3, Line 103: all changes
> your commit message seems to contradict that.  Oh, maybe i'm confusing cata
Hm, sorry for the confusion. What the commit is describing is related to 
catalog versions and the follow up changes for fixing IMPALA-5058.


PS3, Line 105: The from_version will always be 0 for non-delta updates
> what about to_version in this case?
The to_version is 0 only when there are no changes in the topic. It's value 
doesn't depend on whether the update is delta or not.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5856: Fix outer join predicate assignment.

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5856: Fix outer join predicate assignment.
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/8039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.

2017-09-13 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, 
unknown.
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8014/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS7, Line 333: Bool tests
BoolTestPredicates?


http://gerrit.cloudera.org:8080/#/c/8014/7/fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java:

PS7, Line 28:  IS [NOT] 
Replace with the one in L31.


PS7, Line 30:  is one of (TRUE | FALSE | UNKNOWN).
Remove, it's kind of redundant.


http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

PS6, Line 591: 
> added negative tests and coverage to rewrites and end-to-end. the analysis 
Sorry, I don't think I understand this comment. What do you mean by analysis 
for the expr does not do much? Aren't the negative test cases captured by 
AnalysisError()? I think rewrite tests should handle the rewrite logic but 
everything else (analysis related) should be tested here. Maybe I am missing 
something.


http://gerrit.cloudera.org:8080/#/c/8014/7/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

PS7, Line 217: // Incorrect type for LHS.
 : AnalyzedRewritesError("'foo' is true", rule, "'foo' = TRUE 
AND 'foo' IS NOT NULL",
 : "operands of type STRING and BOOLEAN are not comparable: 
'foo' = TRUE");
 : // No subqueries allowed.
 : RewritesError(
 : "(select max(tinyint_col) from functional.alltypessmall) 
is true", rule,
 : "Subqueries are not supported in the select list.");
Hm, it's not clear to me why this should be here and not as an Analysis test.


-- 
To view, visit http://gerrit.cloudera.org:8080/8014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5856: Fix outer join predicate assignment.

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5856: Fix outer join predicate assignment.
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8039/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS1, Line 1381: // Right-most full-outer join table ref that is equal to or to 
the left of the
  : // table ref materializing 'tids'.
Trying to visualize this hurts my brain :) Not sure if it helps to understand 
the logic in this function. The comment in L1385 should suffice.


PS1, Line 1385: this
What is "this" refer to? After reading this more carefully, I think I 
understand what it means. I think it is better to move the comment below L1388.


PS1, Line 1398: globalState_.ojClauseByConjunct.get(e.getId());
getOjRef(e)?


http://gerrit.cloudera.org:8080/#/c/8039/1/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
File testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test:

PS1, Line 1020: from the On-clause of a left outer join
Out of curiosity, did you check if we have the same issue if there is a left 
outer join followed by a full outer join? If we don't have a test like that, 
maybe add it for completion.


Line 1028:  and t1.bigint_col > 10 and t2.bigint_col > 30
To make it more interesting, add a where clause with one predicate that 
references t1 and t2 and one that references t3.


-- 
To view, visit http://gerrit.cloudera.org:8080/8039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/statestore/test_statestore.py
16 files changed, 421 insertions(+), 336 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 2:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

PS2, Line 233:  0, if they have then skip this step and use that data
> nit: perhaps easier to read? "...version 0. If so, then skip ..."
Done


Line 305:   BuildTopicUpdatesHelper(catalog_objects.updated_objects, false);
> I like the previous version more where this is inlined into L292. A quick c
Done


Line 317: if (catalog_object.catalog_version <= last_sent_catalog_version_) 
continue;
> Move this above L312?
Done


PS2, Line 336: if (topic_deletions) {
 :   VLOG(1) << "Publishing deletion: " << entry_key << "@"
 :   << catalog_object.catalog_version;
 : } else {
 :   VLOG(1) << "Publishing update: " << entry_key << "@"
 :   << catalog_object.catalog_version;
 : }
> perhaps remove some redundancy this way:
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1340: LOG(INFO) << "Received large catalog update(>100mb): "
> Received a large catalog item?
Done


PS2, Line 1366: if (item.deleted) {
  : VLOG(3) << "Deleted item: " << item.key << " version: "
  : << catalog_object.catalog_version;
  :   } else {
  : VLOG(3) << "Added item: " << item.key << " version: "
  : << catalog_object.catalog_version;
  :   }
> same suggestion as in catalog_server to reduce redundancy.
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 86:   3: required bool deleted = false;
> Isn't it better to not have a default value to make sure this is always set
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

Line 56:  *   determine which catalog objects were deleted since the last 
catalog update topic.
> catalog topic update
Done


Line 57:  *   Once the catalog update topic is constructed, the old deleted 
catalog objects are
> catalog topic update
Done


Line 74:   // Retrieve all the removed catalog objects with version > 
'fromVersion'.
> /** style comment
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

PS2, Line 152: .
> the word "delta" implies that we're dealing with inserts, deletes, and poss
Done


Line 347:   // Identify the catalog objects that were updated (added/dropped) 
in the catalog with
> /** style comment
Done


Line 351: Set addedCatalogObjectKeys = Sets.newHashSet();
> updatedCatalogObjects?
Done


Line 363: TCatalogObject catalogTbl = new 
TCatalogObject(TCatalogObjectType.TABLE,
> Why not just iterate over db.getTables()?
Done


Line 441: 
addedCatalogObjectKeys.add(CatalogDeltaLog.toCatalogObjectKey(privilege));
> Instead of having a duplicate add() in all places, how about creating the a
Done


Line 448: for (TCatalogObject removedObject: 
deltaLog_.retrieveObjects(fromVersion)) {
> Nice! Much clearer.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 18: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7955/18/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS18, Line 117: DCHECK_LE(len, numeric_limits::max());
Is this actually needed? Isn't the cast in the line above going to fail if the 
condition in the DCHECK is false?


http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

PS18, Line 155: Catalog
nit: catalog


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 18:

Taking a look now.

-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.

2017-09-11 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, 
unknown.
..


Patch Set 6:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/8014/1/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS1, Line 2876: 
don't think you need that


PS1, Line 2878: 
same here


Line 2891:   | UNMATCHED_STRING_LITERAL:l expr:e
nit: extra spaces (see marked as red).


http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS2, Line 273:   KW_UNBOUNDED, KW_UNCACHED, KW_UNION, KW_UPDATE, KW_UPDATE_FN, 
KW_UPSERT, KW_USE,
nit: long line (see vertical separator, that's ok for .test files but for 
everything else, we try to stay < 90).


http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS6, Line 337: rules.add(BoolTestToCompoundRule.INSTANCE);
Maybe add a brief comment about this rule as well?


PS6, Line 1093: it
nit: them


PS6, Line 1098: For all conjuncts other than
  : // the ones listed below, the method should be a no-op.
remove. Similar comment above (L1082).


http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java:

PS2, Line 26: predicate that tests a boolean value using IS.
"a boolean test predicate."? I think you can also add the grammar spec here 
besides the example.


PS2, Line 33:  it to be executable, i.e., it is illegal to call 
For literals we usually make them static as well; also, remove "_" from the 
name.


PS2, Line 38: ivate
No need for "this". We use "_" in the name to indicate private fields.


PS2, Line 41: super();
: this.isNegated_ =
You can use the addChild() function here, same thing.


PS2, Line 88: 
Will that work with something like "select 1 is true"? For instance, "select 1 
= true" returns true. Type is not boolean but it can be casted into one.


PS2, Line 94:  new AnalysisException("Ex
Similar comment as above. e.g. select 1 is 1;


http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java
File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java:

PS2, Line 29: expressions
nit re terminology: "compound predicate". Technically, it is an expression 
(subclass of Expr) but try to use the more specific term.


http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java
File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java:

PS6, Line 29: to compound expressions
"a compound predicate."


Line 64: }
Preconditions.checkNotNull(result);


http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

PS6, Line 578:   
nit: extra space


PS6, Line 580: TestBoolTestPredicate
A few more test cases to consider:
1. bool test predicate in the select list
2. nested bool test predicates, e.g. ((bool_col is true) is true)
3. other bool exprs than bool_col, e.g. function returning bool, case expr 
4. wrap the bool test in an istrue/isfalse function


PS6, Line 586: null
Also use "unknown"


PS6, Line 591: where 1 is true"
This doesn't seem to be consistent with how we handle for instance expr like 1 
= true (we allow that).


PS6, Line 591: AnalysisError
Add a few more negative cases. Example:
1. subquery
2. literals that can't be cast to true/false, e.g. 10 is true


http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

PS6, Line 168: // TODO: figure out how/if parens are printed to reflect 
expression tree.
Did you figure that out? If so, remove the TODO.


-- 
To view, visit http://gerrit.cloudera.org:8080/8014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-08 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7955/13/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
File 
fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java:

PS13, Line 43: return 0;
> It can throw an OutOfMemory*Error* which is fatal. Are you saying we should
It is but we're catching it and trying to release resources. We need to make 
sure that this path works as expected. Right?


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-08 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7955/13/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
File 
fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java:

PS13, Line 43: return 0;
> Not sure I understand, NBAOS.tryAllocate() reads this 0 and throws IllegalS
My point is that the allocator doesn't only return 0, is it? It could always 
throw an OutOfMemoryException which the tryAllocate will catch, try to free any 
allocated memory and then re-throw. Shouldn't we be testing that path as well?


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-08 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7955/13/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java:

PS13, Line 100: private void tryAllocate(long len) {
  : Preconditions.checkState(bufferPtr_ >= 0);
  : try {
  :   if (len <= 0) {
  : throw new 
IllegalArgumentException(INVALID_BUFFER_LEN_MSG + ": " + len);
  :   }
  :   if (len >= BUFFER_MAX_SIZE_LIMIT) {
  : throw new 
IllegalArgumentException(BUFFER_LIMIT_EXCEEDED_MSG + ": " + len);
  :   }
  :   long newBufferPtr = nativeAllocator_.allocate(bufferPtr_, 
len);
  :   if (newBufferPtr == 0) {
  : throw new IllegalStateException(ALLOCATION_FAILED_MSG + 
": " + len
  : + " Is realloc: " + (bufferPtr_ > 0));
  :   }
  :   bufferPtr_ = newBufferPtr;
  :   bufferLen_ = len;
  : } catch (Throwable e) {
  :   nativeAllocator_.free(bufferPtr_);
  :   throw e;
  : }
  :   }
I think that belongs to the Allocator class.


http://gerrit.cloudera.org:8080/#/c/7955/13/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java
File 
fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java:

PS13, Line 43: return 0;
Shouldn't the allocate throw an OutOfMemoryException as well?


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/statestore/test_statestore.py
16 files changed, 438 insertions(+), 331 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for 
deleted objects
..


Patch Set 1:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 288: BuildTopicUpdates(catalog_objects.updated_objects, false);
> Is there a requirement/assumption that the deletions must come after the up
No, it doesn't matter.


Line 300: void CatalogServer::BuildTopicUpdates(const vector& 
catalog_objects,
> IMO, it is more readable if the signature of BuildTopicUpdates takes TGetAl
These two "do something" blocks are almost identical. I refactored the code a 
bit to make it a bit less flaky.


Line 333: << catalog_object.catalog_version;
> indent
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

Line 154:   /// determine items that have been deleted, it saves the set of 
topic entry keys sent
> Update comment
Done


Line 164:   bool topic_deletions);
> What does this param do?
Updated the comment.


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler-test-util.cc
File be/src/scheduling/scheduler-test-util.cc:

Line 481:   item.key = host.ip;
> Use __set fn like above to be consistent
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 141:   if (delta.is_delta && delta.topic_entries.empty()) {
> single line
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1323:   TCatalogObject catalog_object;
> Please move this right before DeserializeThriftMsg() so we can see where it
Done


Line 1343:  << item.key << " is "
> indentation
Done


Line 1367: LOG(INFO) << "Deleted item: " << item.key << " version: " << 
catalog_object.catalog_version;
> That's a lot of logging, sure we need this at INFO level?
Sorry, that was for debugging.


Line 1398: if (existing_object.catalog_version <= 
catalog_object.catalog_version) {
> Shouldn't this be < and not <=?
Done


PS1, Line 1406: batch_size_bytes
> Looks like we didn't account this earlier for deletions. This logic makes m
Done


Line 1527:   if (item.deleted) {
> Is it possible that within a single list of topic_entries there is both an 
No, it's not possible to have both an addition and a deletion in the same list 
of topic_entries.


http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

Line 485: item.value.empty() ? Statestore::TopicEntry::NULL_VALUE : 
item.value,
> Why not just 'item.value'? The value gets copied.
Good point. Done


Line 527:   deleted_key_size_bytes -= itr->first.size();
> += ?
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 92
> - I kind of like this separation of updates and deletions in different list
By switching to TTopicItem for topic deletions there is now significant overlap 
between operations performed on both added and deleted items. We could try to 
extract most of the common logic in separate functions and keep the existing 
logic in place, but for now it seems simple enough to have one loop over one 
list of items and separate the logic with an if statement when needed. If we 
had to add many of these if statements in multiple places, I'd agree that 
splitting the logic would have been better.


Line 97:   5: required bool is_delta;
> fix member numbering
Done


http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

Line 56:   private Map removedCatalogObjectVersionsByKey_ = 
Maps.newHashMap();
> I might be wrong, but I think we can end up having the same object deleted 
Good catch, that was a bug indeed. Removed the second map and simplified the 
logic around handling the deleted objects.


Line 70:* key 'key'.
> update comment
No longer applicable. Function is removed.


Line 90:   public synchronized void garbageCollect(long currentCatalogVersion) {
> Although these methods are synchronized, the underlying maps aren't thread 
That's not correct. The essence of having synchronized functions is exactly 
that, to ensure proper synchronization even though the underlying collections 
are not thread safe. Two threads calling garbageCollect and addRemoved cannot 
both proceed, one will grab the lock on the CatalogDeltaLog object and make 
progress while the other will simply block. In any case, this class is 
simplified and it only contains one collection.


Line 154:   private String toCatalogObjectKey(TCatalogObject 

[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7955/9/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java:

PS9, Line 159: public void writeTo(OutputStream out) {
 : throw new IllegalStateException("Not implemented.");
 :   }
 : 
 :   public byte toByteArray()[] {
 : throw new IllegalStateException("Not implemented.");
 :   }
 : 
 :   public int size() {
 : throw new IllegalStateException("Not implemented.");
 :   }
> These are from ByteArrayOutputStream. Given we mimic it, someone might expe
I get that, but you're not extending the ByteArrayOutputStream class, you're 
extending OutputStream. So, either change that or remove anything that is not 
defined in the parent class.


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 9:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/7955/9/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java:

PS9, Line 67: .
nit: remove


PS9, Line 94: reAllocateMemory
Unsafe.reallocateMemory()


PS9, Line 159: public void writeTo(OutputStream out) {
 : throw new IllegalStateException("Not implemented.");
 :   }
 : 
 :   public byte toByteArray()[] {
 : throw new IllegalStateException("Not implemented.");
 :   }
 : 
 :   public int size() {
 : throw new IllegalStateException("Not implemented.");
 :   }
These are not defined in OutputStream, so what is the point of defining them 
here? Am I missing something?


http://gerrit.cloudera.org:8080/#/c/7955/9/fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java
File fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java:

PS9, Line 38: @SuppressWarnings("restriction")
explain? Also add a brief description of what this class tests.


PS9, Line 78: huge
"huge" doesn't really convey much information here. Maybe a bit more explicit 
what we're testing (>1GB, < 4GB)? Also, you may want to test multiple sizes to 
exercise all paths in the reallocation logic (e.g. 500MB, 1GB, 1.5GB, 2GB). You 
can decide which ones make more sense.


PS9, Line 85: char[] chars = new char[128 * 1024 * 1024];
: Arrays.fill(chars, 'a');
: String hugeString = new String(chars);
nit: see if you can use Guava's Strings.repeat() here, may be cheaper.


PS9, Line 99:   
nit: extra space


PS9, Line 99: Create a huge string of size 512MB. The combined size of the  
test object
: // crosses 4GB.
Maybe say something like "create an object with a serialization size which is 
larger than 4GB"


PS9, Line 105: never
ha, famous last words :)


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates

2017-09-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5881: Use native allocation while building catalog 
updates
..


Patch Set 8:

(2 comments)

Flushing minor comments from patch #8 and switching to patch #9.

http://gerrit.cloudera.org:8080/#/c/7955/8/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS8, Line 603: // Struct containing eight strings, used for testing.
Mention what are we testing with this weird looking struct. Also, leave a TODO 
to move this out of here.


http://gerrit.cloudera.org:8080/#/c/7955/8/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

PS8, Line 51: import org.apache.impala.thrift.TNativeByteBuffer;
dup (L49)


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-09-06 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 4:

> For this options serialization question, we have a C++
 > implementation of "set x=y" -> Thrift at impala::SetQueryOption()
 > (https://github.com/apache/incubator-impala/blob/545eab6d6202ca3968279d14fad28bd2a6d566f6/be/src/service/query-options.cc#L113
 > ). I don't know how important it is to keep one implementation of
 > that, but that one can be made available to us here.

Good point. It's not clear how are we going to expose that to the PlannerTests 
through. Thoughts?

-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-09-06 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 4:

> > > > (1 comment)
 > > >
 > > > The tests in this change use only 3 query options. Adding a new
 > > > section to the .test files to support only these 3 options
 > > wouldn't
 > > > be too much work.
 > > >
 > > > On the other hand, adding support for all the query options
 > that
 > > > Impala supports would be a lot harder. Probably we would have
 > to
 > > > implement that using some Java reflection trickery.
 > >
 > > I don't think you need to use Java reflection. The generated Java
 > > class for TQueryOptions has a number of helper functions to
 > search
 > > and set a field by name. So, for instance the QUERY_OPTIONS
 > section
 > > could have key=value pairs that correspond to query options. Then
 > > we could write a small function that parses the key value pairs
 > and
 > > uses the helper functions to check for valid query options and
 > set
 > > the values. Do you want to give it a try? Thanks
 > 
 > Thanks. I'm still confused though how to implement setting enum
 > query options without reflection. e.g.:
 >  QUERYOPTIONS
 > COMPRESSION_CODEC=GZIP
 > 
 > Here we have to know that the type of TQueryOptions.compression_codec
 > is org.apache.impala.thrift.THdfsCompression, otherwise we cannot
 > parse "GZIP". Am I missing something?

If you detect that this query option refers to a compression_codec, the only 
thing you need to do is map the string literal "GZIP" to an instance of 
HdfsCompression (the latter is just an enum). From that you can get the 
THdfsCompression object that you pass in the setFieldValue() method. You don't 
need to handle all the query options in one go. You can add the infrastructure 
needed, i.e. parsing the key=value pairs, validating the key part and then 
simply add the logic to handle the options we're interesting in for this patch. 
Once the infrastructure is in place, then adding support for other options 
should be quite easy, but you don't need to do everything.

-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] [PREVIEW] Use native allocation while building catalog updates

2017-09-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [PREVIEW] Use native allocation while building catalog updates
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7955/3//COMMIT_MSG
Commit Message:

PS3, Line 20: 4-5GB
You need to comment about the limit (4GB) that currently thrift is imposing.


http://gerrit.cloudera.org:8080/#/c/7955/3/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

PS3, Line 142: This method serializes
 :* the output of CatalogService#getCatallogObjects() into 
native memory allocated
 :* using a custom TNativeSerializer and returns the 
corresponding TNativeByteBuffer
 :* serialized into bytes.
Alternatively: "Uses a custom serializer (TNativeSerializer) to serialize the 
catalog objects into a byte buffer that is backed by native memory."


http://gerrit.cloudera.org:8080/#/c/7955/3/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java:

PS3, Line 36: 512MB
update value


Line 54:   // Limit on the size to which the underlying buffer can grow
Comment why that limit exists


PS3, Line 103: off
offset


PS3, Line 115: if (bufferLen_ >= BUFFER_DOUBLING_RESIZE_LIMIT) {
 : while (newBufferSize < bytesWritten_ + len) {
 :   newBufferSize += BUFFER_RESIZE_INCREMENTS;
 : }
 :   } else {
 : while (newBufferSize < bytesWritten_ + len) {
 :   newBufferSize <<= 1;
 : }
 :   }
I think if you put the if/else block inside the while block it may be a bit 
easier to follow plus it will give you a slightly better behavior because 
you'll be doubling as long as newBufferSize < 1GB and then you'll switch to 
incremental increases if you cross the 1GB boundary.


PS3, Line 135: public synchronized void reset() {
 :   }
single line (here and for some of the other functions as well).


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-09-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 4:

> > (1 comment)
 > 
 > The tests in this change use only 3 query options. Adding a new
 > section to the .test files to support only these 3 options wouldn't
 > be too much work.
 > 
 > On the other hand, adding support for all the query options that
 > Impala supports would be a lot harder. Probably we would have to
 > implement that using some Java reflection trickery.

I don't think you need to use Java reflection. The generated Java class for 
TQueryOptions has a number of helper functions to search and set a field by 
name. So, for instance the QUERY_OPTIONS section could have key=value pairs 
that correspond to query options. Then we could write a small function that 
parses the key value pairs and uses the helper functions to check for valid 
query options and set the values. Do you want to give it a try? Thanks

-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] [PREVIEW] Use native allocation while building catalog updates

2017-09-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [PREVIEW] Use native allocation while building catalog updates
..


Patch Set 1:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/7955/1/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS1, Line 106: RETURN_IF_ERROR(DeserializeThriftMsg(jni_env, result_bytes, 
));
What happens if DeserializeThriftMsg returns a non-ok status? Who will release 
the allocated memory?


PS1, Line 108: uint32_t len = static_cast(nbuffer.buffer_len);
Isn't buffer_len i64? What will happen if the cast fails?


PS1, Line 110: buf
Interestingly, DeserializeThriftMsg() casts the const away of the first param 
(buf). The comment in this function still says that it is treated as a const 
even though the const is casted away but let's make sure we check the 
implementation to ensure nothing bad happens to buf after that call.


PS1, Line 111: desrialize
nit: deserialize (typo)


PS1, Line 113: free(buf);
> I used free after checking that the unsafe.freeMemory() also calls the same
Yes, I think we need to place it safe here. If the implementation changes in 
some later Java version, we may run into issues.


http://gerrit.cloudera.org:8080/#/c/7955/1/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS1, Line 596: native byte buffer
In the context of reading this thrift file, I don't think it is clear what 
"native byte buffer" is. Maybe say at the beginning that it represents a buffer 
allocated by the JVM using native memory (unsafe).


http://gerrit.cloudera.org:8080/#/c/7955/1/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

Line 137:* Gets all catalog objects
Please expand the comment. We're adding some non-trivial behavior here.


http://gerrit.cloudera.org:8080/#/c/7955/1/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java
File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java:

Line 1: package org.apache.impala.thrift;
Apache header?


PS1, Line 26: @SuppressWarnings("restriction")
explain?


PS1, Line 27: class
nit: make it final (do you plan to extend it?)


PS1, Line 33:   private static final long BUFFER_DOUBLING_RESIZE_LIMIT = 1 * 
1024 * 1024 * 1024; /* 1GB */
nit: long line


PS1, Line 38: length
length (in bytes)


PS1, Line 39: protected
why protected?


PS1, Line 48: public NativeByteArrayOutputStream() {
: this(BUFFER_INITIAL_SIZE_DEFAULT);
:   }
single line?


PS1, Line 57: // Unsafe#allocateMemory() can handle negative inputs.
Why do I need to know about this here? Maybe remove.


PS1, Line 94: if (bufferLen_ >= BUFFER_DOUBLING_RESIZE_LIMIT) {
: newBufferSize = bufferLen_ + BUFFER_RESIZE_INCREMENTS;
:   } else {
: newBufferSize = bufferLen_ << 1;
:   }
How do you guarantee that newBufferSize > bytesWritten_ + len?
E.g. bufferLen_ = 128MB, len = 1GB. Isn't bufferLen_ going to be 256MB although 
you need 1128MB? Am I missing something?


http://gerrit.cloudera.org:8080/#/c/7955/1/fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java
File fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java:

PS1, Line 31: native by array
nit: "native by array"?


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [PREVIEW] Use native allocation while building catalog updates

2017-09-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [PREVIEW] Use native allocation while building catalog updates
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7955/1/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS1, Line 113: free(buf);
This freaks me out a bit. Any documentation I've read says that you should be 
using Unsafe.freeMemory() to release memory you allocated using Unsafe. How do 
we know this is safe?


-- 
To view, visit http://gerrit.cloudera.org:8080/7955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-08-31 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7564/2/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

PS2, Line 87: def test_distrib_plan_filter_pruning(self, vector):
: vector.get_value('exec_option')['num_nodes'] = 0
: 
self.run_test_case('QueryTest/runtime_filters_distrib_pruning', vector)
: 
:   def test_singlenode_plan_filter_pruning(self, vector):
: vector.get_value('exec_option')['num_nodes'] = 1
: 
self.run_test_case('QueryTest/runtime_filters_singlenode_pruning', vector)
> Let me give you some context:
Can't we expand our testing infrastructure to enable query options to be set on 
a per query basis in addition to those set before the call to 
runPlannerTestFile()? I am not suggesting supporting SET statements inside the 
.test files but we could add another section e.g. QUERY_OPTIONS, that we parse 
and use to set query options accordingly. You may want to take a look at 
PlannerTestBase.java (testPlan() function) and see how much work it would 
require to do something like that.


-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5850: Cast sender partition exprs under unions.

2017-08-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5850: Cast sender partition exprs under unions.
..


Patch Set 1: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7884/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

PS1, Line 184: In particular, partitioned hash joins under a union are
 :* treated as different series of joins (could have different 
data partitions).
Can you expand the comment to explain what 'node' represents? Also, that part 
of the comment doesn't really help understanding this function and is similar 
to the comment in L178. Maybe remove?


http://gerrit.cloudera.org:8080/#/c/7884/1/testdata/workloads/functional-query/queries/QueryTest/joins.test
File testdata/workloads/functional-query/queries/QueryTest/joins.test:

PS1, Line 761:  
nit: extra space


-- 
To view, visit http://gerrit.cloudera.org:8080/7884
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0aa801bcad8c2324d848349c7967d949224404e0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE

2017-08-29 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3548: Prune runtime filters based on query options in 
the FE
..


Patch Set 2:

(4 comments)

Just some answers on the comments. Haven't looked at the new patch yet.

http://gerrit.cloudera.org:8080/#/c/7564/2/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS2, Line 119: singleNodePlan
> The single node plan is created on L104 and assigned to 'singleNodePlan'. D
Let's expand to the comment to mention that 'singleNodePlan' now points to the 
root of the distributed plan.


PS2, Line 117: // create runtime filters
 : if (ctx_.getQueryOptions().getRuntime_filter_mode() != 
TRuntimeFilterMode.OFF) {
 :   RuntimeFilterGenerator.generateRuntimeFilters(ctx_, 
singleNodePlan);
 :   ctx_.getAnalysisResult().getTimeline().markEvent("Runtime 
filters computed");
 : }
> The block was moved here because the distributed plan is created on L114 an
Thanks for explaining.


http://gerrit.cloudera.org:8080/#/c/7564/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

PS2, Line 165: 
tFilterTarget.setIs_bound_by_partition_columns(isBoundByPartitionColumns);
 : tFilterTarget.setIs_local_target(isLocalTarget);
> The BE uses these flags to create instances of Coordinator::FilterTarget th
I see. Thanks


http://gerrit.cloudera.org:8080/#/c/7564/2/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

PS2, Line 87: def test_distrib_plan_filter_pruning(self, vector):
: vector.get_value('exec_option')['num_nodes'] = 0
: 
self.run_test_case('QueryTest/runtime_filters_distrib_pruning', vector)
: 
:   def test_singlenode_plan_filter_pruning(self, vector):
: vector.get_value('exec_option')['num_nodes'] = 1
: 
self.run_test_case('QueryTest/runtime_filters_singlenode_pruning', vector)
> These are written as query tests and not planner tests because the planner 
I don't follow. Did you check how the query option is set in 
testRuntimeFilterPropagation() in PlannerTest.java?


-- 
To view, visit http://gerrit.cloudera.org:8080/7564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5836: Improvements to Eclipse frontend configuration.

2017-08-24 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5836: Improvements to Eclipse frontend configuration.
..


Patch Set 1: Code-Review+2

Thank you!

-- 
To view, visit http://gerrit.cloudera.org:8080/7803
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia723fbf706cf409a8fb6b5ff0297c2b1ff7c9590
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

2017-08-24 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate 
subqueries
..


Patch Set 5: Code-Review+2

Rebase, keep Alex's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/7706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

2017-08-22 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#4).

Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate 
subqueries
..

IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

This commit fixes an issue where a query will return wrong results if it
has an aggregate subquery with a correlated inequality predicate.
Since the rewrite for this case is not currently supported, an
exception is now thrown during the analysis.

Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
3 files changed, 115 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/7706/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

2017-08-22 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate 
subqueries
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7706/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 1417:   public static String toSqlHelper(List exprs) {
> listToSql()?
Done


http://gerrit.cloudera.org:8080/#/c/7706/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

Line 399: // Ensure that all the extracted correlated predicates can be 
added to the on-clause
> nit: we usually use caps for clauses and such, i.e. ON-clause
Done


Line 648:   private static void canRewriteCorrelatedSubquery(Expr expr) throws 
AnalysisException {
> validateCorrelatedSubqueryStmt()?
Done


Line 676:* added to the on-clause of the join that results from the 
subquert rewrite; It throws
> ON-clause and typo: subquery. Also use "." instead of ";"
Done


Line 683: Preconditions.checkNotNull(correlatedPredicates);
> Preconditions.checkState(inlineView.isAnalyzed());
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

2017-08-22 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate 
subqueries
..

IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

This commit fixes an issue where a query will return wrong results if it
has an aggregate subquery with a correlated inequality predicate.
Since the rewrite for this case is not currently supported, an
exception is now thrown during the analysis.

Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
3 files changed, 115 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/7706/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

2017-08-22 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate 
subqueries
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7706/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

Line 341:   canRewriteCorrelatedSubquery(expr, onClauseConjuncts);
> Why not fold the new logic into this function and move everything down? Our
Done


Line 721:   com.google.common.base.Function toSql =
> Add a static helper that does toSql() on a list. Might live in Expr or ToSq
Done


http://gerrit.cloudera.org:8080/#/c/7706/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

Line 733: "id %s (select %s from functional.alltypestiny t where 
t.bool_col = false " +
> Add a correlated predicate in the usbquery that does not reference 't' from
Done


Line 737: AnalyzesOk(String.format("select id from 
functional.allcomplextypes t where id " +
> Just to be sure: Your original fix broke this test right?
Yes


Line 777:   "where t.int_struct_col.f1 = v.id)", cmpOp, aggFn),
> Why this change?
Forgot to revert. Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

2017-08-21 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate 
subqueries
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7706/1/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

Line 681: List slotRefs = Lists.newArrayList();
> Might be easier/shorter:
Done


Line 690: // Inequality correlated predicates are not currently supported 
in aggregate
> This bug is not specific to inequality predicates. I think it's really that
Done


Line 691: // subqueries (see IMPALA-5531).
> We should be able to run queries with arbitrary correlated predicates if th
Done


Line 692: if (expr instanceof BinaryPredicate && 
!correlatedPredicates.isEmpty() &&
> Let's report which predicate makes the subquery unsupported.
I am not sure the BinaryPredicate check is redundant. The first is applied on 
'expr' which is the subquery expr while the latter is applied on the correlated 
predicates. Is this what you mean?


-- 
To view, visit http://gerrit.cloudera.org:8080/7706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

2017-08-21 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate 
subqueries
..

IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

This commit fixes an issue where a query will return wrong results if it
has an aggregate subquery with a correlated inequality predicate.
Since the rewrite for this case is not currently supported, an
exception is now thrown during the analysis.

Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
3 files changed, 82 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/7706/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-08-18 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7731

Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for 
deleted objects
..

[PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/statestore/test_statestore.py
16 files changed, 401 insertions(+), 289 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

2017-08-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7706

Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate 
subqueries
..

IMPALA-5531: Fix correctness issue in correlated aggregate subqueries

This commit fixes an issue where a query will return wrong results if it
has an aggregate subquery with a correlated inequality predicate.
Since the rewrite for this case is not currently supported, an
exception is now thrown during the analysis.

Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
3 files changed, 64 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/7706/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-4847: Simplify HdfsTable block metadata loading code

2017-08-11 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4847: Simplify HdfsTable block metadata loading code
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7652
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I963d647bd2ba11e3843c6ef2ac6be113c74280bf
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4847: Simplify HdfsTable block metadata loading code

2017-08-11 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4847: Simplify HdfsTable block metadata loading code
..


Patch Set 2:

(3 comments)

Yay, more lined deleted :)

http://gerrit.cloudera.org:8080/#/c/7652/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS2, Line 283: fsBlockLocationSupport
nit: maybe rename to synthesizeBlocks? synthesizeBlocks = 
!FileSystemUtil.supporsStorageIds(fs);


PS2, Line 616: fsBlockLocationSupport
same comment as above


PS2, Line 645: Preconditions.checkNotNull(fd);
move this above L647


-- 
To view, visit http://gerrit.cloudera.org:8080/7652
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I963d647bd2ba11e3843c6ef2ac6be113c74280bf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4847: Simplify HdfsTable block metadata loading code

2017-08-10 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4847: Simplify HdfsTable block metadata loading code
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7652/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS1, Line 252: for all partitions in 'partitions'
nit: "of all the 'partitions' that ..."


Line 258:*   and disk IDs.
Expand this to mention that we synthesize block metadata for FS that don't 
support disk ids. Also, mention (maybe in the beginning) that 'pathDir' could 
be in HDFS, S3 and ADLS.


PS1, Line 331: if (partitions == null || partitions.isEmpty()) return;
 : RemoteIterator fileStatusIter =
 : FileSystemUtil.listFiles(fs, partDir, false);
 : if (fileStatusIter == null) return;
 : while (fileStatusIter.hasNext()) {
 :   LocatedFileStatus fileStatus = fileStatusIter.next();
 :   if (!FileSystemUtil.isValidDataFile(fileStatus)) continue;
 :   // For the purpose of synthesizing block metadata, we 
assume that all partitions
 :   // with the same location have the same file format.
 :   FileDescriptor fd = 
FileDescriptor.createWithSynthesizedBlockMd(fileStatus,
 :   partitions.get(0).getFileFormat(), hostIndex_);
 :   // Update the partitions' metadata that this file belongs 
to.
 :   for (HdfsPartition partition: partitions) {
 : partition.getFileDescriptors().add(fd);
 :   }
 : }
With the exception of L340, this function resembles loadBlockMetadata. Maybe 
see if there is a nice way to merge these two.


PS1, Line 581: Path tblLocation = 
FileSystemUtil.createFullyQualifiedPath(getHdfsBaseDirPath());
It looks like this is only used in L590. Maybe move it closer to that.


PS1, Line 695: .keys()
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/7652
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I963d647bd2ba11e3843c6ef2ac6be113c74280bf
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.

2017-06-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.
..


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.

2017-06-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7257/4/testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test:

PS4, Line 261:  
nit: trailing space


PS4, Line 327: 
Also, consider adding one or two cases with outer joins. You may expand some 
existing query if you want. Also, you may want to add a few cases that we know 
are kind of problematic, similar to q64, that has regression due to the group 
by or analytic function.


-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.

2017-06-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.
..


Patch Set 2:

(2 comments)

Just minor responses; thanks for the clarifications. I'll take a look at the 
tests when submitted.

http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

PS2, Line 280: fkPkCandidate.get(0)
> Might have misunderstood your comment, but I'll give it a shot.
Ah yes, thanks for clarifying. I forgot the grouping effect :)


PS2, Line 299: Preconditions.checkState(lhsCard >= 0 && rhsCard >= 0);
> Maybe you misread the code: It's greater or equal to zero, so zero is valid
oops yes. sorry for the confusion


-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.

2017-06-29 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

PS2, Line 27: import org.apache.impala.analysis.SlotDescriptor;
Is this used?


PS2, Line 40: import org.apache.kudu.shaded.com.google.common.base.Joiner;
replace


http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

PS2, Line 79: -
nit: remove


PS2, Line 80: ordered based on the tuple ids
I am not sure I get what that ordering is. Do you mean grouped?


PS2, Line 202: based on the single most
 :* selective join predicate. We do not attempt to estimate the 
joint selectivity of
 :* multiple join predicates to avoid underestimation.
Much better. Thanks


PS2, Line 280: fkPkCandidate.get(0)
Can you plz explain this? I could be wrong about this but it seems that the 
decision of whether this is a pk/fk depends on which equi-join slots we process 
first. Maybe I am missing something. In either case, plz add a comment.


PS2, Line 299: Preconditions.checkState(lhsCard >= 0 && rhsCard >= 0);
Are we certain this is a valid precondition check? Why isn't a 0 rhsCard 
possible? Unless this is guaranteed to be the case...


PS2, Line 337: we adjustments
nit: grammar


-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5489: Improve Sentry authorization for Kudu tables

2017-06-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5489: Improve Sentry authorization for Kudu tables
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7307/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test
File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test:

PS1, Line 712:  QUERY
 : insert into grant_rev_db.test_kudu_tbl values (1, "foo");
 : 
 :  QUERY
 : upsert into grant_rev_db.test_kudu_tbl values (1, "bar");
Maybe you also want to run these statements before granting insert to show that 
authz fails.


PS1, Line 737: SELECT
hm, why 'SELECT'?


PS1, Line 740: GRANT SELECT(a) ON TABLE grant_rev_db.test_kudu_tbl TO 
grant_revoke_test_KUDU
 :  RESULTS
 : 
 :  QUERY
 : update grant_rev_db.test_kudu_tbl set a = "zzz"
This is kind of weird to me. What if you had a where clause in the update 
statement on column i? Would that work or would it require select privs on i as 
well? My point is that insert + select (on something) = UPDATE is not 
intuitive. Maybe it would be simpler to just allow UPDATE only on ALL. What do 
you think?


Line 804: drop role grant_revoke_test_ROOT;
Don't you need to clean up the role here?


-- 
To view, visit http://gerrit.cloudera.org:8080/7307
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib12d2b32fa3e142e69bd8b0f24f53f9e5cbf7460
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.

2017-06-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.
..


Patch Set 1:

(17 comments)

I have one comment about testing. We have lots of test changes but it's not 
easy to verify if the end result makes sense or not. I think it may worth 
having a planner test with explain_level > 2 that covers a few interesting join 
cases so that we can see what kind of pk/fk conjuncts are detected and what the 
impact is on estimated join cardinalities.

http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

PS1, Line 170: else if (fkPkEqJoinConjuncts_.isEmpty()) {
 :   output.append("assumed fk/pk");
I haven't read the entire patch yet, so I am curious what this case represents.


PS1, Line 178: for (int j = 0; j < slots.size(); ++j) {
 :   output.append(slots.get(j).toString());
 :   if (j + 1 != slots.size()) output.append(", ");
 : }
Can't you use the Guava's Joiner class here? Joiner.on(",").join(slots)


http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

PS1, Line 37: import 
org.apache.kudu.client.shaded.com.google.common.collect.Maps;
?


PS1, Line 271: scanSlotsByTids
nit: Each pair represents two joined tables, no? So, maybe rename this to 
'scanSlotsByJoinedTids'? A bit more explicit on what this thing represents.


PS1, Line 281: numRows
rhsTableCardinality?


Line 290: 
nit: extra line


PS1, Line 294: conjuncts
conjunct slots


PS1, Line 300: Preconditions.checkState(joinOp_.isInnerJoin() || 
joinOp_.isOuterJoin());
This private function is only called in L253? I think you can remove this check.


PS1, Line 312: lhsNdv
Can this ever be 0?


PS1, Line 313: rhsNumRows
Same here, can this be 0?


PS1, Line 318: result = Math.min(result, joinCard);
That part I think is very important and I am not sure it is explained anywhere. 
Maybe expand the comment in L209?


PS1, Line 329: conjuncts
conjunct slots


PS1, Line 341: slots.lhs().getStats().getNumDistinctValues();
nit: these are used in couple of places. Since you already have the 
EqJoinConjunctScanSlots class, why don't your wrap them in some nice util 
functions there (e.g. getLhsNdvs() or something like that).


PS1, Line 343: slots.lhs().getParent().getTable().getNumRows();
same comment as above.


PS1, Line 373: Preconditions
I don't think these checks are useful. This private constructor is only called 
through the static function that has tight control on what gets passed in this 
ctor.


PS1, Line 382: .
nit: remove


PS1, Line 399: tupleDesc
nit: inline


-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5500: Reduce catalog update topic size
..


Patch Set 10: Code-Review+2

Rebase, keep Alex's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5500: Reduce catalog update topic size
..


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-28 Thread Dimitris Tsirogiannis (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7268

to look at the new patch set (#9).

Change subject: IMPALA-5500: Reduce catalog update topic size
..

IMPALA-5500: Reduce catalog update topic size

Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.

Fix: LZ4 compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.

Results: ~4X reduction in catalog update topic size

Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/service/impala-server.cc
A tests/custom_cluster/test_compact_catalog_updates.py
5 files changed, 136 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/9
-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-28 Thread Dimitris Tsirogiannis (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7268

to look at the new patch set (#8).

Change subject: IMPALA-5500: Reduce catalog update topic size
..

IMPALA-5500: Reduce catalog update topic size

Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.

Fix: LZ4 compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.

Results: ~4X reduction in catalog update topic size

Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/service/impala-server.cc
M fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java
M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A tests/custom_cluster/test_compact_catalog_updates.py
8 files changed, 139 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5500: Reduce catalog update topic size
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7268/7/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 335: pending_topic_updates_.pop_back();
> I think we could incorrectly end up popping two pending topic updates here.
Good catch. Done


http://gerrit.cloudera.org:8080/#/c/7268/7/be/src/catalog/catalog-util.cc
File be/src/catalog/catalog-util.cc:

Line 202:   ReadWriteUtil::PutInt(output_buffer_ptr, 
(uint32_t)catalog_object->size());
> static_cast instead of C-style cast
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5500: Reduce catalog update topic size
..


Patch Set 7:

> The code changes look good. We don't seem to have any test coverage
 > for this flag though - I grepped for it and don't see it referenced
 > in any tests. Can we add a custom cluster test that starts up
 > Impala with this flag and runs a query?
 > 
 > Although I'm actually wondering if this should just be the
 > default/only mode. It seems like compression and decompression are
 > probably less of a bottleneck than the network.

But I added a custom test for in this patch. Do you mean an additional test?

-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#7).

Change subject: IMPALA-5500: Reduce catalog update topic size
..

IMPALA-5500: Reduce catalog update topic size

Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.

Fix: LZ4 compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.

Results: ~4X reduction in catalog update topic size

Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/service/impala-server.cc
A tests/custom_cluster/test_compact_catalog_updates.py
5 files changed, 135 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#6).

Change subject: IMPALA-5500: Reduce catalog update topic size
..

IMPALA-5500: Reduce catalog update topic size

Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.

Fix: Snappy compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.

Results: ~4X reduction in catalog update topic size

Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/service/impala-server.cc
A tests/custom_cluster/test_compact_catalog_updates.py
5 files changed, 135 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-27 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5500: Reduce catalog update topic size
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 358:   RETURN_IF_ERROR(Codec::CreateCompressor(nullptr, false, 
THdfsCompression::SNAPPY,
> Why not LZ4 to be consistent with what we do for exchanges? I don't have a 
LZ4 decompressor requires a pre-allocated output. The problem in this case is 
that we don't know the initial (pre-compressed) size of the catalog object.


-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-27 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#5).

Change subject: IMPALA-5500: Reduce catalog update topic size
..

IMPALA-5500: Reduce catalog update topic size

Problem: IMPALA-4029 introduced the use of the flatbuffers serialization
libary for storing file and block metadata. That change reduced the
effectiveness of the Thrift compaction protocol (when
--compact_catalog_topic is used), thereby causing a 2X increase in
catalog update topic size when the compact protocol is used.

Fix: Snappy compress the catalog topic updates before sent to the
statestore when --compact_catalog_topic is set to true.

Results: ~4X reduction in catalog update topic size

Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
A tests/custom_cluster/test_compact_catalog_updates.py
5 files changed, 138 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size

2017-06-27 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5500: Reduce catalog update topic size
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7268/4/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 328: Status status = thrift_serializer_.Serialize(_object, 
_object);
> Let's try to avoid the extra copy for the !FLAGS_compact_catalog_topic case
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


  1   2   3   4   5   >