[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Reviewed-on: http://gerrit.cloudera.org:8080/7731
Reviewed-by: Dimitris Tsirogiannis 
Tested-by: Impala Public Jenkins
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M tests/statestore/test_statestore.py
21 files changed, 480 insertions(+), 441 deletions(-)

Approvals:
  Dimitris Tsirogiannis: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 9
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 8
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 26 Sep 2017 20:20:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1267/


--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 8
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 26 Sep 2017 16:20:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-26 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 8: Code-Review+2

Rebase and keep Dan's +2


--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 8
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 26 Sep 2017 16:20:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-25 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc@177
PS5, Line 177: Statestore::Subscriber::Subscriber(const SubscriberId& 
subscriber_id,
 : const TUniqueId& registration_id, const TNe
> Hm, it's still not clear to me why a transient topic cannot get values in d
okay. it wasn't clear to me if we can just take the value that was produced for 
an add/update and use it for delete, and expect that value to be the one that 
the subscriber expects for delete. But i guess assuming that is okay.



--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 7
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 25 Sep 2017 18:50:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-25 Thread Dimitris Tsirogiannis (Code Review)
Hello Bharath Vissapragada, Alex Behm, Vuk Ercegovac, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7731

to look at the new patch set (#7).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M tests/statestore/test_statestore.py
21 files changed, 480 insertions(+), 441 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/7
--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 7
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-25 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h@131
PS6, Line 131: Also determines any
 :   /// deletions of catalog objects by looking at the
 :   /// difference of the last set of topic entry keys that were 
sent and the current set
 :   /// of topic entry keys
> Remove?
Done


http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc@308
PS6, Line 308: bool topi
> should that be DCHECK_GT?  The old conditional to continue on was <= ...
Correct. Done


http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h
File be/src/catalog/catalog.h:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h@59
PS6, Line 59: Gets all Catalog objects and the metadata that is applicable for 
the given request.
:   /// Always returns all object names that exist in the Catalog, 
but allows for extended
:   /// metadata for objects that were modified after the specified 
version.
> is that accurate?  from our discussion, my understanding was that only obje
Yeah, wording is not clear. Changed the comment.


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc@177
PS5, Line 177: entry_it->second.SetDeleted(true);
 : entry_it->second.SetVersion(last_version_);
> What I mean is that topics that are transient do not get values in deleted
Hm, it's still not clear to me why a transient topic cannot get values in 
deleted topics, in the sense that I see the topic being transient and how to 
process deletions two orthogonal issues and I don't understand why one should 
influence the other.


http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc@170
PS6, Line 170: topic_update_log_.erase(version);
> also, does that still make sense (now that we no longer do the SetValue(NUL
Done


http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift@32
PS6, Line 32:
> does it include objects with changes in version > or >= the from_version?
Updated the comment (>). Done


http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@310
PS6, Line 310:  Views, Databases, and
 :* Functions)
> datasources, cachepools, sentry stuff ... I think the list is too big, may
Done



--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 5
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 25 Sep 2017 17:55:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-22 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h@131
PS6, Line 131: Also determines any
 :   /// deletions of catalog objects by looking at the
 :   /// difference of the last set of topic entry keys that were 
sent and the current set
 :   /// of topic entry keys
Remove?


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@369
PS5, Line 369:   catalogTbl.setTable(tbl.toThrift());
> It's racy if you do that. A concurrent table modification may have gotten a
Yea, got it, thanks.


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@970
PS5, Line 970: return removedTable;
> I thought about it but there are two reasons not to do it: a) we need the T
I see.


http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@310
PS6, Line 310: Tables, Views, Databases, and
 :* Functions)
datasources, cachepools, sentry stuff ... I think the list is too big, may be 
remove this, rather than keeping it stale?



--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 6
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 22 Sep 2017 16:20:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc@170
PS6, Line 170: total_value_size_bytes_ -= entry_it->second.value().size()
also, does that still make sense (now that we no longer do the SetValue(NULL) 
at line 176 in the old code?



--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 6
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 21 Sep 2017 23:17:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-21 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc@308
PS6, Line 308: DCHECK_GE
should that be DCHECK_GT?  The old conditional to continue on was <= ...


http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h
File be/src/catalog/catalog.h:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h@59
PS6, Line 59: Gets all Catalog objects and the metadata that is applicable for 
the given request.
:   /// Always returns all object names that exist in the Catalog, 
but allows for extended
:   /// metadata for objects that were modified after the specified 
version.
is that accurate?  from our discussion, my understanding was that only objects 
with version > from_version would be included.


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc@177
PS5, Line 177:   }
 : }
> The behavior is topic specific and currently only subscribers of catalog-up
What I mean is that topics that are transient do not get values in deleted 
topics.  So, if a subscriber relies on value to process deletion, it better not 
register as transient, right?  Shouldn't that be made explicit?


http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift@32
PS6, Line 32:  relative to
does it include objects with changes in version > or >= the from_version?



--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 6
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 21 Sep 2017 23:11:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-19 Thread Dimitris Tsirogiannis (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7731

to look at the new patch set (#6).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M tests/statestore/test_statestore.py
21 files changed, 473 insertions(+), 425 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-19 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

PS5, Line 231:   // If this is not a delta update, request an update from 
version 0 from the local
 :   // catalog. There is an optimization that checks if 
pending_topic_updates_ was just
 :   // reloaded from version 0. If so, then skip this step and use 
that data.
> I found it impossible to understand this comment without reading through mo
Done


PS5, Line 234: delta.from_version == 0 && delta.to_version == 0
> from the comment above, presumably this condition means "this is not a delt
Yeah, you're right. This condition is really confusing. from_version = 0 
implies delta = false and can be used here instead. The part that is not clear 
to me is the to_version check but I couldn't find any case where it would make 
sense to check the to_version in order to decide whether to send the full 
catalog update or not. I will remove it and see if it breaks something.


PS5, Line 310: if (catalog_object.catalog_version <= 
last_sent_catalog_version_) continue;
> given that last_sent_catalog_version_ was passed to GetAllCatalogObjects(),
Right, it's not needed. Replaced it with a DCHECK.


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

PS5, Line 150: added
> nit: May be 'added/updated/removed' to be consistent with other places?
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS5, Line 1392: 
> can we delete this function now?
Yes. Done


PS5, Line 1366: ;
> Could you add 'len' too, given this is trace logging anyway. (might be help
Done


PS5, Line 1399: // Nothing to do here.
> Remove? Kinda seems obvious
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

PS5, Line 177: entry_it->second.SetDeleted(true);
 : entry_it->second.SetVersion(last_version_);
> now that some subscribes require deleted entries to have a value, how does 
The behavior is topic specific and currently only subscribers of catalog-update 
topic rely on the value for deleted entries. Not sure how to make it less 
fragile in the sense that "value_" is opaque to the statestore and is 
interpreted only by subscribers.


PS5, Line 490:  
> would be easier to read if you line break it there to keep the last arg all
Done


PS5, Line 538: if (topic_entry.value() != Statestore::TopicEntry::NULL_VALUE)
> why do we need that guard? why can't we set topic_item.value even in that c
Yeap, removed it.


PS5, Line 546: int64_t topic_size = topic.total_key_size_bytes() - 
deleted_key_size_bytes
 : + topic.total_value_size_bytes();
> that math doesn't look correct anymore (deleted entries can have non-zero v
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

PS5, Line 180: may
> may or will?
"will". Changed it.


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS5, Line 25: GetAllCatalogObjects
> see below. I think we should rename this.
Done


PS5, Line 25: Contains all known Catalog objects
: // (databases, tables/views, and functions) from the 
CatalogService's cache.
> I think that should be updated to explain that this is relative to a partic
Done


PS5, Line 38: empty list if no objects detected in the Catalog
> Thanks. As we talked about in person, I think we should rename GetAllCatalo
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

PS5, Line 87: If true, this item was deleted
> How about:
Done


PS5, Line 87: If true, this item was deleted
> Actually, the parenthesis I suggested doesn't clarify.  Maybe that part sho
Done


PS5, Line 97: List of changes to topic entries, including deletions.
> That's only true when is_delta==true.  So, how about we say:
Done


PS5, Line 100: applied to in-memory state,
> that's kinda dependent on the subscriber's implementation, right? would it 
Done


http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

PS5, Line 53: identiy
> nit: typo
Done


PS5, Line 123: if (first.getType() != second.getType()) return false;
> This seems to already be handled by the next line (by generating a differen
Good point. Removed



[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-18 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

PS5, Line 87: If true, this item was deleted
> How about:
Actually, the parenthesis I suggested doesn't clarify.  Maybe that part should 
say: (since the latest version of every still-present topic will be included)

or something like that.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-15 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

PS5, Line 231:   // If this is not a delta update, request an update from 
version 0 from the local
 :   // catalog. There is an optimization that checks if 
pending_topic_updates_ was just
 :   // reloaded from version 0. If so, then skip this step and use 
that data.
I found it impossible to understand this comment without reading through most 
of the code in this file, which kinda defeats the point of having a comment :) 
i think it would be clearer if it said something like:

If not generating a delta update and 'pending_topic_updates_' doesn't already 
contain the full catalog (beginning with version 0), then force 
GatherCatalogUpdatesThread() to reload the full catalog.


PS5, Line 234: delta.from_version == 0 && delta.to_version == 0
from the comment above, presumably this condition means "this is not a delta 
update"?  Why is that the condition and not 'delta.is_delta'?


PS5, Line 310: if (catalog_object.catalog_version <= 
last_sent_catalog_version_) continue;
given that last_sent_catalog_version_ was passed to GetAllCatalogObjects(), why 
is this needed?


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS5, Line 1392: 
can we delete this function now?


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

PS5, Line 177: entry_it->second.SetDeleted(true);
 : entry_it->second.SetVersion(last_version_);
now that some subscribes require deleted entries to have a value, how does this 
all work? are we implicitly requiring that only non-transient subscribers 
depend on values for deletions?  If that's the case, that seems fragile (and at 
the least should be documented).


PS5, Line 490:  
would be easier to read if you line break it there to keep the last arg all on 
one line.


PS5, Line 538: if (topic_entry.value() != Statestore::TopicEntry::NULL_VALUE)
why do we need that guard? why can't we set topic_item.value even in that case?
assuming we don't need it, it seems like we can remove the NULL_VALUE constant 
altogether.


PS5, Line 546: int64_t topic_size = topic.total_key_size_bytes() - 
deleted_key_size_bytes
 : + topic.total_value_size_bytes();
that math doesn't look correct anymore (deleted entries can have non-zero value 
size).  it might be more straight forward to just add up the topic size as we 
go rather than try to subtract out the deleted portion.


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.h
File be/src/statestore/statestore.h:

PS5, Line 180: may
may or will?


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS5, Line 25: GetAllCatalogObjects
see below. I think we should rename this.


PS5, Line 25: Contains all known Catalog objects
: // (databases, tables/views, and functions) from the 
CatalogService's cache.
I think that should be updated to explain that this is relative to a particular 
catalog version and that it now expresses it as an update and delete.


PS5, Line 38: empty list if no objects detected in the Catalog
> Yes, there is some context that is missing here, hence it's not easy to und
Thanks. As we talked about in person, I think we should rename 
GetAllCatalogObjects to GetCatalogDelta() or something similar to make it clear 
that this gets the delta. There's no hint at that until you read the header 
file comment.


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 84:   // is not included in the topic key (e.g. catalog version of deleted 
catalog objects).
> Let me try to explain. Statestore only uses that flag to avoid sending dele
Please see my comments elsewhere in this file for suggestions on clarifying the 
interface.

It still seems confusing and weird that 'value' can matter for delete when 
delete, but I guess the comments I'm suggested elsewhere help clarify at least 
what 'delete' actually means.


PS5, Line 87: If true, this item was deleted
How about:

If true, this item was deleted.  When false, this TTopicItem need not be 
included in non-delta TTopicDelta's (since the complete set of TTopicItems will 
be included in that case).

or something like that.


PS5, Line 97: List of changes to topic entries, including deletions.
That's only true when is_delta==true.  So, how about we say:

When is_delta=true, a list of changes to topic entries, including deletions, 
within [from_version, to_version].
When 

[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(2 comments)

Addressing some questions by Dan.

http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS5, Line 38: empty list if no objects detected in the Catalog
> s/through/trouble/g
Yes, there is some context that is missing here, hence it's not easy to 
understand what these comments mean. Let me try to explain first here and I 
will expand the comments to clarify some things. 

1. What I think that comment should say is "no deleted objects were detected 
since the last GetAllCatalogObjects request". 

2. Similarly to the previous comment. Deleted and updated/new objects are with 
respect to the previous GetAllCatalogObjects call. The base from which these 
deltas are computed is the "fromVersion" param of the TGetAllCatalogObjects 
struct. 

I'll update the comments.


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 84:   // is not included in the topic key (e.g. catalog version of deleted 
catalog objects).
> Sorry, opaque was the wrong word. I meant to say agnostic.
Let me try to explain. Statestore only uses that flag to avoid sending deleted 
topic entries when the topic update is not a delta one (this is an 
optimization). The statestore subscribers (e.g. impalad) are the ones that 
really need what's in the 'value' field in order to determine how to process a 
deleted topic entry. For the case of catalog update, the value contains among 
other things the catalog version when the deletion occurred and the impalad 
uses that information to determine if the deletion should be applied at the 
local catalog or not. 

We can definitely add the deleted flag inside whatever object we're putting in 
'value' (e.g. TCatalogObject) and I am happy to try this out if you think it's 
better.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS5, Line 38: empty list if no objects detected in the Catalog
> I wasn't very clear. I was making two separate points:
s/through/trouble/g


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS5, Line 38: empty list if no objects detected in the Catalog
> Sorry not following. This says detected not deleted. Maybe I misunderstood 
I wasn't very clear. I was making two separate points:
1) The parenthesis statement makes it sound like the (only) condition where 
this list is empty is when no objects are detected in the catalog, but it seems 
that this list can also be empty if nothing is deleted.

2) Separately (and not related to the highlighted code so sorry for the 
confusion), I'm having through understanding what 'update_objects' and 
'deleted_objects' really mean since those are ways to express things as a delta 
from some list, but I don't know what that list is. In the old structure, 
'objects' was just an absolute thing (not relative to a previous list).


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 84:   // is not included in the topic key (e.g. catalog version of deleted 
catalog objects).
> Hm, I don't see why the value should be opaque to the subscriber. On the co
Sorry, opaque was the wrong word. I meant to say agnostic.

I agree value should be opaque to statestore. What I don't understand is how 
'delete' can have meaning at the statestore level if a subscriber needs to look 
inside 'value' to know how to apply it.  So, is 'deleted' opaque to the 
statestore, and if so, why not just put that in the 'value'?  If statestore 
does use 'deleted' for something, how can that be given that 'value' contains 
information needed in order to apply the delete?


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS5, Line 38: empty list if no objects detected in the Catalog
> that seems kinda misleading. won't the list also be empty if nothing was de
Sorry not following. This says detected not deleted. Maybe I misunderstood you 
comment.


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 84:   // is not included in the topic key (e.g. catalog version of deleted 
catalog objects).
> hmm, without reading through how the catalog topic works, this is still pre
Hm, I don't see why the value should be opaque to the subscriber. On the 
contrary, it should be opaque to the statestore but each topic subscriber could 
have different logic implemented around deletions. If we want to be able to 
handle deletions with only the information from the key then we need to expand 
the key and for the case of catalog updates, it should also include the catalog 
version. Unfortunately, that won't work well with the current implementation of 
the statestore.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS5, Line 38: empty list if no objects detected in the Catalog
that seems kinda misleading. won't the list also be empty if nothing was 
deleted?  And what do we mean by deleted (and for that matter updated)?  
Deleted (and updated) since when?  Doesn't this need a "base version" to make 
sense?


http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 84:   // is not included in the topic key (e.g. catalog version of deleted 
catalog objects).
hmm, without reading through how the catalog topic works, this is still pretty 
confusing. it seems weird that a value is needed to perform a deletion of a key 
(and it means that semantics of delete are now not opaque to the particular 
subscriber, right?)
Maybe easiest if we talk in person.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

PS3, Line 86: deleted
> What I meant is that we should explain in the comment here what 'value' mea
Fair enough. Done


PS3, Line 105: The from_version will always be 0 for non-delta updates
> Yeah. what I meant was we should document it in this comment, right?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7731

to look at the new patch set (#5).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/statestore/test_statestore.py
16 files changed, 427 insertions(+), 338 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

PS3, Line 86: deleted
> What I tried to explain in the commit msg (let me know if it still unclear)
What I meant is that we should explain in the comment here what 'value' means 
when 'deleted'=true. it doesn't seem intuitive what 'value' would be in that 
case (in most systems, "delete" operations deal only with the 'key' not the 
'value').


PS3, Line 105: The from_version will always be 0 for non-delta updates
> The to_version is 0 only when there are no changes in the topic. It's value
Yeah. what I meant was we should document it in this comment, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7731

to look at the new patch set (#4).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/statestore/test_statestore.py
16 files changed, 422 insertions(+), 336 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS3, Line 34: updated
> does "updated" also include new objects? or is it really just objects that 
It includes both added and changed objects. I updated the comment but let me 
know if you have a better name option.


http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

PS3, Line 86: deleted
> in this case, what is in 'value'? Is it defined to be set or is only the ke
What I tried to explain in the commit msg (let me know if it still unclear) was 
that we should not use the contents of "value" to indicate if a topic item is 
deleted or not. It doesn't mean that a topic item with deleted set to true 
should have a specific value, i.e. each topic can have a separate behavior.


PS3, Line 103: all changes
> your commit message seems to contradict that.  Oh, maybe i'm confusing cata
Hm, sorry for the confusion. What the commit is describing is related to 
catalog versions and the follow up changes for fixing IMPALA-5058.


PS3, Line 105: The from_version will always be 0 for non-delta updates
> what about to_version in this case?
The to_version is 0 only when there are no changes in the topic. It's value 
doesn't depend on whether the update is delta or not.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-14 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

PS3, Line 34: updated
does "updated" also include new objects? or is it really just objects that have 
been updated, i.e. changed?


http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

PS3, Line 86: deleted
in this case, what is in 'value'? Is it defined to be set or is only the key 
relevant?  (Your commit message talks about it but it's not clear from the 
interface).


PS3, Line 103: all changes
your commit message seems to contradict that.  Oh, maybe i'm confusing catalog 
versions and statestore topic versions?


PS3, Line 105: The from_version will always be 0 for non-delta updates
what about to_version in this case?


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/statestore/test_statestore.py
16 files changed, 421 insertions(+), 336 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 2:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

PS2, Line 233:  0, if they have then skip this step and use that data
> nit: perhaps easier to read? "...version 0. If so, then skip ..."
Done


Line 305:   BuildTopicUpdatesHelper(catalog_objects.updated_objects, false);
> I like the previous version more where this is inlined into L292. A quick c
Done


Line 317: if (catalog_object.catalog_version <= last_sent_catalog_version_) 
continue;
> Move this above L312?
Done


PS2, Line 336: if (topic_deletions) {
 :   VLOG(1) << "Publishing deletion: " << entry_key << "@"
 :   << catalog_object.catalog_version;
 : } else {
 :   VLOG(1) << "Publishing update: " << entry_key << "@"
 :   << catalog_object.catalog_version;
 : }
> perhaps remove some redundancy this way:
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1340: LOG(INFO) << "Received large catalog update(>100mb): "
> Received a large catalog item?
Done


PS2, Line 1366: if (item.deleted) {
  : VLOG(3) << "Deleted item: " << item.key << " version: "
  : << catalog_object.catalog_version;
  :   } else {
  : VLOG(3) << "Added item: " << item.key << " version: "
  : << catalog_object.catalog_version;
  :   }
> same suggestion as in catalog_server to reduce redundancy.
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 86:   3: required bool deleted = false;
> Isn't it better to not have a default value to make sure this is always set
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

Line 56:  *   determine which catalog objects were deleted since the last 
catalog update topic.
> catalog topic update
Done


Line 57:  *   Once the catalog update topic is constructed, the old deleted 
catalog objects are
> catalog topic update
Done


Line 74:   // Retrieve all the removed catalog objects with version > 
'fromVersion'.
> /** style comment
Done


http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

PS2, Line 152: .
> the word "delta" implies that we're dealing with inserts, deletes, and poss
Done


Line 347:   // Identify the catalog objects that were updated (added/dropped) 
in the catalog with
> /** style comment
Done


Line 351: Set addedCatalogObjectKeys = Sets.newHashSet();
> updatedCatalogObjects?
Done


Line 363: TCatalogObject catalogTbl = new 
TCatalogObject(TCatalogObjectType.TABLE,
> Why not just iterate over db.getTables()?
Done


Line 441: 
addedCatalogObjectKeys.add(CatalogDeltaLog.toCatalogObjectKey(privilege));
> Instead of having a duplicate add() in all places, how about creating the a
Done


Line 448: for (TCatalogObject removedObject: 
deltaLog_.retrieveObjects(fromVersion)) {
> Nice! Much clearer.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-11 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

PS2, Line 233:  0, if they have then skip this step and use that data
nit: perhaps easier to read? "...version 0. If so, then skip ..."


PS2, Line 336: if (topic_deletions) {
 :   VLOG(1) << "Publishing deletion: " << entry_key << "@"
 :   << catalog_object.catalog_version;
 : } else {
 :   VLOG(1) << "Publishing update: " << entry_key << "@"
 :   << catalog_object.catalog_version;
 : }
perhaps remove some redundancy this way:
VLOG(1) << "Publishing " 
<< (topic_deletions ? "deletion" : "update") 
<< ": " << entry_key << "@" 
<< catalog_object.catalog_version;


http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS2, Line 1366: if (item.deleted) {
  : VLOG(3) << "Deleted item: " << item.key << " version: "
  : << catalog_object.catalog_version;
  :   } else {
  : VLOG(3) << "Added item: " << item.key << " version: "
  : << catalog_object.catalog_version;
  :   }
same suggestion as in catalog_server to reduce redundancy.


http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

PS2, Line 152: .
the word "delta" implies that we're dealing with inserts, deletes, and possibly 
updates. any reason why this member is named more generally? why not call it 
deleteLog_?


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 2:

(12 comments)

Change is looking pretty good to me, only minor comments left.

http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

Line 305:   BuildTopicUpdatesHelper(catalog_objects.updated_objects, false);
I like the previous version more where this is inlined into L292. A quick 
comment about the requirements/structure of a topic update would be nice, e.g. 
stating that the set of deletions and updates are disjoint and therefore the 
order in which updates/deletions are added is irrelevant.


Line 317: if (catalog_object.catalog_version <= last_sent_catalog_version_) 
continue;
Move this above L312?


http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1340: LOG(INFO) << "Received large catalog update(>100mb): "
Received a large catalog item?


http://gerrit.cloudera.org:8080/#/c/7731/2/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 86:   3: required bool deleted = false;
Isn't it better to not have a default value to make sure this is always set?


http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java:

Line 56:  *   determine which catalog objects were deleted since the last 
catalog update topic.
catalog topic update


Line 57:  *   Once the catalog update topic is constructed, the old deleted 
catalog objects are
catalog topic update


Line 74:   // Retrieve all the removed catalog objects with version > 
'fromVersion'.
/** style comment


http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 347:   // Identify the catalog objects that were updated (added/dropped) 
in the catalog with
/** style comment


Line 351: Set addedCatalogObjectKeys = Sets.newHashSet();
updatedCatalogObjects?


Line 363: TCatalogObject catalogTbl = new 
TCatalogObject(TCatalogObjectType.TABLE,
Why not just iterate over db.getTables()?


Line 441: 
addedCatalogObjectKeys.add(CatalogDeltaLog.toCatalogObjectKey(privilege));
Instead of having a duplicate add() in all places, how about creating the 
addedCatalogObjets set right before L448 by looping over the 
addedCatalogObjects? Seems only marginally less efficient, but less prone to 
forgetting an add() somewhere.


Line 448: for (TCatalogObject removedObject: 
deltaLog_.retrieveObjects(fromVersion)) {
Nice! Much clearer.


-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/statestore/test_statestore.py
16 files changed, 438 insertions(+), 331 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis