Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8825 )

Change subject: IMPALA-5990: End-to-end compression of metadata
......................................................................


Patch Set 2:

(10 comments)

Hi,

I've not done a deep review of the protocol changes, but took a look through to 
understand new things for myself.

I think it would be useful to write the boring test for the C++ 
CompressCatalogObject() and DecompressCatalogObject() functions. We could check 
that things like the empty string go through this just fine. (I have no reason 
to believe they wouldn't.)

We should also decide on our approach to Thrift structs. My biases are to keep 
them compatible (i.e., don't re-use numbers, keep around deprecated fields 
forever), but we should decide explicitly, and perhaps update the style guide.

http://gerrit.cloudera.org:8080/#/c/8825/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8825/2//COMMIT_MSG@17
PS2, Line 17: - A single catalog object cannot be larger than ~2GB on openjdk 
8. It
This is probably just "jdk 8" (rather than Open). A lot of people run Impala 
using the Oracle JDK, so it'd be good to know that Oracle and OpenJDK agree 
here.


http://gerrit.cloudera.org:8080/#/c/8825/2//COMMIT_MSG@23
PS2, Line 23: Testing: Ran existing tests. Manually tested with a 1.95GB 
catalog object
Was this painful to produce? Could we check in a test that creates a huge 
object and tries to force this to happen, or is it prohibitively expensive?


http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-server.cc@384
PS2, Line 384:   VLOG(1) << "Publishing " << (deleted ? "deletion " : "update 
") << ": " << item.key;
Would logging size (perhaps both compressed and uncompressed) here be useful? 
The old log also had a catalog version; I think that's useful to keep.


http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-util.h
File be/src/catalog/catalog-util.h:

http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-util.h@41
PS2, Line 41: /// 'catalog_object'. Stores the size of the uncopressed catalog 
object in the first
spelling: uncompressed


http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-util.cc
File be/src/catalog/catalog-util.cc:

http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-util.cc@139
PS2, Line 139: Status CompressCatalogObject(const uint8_t* buf, uint32_t size, 
string* output) {
Does the asymmetric of using string* for output here, and vector<uint8_t>* 
output in decompression bother us?


http://gerrit.cloudera.org:8080/#/c/8825/2/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/8825/2/common/thrift/CatalogInternalService.thrift@28
PS2, Line 28:   1: required i64 callback_ctx
I think this should be "2", to preserve the ordering. We don't explicitly 
support multiple versions of these things working together, but it's good to 
honor Thrift's goals here.


http://gerrit.cloudera.org:8080/#/c/8825/2/common/thrift/CatalogInternalService.thrift@39
PS2, Line 39:   1: required i64 max_catalog_version
Technically we should leave 2 and 3 around as optional, renaming them to 
_deprecated or some such.


http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@24
PS2, Line 24: import java.util.*;
I think we should discourage wildcard imports. (You can configure your IDE to 
not do this.)

We don't formally adopt Google's Java style guide, but 
https://google.github.io/styleguide/javaguide.html#s3.3.1-wildcard-imports is 
the reference. It's especially bad in line 46, where nobody knows what might be 
in org.apache.impala.common.*.


http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:

http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java@162
PS2, Line 162:       LOG.trace((update.first ? "Deleting " : "Adding ") + 
"item: " + key + " version: "
I don't know if this is performance sensitive, but slf4j has some mechanism to 
say:

if (LOG.isEnabled(TRACE)) { ... }

That would avoiding doing the string interpolation if it's not going to just 
throw it away.

(I didn't look up the syntax for slf4j here.)


http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/util/TByteBuffer.java
File fe/src/main/java/org/apache/impala/util/TByteBuffer.java:

http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/util/TByteBuffer.java@26
PS2, Line 26:  * ByteBuffer-backed implementation of TTransport.
https://github.com/apache/thrift/blame/master/lib/java/src/org/apache/thrift/transport/TByteBuffer.java
https://issues.apache.org/jira/browse/THRIFT-3432

Looks like Thrift has added this. We've got other work to upgrade to a more 
recent version.

The implementations are slightly different; let's make sure ours makes sense?



--
To view, visit http://gerrit.cloudera.org:8080/8825
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a8819cad734b3a416eef6c954e55b73cc6023ae
Gerrit-Change-Number: 8825
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Comment-Date: Thu, 14 Dec 2017 21:52:37 +0000
Gerrit-HasComments: Yes

Reply via email to