Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8825 )

Change subject: IMPALA-5990: End-to-end compression of metadata
......................................................................


Patch Set 4:

(34 comments)

General approach and flow looks good to me.

I'm still thinking about details and testing.

http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/CMakeLists.txt
File be/src/catalog/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/CMakeLists.txt@29
PS4, Line 29: ADD_BE_TEST(catalog-util-test)
I think you forgot to add the .cc file


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/catalog-server.h@78
PS4, Line 78:   void AddTopicUpdate(std::string key, const uint8_t* buf, 
uint32_t size, bool deleted);
* AddPendingTopicItem()
(and corresponding change to comment, see a comment elsewhere explaining our 
use of "topic" and "item")
* Pass the key as reference? Seems clearer what is happening. Also comment that 
the function takes ownership of the key memory
* 'buf' needs a better name


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/catalog-server.cc@245
PS4, Line 245:       update.topic_entries.emplace_back();
Why not this instead of the loop:

update.topic_entries = std::move(pending_topic_updates_);


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/catalog-util.h
File be/src/catalog/catalog-util.h:

http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/catalog-util.h@44
PS4, Line 44: Status CompressCatalogObject(const uint8_t* buf, uint32_t size,
Use src/dst as arg names. Same for decompress below.


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/catalog-util.cc
File be/src/catalog/catalog-util.cc:

http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/catalog-util.cc@145
PS4, Line 145:   output->resize(static_cast<size_t>(output_buffer_len));
why static_cast?


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/catalog-util.cc@162
PS4, Line 162:   output->resize(static_cast<size_t>(decompressed_len));
why static_cast?


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/catalog.h
File be/src/catalog/catalog.h:

http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/catalog/catalog.h@64
PS4, Line 64:   Status GetCatalogDelta(void* server_ptr, int64_t from_version,
"CatalogServer* caller" instead of "void* server_ptr", seems clearer, you can 
forward declare class CatalogServer to avoid the include


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/fe-support.cc@429
PS4, Line 429:     jclass caller_class, jlong callback_ctx, jbyteArray key,
Currently, we get the key bytes via String.getBytes() which I believe must 
create a new byte[].

If we pass the key as a jstring we can avoid a copy in most cases.


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/fe-support.cc@432
PS4, Line 432:   signed char* key_buf = env->GetByteArrayElements(key, nullptr);
I think we should consider using the "critical" version of this function. See 
JniScopedArrayCritical from:
https://gerrit.cloudera.org/#/c/8150/6/be/src/util/jni-util.h

That patch never made it in, you could just copy the code.


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/fe-support.cc@446
PS4, Line 446: 
Java_org_apache_impala_service_FeSupport_NativeGetCatalogUpdate(JNIEnv* env,
NativeGetNextCatalogTopicItem() or similar

the name should convey that this is a "get next" type of call and what exactly 
is being iterated over


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/fe-support.cc@454
PS4, Line 454:     jclass caller_class, jbyteArray hdfs_location) {
why not "jstring hdfs_location"


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/fe-support.cc@465
PS4, Line 465:     jclass caller_class, jbyteArray hdfs_lib_file) {
why not "jstring hdfs_lib_file"


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/fe-support.cc@551
PS4, Line 551:       (char*)"NativeAddCatalogUpdate", (char*)"(J[B[BZ)V",
make indentation consistent


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/frontend.h
File be/src/service/frontend.h:

http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/frontend.h@41
PS4, Line 41:   /// Request to update the Impalad catalog cache. The req 
argument contains a vector of
Update comment on the 'req' arg


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/frontend.h@214
PS4, Line 214: // A helper class used to pass catalog object updates to the 
frontend.
/// comment style here and below


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/frontend.h@217
PS4, Line 217:   // Return the next catalog object update. The return type is 
Pair<Boolean, ByteBuffer>.
Return the next catalog object from a catalog update.


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/frontend.h@220
PS4, Line 220:   // return value is invalided on the next call.
Comment on error handling


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/frontend.cc@274
PS4, Line 274:   jclass pair_class = 
env->FindClass("Lorg/apache/impala/common/Pair;");
Better to get the class and method ids once during a static initialization 
phase. You can keep them as static members like is done in other JNI-based 
helpers, e.g., hbase-table.h


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/impala-server.cc@1319
PS4, Line 1319:   class TopicItemIterator : public 
JniCatalogCacheUpdateIterator {
Let's move these and its subclasses into catalog-util.h/cc or frontend.h/cc

I could not find any guidance on this in the Google style guide. Personally, I 
find such inline class declarations obscure the rest of the function code and 
we generally don't have this pattern in the codebase.

We should also explain the motivation behind using these iterators as opposed 
to a simpler approach where we populate two ArrayLists in one JNI call. The 
benefit of the iterators is that they reduce the intermediate memory 
consumption, right?


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/impala-server.cc@1337
PS4, Line 1337:           LOG(ERROR) << "Error decompressing catalog object: " 
<< s.GetDetail();
I think we could deal with errors in two ways:
1. We could propagate them as exceptions in env which are picked up on the Java 
side. That makes it clear that the update process failed and that the catalog 
is in an inconsistent state.
2. Instead of returning on the first error, we could attempt to process as many 
objects as we can and then LOG(ERROR) a report of failed updates.

I have a slight preference for (2), but would also accept (1).

In the current code, we don't have a good way to determine whether a null 
return value means we have successfully process all objects or not. It's also 
not obvious from the log message that the entire update process is aborted.


http://gerrit.cloudera.org:8080/#/c/8825/4/be/src/service/impala-server.cc@1483
PS4, Line 1483:           LOG(ERROR) << "Error serializing catalog object: " << 
s.GetDetail();
Same comments about error handling as above


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@395
PS4, Line 395:     long callbackContext_;
Maybe call this nativeCatalogServerPtr or similar to make it clear that this is 
a raw pointer to a native object.


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@397
PS4, Line 397:     long fromVersion_;
We only use the "_" suffix for private and protected members. I think it make 
sense to omit them here.


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@409
PS4, Line 409:     void addTopic(TCatalogObject obj, boolean delete) throws 
TException {
We typically use the term "topic" in the statestore topic sense, i.e. there is 
a metadata topic or a cluster membership topic. A clearer function name would 
be addTopicItem() or addItem() or addCatalogObject() or something like that.


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@418
PS4, Line 418:       byte[] data = new TSerializer(new 
TBinaryProtocol.Factory()).serialize(obj);
Do we need to create a new protocol and serializer each time this function is 
called? I think we can probably have the TSerializer as a member of this class.


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/service/FeSupport.java@25
PS4, Line 25: import io.netty.buffer.ByteBuf;
remove?


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/service/FeSupport.java@26
PS4, Line 26: import javolution.io.Struct;
remove?


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/service/FeSupport.java@86
PS4, Line 86:   // Add a topic update to the backend pending topic updates. 
'serializationBuffer' is a
Adds a topic item to the backend's pending metadata-topic update.


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/service/FeSupport.java@91
PS4, Line 91:   // Get a topic update from the backend.
// Get a catalog topic item from the backend,


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/service/FeSupport.java@93
PS4, Line 93:   public native static Pair<Boolean, ByteBuffer> 
NativeGetCatalogUpdate(
NativeGetCatalogTopicItem()


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/service/FeSupport.java@94
PS4, Line 94:       long callbackContext);
callbackContext sounds mysterious, how about nativeIteratorPtr or another name 
that makes it clear that this is a native pointer to an object (or more 
specifically an iterator)


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/service/FeSupport.java@96
PS4, Line 96:   public native static void NativeLibCacheSetNeedsRefresh(byte[] 
hdfs_location);
* camel case for args here and below
* why is the location not a String (jstring in JNI)? Properly typing the arg 
makes the function easier to read/understand


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/service/FeSupport.java@97
PS4, Line 97:
remove blank line


http://gerrit.cloudera.org:8080/#/c/8825/4/fe/src/main/java/org/apache/impala/service/FeSupport.java@98
PS4, Line 98:   public native static void NativeLibCacheRemoveEntry(byte[] 
hdfs_lib_file);
* camel case for arg
* why is the location not a String (jstring in JNI)? Properly typing the arg 
makes the function easier to read/understand



--
To view, visit http://gerrit.cloudera.org:8080/8825
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a8819cad734b3a416eef6c954e55b73cc6023ae
Gerrit-Change-Number: 8825
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Comment-Date: Wed, 10 Jan 2018 05:33:25 +0000
Gerrit-HasComments: Yes

Reply via email to