Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18198 )

Change subject: WIP KUDU-3197 [tserver] optimal Schema's memory used, using 
std::shared_ptr
......................................................................


Patch Set 5:

(9 comments)

I started going through the non-test files here. My overarching thoughts are 
that this patch could be significantly less invasive if we just focus on the 
problem at hand: the Schemas in TabletMetadata::old_schemas_, rather than 
Schemas as a whole.

http://gerrit.cloudera.org:8080/#/c/18198/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18198/5//COMMIT_MSG@18
PS5, Line 18: wangningito has proposed a simple solution at 
https://gerrit.cloudera.org/c/16508/.
            : And I continue his work, and the detail at: 
https://gerrit.cloudera.org/c/18098/,
            : But the changes has been thought too much, and code reviewers 
advise changing from
            : scoped_refptr<Schema> to std::shared_ptr.
I also left a comment on that ticket mentioning a pretty important detail:

https://issues.apache.org/jira/browse/KUDU-3197?focusedCommentId=17203521&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17203521

The schemas that are affected by this bug are only the ones used and passed 
around by TabletMetadata::schema(). If that is the case, would it make sense to 
just start there, updating the callers of TabletMetadata::schema() to use 
shared_ptr<>, and leaving the rest of our usages of Schema untouched?

That is, the general usage of Schemas today is already safe, and it's only the 
schemas in TabletMetadata::old_schemas_ that are ever affected by this bug. I 
appreciate the attention across the codebase to fixing this, but something 
targeted seems more appropriate and less likely to introduce a performance 
regression (right now, we may be doing more heap allocation than before).


http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/client/client-test.cc@4713
PS5, Line 4713: Schema& schema = *schema_ptr;
In general, the old code just made a copy of the Schema; if you want, you could 
do the same here and elsewhere, for example:

 Schema schema = *tablet_replica->tablet()->metadata()->schema();


http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/common/generic_iterators.cc@518
PS5, Line 518:   // Initialized during Init.
             :   SchemaPtr schema_;
The problem statement for this ticket only refers to the schemas owned by the 
TabletMetadata, if I understand correctly. Since this schema is wholly owned by 
this iterator, we should be fine leaving it as it was. Same below.


http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/consensus/log.cc@771
PS5, Line 771: SchemaPtr schema_clone(new Schema(*schema.get()));
Why not pass the smart pointer directly? Doesn't this mean that we now have two 
heap-allocated schemas?

If we're not going to share references to the same schema object, we may as 
well leave this to be copied on the stack as it was before (i.e. using Schema 
instead of SchemaPtr).

Same elsewhere in this file.


http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/master/catalog_manager.cc@2469
PS5, Line 2469:   SchemaPtr cur_schema_ptr = std::make_shared<Schema>();
              :   Schema& cur_schema = *cur_schema_ptr.get();
Again, it seems these changes are unrelated to the schemas referred to in 
KUDU-3197. We should leave them as is.


http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/master/sys_catalog.cc@342
PS5, Line 342: schema_ptr
Seems like it would be less intrusive to have call-sites use Schema as they did 
before, but have CreateNew() convert it to a std::shared_ptr<Schema>, as done 
here. What do you think?


http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/master/sys_catalog.cc@786
PS5, Line 786: schema_ptr
Seems unnecessary to use this, given the iterator already makes a copy and thus 
has full control over the life cycle of its schema. Here we're doing the same, 
but heap-allocating the schema.


http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/tablet/cfile_set.h
File src/kudu/tablet/cfile_set.h:

http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/tablet/cfile_set.h@87
PS5, Line 87: SchemaPtr
Following the trend of iterators owning their schemas, I think we can leave 
this as a raw pointer, since the Schema it points to isn't the one owned by the 
TabletMetadata


http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/18198/5/src/kudu/tablet/delta_tracker.cc@224
PS5, Line 224:   opts.projection = projection;
Without this patch, it seems like this projection is already reference counted 
(via unique_ptr) and fully owned by the iterator. That seems to be safe as is.



--
To view, visit http://gerrit.cloudera.org:8080/18198
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I586e3a67791de4006c3479b39f3222cbf93fac28
Gerrit-Change-Number: 18198
Gerrit-PatchSet: 5
Gerrit-Owner: Yuqi Du <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 05 Feb 2022 00:39:55 +0000
Gerrit-HasComments: Yes

Reply via email to