Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/12119 )
Change subject: [blog] a blogpost about location awareness in Kudu ...................................................................... Patch Set 8: (28 comments) Thank you for the thorough review. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md File _posts/2019-03-25-location-awareness.md: http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@1 PS8, Line 1: --- > Can you push this to the gh_pages branch in your github fork so a rendered Done: https://github.com/alexeyserbin/kudu/blob/gh-pages/_posts/2019-03-25-location-awareness.md http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@15 PS8, Line 15: <!--TODO(aserbin) rename the file to reflect the date when published --> > Should this be removed? I'll remove it once I'm getting +1 or +2 on the latest patchset. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@19 PS8, Line 19: first cut > initial implementation? Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@19 PS8, Line 19: starting 1.9.0 > ...starting *with the* 1.9.0... Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@20 PS8, Line 20: is built for the following use case: > I am not sure this is a "use case" per se, but instead what the term "locat I rephrased it using term 'requirement'. Does it sound better now? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@26 PS8, Line 26: A rack failure might happen because of a failure of a hardware component shared : among servers in the rack: network switch, power supply, etc. > A rack failure can occur when a hardware component shared Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@31 PS8, Line 31: network latency between datacenters is low. > This is a good opportunity to explicitly mention that this is why we call t Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@37 PS8, Line 37: are : supposed to > should Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@38 PS8, Line 38: physical or cloud-defined hierarchy of the : deployed cluster > I am not sure I understand what this means in relation to location awarenes I recall trying to introduce 'fault domain' term in this blog , I got feedback that we better not introduce it since 1) readers might be not aware of that concept, 2) we don't use that term in how design doc. That's why you don't see 'failure domain' or 'fault domain' here at all. If you feel strong about the necessity to speak in 'proper' terms here, let me know and I'll introduce those terms with corresponding references. As of now, I just added the examples you suggested. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@41 PS8, Line 41: However, we want to keep the hierarchy : there to make it possible to exploit it later > However, we plan to leverage the hierarchical structure in future releases. Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@43 PS8, Line 43: compatibility with HDFS > Perhaps this should be moved up and describe a bit more in detail as a desi Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@52 PS8, Line 52: etc > What is the "etc"? What else does it use it for? The text below contains information about what those locations are used for, so 'etc' here is for: * client finding the closest tablet server to its own location * other usages of the location hierarchy which we haven't leveraged yet I removed the 'etc'. Does it look better to you now? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@55 PS8, Line 55: location string for the specified IP address/hostname. > The script below specifically shows ip-address. How do I use hostname? The script below is an example. You are free to use hostname to perform any flavor of mapping 'hostname' --> 'location string' you want. I added corresponding note into the example's comment. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@59 PS8, Line 59: tablet server restarts > Is this dependent on `--follower_unavailable_considered_failed_sec`? Or wil It's not dependent of '--follower_unavailable_considered_failed_sec'. That's about registering tablet servers, not tracking tablet replicas. Any restart of a tablet server will cause assignment of location to the tablet server. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@59 PS8, Line 59: Kudu tablet servers are location : agnostic, at least for now, so the assigned location is not reported back : to the tablet server. > Maybe this paragraph would flow better if you moved this part to the bottom Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@64 PS8, Line 64: masters provide connected clients > How do they do this? I added a reference to the design doc. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@62 PS8, Line 62: to try to place replicas evenly across : locations and to keep tablets available in case all tablet servers in a single : location fail. > This last part is somewhat duplicated from the Introduction section above. I want to keep it here because it's a bit different context, and I think this is some useful sort of 'duplication' that allows to grasp the presented information better. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@75 PS8, Line 75: Essentially, that's about having > This results in... Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@81 PS8, Line 81: The error handling and the input validation are minimalistic. Also, the : # network topology choice, supportability and capacity planning aspects of : # this script might be sub-optimal if applied as-is for real-world use cases. > Is there anywhere else anyone can get a "good" production worthy example? I I don't think we should aim for a 'good' production worthy example in this blog because too many error handling and other details will make it harder to read and understand. This is an example, and the deficiencies of this script are outlined. Readers interested in extending this example and making it closer to 'good' (what's 'good', BTW?) production-grade script can do that to their taste. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@104 PS8, Line 104: echo "ERROR: '$ip_address' is not a valid IPv4 address" > Should errors map to "/other"? How does Kudu handle this script exiting wit I think errors should map to errors/failures. That way it's simpler to track and fix them. If the location mapping script returns an error, the tablet server's registration fails and the tablet server is not added into the master's registry, making it inaccessible to clients. I added information on what happens if location assignment script fails into the text of the blog and into this example script. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@142 PS8, Line 142: The reasoning is simple: with > I try to stay away from saying something is "simple". People have wide leve Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@151 PS8, Line 151: The location-aware placement policy for tablet replicas in Kudu > This seems more appropriate for earlier sections. When reading the blog pos Indeed. I re-shuffled the paragraphs and explicitly mentioned that the policy is the only available one and it doesn't have any parameters to configure. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@162 PS8, Line 162: Automatic re-replication and placement policy > Per my earlier comment, this is also more about "How it works". Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@177 PS8, Line 177: Reinstating location-aware policy in Kudu cluster > I think this is "How to use it" and makes sense here. Ack http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@208 PS8, Line 208: Examples > Per my earlier comment, this is also more about "How it works". Done http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@337 PS8, Line 337: roadmap > What roadmap? Does Apache Kudu have a roadmap? I think the same roadmap that people refer when asking questions like 'are you going to implement feature A any soon?'. There is a request to implement some sort of 'tagging' as JIRA item at issues.apache.org. I think I'll just add this here. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@342 PS8, Line 342: see [2] > Any reason not to link inline instead of using reference style? Nothing particular, but I was hoping to get it referenced as a link in [2] as well. Probably, I confused .md format with something else. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@346 PS8, Line 346: [[1]] [Location awareness in Kudu, design document](https://s.apache.org/location-awareness-design) > Can we check this design doc into https://github.com/apache/kudu/tree/maste Yes, but it will take some time to convert the design doc into .md format and pass it through review cycles. It should be done anyway, so I'll try to do so shortly. -- To view, visit http://gerrit.cloudera.org:8080/12119 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I10b30a80d5661fb889a11285b8118cdea1a87cd2 Gerrit-Change-Number: 12119 Gerrit-PatchSet: 8 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Greg Solovyev <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Tue, 26 Mar 2019 18:06:02 +0000 Gerrit-HasComments: Yes
