Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12119 )
Change subject: [blog] a blogpost about location awareness in Kudu ...................................................................... Patch Set 8: (28 comments) http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md File _posts/2019-03-25-location-awareness.md: http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@1 PS8, Line 1: --- Can you push this to the gh_pages branch in your github fork so a rendered version can be proofed? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@15 PS8, Line 15: <!--TODO(aserbin) rename the file to reflect the date when published --> Should this be removed? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@19 PS8, Line 19: first cut initial implementation? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@19 PS8, Line 19: starting 1.9.0 ...starting *with the* 1.9.0... http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@20 PS8, Line 20: is built for the following use case: I am not sure this is a "use case" per se, but instead what the term "location awareness" currently means in Kudu. Maybe say something like: "In the initial implementation of location awareness in Kudu, when a Kudu cluster consists of multiple servers spread across several racks, Kudu will place the replicas of a tablet in such a way that the tablet stays available even if all the servers in a single rack become unavailable." http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@26 PS8, Line 26: A rack failure might happen because of a failure of a hardware component shared : among servers in the rack: network switch, power supply, etc. A rack failure can occur when a hardware component shared among servers in the rack, such as a network switch or power supply, fails. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@31 PS8, Line 31: network latency between datacenters is low. This is a good opportunity to explicitly mention that this is why we call the feature location awareness and not rack awareness. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@37 PS8, Line 37: are : supposed to should http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@38 PS8, Line 38: physical or cloud-defined hierarchy of the : deployed cluster I am not sure I understand what this means in relation to location awareness utility. I suspect it's saying that the components should map to the hierarchical levels of "failure domains". You could then give a private data center example: `/data-center-0/rack-09` And a cloud example: `/region-0/availability-zone-01` http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@41 PS8, Line 41: However, we want to keep the hierarchy : there to make it possible to exploit it later However, we plan to leverage the hierarchical structure in future releases. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@43 PS8, Line 43: compatibility with HDFS Perhaps this should be moved up and describe a bit more in detail as a design choice? It's useful to know that you can use the same locations as your HDFS nodes, because it's common to deploy Kudu along size HDFS. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@52 PS8, Line 52: etc What is the "etc"? What else does it use it for? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@55 PS8, Line 55: location string for the specified IP address/hostname. The script below specifically shows ip-address. How do I use hostname? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@59 PS8, Line 59: tablet server restarts Is this dependent on `--follower_unavailable_considered_failed_sec`? Or will a "quick" restart cause the location to be reset? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@59 PS8, Line 59: Kudu tablet servers are location : agnostic, at least for now, so the assigned location is not reported back : to the tablet server. Maybe this paragraph would flow better if you moved this part to the bottom. That would make it so you describe how the master uses the location configurations, and then tack on at the end that the tablet servers do not need/use it. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@64 PS8, Line 64: masters provide connected clients How do they do this? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@62 PS8, Line 62: to try to place replicas evenly across : locations and to keep tablets available in case all tablet servers in a single : location fail. This last part is somewhat duplicated from the Introduction section above. Perhaps it's not needed here. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@75 PS8, Line 75: Essentially, that's about having This results in... http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@81 PS8, Line 81: The error handling and the input validation are minimalistic. Also, the : # network topology choice, supportability and capacity planning aspects of : # this script might be sub-optimal if applied as-is for real-world use cases. Is there anywhere else anyone can get a "good" production worthy example? If not from us, from who? This leaves the reader with a lot of concerning questions. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@104 PS8, Line 104: echo "ERROR: '$ip_address' is not a valid IPv4 address" Should errors map to "/other"? How does Kudu handle this script exiting with a non-zero exit code? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@142 PS8, Line 142: The reasoning is simple: with I try to stay away from saying something is "simple". People have wide levels of experience with distributed systems. Maybe something like: "It's recommended to have at least three locations defined in a Kudu cluster so that no location contains a majority of replicas of a tablet." Then below you can mention the replication factor of 3 in your example. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@151 PS8, Line 151: The location-aware placement policy for tablet replicas in Kudu This seems more appropriate for earlier sections. When reading the blog post I got the idea that the structure was: - What it is - How it works - How to use it - Future work We are now in the "How to use it" part, but this is more about how it works. Can users configure these policies? Are there more than one? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@162 PS8, Line 162: Automatic re-replication and placement policy Per my earlier comment, this is also more about "How it works". http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@177 PS8, Line 177: Reinstating location-aware policy in Kudu cluster I think this is "How to use it" and makes sense here. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@208 PS8, Line 208: Examples Per my earlier comment, this is also more about "How it works". http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@337 PS8, Line 337: roadmap What roadmap? Does Apache Kudu have a roadmap? Maybe we should open jiras and link them for any future work/ideas. http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@342 PS8, Line 342: see [2] Any reason not to link inline instead of using reference style? http://gerrit.cloudera.org:8080/#/c/12119/8/_posts/2019-03-25-location-awareness.md@346 PS8, Line 346: [[1]] [Location awareness in Kudu, design document](https://s.apache.org/location-awareness-design) Can we check this design doc into https://github.com/apache/kudu/tree/master/docs/design-docs and link there? -- To view, visit http://gerrit.cloudera.org:8080/12119 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I10b30a80d5661fb889a11285b8118cdea1a87cd2 Gerrit-Change-Number: 12119 Gerrit-PatchSet: 8 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Greg Solovyev <gsolov...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Tue, 26 Mar 2019 04:05:31 +0000 Gerrit-HasComments: Yes