[GitHub] [helix] NealSun96 commented on a change in pull request #731: Add TrieRoutingData constructor

GitBox Fri, 07 Feb 2020 12:08:18 -0800

NealSun96 commented on a change in pull request #731: Add TrieRoutingData 
constructor
URL: https://github.com/apache/helix/pull/731#discussion_r376584035


 ##########
 File path: 
helix-rest/src/main/java/org/apache/helix/rest/metadatastore/TrieRoutingData.java
 ##########
 @@ -124,8 +128,91 @@ private TrieNode findTrieNode(String path, boolean 
findLeafAlongPath)
     return curNode;
   }
 
-  // TODO: THE CLASS WILL BE CHANGED TO PRIVATE ONCE THE CONSTRUCTOR IS 
CREATED.
-  static class TrieNode {
+  /**
+   * Checks for the edge case when the only sharding key in provided routing 
data is the delimiter
+   * or an empty string. When this is the case, the trie is valid and contains 
only one node, which
+   * is the root node, and the root node is a leaf node with a realm address 
associated with it.
+   * @param routingData - a mapping from "sharding keys" to "realm addresses" 
to be parsed into a
+   *          trie
+   * @return whether the edge case is true
+   */
+  private boolean isRootShardingKey(Map<String, List<String>> routingData) {
+    if (routingData.values().size() == 1) {
+      for (List<String> shardingKeys : routingData.values()) {
+        return shardingKeys.size() == 1
+            && (shardingKeys.get(0).equals(DELIMITER) || 
shardingKeys.get(0).equals(""));
+      }
+    }
+
+    return false;
+  }
+
+  /**
+   * Constructs a trie based on the provided routing data. It loops through 
all sharding keys and
+   * construct the trie in a top down manner.
+   * @param routingData- a mapping from "sharding keys" to "realm addresses" 
to be parsed into a
+   *          * trie
+   * @throws InvalidRoutingDataException - when there is an empty sharding key 
(edge case that
+   *           always renders the routing data invalid); when there is a 
sharding key which already
+   *           contains a sharding key (invalid); when there is a sharding key 
that is a part of
+   *           another sharding key (invalid)
+   */
+  private void constructTrie(Map<String, List<String>> routingData)
+      throws InvalidRoutingDataException {
+    for (Map.Entry<String, List<String>> entry : routingData.entrySet()) {
+      for (String shardingKey : entry.getValue()) {
+        // Add a leading delimiter if there isn't any
+        if (!shardingKey.substring(0, 1).equals(DELIMITER)) {
+          shardingKey = DELIMITER + shardingKey;
+        }
+
+        // Root can only be a sharding key if it's the only sharding key. 
Since this method is
+        // running, the special case has already been checked, therefore it's 
definitely invalid
+        if (shardingKey.equals(DELIMITER)) {
+          throw new InvalidRoutingDataException(
+              "There exists other sharding keys. Root cannot be a sharding 
key.");
+        }
+
+        // Locate the next delimiter
+        int nextDelimiterIndex = shardingKey.indexOf(DELIMITER, 1);
 
 Review comment:
   After an offline discussion with  @pkuwm , I've realized that the "storing 
sharding keys prefixes in nodes" approach is not as good as I used to think, 
and therefore I'm opening up a discussion on this. A quick analysis of two 
approaches: 
   |                                                                     | 
Store sharding key prefixes in nodes ("/a", "/a/b", "/a/b/c") | Store sharding 
key sections in nodes ("/a", "/b", "/c")                                   |
   
|---------------------------------------------------------------------|--------------------------------------------------------------|------------------------------------------------------------------------------------------|
   | Efficiency to return full sharding keys in `getAllMappingUnderPath` | O(1) 
accessing operations                                    | O(N) joining 
operations; O(1) to add/remove key sections while traversing the trie; O(N) 
size map used to keep track whether a trie node has been visited |
   | Memory consumption to store the keys in nodes                       | 
Strictly larger than the combined sizes of all keys (usually several times 
larger)          | Equal or lower than the combined sizes of all keys           
                            |
   | Efficiency to construct the trie                                    | O(N) 
to get the prefix substring of sharding keys            | O(1) since we already 
have the key sections                                              |
   
   Several additional points:
   
   1. The efficiency of `getAllMappingUnderPath` is more important than the 
efficiency of construction for two reasons: 1. the frequency of 
`getAllMappingUnderPath` is likely higher than construction; 2. the efficiency 
of `getAllMappingUnderPath` matters more because it's serving a REST endpoint. 
   2. In the discussion with @pkuwm , we came to a conclusion: if we are 
storing sharding keys prefixes, why don't we just use a hashmap instead, 
because a hashmap consumes less memory? This is false because hashmap provides 
slower string searching than trie. Even if sharding key prefixs are stored, a 
trie is still better. 
   
   Please provide your opinion and I'll make a decision based on the 
discussion. Thanks! @dasahcc @narendly @pkuwm 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [helix] NealSun96 commented on a change in pull request #731: Add TrieRoutingData constructor

Reply via email to