NealSun96 commented on a change in pull request #731: Add TrieRoutingData
constructor
URL: https://github.com/apache/helix/pull/731#discussion_r376584035
##########
File path:
helix-rest/src/main/java/org/apache/helix/rest/metadatastore/TrieRoutingData.java
##########
@@ -124,8 +128,91 @@ private TrieNode findTrieNode(String path, boolean
findLeafAlongPath)
return curNode;
}
- // TODO: THE CLASS WILL BE CHANGED TO PRIVATE ONCE THE CONSTRUCTOR IS
CREATED.
- static class TrieNode {
+ /**
+ * Checks for the edge case when the only sharding key in provided routing
data is the delimiter
+ * or an empty string. When this is the case, the trie is valid and contains
only one node, which
+ * is the root node, and the root node is a leaf node with a realm address
associated with it.
+ * @param routingData - a mapping from "sharding keys" to "realm addresses"
to be parsed into a
+ * trie
+ * @return whether the edge case is true
+ */
+ private boolean isRootShardingKey(Map<String, List<String>> routingData) {
+ if (routingData.values().size() == 1) {
+ for (List<String> shardingKeys : routingData.values()) {
+ return shardingKeys.size() == 1
+ && (shardingKeys.get(0).equals(DELIMITER) ||
shardingKeys.get(0).equals(""));
+ }
+ }
+
+ return false;
+ }
+
+ /**
+ * Constructs a trie based on the provided routing data. It loops through
all sharding keys and
+ * construct the trie in a top down manner.
+ * @param routingData- a mapping from "sharding keys" to "realm addresses"
to be parsed into a
+ * * trie
+ * @throws InvalidRoutingDataException - when there is an empty sharding key
(edge case that
+ * always renders the routing data invalid); when there is a
sharding key which already
+ * contains a sharding key (invalid); when there is a sharding key
that is a part of
+ * another sharding key (invalid)
+ */
+ private void constructTrie(Map<String, List<String>> routingData)
+ throws InvalidRoutingDataException {
+ for (Map.Entry<String, List<String>> entry : routingData.entrySet()) {
+ for (String shardingKey : entry.getValue()) {
+ // Add a leading delimiter if there isn't any
+ if (!shardingKey.substring(0, 1).equals(DELIMITER)) {
+ shardingKey = DELIMITER + shardingKey;
+ }
+
+ // Root can only be a sharding key if it's the only sharding key.
Since this method is
+ // running, the special case has already been checked, therefore it's
definitely invalid
+ if (shardingKey.equals(DELIMITER)) {
+ throw new InvalidRoutingDataException(
+ "There exists other sharding keys. Root cannot be a sharding
key.");
+ }
+
+ // Locate the next delimiter
+ int nextDelimiterIndex = shardingKey.indexOf(DELIMITER, 1);
Review comment:
After an offline discussion with @pkuwm , I've realized that the "storing
sharding keys prefixes in nodes" approach is not as good as I used to think,
and therefore I'm opening up a discussion on this. A quick analysis of two
approaches:
| |
Store sharding key prefixes in nodes ("/a", "/a/b", "/a/b/c") | Store sharding
key sections in nodes ("/a", "/b", "/c") |
|---------------------------------------------------------------------|--------------------------------------------------------------|------------------------------------------------------------------------------------------|
| Efficiency to return full sharding keys in `getAllMappingUnderPath` | O(1)
accessing operations | O(N) joining
operations; O(1) to add/remove key sections while traversing the trie; O(N)
size map used to keep track whether a trie node has been visited |
| Memory consumption to store the keys in nodes |
Strictly larger than the combined sizes of all keys (usually several times
larger) | Equal or lower than the combined sizes of all keys
|
| Efficiency to construct the trie | O(N)
to get the prefix substring of sharding keys | O(1) since we already
have the key sections |
Several additional points:
1. The efficiency of `getAllMappingUnderPath` is more important than the
efficiency of construction for two reasons: 1. the frequency of
`getAllMappingUnderPath` is likely higher than construction; 2. the efficiency
of `getAllMappingUnderPath` matters more because it's serving a REST endpoint.
2. In the discussion with @pkuwm , we came to a conclusion: if we are
storing sharding keys prefixes, why don't we just use a hashmap instead,
because a hashmap consumes less memory? This is false because hashmap provides
slower string searching than trie. Even if sharding key prefixs are stored, a
trie is still better.
Please provide your opinion and I'll make a decision based on the
discussion. Thanks! @dasahcc @narendly @pkuwm
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]