I like to use URL encoding. Then I can use the JDK's UrlEncoder. ==================== Jordan Zimmerman
On Jul 6, 2012, at 9:11 AM, David Nickerson <[email protected]> wrote: > I'm writing a distributed locking API based on ZooKeeper. I create nodes > based on the resource names, but I have no control over what the client > chooses as their resource name. (Quite often the client uses linux file > paths, so I have to remove or escape all of the front slashes.) > > To clean the node names, I wrote a method that escapes the bad characters. > The method is called 'normalize': http://pastebin.com/hakkb9Nw . > > For example, a front slash becomes \x2f. This method works, but it has a > few drawbacks. It doesn't deal with unicode characters greater then 16 bits > in size, and it's impossible to reverse the escape process. Also, > crucially, it is possible that two different resources will result in the > same znode name, which could cause all kinds of trouble. > > A more reliable approach would be to convert the resource name into hex. > For example: > > import javax.xml.bind.DatatypeConverter; > > DatatypeConverter.printHexBinary(string.getBytes()) > > This would always result in a safe and unique node name. (It will never > result in the token "zookeeper" because "zookeeper" has an odd number of > characters.) The only problem with this is that it becomes impossible to > read and understand the resource names from ZooKeeper unless you reverse > the process: > > new String(DatatypeConverter.parseHexBinary(hex)) > > So I'm wondering, is there a standard or recommended practice for > sanitizing znode names? If not, which approach would you recommend?
