Worth noting here; the current Java client is entirely UTF-8 centric and is explicitly converting those bytes to UTF-8 strings, so yes ... that's probably an issue here if I'm understanding things correctly.
Almost everything is copied to/from the protocol buffer message to Java Strings using the ByteString.copyFromUtf8() and ByteString.toStringUtf8() methods. This is actually something that is addressed in the new 2.0 Java client Dave and I are working on. Thanks, - Roach On Tue, Nov 5, 2013 at 5:40 PM, Toby Corkindale <[email protected]> wrote: > On 06/11/13 11:30, Evan Vigil-McClanahan wrote: >> >> You can replace int_to_bin with int_to_str to make it easier to debug >> in the future, I suppose. I am not sure how to get them to be fetched >> as bytes, without may altering the client. >> >> You could just attach to the console and run whatever listing command >> you're running there, which would give you the answer as unfiltered >> erlang binaries, which are easy to understand. > > > Ah, I'm really not familiar enough with Erlang and Riak to be doing that. > Which API applies to console commands? I'll take a look. (Is it just the > same as the Erlang client?) > > > >> Is this easily replicable on a new cluster? > > > I think it should be -- the only difference over default configuration is > that LevelDB is configured as the default backend. > Run basho_bench with the pbc-client test to generate the initial keys and > you should be set. > > > T > >> On Tue, Nov 5, 2013 at 4:17 PM, Toby Corkindale >> <[email protected]> wrote: >>> >>> Hi Evan, >>> These keys were originally created by basho-bench, using: >>> {key_generator, {int_to_bin, {uniform_int, 10000}}}. >>> >>> Of the 10k keys, it seems half could be removed, but not the other half. >>> >>> Now I've tried storing keys with the same key as the un-deleteable ones, >>> waiting a minute, and then deleting them again.. this isn't seeming to >>> help! >>> >>> I don't know if it's significant, but I'm working with the Java client >>> here >>> (protocol buffers). I note that the bad keys are basically just bytes, >>> not >>> actual ascii strings, and they do contain nulls. >>> >>> Actually, here's something I just noticed -- the keys I'm getting from >>> the >>> index are repeating! It's the same 39 keys, repeated 128 times. >>> >>> O.o >>> >>> Are there any known bugs in the PBC interface when it comes to binary >>> keys? >>> I know the HTTP interface just crashes out completely. >>> >>> I'm fetching the keys in a manner that returns strings; is there a way to >>> fetch them as bytes? Maybe that would work better; I'm wondering if the >>> client is attempting to convert the bytes into unicode strings and >>> dropping >>> invalid characters? >>> >>> >>> On 05/11/13 03:44, Evan Vigil-McClanahan wrote: >>>> >>>> >>>> Hi Toby. >>>> >>>> It's possible, since they're stored separately, that the objects were >>>> deleted but the indices were left in place because of some error (e.g. >>>> the operation failed for some reason between the object removal and >>>> the index removal). One of the things on the feature list for the >>>> next release is AAE of index values, which should take care of this >>>> case. This is really rare, but not unknown. It'd be interesting to >>>> know how you ended up with so many. >>>> >>>> In the mean time, the only way I can think of to get rid of them >>>> (other than deleting them from the console, which would require taking >>>> nodes down and a lot of manual effort), would be to write another >>>> value that would have the same index, then delete it, which should >>>> normally succeed. >>>> >>>> I'll ask around to see if there is anything that might work better. > > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
