I'm at my laptop now so I can talk a bit more about it. Don't conflate the value type with the encodings. UUID is a field type, just like how dates or integers are field types. They explain to the Solr indexer how to reason about the value it gets. The field type string "20140810" is encoded differently than the integer value 20140810 or Date "20140810". This is important for the queries you can build, as a date range query is different than an integer or string range.
That said, in Solr, usually UUID is generated on the backend, such as with UUIDUpdateProcessorFactory. Even so, you can no more send a binary UUID than you can a binary date value. There are two encodings you have to think about when dealing with Solr. Anything that's binary needs to be converted to a String that Solr can understand. Base64 is how you convert a binary value to a string value. So in the case of your key (in Erlang): 1> base64:encode(<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>). <<"Xo8hIy20TqSX7UhROA0c+g==">> base64 encoding libs exist in any language. Once you have this key string in base64, internally, Yokozuna will assume that string is valid UTF8. I was probably a bit hasty when I said "yokozuna only supports UTF8 . What I should have said is that "yokozuna assumes types/buckets/keys are UTF8 and encodes values appropriately." So in summation: UUID: Solr field type Base64: Encode binary values to a string UTF8: The assumed string encoding Does that help? Eric On Aug 10, 2014, at 5:03 PM, David James <[email protected]> wrote: > Thanks for the quick responses. > > Eric: I don't understand. Why does Solr have the UUIDField > (http://lucene.apache.org/solr/4_7_0/solr-core/org/apache/solr/schema/UUIDField.html) > if it were not indexable? What is the nature of the limitation? > > Jason: Thanks, I will consider Base 64 encoding. > > > On Sun, Aug 10, 2014 at 7:19 PM, Jason Campbell <[email protected]> wrote: > I like UUIDs for everything as well, although I expected compatibility issues > with something. Base 64 encoding the binary value is a nice compromise for > me, and takes 22 characters (if you drop the padding) instead of the usual 36 > for the hyphenated hex format. > > It would still require re encoding all the keys, but it's a partial solutions. > > From: Eric Redmond > Sent: Monday, 11 August 2014 9:15 AM > To: David James > Cc: riak-users > Subject: Re: Using UUID as keys is problematic for Riak Search > > You're correct that yokozuna only supports utf8, because the Solr interface > only supports utf8 (note that the failure happens when attempting to build a > non-utf8 JSON add document command). There's not much we can do here at the > moment, since we've yet to (if ever) support a custom interface to Solr that > accepts arbitrary binary values. In the mean time, to use yokozuna, you'll > have to encode your keys to utf8. > > Eric Redmond, Engineer @ Basho > > > On Sun, Aug 10, 2014 at 4:01 PM, David James <[email protected]> wrote: > > I'm using UUIDs for keys in Riak -- converted to bytes, not UTF-8 strings. > (I'd rather spend 16 bytes for each key, not 36.) > > As I understand it, Yokozuna maps the Riak key to _yz_id. > > Here is the suggested schema from the documentation: > > <!-- schema.xml --> > <field name="_yz_id" type="_yz_str" indexed="true" stored="true" > multiValued="false" required="true"/> > <fieldType name="_yz_str" class="solr.StrField" sortMissingLast="true"/> > > Would you expect this to work with Riak Search? I would hope so. > > (Or must keys be UTF-8 strings?) > > I get this error, which does not surprise me, given that the _yz_id is > defined as a string: > ==> log/error.log <== > > 2014-08-10 18:24:16.221 [error] <0.610.0>@yz_kv:index:206 failed to index > object > {<<"test-0001">>,<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>} > with error {ucs,{bad_utf8_character_code}} because > [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},{mochijson2,json_encode_string,2,[{file,"src/mochijson2.erl"},{line,186}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]}] > > I don't think changing the schema.xml type for _yz_id to "solr.UUIDField" is > a good idea. > > What can I do? > > Thanks, > David > > > > > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
