Wes, Also, if you're using curl to load things into Riak, be sure to use --data-binary with your payload, which will not try to convert multibyte characters or line-terminators.
On Sat, Apr 7, 2012 at 11:21 AM, Wes James <[email protected]> wrote: > I found it. I thought if any web site might be able to handle unicode, > it would be erlang.org, so I went and grabbed some of the header text: > > <?xml version='1.0' encoding='utf-8'?> > <!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' > 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'> > <html xmlns='http://www.w3.org/1999/xhtml'> > <head> > <title>test</title> > <meta http-equiv='Content-Type' content='text/html;charset=utf-8'/> > </head> > > and it works correctly now. > > thanks > > On Fri, Apr 6, 2012 at 3:18 PM, Kresten Krab Thorup <[email protected]> > wrote: > > It looks like you may have missed specifying the charset when importing > your data; could that be the case? > > > > You need to specify the charset when importing 8-bit text. It looks > like your xml is utf-8 encoded, so it should be imported using something > like this: > > > > curl -H 'Content-Type: text/html;charset=UTF-8' -X PUT @datafile.xml > http://host:port/riak/bucket/key > > > > The various language clients have different ways of specifying the > charset for a value; so if you imported the xml using some other method you > need to find out where to specify it. > > > > Perhaps to verify, you can check the result of a curl -v (verbose, print > the headers) for one of your values. If it does not come back with a > charset=XXX in the Content-Type header, then this is your problem. > > > > Kresten > > > > > > > > On Apr 6, 2012, at 4:44 PM, Wes James wrote: > > > > I imported many records, one of which looks like this: > > > > <add> > > <doc> > > <field name='id'>0</field> > > <field name='title'>Ekologie lučních porostů (A)</field> > > <field name='author_editor'>Rychnovská, Milena, Emilie > Balátová-Tuláčková, Blanka Úlehlová, Jaroslav Pelikán</field> > > <field name='date_of_publication'>1985</field> > > <field name='publisher'>Academia</field> > > <field name='keywords'>-</field> > > <field name='notes'>amazon 5/22/09 Category: Ecology (Y)</field> > > <field name='valuation'>8.00</field> > > <field name='purchase_price'>10.00</field> > > </doc> > > </add> > > > > with > > > > bin/search-cmd solr books books.xml > > > > Notice the characters above. In the riak -> cowboy -> webpage it looks > like: > > > > Id: 0 > > Title: title: Ekologie luÄ nÃch porostů (A) > > Auther Editor: author_editor: Rychnovská, Milena, Emilie > Balátová-TulÃ¡Ä ková, Blanka Úlehlová, Jaroslav Pelikán > > Date of Publication: date_of_publication: 1985 > > Notes: publisher: Academia > > Notes: notes: amazon 5/22/09 Category: Ecology (Y) > > Purchase Price: purchase_price: 10.00 > > Valuation: valuation: 8.00 > > > > Is there a way I can fix this? > > > > Doing an io:format it it looks like: > > > > Rychnovská, Milena, Emilie Balátová-TulÃ¡Ä ková, Blanka Úlehlová, > Jaroslav Pelikán > > > > Thanks, > > > > Wes > > _______________________________________________ > > riak-users mailing list > > [email protected]<mailto:[email protected]> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > > > > Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab > > Trifork A/S | Margrethepladsen 4 | DK- 8000 Aarhus C | Phone : +45 > 8732 8787 | www.trifork.com<http://www.trifork.com> > > > > > > > > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > -- Sean Cribbs <[email protected]> Software Engineer Basho Technologies, Inc. http://basho.com/
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
