The answer for my problem was to use regexp_replace. In this case just using an underscore to replace any characters outside of the expected range using the UTF8 character set table.
Instead of just selecting the column used this regexp_replace(<column or field name>, '[^\\x00-\\x7f-\\x80-\\xad]', '_') The ^ in the front is to select the inverse (or any characters not specified in the range of the square brackets [] (Make a note of this as I’m sure many will run into this issues in the future with various JSON data). —Andries On Mar 2, 2015, at 6:23 PM, Andries Engelbrecht <[email protected]> wrote: > How can I convert JSON data with various characters in a text field to a more > usable UTF8 encoding? (even if characters not in UTF8 is dropped) > Or if needs be filter out the records with characters not in UTF8? > > CONVERT_TO is for binary and cast as varchar still passes these characters > through. > > Results in sqlline works, but when using tools through ODBC various failures > occur (especially tools not capable of handling characters outside of UTF8). > > > Thanks > —Andries
