It's https://issues.apache.org/jira/browse/ARROW-1489
On Sat, May 30, 2020 at 9:56 AM Neal Richardson <[email protected]> wrote: > > Sounds reasonable, could you please open a JIRA issue? > > Neal > > On Sat, May 30, 2020 at 1:01 AM Yue Ni <[email protected]> wrote: >> >> Hi there, >> >> I find arrow compute provides Cast API allowing users to cast from string to >> number/boolean values, but sometimes the string values contain some invalid >> values that cannot be casted to a number/boolean (sorry, data is really >> messy), for example, in a string array like ["1", "2", "3", "None", ""]. I >> wonder if there is any way to handle those invalid values during casting. >> >> Currently from the code I read (cast.h/cast.cc), it seems the cast will fail >> and return when dealing with invalid values, I wonder if there is any way I >> can ask the Cast API to return NULL for invalid values, so that it is easier >> to process these NULL values later. >> >> And since it is rarely possible to guarantee all string values in an array >> are valid, **any** invalid value in an array/entire data set will make the >> cast process failed. This requires users using the cast API to figure out >> which value in the array has the invalid value by themself, which is not >> easy to do programmatically (only an error status message is set in the >> context). IMHO the following strategy could be a better default strategy >> when casting from string to number/boolean: >> 1) when finding an invalid value, set NULL as its value >> 2) set an error status indicating this array casting has some invalid values >> 3) keep finish casting the remaining elements in the array >> But I believe there are users who prefer bailing out as soon as possible as >> well, it will be great if we can provide different cast options to make both >> strategies possible. >> >> Thanks so much. >> >> Regards, >> Yue
