UDFs scare me in that the only Java I've conquered is evident from my empty
french press...

Same issue here. I have solved this in other platforms by pre-processing the data with a set of regex replacements in Awk:

~~~
# "Repair" invalid dates as stored in MySQL (3 replacements for readability no slower than one nested)
$0 = gensub(/0000-([0-9]{2}-[0-9]{2})/, "0001-\\1", "g", $0)
$0 = gensub(/([0-9]{4})-00-([0-9]{2})/, "\\1-01-\\2", "g", $0)
$0 = gensub(/([0-9]{4}-[0-9]{2})-00/, "\\1-01", "g", $0)
~~~

But of course this adds another step in the pipeline. Perhaps something similar to could be implemented via https://drill.apache.org/docs/string-manipulation/#regexp_replace ?



On 25 May 2016, at 12:55, John Omernik wrote:

Cool, I wasn't aware of SIMILAR to (I learned something) However, that doesn't work because my data is accurate i.e. '____-__-__' 2015-04-02 and
2015-00-23  but 00 doesn't work (bad data) .

UDFs scare me in that the only Java I've conquered is evident from my empty
french press...

I know I've brought it up in the past, but has anyone seen any community around UDFs start? I'd love to have a community that follows Apache like rules, and allows us to create and track UDFs to share... that would be pretty neat. I guess if we were to do something like that, should one of us (I can volunteer) just start a Github project and encourage folks to come to the table or is there better way via Apache to do something like
that?

On Wed, May 25, 2016 at 10:27 AM, Veera Naranammalpuram <
[email protected]> wrote:

You could write a UDF. Or you could do something like this:

cat data.csv
05/25/2016
20160525
May 25th 2016

0: jdbc:drill:> select case when columns[0] similar to '__/__/____' then to_date(columns[0],'MM/dd/yyyy') when columns[0] similar to '________' then
to_date(columns[0],'yyyyMMdd') else NULL end from `data.csv`;
+-------------+
|   EXPR$0    |
+-------------+
| 2016-05-25  |
| 2016-05-25  |
| null        |
+-------------+
3 rows selected (0.4 seconds)
0: jdbc:drill:>

-Veera

On Wed, May 25, 2016 at 11:12 AM, Vince Gonzalez <[email protected]>
wrote:

Sounds like a job for a UDF?

You could do the try/catch inside the UDF.

 ----
 Vince Gonzalez
 Systems Engineer
 212.694.3879

 mapr.com

On Wed, May 25, 2016 at 11:05 AM, John Omernik <[email protected]> wrote:

I have some DOBs, and some fields are empty others apparently were
filled
by trained monkeys, but while most data is accurate, some data is not.

As you saw from my other post, I am trying to get the age for those
DOBs
that are valid...

My function works, until I get to a record that is not valid and I get
something like this:

Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for
monthOfYear
must be in the range [1,12]


Is there a good "Try -> Except" type solution that will grant me the
valid
data if things worked, and just return 0 or whatever I specify if there
is
an error?

I could try casting the data, but if it fails won't it kill my query? Basically I want it to keep going if it fails... not sure if Drill has
this
ability, but thought I would ask.





--
Veera Naranammalpuram
Product Specialist - SQL on Hadoop
*MapR Technologies (www.mapr.com <http://www.mapr.com>)*
*(Email) [email protected] <[email protected]>*
*(Mobile) 917 683 8116 - can text *
*Timezone: ET (UTC -5:00 / -4:00)*

Reply via email to