select t.app.hcc.event_name as en
from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
where en in ('logout');

this yields the error:
  Error: SYSTEM ERROR: NumberFormatException: logout

ok, so let's explicitly cast

select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
en
from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
where en in ('logout');

now, just to humor drill

select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
en
from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
where en in ('123');

runs, but returns no results - as would be expected because we don't use #s
as event names

Am I misunderstanding how drill types data in a schema less record?
I would have thought the explicit cast would have been enough


P.S. I ran another query like this one on a months worth of logs (a lot of
json in HDFS) and it chewed through it in less time than it takes my
current Hive query to actually start, and all of this on a single aws
m3.xlarge - this drill sucker is fast, we really want to use it.

john o schneider
[email protected]
408-203-7891

Reply via email to