The error seems to indicated 'PAYLOAD' does not contain UTF8-encoded bytes. The like function is a string function, and it only accepts varchar/char type, which assumes inputs are UTF8 bytes.
You may consider implementing a Drill UDF 'blike" which works similar to string function 'like', but could operate on non-UTF8 bytes. On Fri, Apr 28, 2017 at 3:02 PM, Boaz Ben-Zvi <[email protected]> wrote: > Hi Franca, > > This issue is specific to the “bytes” type; for other Avro types the LIKE > clause matches the printed representation, like: > > select * from dfs.`/data/avro/twitter.snappy.avro` where `timestamp` like > '%66%'; > +-------------+--------------------------------------+-------------+ > | username | tweet | timestamp | > +-------------+--------------------------------------+-------------+ > | miguno | Rock: Nerf paper, scissors is fine. | 1366150681 | > | BlizzardCS | Works as intended. Terran is IMBA. | 1366154481 | > +-------------+--------------------------------------+-------------+ > > Can you share some sample avro file with “bytes” type? (I couldn’t find any > such sample online) Maybe we’ll need to open a Jira for this case … > > Thanks, > > -- Boaz > > On 4/25/17, 8:45 AM, "franca perrina" <[email protected]> wrote: > > Hi, > > I would like to use Drill to query data formatted in avro. > > My avro schema looks like > > .. > {"name":"payload", > "type":"bytes"} > .. > > and the result to the query > > SELECT payload FROM `dfs`.`myfile.avro` LIMIT 1 > > looks like: > > +-----------------+ > | payload | > +-----------------+ > | [B@3b8e004e | > +-----------------+ > > > My problem is that when I run a query like: > > SELECT * FROM `dfs`.`myfile.avro` WHERE `PAYLOAD` LIKE '%abcd%' > > then I have > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > DrillRuntimeException: Unexpected byte 0xfd at position 1008556 > encountered > while decoding UTF8 string. Fragment 0:0 [Error Id: > 0c247c14-0e51-402c-ad9a-411cbc445597 > on maprdemo:31010] > > It seems like drill tries to decode the payload's bytes to UTF8. > > What I would need is a grep like behaviour, where my payload data is > considered as is, i.e. binary data, and it is not converted to a string > data type. > > Thanks a lot for your help. > franca > >
