The error seems to indicated 'PAYLOAD' does not contain UTF8-encoded
bytes. The like function is a string function, and it only accepts
varchar/char type, which assumes inputs are UTF8 bytes.

You may consider implementing a Drill UDF 'blike" which works similar
to string function 'like', but could operate on non-UTF8 bytes.

On Fri, Apr 28, 2017 at 3:02 PM, Boaz Ben-Zvi <[email protected]> wrote:
>  Hi Franca,
>
>     This issue is specific to the “bytes” type; for other Avro types the LIKE 
> clause matches the printed representation, like:
>
> select * from dfs.`/data/avro/twitter.snappy.avro` where `timestamp` like 
> '%66%';
> +-------------+--------------------------------------+-------------+
> |  username   |                tweet                 |  timestamp  |
> +-------------+--------------------------------------+-------------+
> | miguno      | Rock: Nerf paper, scissors is fine.  | 1366150681  |
> | BlizzardCS  | Works as intended.  Terran is IMBA.  | 1366154481  |
> +-------------+--------------------------------------+-------------+
>
> Can you share some sample avro file with “bytes” type?  (I couldn’t find any 
> such sample online)     Maybe we’ll need to open a Jira for this case …
>
>      Thanks,
>
>              -- Boaz
>
> On 4/25/17, 8:45 AM, "franca perrina" <[email protected]> wrote:
>
>     Hi,
>
>     I would like to use Drill to query data formatted in avro.
>
>     My avro schema looks like
>
>     ..
>     {"name":"payload",
>       "type":"bytes"}
>     ..
>
>     and the result to the query
>
>     SELECT payload FROM `dfs`.`myfile.avro` LIMIT 1
>
>     looks like:
>
>     +-----------------+
>     |     payload     |
>     +-----------------+
>     | [B@3b8e004e     |
>     +-----------------+
>
>
>     My problem is that when I run a query like:
>
>     SELECT * FROM  `dfs`.`myfile.avro` WHERE `PAYLOAD` LIKE '%abcd%'
>
>     then I have
>     org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>     DrillRuntimeException: Unexpected byte 0xfd at position 1008556 
> encountered
>     while decoding UTF8 string. Fragment 0:0 [Error Id:
>     0c247c14-0e51-402c-ad9a-411cbc445597
>     on maprdemo:31010]
>
>     It seems like drill tries to decode the payload's bytes to UTF8.
>
>     What I would need is a grep like behaviour, where my payload data is
>     considered as is, i.e. binary data, and it is not converted to a string
>     data type.
>
>     Thanks a lot for your help.
>     franca
>
>

Reply via email to