Hi Franca,
This issue is specific to the “bytes” type; for other Avro types the LIKE
clause matches the printed representation, like:
select * from dfs.`/data/avro/twitter.snappy.avro` where `timestamp` like
'%66%';
+-------------+--------------------------------------+-------------+
| username | tweet | timestamp |
+-------------+--------------------------------------+-------------+
| miguno | Rock: Nerf paper, scissors is fine. | 1366150681 |
| BlizzardCS | Works as intended. Terran is IMBA. | 1366154481 |
+-------------+--------------------------------------+-------------+
Can you share some sample avro file with “bytes” type? (I couldn’t find any
such sample online) Maybe we’ll need to open a Jira for this case …
Thanks,
-- Boaz
On 4/25/17, 8:45 AM, "franca perrina" <[email protected]> wrote:
Hi,
I would like to use Drill to query data formatted in avro.
My avro schema looks like
..
{"name":"payload",
"type":"bytes"}
..
and the result to the query
SELECT payload FROM `dfs`.`myfile.avro` LIMIT 1
looks like:
+-----------------+
| payload |
+-----------------+
| [B@3b8e004e |
+-----------------+
My problem is that when I run a query like:
SELECT * FROM `dfs`.`myfile.avro` WHERE `PAYLOAD` LIKE '%abcd%'
then I have
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
DrillRuntimeException: Unexpected byte 0xfd at position 1008556 encountered
while decoding UTF8 string. Fragment 0:0 [Error Id:
0c247c14-0e51-402c-ad9a-411cbc445597
on maprdemo:31010]
It seems like drill tries to decode the payload's bytes to UTF8.
What I would need is a grep like behaviour, where my payload data is
considered as is, i.e. binary data, and it is not converted to a string
data type.
Thanks a lot for your help.
franca