So on B. I found the BYTE_SUBSTR and only send 200 bytes to the string_binary function, I still get an error. Something else is happening here...
select `type`, `timestamp`, `src_ip`, `dst_ip`, `src_port`, `dst_port`, string_binary(byte_substr(`data`, 1, 200)) as mydata from `user/jomernik/bf2_7306.pcap` limit 10 I get the same Error Id: 213075e7-378a-437f-a5dc-408326f123f3 on zeta3.brewingintel.com:20005] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, length: 379 (expected: range(0, 256)) On Tue, Jul 17, 2018 at 12:56 PM, John Omernik <[email protected]> wrote: > > Thanks Vlad a couple of thoughts. > > > A. I think that should be fixed. That seems like a limitation that is both > unexpected and undocumented. > > B. Is there a way, if my data in the table is returned as binary to start > with, for me to return the first 256 bytes? I tried substring, and tries to > force to UTF-8 and I am getting some issues there. > > On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov <[email protected]> wrote: > >> In case of DRILL-6607 the issue lies in the implementation of >> "string_binary" function: it is not prepared to handle incoming data that >> when converted to a binary string would exceed 256 bytes as it does not >> reallocate the output buffer. Until the function code is fixed, the only >> way to avoid the error is either not to use "string_binary" or to use it >> with the data that meets "string_binary" limitation. >> >> Thank you, >> >> Vlad >> >> >> On 7/13/18 14:01, Ted Dunning wrote: >> >>> There are bounds for acceptable behavior for a function like this. Array >>> index out of bounds is not acceptable. Aborting with a clean message >>> about >>> to true problem might be fine, as would be to return a null. >>> >>> On Fri, Jul 13, 2018, 13:46 John Omernik <[email protected]> wrote: >>> >>> So, as to the actual problem, I opened a JIRA here: >>>> >>>> https://issues.apache.org/jira/browse/DRILL-6607 >>>> >>>> The reason I brought this here is my own curiosity: Does an issue in >>>> using >>>> this function most likely lie in the function code itself not handling >>>> good >>>> data, or is the issue in the pcap pluglin which produces the data for >>>> this >>>> function to consume, I am just curious on how something like this could >>>> be >>>> avoided. >>>> >>>> John >>>> >>>> >> >
