So on B. I found the BYTE_SUBSTR and only send 200 bytes to the
string_binary function, I still get an error.  Something else is happening
here...

select `type`, `timestamp`, `src_ip`, `dst_ip`, `src_port`, `dst_port`,
string_binary(byte_substr(`data`, 1, 200)) as mydata from
`user/jomernik/bf2_7306.pcap` limit 10

I get the same

Error Id: 213075e7-378a-437f-a5dc-408326f123f3 on
zeta3.brewingintel.com:20005]

org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
IndexOutOfBoundsException: index: 0, length: 379 (expected: range(0, 256))





On Tue, Jul 17, 2018 at 12:56 PM, John Omernik <[email protected]> wrote:

>
> Thanks Vlad a couple of thoughts.
>
>
> A. I think that should be fixed. That seems like a limitation that is both
> unexpected and undocumented.
>
> B.  Is there a way, if my data in the table is returned as binary to start
> with, for me to return the first 256 bytes? I tried substring, and tries to
> force to UTF-8 and I am getting some issues there.
>
> On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov <[email protected]> wrote:
>
>> In case of DRILL-6607 the issue lies in the implementation of
>> "string_binary" function: it is not prepared to handle incoming data that
>> when converted to a binary string would exceed 256 bytes as it does not
>> reallocate the output buffer. Until the function code is fixed, the only
>> way to avoid the error is either not to use "string_binary" or to use it
>> with the data that meets "string_binary" limitation.
>>
>> Thank you,
>>
>> Vlad
>>
>>
>> On 7/13/18 14:01, Ted Dunning wrote:
>>
>>> There are bounds for acceptable behavior for a function like this.  Array
>>> index out of bounds is not acceptable. Aborting with a clean message
>>> about
>>> to true problem might be fine, as would be to return a null.
>>>
>>> On Fri, Jul 13, 2018, 13:46 John Omernik <[email protected]> wrote:
>>>
>>> So, as to the actual problem, I opened a JIRA here:
>>>>
>>>> https://issues.apache.org/jira/browse/DRILL-6607
>>>>
>>>> The reason I brought this here is my own curiosity:  Does an issue in
>>>> using
>>>> this function most likely lie in the function code itself not handling
>>>> good
>>>> data, or is the issue in the pcap pluglin which produces the data for
>>>> this
>>>> function to consume, I am just curious on how something like this could
>>>> be
>>>> avoided.
>>>>
>>>> John
>>>>
>>>>
>>
>

Reply via email to