Adam Hooper created ARROW-7435:
----------------------------------

             Summary: Security issue: ValidateOffsets() does not prevent buffer 
over-read
                 Key: ARROW-7435
                 URL: https://issues.apache.org/jira/browse/ARROW-7435
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 0.15.1, 1.0.0
         Environment: Docker
            Reporter: Adam Hooper


Skimming through {{Validate()}} code in both 0.15 and master, I noticed an 
oversight in {{BinaryArray}} validation in C++ (and Python).

{{ValidateOffsets()}} checks that the first offset is 0, but it doesn't check 
that the offsets all point within the data buffer. A nefarious Arrow file could 
write {{offsets=[0,999999]}} and {{data=[]}}. If a caller reads the first value 
in that array, that will produce a buffer over-read.

Validation is cheap, since Arrow already validates that offsets are 
monotonically increasing. One need only test that the last offset is less than 
or equal to the size of the data buffer.

We at Workbench are letting untrusted programs write Arrow files that we then 
validate and read. We're keen to ensure Arrow files don't allow untrusted 
programs to plant data that leads to arbitrary code execution or arbitrary 
reads. We wrote a validation tool that checks this buffer over-read I describe 
here: 
https://github.com/CJWorkbench/arrow-tools/blob/005fe582b428c1ab6a9ed5f6dc968387d77e9a80/src/arrow-validate.cc#L27.
 But it feels to me like Arrow's {{Validate()}} should be checking this.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to