Adam Hooper created ARROW-7435: ---------------------------------- Summary: Security issue: ValidateOffsets() does not prevent buffer over-read Key: ARROW-7435 URL: https://issues.apache.org/jira/browse/ARROW-7435 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 0.15.1, 1.0.0 Environment: Docker Reporter: Adam Hooper
Skimming through {{Validate()}} code in both 0.15 and master, I noticed an oversight in {{BinaryArray}} validation in C++ (and Python). {{ValidateOffsets()}} checks that the first offset is 0, but it doesn't check that the offsets all point within the data buffer. A nefarious Arrow file could write {{offsets=[0,999999]}} and {{data=[]}}. If a caller reads the first value in that array, that will produce a buffer over-read. Validation is cheap, since Arrow already validates that offsets are monotonically increasing. One need only test that the last offset is less than or equal to the size of the data buffer. We at Workbench are letting untrusted programs write Arrow files that we then validate and read. We're keen to ensure Arrow files don't allow untrusted programs to plant data that leads to arbitrary code execution or arbitrary reads. We wrote a validation tool that checks this buffer over-read I describe here: https://github.com/CJWorkbench/arrow-tools/blob/005fe582b428c1ab6a9ed5f6dc968387d77e9a80/src/arrow-validate.cc#L27. But it feels to me like Arrow's {{Validate()}} should be checking this. -- This message was sent by Atlassian Jira (v8.3.4#803005)