Hello,

(sorry, this is a rehash of a question asked on
https://issues.apache.org/jira/browse/THRIFT-5237, since I haven't
received any reply there)

In Apache Parquet, some of our users have encountered situations where
the Thrift 0.14 message size limitations would prevent from reading
legitimate real-world data (see
https://issues.apache.org/jira/browse/ARROW-13655 ).  I have been
trying to understand what kind of vulnerability the new limitations are
designed to address, but have failed to find any precise analysis of
the issue.

Therefore I have tried to go by the Thrift C++ library source code and
have come to the understanding that the vulnerability arises when using
one of the streaming transports where the encoded message size isn't
known in advance (such as socket-based). However, in Parquet C++ we read
the full message in one block from the underlying random access file,
and therefore it seems that disabling the max message size is
legitimate in our case.

Is my understanding ok? If not, can somebody shed a bit more light on
what the vulnerability consists in?

Regards

Antoine.


Reply via email to