Chao Sun created PARQUET-2052: --------------------------------- Summary: Integer overflow when writing huge binary using dictionary encoding Key: PARQUET-2052 URL: https://issues.apache.org/jira/browse/PARQUET-2052 Project: Parquet Issue Type: Bug Reporter: Chao Sun Assignee: Chao Sun
To check whether it should fallback to plain encoding, {{DictionaryValuesWriter}} currently use two variables: {{dictionaryByteSize}} and {{maxDictionaryByteSize}}, both of which are integer. This will cause issue when one first writes a relatively small binary within the threshold and then write a huge string which cause {{dictionaryByteSize}} overflow and becoming negative. -- This message was sent by Atlassian Jira (v8.3.4#803005)