[jira] [Assigned] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

Wenchen Fan (Jira) Sat, 27 Apr 2024 20:11:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wenchen Fan reassigned SPARK-48019:
-----------------------------------

    Assignee: Gene Pang

> ColumnVectors with dictionaries and nulls are not read/copied correctly
> -----------------------------------------------------------------------
>
>                 Key: SPARK-48019
>                 URL: https://issues.apache.org/jira/browse/SPARK-48019
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.3
>            Reporter: Gene Pang
>            Assignee: Gene Pang
>            Priority: Major
>              Labels: pull-request-available
>
> {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those 
> return a primitive array with the contents of the vector. When the 
> ColumnVector has a dictionary, the values are decoded with the dictionary 
> before filling in the primitive array.
> However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, 
> the dictionary id is irrelevant, and can also be invalid. The dictionary 
> should not be used for the {{null}} entries of the vector. Sometimes, this 
> can cause an {{ArrayIndexOutOfBoundsException}} .
> In addition to the possible Exception, copying a {{ColumnarArray}} is not 
> correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain 
> {{null}} values. However, the {{copy()}} for primitive types does not take 
> into account the null-ness of the entries, and blindly copies all the 
> primitive values. That means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

Reply via email to