[ 
https://issues.apache.org/jira/browse/SPARK-29454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-29454:
-----------------------------
    Description: 
ParquetGroupConverter call unsafeProjection function to covert 
SpecificInternalRow to UnsafeRow every times when read Parquet data file use 
ParquetRecordReader, then ParquetFileFormat will call unsafeProjection function 
to covert this UnsafeRow to another UnsafeRow again when partitionSchema is not 
empty , and on the other hand PartitionReaderWithPartitionValues  always do 
this convert process when use DataSourceV2.

I think the first time convert in ParquetGroupConverter is redundant and 
ParquetRecordReader return a SpecificInternalRow is enough.

  was:
ParquetGroupConverter call unsafeProjection function to covert 
SpecificInternalRow to UnsafeRow every times when read Parquet data file use 
ParquetRecordReader, then ParquetFileFormat will call unsafeProjection function 
to covert this UnsafeRow to another UnsafeRow again when partitionSchema is not 
empty , and on the other hand we PartitionReaderWithPartitionValues  always do 
this convert process when use DataSourceV2.

I think the first time convert in ParquetGroupConverter is redundant and 
ParquetRecordReader return a SpecificInternalRow is enough.


> Reduce one unsafeProjection call when read parquet file
> -------------------------------------------------------
>
>                 Key: SPARK-29454
>                 URL: https://issues.apache.org/jira/browse/SPARK-29454
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.3, 2.3.4, 2.4.4
>            Reporter: Yang Jie
>            Priority: Major
>
> ParquetGroupConverter call unsafeProjection function to covert 
> SpecificInternalRow to UnsafeRow every times when read Parquet data file use 
> ParquetRecordReader, then ParquetFileFormat will call unsafeProjection 
> function to covert this UnsafeRow to another UnsafeRow again when 
> partitionSchema is not empty , and on the other hand 
> PartitionReaderWithPartitionValues  always do this convert process when use 
> DataSourceV2.
> I think the first time convert in ParquetGroupConverter is redundant and 
> ParquetRecordReader return a SpecificInternalRow is enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to