[ https://issues.apache.org/jira/browse/SPARK-36696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takuya Ueshin resolved SPARK-36696. ----------------------------------- Fix Version/s: 3.2.0 Resolution: Fixed I confirmed the example file can be read after the upgrade. I'd close this now. Thanks! > spark.read.parquet loads empty dataset > -------------------------------------- > > Key: SPARK-36696 > URL: https://issues.apache.org/jira/browse/SPARK-36696 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.0 > Reporter: Takuya Ueshin > Priority: Blocker > Fix For: 3.2.0 > > Attachments: example.parquet > > > Here's a parquet file Spark 3.2/master can't read properly. > The file was stored by pandas and must contain 3650 rows, but Spark > 3.2/master returns an empty dataset. > {code:python} > >>> import pandas as pd > >>> len(pd.read_parquet('/path/to/example.parquet')) > 3650 > >>> spark.read.parquet('/path/to/example.parquet').count() > 0 > {code} > I guess it's caused by the parquet 1.12.0. > When I reverted two commits related to the parquet 1.12.0 from branch-3.2: > - > [https://github.com/apache/spark/commit/e40fce919ab77f5faeb0bbd34dc86c56c04adbaa] > - > [https://github.com/apache/spark/commit/cbffc12f90e45d33e651e38cf886d7ab4bcf96da] > it reads the data successfully. > We need to add some workaround, or revert the commits. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org