Hi Yuming
thanks for the suggestion , now the partitionFilter is normal. Appreciate for
your help!
Best Regards
Kelly Zhang, Liyun Zhang
在 2023-03-06 11:04:55,"Yuming Wang" 写道:
Hi Liyun,
This is because of this change:
https://issues.apache.org/jira/browse/SPARK-27638.
You can set spark.sql.legacy.typeCoercion.datetimeToString.enabled to true to
restore the old behavior.
On Mon, Mar 6, 2023 at 10:27 AM zhangliyun wrote:
Hi all
i have a spark sql , before in spark 2.4.2 it runs correctly, when i
upgrade to spark 3.1.3, it has some problem.
the sql
```
select * from eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly where
dt >= date_sub('${today}',30);
```
it will load the data of past 30 days of table
eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly, here
today='2023-03-01'
in spark2 i saw the physical plan the partition Filter is PartitionFilters:
[isnotnull(dt#1461), (dt#1461 >= 2023-01-31)]
+- *(4) FileScan parquet
eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly[disputeid#1327,statuswork#1330,opTs#1457,trailSeqno#1459,trailRba#1460,dt#1461,hr#1462]
Batched: true, Format: Parquet, Location:
PrunedInMemoryFileIndex[gs://pypl-bkt-prd-row-std-gds-non-edw-tables/apps/risk/eds/eds_risk/eds_r...,
PartitionCount: 805, PartitionFilters: [isnotnull(dt#1461), (dt#1461 >=
2023-01-31)], PushedFilters: [IsNotNull(disputeid)], ReadSchema:
struct
in spark3 , i saw the physical plan , the partitionFilter is
[isnotnull(dt#1602), (cast(dt#1602 as date) >= 19387)]
```
(8) Scan parquet eds_rds.cdh_prpc63cgudba_pp_index_disputecasedetails_hourly
Output [7]: [disputeid#1468, statuswork#1471, opTs#1598, trailSeqno#1600,
trailRba#1601, dt#1602, hr#1603]
Batched: true
Location: InMemoryFileIndex
[gs://pypl-bkt-prd-row-std-gds-non-edw-tables/apps/risk/eds/eds_risk/eds_rds/cdh/prpc63cgudba_pp_index_disputecasedetails/dt=2023-01-30/hr=00,
... 784 entries]
PartitionFilters: [isnotnull(dt#1602), (cast(dt#1602 as date) >= 19387)]
PushedFilters: [IsNotNull(disputeid)]
ReadSchema:
struct
```
here i want to ask why there is big difference in partitionFitler in spark2 and
spark3, i guess most my spark configure is similar in spark2 and spark3 to run
the same sql