Hello spark users, I have an error that I would like to report as a spark 3.1.1 bug but I do not know how to create a reproducible example. I can provide a full stack trace if desired but the most useful information seems to be
E py4j.protocol.Py4JJavaError: An error occurred while calling o3301.toJavaRDD. E : java.lang.IllegalStateException: UnspecifiedDistribution does not have default partitioning. E at org.apache.spark.sql.catalyst.plans.physical.UnspecifiedDistribution$.createPartitioning(partitioning.scala:52) E at org.apache.spark.sql.execution.exchange.EnsureRequirements$.$anonfun$ensureDistributionAndOrdering$1(EnsureRequirements.scala:54) This error happens when I have spark.sql.adaptive.enabled=true but does not happen when I change to false. It happens for both one of my unit tests (~30 rows) and with production data. Another work-around is to cache the dataframe before calling the collect/toJSON statement. I was not able to find any information about this kind of error on the jira or from stackexchange. I was wondering if anyone has seen this error before related to AQE and has any suggestions for trying to report it. Thanks, Jesse