Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22275#discussion_r219557215
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -4434,6 +4434,12 @@ def test_timestamp_dst(self):
             self.assertPandasEqual(pdf, df_from_python.toPandas())
             self.assertPandasEqual(pdf, df_from_pandas.toPandas())
     
    +    def test_toPandas_batch_order(self):
    +        df = self.spark.range(64, numPartitions=8).toDF("a")
    +        with 
self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 4}):
    +            pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
    +            self.assertPandasEqual(pdf, pdf_arrow)
    --- End diff --
    
    This looks pretty similar to the kind of test case we could verify with 
something like hypothesis. Integrating hypothesis is probably too much work, 
but we could at least explore num partitions space in a loop quickly here. 
Would that help do you think @felixcheung ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to