[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...

holdenk Fri, 21 Sep 2018 09:54:02 -0700

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22275#discussion_r219557215
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -4434,6 +4434,12 @@ def test_timestamp_dst(self):
             self.assertPandasEqual(pdf, df_from_python.toPandas())
             self.assertPandasEqual(pdf, df_from_pandas.toPandas())
     
    +    def test_toPandas_batch_order(self):
    +        df = self.spark.range(64, numPartitions=8).toDF("a")
    +        with 
self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 4}):
    +            pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
    +            self.assertPandasEqual(pdf, pdf_arrow)
    --- End diff --
    
    This looks pretty similar to the kind of test case we could verify with 
something like hypothesis. Integrating hypothesis is probably too much work, 
but we could at least explore num partitions space in a loop quickly here. 
Would that help do you think @felixcheung ?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...

Reply via email to