HyukjinKwon commented on a change in pull request #25515: [SPARK-27659][PYTHON] 
Allow PySpark to prefetch during toLocalIterator
URL: https://github.com/apache/spark/pull/25515#discussion_r316939353
 
 

 ##########
 File path: python/pyspark/tests/test_rdd.py
 ##########
 @@ -68,6 +70,27 @@ def test_to_localiterator(self):
         it2 = rdd2.toLocalIterator()
         self.assertEqual([1, 2, 3], sorted(it2))
 
+    def test_to_localiterator_prefetch(self):
+        # Test that we fetch the next partition in parallel
+        # We do this by returning the current time and:
+        # reading the first elem, waiting, and reading the second elem
+        # If not in parallel then these would be at different times
+        # But since they are being computed in parallel we see the time
+        # is "close enough" to the same.
+        rdd = self.sc.parallelize(range(2), 2)
+        times1 = rdd.map(lambda x: datetime.now())
+        times2 = rdd.map(lambda x: datetime.now())
+        timesIterPrefetch = times1.toLocalIterator(prefetchPartitions=True)
 
 Review comment:
   Shall we stick to underscore naming rule?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to