[GitHub] [spark] HyukjinKwon commented on a change in pull request #25515: [SPARK-27659][PYTHON] Allow PySpark to prefetch during toLocalIterator

GitBox Thu, 22 Aug 2019 17:52:01 -0700

HyukjinKwon commented on a change in pull request #25515: [SPARK-27659][PYTHON] 
Allow PySpark to prefetch during toLocalIterator
URL: https://github.com/apache/spark/pull/25515#discussion_r316939353


 ##########
 File path: python/pyspark/tests/test_rdd.py
 ##########
 @@ -68,6 +70,27 @@ def test_to_localiterator(self):
         it2 = rdd2.toLocalIterator()
         self.assertEqual([1, 2, 3], sorted(it2))
 
+    def test_to_localiterator_prefetch(self):
+        # Test that we fetch the next partition in parallel
+        # We do this by returning the current time and:
+        # reading the first elem, waiting, and reading the second elem
+        # If not in parallel then these would be at different times
+        # But since they are being computed in parallel we see the time
+        # is "close enough" to the same.
+        rdd = self.sc.parallelize(range(2), 2)
+        times1 = rdd.map(lambda x: datetime.now())
+        times2 = rdd.map(lambda x: datetime.now())
+        timesIterPrefetch = times1.toLocalIterator(prefetchPartitions=True)
 
 Review comment:
   Shall we stick to underscore naming rule?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25515: [SPARK-27659][PYTHON] Allow PySpark to prefetch during toLocalIterator

Reply via email to