[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21437 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r192192843 --- Diff: python/pyspark/tests.py --- @@ -543,6 +543,20 @@ def test_tc_on_driver(self): tc = TaskContext.get() self.assertTrue(tc is None) +def test_get_local_property(self): +"""Verify that local properties set on the driver are available in TaskContext.""" +key = "testkey" +value = "testvalue" +self.sc.setLocalProperty(key, value) +try: +rdd = self.sc.parallelize(range(1), 1) +prop1 = rdd.map(lambda x: TaskContext.get().getLocalProperty(key)).collect()[0] --- End diff -- I can address that in a follow-up PR that I am working on for SPARK-24396 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r192174378 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- Sounds good to me since this matches the Scala API --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r192070134 --- Diff: python/pyspark/tests.py --- @@ -543,6 +543,20 @@ def test_tc_on_driver(self): tc = TaskContext.get() self.assertTrue(tc is None) +def test_get_local_property(self): +"""Verify that local properties set on the driver are available in TaskContext.""" +key = "testkey" +value = "testvalue" +self.sc.setLocalProperty(key, value) +try: +rdd = self.sc.parallelize(range(1), 1) +prop1 = rdd.map(lambda x: TaskContext.get().getLocalProperty(key)).collect()[0] --- End diff -- `x` -> `_` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191969840 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- Makes sense but +1 for leaving it out since I either don't know how commonly it will be used for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191881226 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- FWIW we can always evolve what you have here towards the `getLocalProperty(key, default=None)` case; that evolution would maintain behavior and source compatibility. Therefore maybe it's fine to defer that until later if we're not sure whether that variant is useful. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191879730 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- I am trying to match the Java TaskContext API, which only has a `TaskContext.getLocalProperty(key)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191877900 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- The Java / Scala equivalents of this API return `null` for missing keys, so on the one hand returning `None` is kinda consistent with that. On the other hand, consider a case where you want to specify an alternative in case a key is not set: With this API, you might think of doing something like `tc.getLocalProperty('key') or 'defaultValue'`, which potentially could be a problem in case a non-None key could have a `False`-y value. I suppose we're only dealing with strings here, though, and that'd only happen for empty strings. If we allowed non-strings to be returned here, though, then we'd have problems if we're returning values like `0`. For that case, having a `getLocalProperty('key', 'defaultValue')` is a bit more useful. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191624414 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- No, it seems not. Shall we change it to `get(key)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191622793 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- Wait .. it's a dictionary. Does a dictionary's `get` has `default` keyword argument? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191622279 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- yeah. I added the default=None to make it super obvious. i see no harm in making the code more intuitive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191621228 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- get(key) would return `None` tho if it's missing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191620990 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- Just get(key) returns `None` if missing tho .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191611421 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- Thanks @BryanCutler for catching this stupid stuff. not at comfortable with python. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191596607 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- Right. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191589537 --- Diff: python/pyspark/taskcontext.py --- @@ -88,3 +89,9 @@ def taskAttemptId(self): TaskAttemptID. """ return self._taskAttemptId + +def getLocalProperty(self, key): +""" +Get a local property set upstream in the driver, or None if it is missing. --- End diff -- If it's missing it will result in a `KeyError`, maybe you want `return self._localProperties.get(key)` which returns `None` as the default? That seems better to me too, although you might want to add an optional `default` value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191548734 --- Diff: python/pyspark/tests.py --- @@ -543,6 +543,15 @@ def test_tc_on_driver(self): tc = TaskContext.get() self.assertTrue(tc is None) +def test_get_local_property(self): +"""Verify that local properties set on the driver are available in TaskContext.""" +self.sc.setLocalProperty("testkey", "testvalue") --- End diff -- there isnt a clear way to clear local property in the spark context? what do you propose is the right approach here? set the local property as "None"? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21437#discussion_r191509834 --- Diff: python/pyspark/tests.py --- @@ -543,6 +543,15 @@ def test_tc_on_driver(self): tc = TaskContext.get() self.assertTrue(tc is None) +def test_get_local_property(self): +"""Verify that local properties set on the driver are available in TaskContext.""" +self.sc.setLocalProperty("testkey", "testvalue") --- End diff -- We should cleanup the property after the test to avoid affecting to other tests? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/21437 [SPARK-24397][PYSPARK] Added TaskContext.getLocalProperties in Python ## What changes were proposed in this pull request? Added. ## How was this patch tested? New test added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-24397 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21437.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21437 commit ac0270aa60a3f98abfc534be49c759fbeb6c901d Author: Tathagata Das Date: 2018-05-26T00:24:23Z Added --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org