[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21437


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-31 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r192192843
  
--- Diff: python/pyspark/tests.py ---
@@ -543,6 +543,20 @@ def test_tc_on_driver(self):
 tc = TaskContext.get()
 self.assertTrue(tc is None)
 
+def test_get_local_property(self):
+"""Verify that local properties set on the driver are available in 
TaskContext."""
+key = "testkey"
+value = "testvalue"
+self.sc.setLocalProperty(key, value)
+try:
+rdd = self.sc.parallelize(range(1), 1)
+prop1 = rdd.map(lambda x: 
TaskContext.get().getLocalProperty(key)).collect()[0]
--- End diff --

I can address that in a follow-up PR that I am working on for SPARK-24396


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-31 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r192174378
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

Sounds good to me since this matches the Scala API


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r192070134
  
--- Diff: python/pyspark/tests.py ---
@@ -543,6 +543,20 @@ def test_tc_on_driver(self):
 tc = TaskContext.get()
 self.assertTrue(tc is None)
 
+def test_get_local_property(self):
+"""Verify that local properties set on the driver are available in 
TaskContext."""
+key = "testkey"
+value = "testvalue"
+self.sc.setLocalProperty(key, value)
+try:
+rdd = self.sc.parallelize(range(1), 1)
+prop1 = rdd.map(lambda x: 
TaskContext.get().getLocalProperty(key)).collect()[0]
--- End diff --

`x` -> `_`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191969840
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

Makes sense but +1 for leaving it out since I either don't know how 
commonly it will be used for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191881226
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

FWIW we can always evolve what you have here towards the 
`getLocalProperty(key, default=None)` case; that evolution would maintain 
behavior and source compatibility. Therefore maybe it's fine to defer that 
until later if we're not sure whether that variant is useful.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-30 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191879730
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

I am trying to match the Java TaskContext API, which only has a 
`TaskContext.getLocalProperty(key)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-30 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191877900
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

The Java / Scala equivalents of this API return `null` for missing keys, so 
on the one hand returning `None` is kinda consistent with that.

On the other hand, consider a case where you want to specify an alternative 
in case a key is not set:

With this API, you might think of doing something like 
`tc.getLocalProperty('key') or 'defaultValue'`, which potentially could be a 
problem in case a non-None key could have a `False`-y value. I suppose we're 
only dealing with strings here, though, and that'd only happen for empty 
strings. If we allowed non-strings to be returned here, though, then we'd have 
problems if we're returning values like `0`. For that case, having a 
`getLocalProperty('key', 'defaultValue')` is a bit more useful.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191624414
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

No, it seems not. Shall we change it to `get(key)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191622793
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

Wait .. it's a dictionary. Does a dictionary's `get` has `default` keyword 
argument?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191622279
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

yeah. I added the default=None to make it super obvious. i see no harm in 
making the code more intuitive.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191621228
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

get(key) would return `None` tho if it's missing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191620990
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

Just get(key) returns `None` if missing tho ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191611421
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

Thanks @BryanCutler for catching this stupid stuff. not at comfortable with 
python. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191596607
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

Right. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191589537
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

If it's missing it will result in a `KeyError`, maybe you want `return 
self._localProperties.get(key)` which returns `None` as the default?  That 
seems better to me too, although you might want to add an optional `default` 
value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191548734
  
--- Diff: python/pyspark/tests.py ---
@@ -543,6 +543,15 @@ def test_tc_on_driver(self):
 tc = TaskContext.get()
 self.assertTrue(tc is None)
 
+def test_get_local_property(self):
+"""Verify that local properties set on the driver are available in 
TaskContext."""
+self.sc.setLocalProperty("testkey", "testvalue")
--- End diff --

there isnt a clear way to clear local property in the spark context? what 
do you propose is the right approach here? set the local property as "None"?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191509834
  
--- Diff: python/pyspark/tests.py ---
@@ -543,6 +543,15 @@ def test_tc_on_driver(self):
 tc = TaskContext.get()
 self.assertTrue(tc is None)
 
+def test_get_local_property(self):
+"""Verify that local properties set on the driver are available in 
TaskContext."""
+self.sc.setLocalProperty("testkey", "testvalue")
--- End diff --

We should cleanup the property after the test to avoid affecting to other 
tests?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-25 Thread tdas
GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/21437

[SPARK-24397][PYSPARK] Added TaskContext.getLocalProperties in Python

## What changes were proposed in this pull request?

Added.

## How was this patch tested?

New test added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-24397

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21437.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21437


commit ac0270aa60a3f98abfc534be49c759fbeb6c901d
Author: Tathagata Das 
Date:   2018-05-26T00:24:23Z

Added




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org