[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2017-01-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16119


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r93598376
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1851,6 +1851,71 @@ def test_hivecontext(self):
 self.assertIn("default", out.decode('utf-8'))
 self.assertTrue(os.path.exists(metastore_path))
 
+def test_hivecontext_create_dataframe(self):
--- End diff --

I think we can just create two `SQLContext` and compare if they share the 
same `SparkSession` instance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-22 Thread vijoshi
Github user vijoshi commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r93596548
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1851,6 +1851,71 @@ def test_hivecontext(self):
 self.assertIn("default", out.decode('utf-8'))
 self.assertTrue(os.path.exists(metastore_path))
 
+def test_hivecontext_create_dataframe(self):
--- End diff --

did you mean have the same scenario where multiple SQLContext objects are 
constructed but the test success criteria is to verify that no more than a 
single `SparkSession` object ever gets created?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r93596371
  
--- Diff: python/pyspark/sql/context.py ---
@@ -72,8 +72,13 @@ def __init__(self, sparkContext, sparkSession=None, 
jsqlContext=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
 if sparkSession is None:
-sparkSession = SparkSession(sparkContext)
+if sparkContext is SparkContext._active_spark_context:
+sparkSession = SparkSession.builder.getOrCreate()
--- End diff --

I found a relate discussion about multiple `SparkContext` at 
http://stackoverflow.com/questions/28259756/how-to-create-multiple-sparkcontexts-in-a-console

I think the usage of multiple contexts is discouraged, even you could do it 
in Scala/Java. Specially, you can't have multiple `SparkContext` in Python. So 
I am not worrying that another `SparkContext` other than the active one is 
passed in when constructing `SQLContext`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-22 Thread vijoshi
Github user vijoshi commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r93594980
  
--- Diff: python/pyspark/sql/context.py ---
@@ -72,8 +72,13 @@ def __init__(self, sparkContext, sparkSession=None, 
jsqlContext=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
 if sparkSession is None:
-sparkSession = SparkSession(sparkContext)
+if sparkContext is SparkContext._active_spark_context:
+sparkSession = SparkSession.builder.getOrCreate()
+else:
--- End diff --

@cloud-fan refer my comments above. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-22 Thread vijoshi
Github user vijoshi commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r93594802
  
--- Diff: python/pyspark/sql/context.py ---
@@ -72,8 +72,13 @@ def __init__(self, sparkContext, sparkSession=None, 
jsqlContext=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
 if sparkSession is None:
-sparkSession = SparkSession(sparkContext)
+if sparkContext is SparkContext._active_spark_context:
+sparkSession = SparkSession.builder.getOrCreate()
--- End diff --


1. the `SQLContext __init__` suggested that any sparkSession object could 
be injected. And the `SparkSession __init__` suggested any sparkContext could 
be injected. If not so - I thought they would all be invoking `getOrCreate()` 
to get instances of each other.  

2.  I found a few hits around this property - 
`spark.driver.allowMultipleContexts` which though not a python property - 
showed that there may be cases where someone (for testing etc) wanted the 
ability to bypass the one sparkContext limitation. 

These two combined I was not 100% sure someone couldn't send in different 
sparkContext objects / sparkSession objects tied to different sparkContexts 
when constructing an SQLContext. 

I will admit I am not familiar with Python side of things, so I hoped this 
PR review would be the best way to get this clarified :). 

If everyone agrees that we needn't worry about what sparkSession is passed 
in when constructing SQLContext, then I can get rid of the `if sparkContext is 
SparkContext._active_spark_context:` check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r93576154
  
--- Diff: python/pyspark/sql/context.py ---
@@ -72,8 +72,13 @@ def __init__(self, sparkContext, sparkSession=None, 
jsqlContext=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
 if sparkSession is None:
-sparkSession = SparkSession(sparkContext)
+if sparkContext is SparkContext._active_spark_context:
+sparkSession = SparkSession.builder.getOrCreate()
+else:
--- End diff --

when will hit the else branch? The if-else seems the only difference 
between this PR and https://github.com/apache/spark/pull/16369


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r93575937
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1851,6 +1851,71 @@ def test_hivecontext(self):
 self.assertIn("default", out.decode('utf-8'))
 self.assertTrue(os.path.exists(metastore_path))
 
+def test_hivecontext_create_dataframe(self):
--- End diff --

can we write a more low-level unit test to check if we create 2 
`SparkSession`s?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r93556600
  
--- Diff: python/pyspark/sql/context.py ---
@@ -72,8 +72,13 @@ def __init__(self, sparkContext, sparkSession=None, 
jsqlContext=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
 if sparkSession is None:
-sparkSession = SparkSession(sparkContext)
+if sparkContext is SparkContext._active_spark_context:
+sparkSession = SparkSession.builder.getOrCreate()
--- End diff --

So you can see in the pr, I simply do `sparkSession = 
SparkSession.builder.getOrCreate()` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r93556537
  
--- Diff: python/pyspark/sql/context.py ---
@@ -72,8 +72,13 @@ def __init__(self, sparkContext, sparkSession=None, 
jsqlContext=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
 if sparkSession is None:
-sparkSession = SparkSession(sparkContext)
+if sparkContext is SparkContext._active_spark_context:
+sparkSession = SparkSession.builder.getOrCreate()
--- End diff --

Yeah, I've noticed this when doing the other pr. But I think we can have 
only one `SparkContext`. Actually when you try to create another 
`SparkContext`, you will see error. So I don't know why we need to pass in 
other SparkContext which is not the active one.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-21 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r93456555
  
--- Diff: python/pyspark/sql/context.py ---
@@ -72,8 +72,13 @@ def __init__(self, sparkContext, sparkSession=None, 
jsqlContext=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
 if sparkSession is None:
-sparkSession = SparkSession(sparkContext)
+if sparkContext is SparkContext._active_spark_context:
+sparkSession = SparkSession.builder.getOrCreate()
--- End diff --

@viirya since you opened a similar change, what do you think of this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-13 Thread vijoshi
Github user vijoshi commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r92333556
  
--- Diff: python/pyspark/sql/context.py ---
@@ -72,8 +72,13 @@ def __init__(self, sparkContext, sparkSession=None, 
jsqlContext=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
 if sparkSession is None:
-sparkSession = SparkSession(sparkContext)
+if sparkContext is SparkContext._active_spark_context:
+sparkSession = SparkSession.builder.getOrCreate()
--- End diff --

okay - I wanted to avoid adding code to the new SparkSession class to 
handle this compatibility issue arising out of the now deprecated SQLContext 
class. Looks like the  python impl of SparkSession builder does not allow a 
SparkContext to be passed in. Do we want to change the public builder interface 
for this ?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility...

2016-12-13 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16119#discussion_r92134494
  
--- Diff: python/pyspark/sql/context.py ---
@@ -72,8 +72,13 @@ def __init__(self, sparkContext, sparkSession=None, 
jsqlContext=None):
 self._sc = sparkContext
 self._jsc = self._sc._jsc
 self._jvm = self._sc._jvm
+
 if sparkSession is None:
-sparkSession = SparkSession(sparkContext)
+if sparkContext is SparkContext._active_spark_context:
+sparkSession = SparkSession.builder.getOrCreate()
--- End diff --

I don't know enough to usefully review, but is the better fix to call the 
getOrCreate that lets you pass a context and use that for both cases? I don't 
know if it is the same behavior but seems more natural to use the logic in 
SparkSession for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org