[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-11-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20788


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-11-20 Thread DylanGuedes
Github user DylanGuedes commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r235183021
  
--- Diff: python/pyspark/sql/tests/test_dataframe.py ---
@@ -375,6 +375,19 @@ def test_generic_hints(self):
 plan = df1.join(df2.hint("broadcast"), 
"id")._jdf.queryExecution().executedPlan()
 self.assertEqual(1, plan.toString().count("BroadcastHashJoin"))
 
+# add tests for SPARK-23647 (test more types for hint)
+def test_extended_hint_types(self):
+from pyspark.sql import DataFrame
+
+df = self.spark.range(10e10).toDF("id")
+such_a_nice_list = ["itworks1", "itworks2", "itworks3"]
--- End diff --

@HyukjinKwon I tested locally and you are right, it is kinda weird, so I 
just removed it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r233678942
  
--- Diff: python/pyspark/sql/tests/test_dataframe.py ---
@@ -375,6 +375,19 @@ def test_generic_hints(self):
 plan = df1.join(df2.hint("broadcast"), 
"id")._jdf.queryExecution().executedPlan()
 self.assertEqual(1, plan.toString().count("BroadcastHashJoin"))
 
+# add tests for SPARK-23647 (test more types for hint)
+def test_extended_hint_types(self):
+from pyspark.sql import DataFrame
+
+df = self.spark.range(10e10).toDF("id")
+such_a_nice_list = ["itworks1", "itworks2", "itworks3"]
--- End diff --

@DylanGuedes, what do we get if `dict` is given? looks not being tested. If 
that produces a weird result, we can disallow it here for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-03-16 Thread DylanGuedes
Github user DylanGuedes commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r175181523
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,12 @@ def hint(self, name, *parameters):
 if not isinstance(name, str):
 raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
 
+allowed_types = (basestring, list, float, int)
--- End diff --

Good catch - the Scala API accepts any, but since i don't know about the 
python->scala conversion, I can't answer if it is necessary to restrict to 
primite python types. Whatever, I get noticed that even dict works, so I 
extended it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-03-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r175156065
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,12 @@ def hint(self, name, *parameters):
 if not isinstance(name, str):
 raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
 
+allowed_types = (basestring, list, float, int)
--- End diff --

If we allow `list`, should we check the types recursively?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-03-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r175008195
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
 if not isinstance(name, str):
 raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
 
+allowed_types = (basestring, list, float, int)
--- End diff --

yap ok - 
https://github.com/DylanGuedes/spark/blob/f5de3b8fce22a2d0c55b86066471579c5720ec97/python/pyspark/sql/dataframe.py#L22

no, this should be good


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-03-14 Thread DylanGuedes
Github user DylanGuedes commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r174580902
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
 if not isinstance(name, str):
 raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
 
+allowed_types = (basestring, list, float, int)
--- End diff --

At the start of the file, basestring, unicode and str becomes the same 
thing, so I don't think that the Python version can cause trouble at all.

Should I make tests for both types then? Any suggestion of how to test it? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-03-14 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r174367197
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
 if not isinstance(name, str):
 raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
 
+allowed_types = (basestring, list, float, int)
--- End diff --

I think the point is that `basestring` is in python2 
https://docs.python.org/2/library/functions.html#basestring

but not in python 3 
https://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

we should test this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-03-13 Thread DylanGuedes
Github user DylanGuedes commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r174208501
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
 if not isinstance(name, str):
 raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
 
+allowed_types = (basestring, list, float, int)
--- End diff --

It looks like Scala can handle unicode, so basestring looks correct. What 
you guys think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-03-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r173682841
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
 if not isinstance(name, str):
 raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
 
+allowed_types = (basestring, list, float, int)
--- End diff --

Should it be `basestring` or `str` as previously here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org