[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20788 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r235183021 --- Diff: python/pyspark/sql/tests/test_dataframe.py --- @@ -375,6 +375,19 @@ def test_generic_hints(self): plan = df1.join(df2.hint("broadcast"), "id")._jdf.queryExecution().executedPlan() self.assertEqual(1, plan.toString().count("BroadcastHashJoin")) +# add tests for SPARK-23647 (test more types for hint) +def test_extended_hint_types(self): +from pyspark.sql import DataFrame + +df = self.spark.range(10e10).toDF("id") +such_a_nice_list = ["itworks1", "itworks2", "itworks3"] --- End diff -- @HyukjinKwon I tested locally and you are right, it is kinda weird, so I just removed it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r233678942 --- Diff: python/pyspark/sql/tests/test_dataframe.py --- @@ -375,6 +375,19 @@ def test_generic_hints(self): plan = df1.join(df2.hint("broadcast"), "id")._jdf.queryExecution().executedPlan() self.assertEqual(1, plan.toString().count("BroadcastHashJoin")) +# add tests for SPARK-23647 (test more types for hint) +def test_extended_hint_types(self): +from pyspark.sql import DataFrame + +df = self.spark.range(10e10).toDF("id") +such_a_nice_list = ["itworks1", "itworks2", "itworks3"] --- End diff -- @DylanGuedes, what do we get if `dict` is given? looks not being tested. If that produces a weird result, we can disallow it here for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r175181523 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,12 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed_types = (basestring, list, float, int) --- End diff -- Good catch - the Scala API accepts any, but since i don't know about the python->scala conversion, I can't answer if it is necessary to restrict to primite python types. Whatever, I get noticed that even dict works, so I extended it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r175156065 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,12 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed_types = (basestring, list, float, int) --- End diff -- If we allow `list`, should we check the types recursively? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r175008195 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,11 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed_types = (basestring, list, float, int) --- End diff -- yap ok - https://github.com/DylanGuedes/spark/blob/f5de3b8fce22a2d0c55b86066471579c5720ec97/python/pyspark/sql/dataframe.py#L22 no, this should be good --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r174580902 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,11 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed_types = (basestring, list, float, int) --- End diff -- At the start of the file, basestring, unicode and str becomes the same thing, so I don't think that the Python version can cause trouble at all. Should I make tests for both types then? Any suggestion of how to test it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r174367197 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,11 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed_types = (basestring, list, float, int) --- End diff -- I think the point is that `basestring` is in python2 https://docs.python.org/2/library/functions.html#basestring but not in python 3 https://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit we should test this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r174208501 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,11 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed_types = (basestring, list, float, int) --- End diff -- It looks like Scala can handle unicode, so basestring looks correct. What you guys think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r173682841 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,11 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed_types = (basestring, list, float, int) --- End diff -- Should it be `basestring` or `str` as previously here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org