itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2058148979
Because I called `PySparkCurrentOrigin` directly on the
`DataFrameQueryContext` without utilizing `withOrigin` in the initial
implementation. I realized it from recent review from the
HyukjinKwon commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2058107235
> perfectly sync the data between two separately operating TheadLocal,
CurrentOrigin and PySparkCurrentOrigin.
Why is that?
--
This is an automated message from the Apache
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2056680916
> Let me give it a try and create a PR to refactoring the current structure,
and ping you guys.
Created https://github.com/apache/spark/pull/46063.
--
This is an automated
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2056611480
The difficulty with the previous method was that it was not easy to
perfectly sync the data between two separately operating TheadLocal,
`CurrentOrigin` and `PySparkCurrentOrigin`.
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1565491650
##
sql/core/src/main/scala/org/apache/spark/sql/Column.scala:
##
@@ -171,6 +171,29 @@ class Column(val expr: Expression) extends Logging {
Column.fn(name, this,
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1565491650
##
sql/core/src/main/scala/org/apache/spark/sql/Column.scala:
##
@@ -171,6 +171,29 @@ class Column(val expr: Expression) extends Logging {
Column.fn(name, this,
HyukjinKwon commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2056191793
Let's clarify why
https://github.com/apache/spark/pull/45377#issuecomment-2041315119 happens
before we move further. That shouldn't happen from my understanding.
If we go with
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2048796376
Thanks @cloud-fan @ueshin @HyukjinKwon @xinrong-meng for the review!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
cloud-fan closed pull request #45377: [SPARK-47274][PYTHON][SQL] Provide more
useful context for PySpark DataFrame API errors
URL: https://github.com/apache/spark/pull/45377
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
cloud-fan commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2048784678
thanks, merging to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1559135258
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -134,7 +134,9 @@ case class SQLQueryContext(
override def callSite: String
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1559114967
##
sql/core/src/main/scala/org/apache/spark/sql/Column.scala:
##
@@ -171,6 +171,26 @@ class Column(val expr: Expression) extends Logging {
Column.fn(name, this,
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1559115572
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -134,7 +134,9 @@ case class SQLQueryContext(
override def callSite: String
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1559115572
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -134,7 +134,9 @@ case class SQLQueryContext(
override def callSite: String
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558921090
##
sql/core/src/main/scala/org/apache/spark/sql/Column.scala:
##
@@ -171,6 +171,26 @@ class Column(val expr: Expression) extends Logging {
Column.fn(name, this,
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558919561
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -134,7 +134,9 @@ case class SQLQueryContext(
override def callSite:
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558840908
##
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##
@@ -111,6 +111,26 @@ package object sql {
}
}
+ private[sql] def withOrigin[T](
+
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558831279
##
python/pyspark/sql/column.py:
##
@@ -174,16 +175,48 @@ def _bin_op(
["Column", Union["Column", "LiteralType", "DecimalLiteral",
"DateTimeLiteral"]], "Column"
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558766748
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -134,7 +134,9 @@ case class SQLQueryContext(
override def callSite: String
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558765519
##
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##
@@ -111,6 +111,26 @@ package object sql {
}
}
+ private[sql] def withOrigin[T](
+
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558765078
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -134,7 +134,9 @@ case class SQLQueryContext(
override def callSite:
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558761829
##
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##
@@ -111,6 +111,26 @@ package object sql {
}
}
+ private[sql] def withOrigin[T](
+
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558760260
##
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##
@@ -111,6 +111,26 @@ package object sql {
}
}
+ private[sql] def withOrigin[T](
+
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558755161
##
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##
@@ -111,6 +111,26 @@ package object sql {
}
}
+ private[sql] def withOrigin[T](
+
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558758511
##
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##
@@ -111,6 +111,26 @@ package object sql {
}
}
+ private[sql] def withOrigin[T](
+
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558755161
##
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##
@@ -111,6 +111,26 @@ package object sql {
}
}
+ private[sql] def withOrigin[T](
+
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558755161
##
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##
@@ -111,6 +111,26 @@ package object sql {
}
}
+ private[sql] def withOrigin[T](
+
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558755161
##
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##
@@ -111,6 +111,26 @@ package object sql {
}
}
+ private[sql] def withOrigin[T](
+
xinrong-meng commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1558128891
##
python/pyspark/errors/exceptions/captured.py:
##
@@ -379,5 +379,13 @@ def fragment(self) -> str:
def callSite(self) -> str:
return
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1557621216
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -134,7 +134,9 @@ case class SQLQueryContext(
override def callSite: String
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1557621216
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -134,7 +134,9 @@ case class SQLQueryContext(
override def callSite: String
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1557621216
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -134,7 +134,9 @@ case class SQLQueryContext(
override def callSite: String
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1557615484
##
python/pyspark/sql/column.py:
##
@@ -174,16 +175,48 @@ def _bin_op(
["Column", Union["Column", "LiteralType", "DecimalLiteral",
"DateTimeLiteral"]], "Column"
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1557593182
##
python/pyspark/sql/column.py:
##
@@ -174,16 +175,48 @@ def _bin_op(
["Column", Union["Column", "LiteralType", "DecimalLiteral",
"DateTimeLiteral"]],
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1557591426
##
sql/core/src/main/scala/org/apache/spark/sql/package.scala:
##
@@ -111,6 +111,26 @@ package object sql {
}
}
+ private[sql] def withOrigin[T](
+
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1557589150
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -134,7 +134,9 @@ case class SQLQueryContext(
override def callSite:
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1557589150
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -134,7 +134,9 @@ case class SQLQueryContext(
override def callSite:
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2044191299
@cloud-fan @ueshin I believe now the previous comments are all resolved, and
I also added more tests accordingly.
Could you take a look when you find some time?
--
This is an
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1556624078
##
sql/core/src/main/scala/org/apache/spark/sql/Column.scala:
##
@@ -699,6 +699,13 @@ class Column(val expr: Expression) extends Logging {
*/
def plus(other:
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1555465767
##
sql/core/src/main/scala/org/apache/spark/sql/Column.scala:
##
@@ -699,6 +699,13 @@ class Column(val expr: Expression) extends Logging {
*/
def plus(other:
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1555359203
##
sql/core/src/main/scala/org/apache/spark/sql/Column.scala:
##
@@ -699,6 +699,13 @@ class Column(val expr: Expression) extends Logging {
*/
def plus(other:
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2041845268
> My idea: we add new Column creation methods for PySpark, which takes
python call site information.
I'm not 100% sure if it is work, but it sounds worth enough to try. Let me
cloud-fan commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2041828693
@itholic what if we don't use thread local? IIUC, PySpark calls JVM methods
to build the column instances at the end. On the JVM side, we wrap code with
`withOrigin` to capture the
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2041315119
Hmm... I faced some problem to resolve this case.
PySpark provide logs to JVM at the time an expression is declared,
but the actual execution order on the JVM side could be
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2041313068
Hmm... I faced some problem to resolve this case.
PySpark provide logs to JVM at the time an expression is declared,
but the actual execution order on the JVM side could be
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2041298301
Nice catching Let me address this case as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
ueshin commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2041273571
I'm afraid I still see a weird behavior:
```py
>>> spark.conf.set("spark.sql.ansi.enabled", True)
>>> df = spark.range(10)
>>> a = df.id / 10
>>> b = df.id / 0
>>>
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1553150727
##
python/pyspark/sql/tests/test_dataframe.py:
##
@@ -825,6 +828,231 @@ def test_duplicate_field_names(self):
self.assertEqual(df.schema, schema)
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2039207285
Thanks @cloud-fan @xinrong-meng @ueshin for the additional comments! Just
resolved all comments.
--
This is an automated message from the Apache Git Service.
To respond to the message,
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1553150727
##
python/pyspark/sql/tests/test_dataframe.py:
##
@@ -825,6 +828,231 @@ def test_duplicate_field_names(self):
self.assertEqual(df.schema, schema)
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1552967079
##
python/pyspark/errors/exceptions/captured.py:
##
@@ -379,5 +379,13 @@ def fragment(self) -> str:
def callSite(self) -> str:
return
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1552966218
##
python/pyspark/errors/exceptions/captured.py:
##
@@ -379,5 +379,13 @@ def fragment(self) -> str:
def callSite(self) -> str:
return
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1552966218
##
python/pyspark/errors/exceptions/captured.py:
##
@@ -379,5 +379,13 @@ def fragment(self) -> str:
def callSite(self) -> str:
return
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1552964769
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -165,6 +172,20 @@ case class DataFrameQueryContext(stackTrace:
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2039001191
> ```
> == DataFrame ==
> "divide" was called from
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
>
> == PySpark call site ==
>
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2039000576
Thanks @ueshin for spotting the negative case! Let me add it to the test
case and fix it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please
ueshin commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2038202349
When `df.select` with multiple expressions, both `fragment` and `callSite`
seem to be different from Scala's.
```py
>>> spark.conf.set("spark.sql.ansi.enabled", True)
>>> df =
xinrong-meng commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1552181315
##
python/pyspark/errors/exceptions/captured.py:
##
@@ -379,5 +379,13 @@ def fragment(self) -> str:
def callSite(self) -> str:
return
cloud-fan commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2036729930
```
== DataFrame ==
"divide" was called from
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
== PySpark call site ==
"divide" was
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1551367597
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -165,6 +172,20 @@ case class DataFrameQueryContext(stackTrace:
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1551362982
##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala:
##
@@ -165,6 +172,20 @@ case class DataFrameQueryContext(stackTrace:
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1551361278
##
python/pyspark/errors/exceptions/captured.py:
##
@@ -379,5 +379,13 @@ def fragment(self) -> str:
def callSite(self) -> str:
return
cloud-fan commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1551358514
##
python/pyspark/errors/exceptions/captured.py:
##
@@ -379,5 +379,13 @@ def fragment(self) -> str:
def callSite(self) -> str:
return
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1549517320
##
python/pyspark/sql/column.py:
##
@@ -195,6 +197,7 @@ def _(self: "Column", other: Union["LiteralType",
"DecimalLiteral"]) -> "Column"
return _
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1549517320
##
python/pyspark/sql/column.py:
##
@@ -195,6 +197,7 @@ def _(self: "Column", other: Union["LiteralType",
"DecimalLiteral"]) -> "Column"
return _
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1549476644
##
python/pyspark/errors/utils.py:
##
@@ -119,3 +124,61 @@ def get_message_template(self, error_class: str) -> str:
message_template =
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1549475904
##
python/pyspark/errors/utils.py:
##
@@ -119,3 +124,61 @@ def get_message_template(self, error_class: str) -> str:
message_template =
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1549469336
##
python/pyspark/errors/utils.py:
##
@@ -119,3 +124,61 @@ def get_message_template(self, error_class: str) -> str:
message_template =
HyukjinKwon commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1549242008
##
python/pyspark/errors/utils.py:
##
@@ -119,3 +124,61 @@ def get_message_template(self, error_class: str) -> str:
message_template =
HyukjinKwon commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1549240235
##
python/pyspark/sql/column.py:
##
@@ -195,6 +197,7 @@ def _(self: "Column", other: Union["LiteralType",
"DecimalLiteral"]) -> "Column"
return _
HyukjinKwon commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1549232872
##
python/pyspark/errors/utils.py:
##
@@ -119,3 +124,61 @@ def get_message_template(self, error_class: str) -> str:
message_template =
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1548983997
##
python/pyspark/sql/tests/test_dataframe.py:
##
@@ -825,6 +828,172 @@ def test_duplicate_field_names(self):
self.assertEqual(df.schema, schema)
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1548983997
##
python/pyspark/sql/tests/test_dataframe.py:
##
@@ -825,6 +828,172 @@ def test_duplicate_field_names(self):
self.assertEqual(df.schema, schema)
HyukjinKwon commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-2033393150
cc @cloud-fan too
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
HyukjinKwon commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1548814091
##
python/pyspark/testing/utils.py:
##
@@ -280,7 +282,14 @@ def check_error(
exception: PySparkException,
error_class: str,
HyukjinKwon commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1548812966
##
python/pyspark/errors/exceptions/captured.py:
##
@@ -379,5 +379,17 @@ def fragment(self) -> str:
def callSite(self) -> str:
return
itholic commented on PR #45377:
URL: https://github.com/apache/spark/pull/45377#issuecomment-200087
Added QueryContext testing for DataFrameContext and UTs. The CI failures
seems not related. cc @HyukjinKwon FYI
--
This is an automated message from the Apache Git Service.
To respond
itholic commented on code in PR #45377:
URL: https://github.com/apache/spark/pull/45377#discussion_r1547836649
##
python/pyspark/errors/utils.py:
##
@@ -119,3 +124,62 @@ def get_message_template(self, error_class: str) -> str:
message_template =
78 matches
Mail list logo