Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-15 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2058148979 Because I called `PySparkCurrentOrigin` directly on the `DataFrameQueryContext` without utilizing `withOrigin` in the initial implementation. I realized it from recent review from the

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2058107235 > perfectly sync the data between two separately operating TheadLocal, CurrentOrigin and PySparkCurrentOrigin. Why is that? -- This is an automated message from the Apache

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-15 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2056680916 > Let me give it a try and create a PR to refactoring the current structure, and ping you guys. Created https://github.com/apache/spark/pull/46063. -- This is an automated

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-15 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2056611480 The difficulty with the previous method was that it was not easy to perfectly sync the data between two separately operating TheadLocal, `CurrentOrigin` and `PySparkCurrentOrigin`.

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1565491650 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -171,6 +171,29 @@ class Column(val expr: Expression) extends Logging { Column.fn(name, this,

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1565491650 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -171,6 +171,29 @@ class Column(val expr: Expression) extends Logging { Column.fn(name, this,

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2056191793 Let's clarify why https://github.com/apache/spark/pull/45377#issuecomment-2041315119 happens before we move further. That shouldn't happen from my understanding. If we go with

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2048796376 Thanks @cloud-fan @ueshin @HyukjinKwon @xinrong-meng for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
cloud-fan closed pull request #45377: [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors URL: https://github.com/apache/spark/pull/45377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2048784678 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1559135258 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite: String

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1559114967 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -171,6 +171,26 @@ class Column(val expr: Expression) extends Logging { Column.fn(name, this,

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1559115572 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite: String

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1559115572 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite: String

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558921090 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -171,6 +171,26 @@ class Column(val expr: Expression) extends Logging { Column.fn(name, this,

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-10 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558919561 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558840908 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -111,6 +111,26 @@ package object sql { } } + private[sql] def withOrigin[T]( +

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558831279 ## python/pyspark/sql/column.py: ## @@ -174,16 +175,48 @@ def _bin_op( ["Column", Union["Column", "LiteralType", "DecimalLiteral", "DateTimeLiteral"]], "Column"

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558766748 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite: String

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558765519 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -111,6 +111,26 @@ package object sql { } } + private[sql] def withOrigin[T]( +

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558765078 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558761829 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -111,6 +111,26 @@ package object sql { } } + private[sql] def withOrigin[T]( +

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558760260 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -111,6 +111,26 @@ package object sql { } } + private[sql] def withOrigin[T]( +

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558755161 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -111,6 +111,26 @@ package object sql { } } + private[sql] def withOrigin[T]( +

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558758511 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -111,6 +111,26 @@ package object sql { } } + private[sql] def withOrigin[T]( +

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558755161 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -111,6 +111,26 @@ package object sql { } } + private[sql] def withOrigin[T]( +

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558755161 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -111,6 +111,26 @@ package object sql { } } + private[sql] def withOrigin[T]( +

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558755161 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -111,6 +111,26 @@ package object sql { } } + private[sql] def withOrigin[T]( +

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
xinrong-meng commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1558128891 ## python/pyspark/errors/exceptions/captured.py: ## @@ -379,5 +379,13 @@ def fragment(self) -> str: def callSite(self) -> str: return

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1557621216 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite: String

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1557621216 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite: String

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1557621216 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite: String

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1557615484 ## python/pyspark/sql/column.py: ## @@ -174,16 +175,48 @@ def _bin_op( ["Column", Union["Column", "LiteralType", "DecimalLiteral", "DateTimeLiteral"]], "Column"

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1557593182 ## python/pyspark/sql/column.py: ## @@ -174,16 +175,48 @@ def _bin_op( ["Column", Union["Column", "LiteralType", "DecimalLiteral", "DateTimeLiteral"]],

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1557591426 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -111,6 +111,26 @@ package object sql { } } + private[sql] def withOrigin[T]( +

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1557589150 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-09 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1557589150 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -134,7 +134,9 @@ case class SQLQueryContext( override def callSite:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-08 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2044191299 @cloud-fan @ueshin I believe now the previous comments are all resolved, and I also added more tests accordingly. Could you take a look when you find some time? -- This is an

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-08 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1556624078 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -699,6 +699,13 @@ class Column(val expr: Expression) extends Logging { */ def plus(other:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-08 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1555465767 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -699,6 +699,13 @@ class Column(val expr: Expression) extends Logging { */ def plus(other:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-08 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1555359203 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -699,6 +699,13 @@ class Column(val expr: Expression) extends Logging { */ def plus(other:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-07 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2041845268 > My idea: we add new Column creation methods for PySpark, which takes python call site information. I'm not 100% sure if it is work, but it sounds worth enough to try. Let me

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-07 Thread via GitHub
cloud-fan commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2041828693 @itholic what if we don't use thread local? IIUC, PySpark calls JVM methods to build the column instances at the end. On the JVM side, we wrap code with `withOrigin` to capture the

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-06 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2041315119 Hmm... I faced some problem to resolve this case. PySpark provide logs to JVM at the time an expression is declared, but the actual execution order on the JVM side could be

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-06 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2041313068 Hmm... I faced some problem to resolve this case. PySpark provide logs to JVM at the time an expression is declared, but the actual execution order on the JVM side could be

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-06 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2041298301 Nice catching  Let me address this case as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-06 Thread via GitHub
ueshin commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2041273571 I'm afraid I still see a weird behavior: ```py >>> spark.conf.set("spark.sql.ansi.enabled", True) >>> df = spark.range(10) >>> a = df.id / 10 >>> b = df.id / 0 >>>

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-05 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1553150727 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -825,6 +828,231 @@ def test_duplicate_field_names(self): self.assertEqual(df.schema, schema)

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-05 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2039207285 Thanks @cloud-fan @xinrong-meng @ueshin for the additional comments! Just resolved all comments. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-05 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1553150727 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -825,6 +828,231 @@ def test_duplicate_field_names(self): self.assertEqual(df.schema, schema)

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1552967079 ## python/pyspark/errors/exceptions/captured.py: ## @@ -379,5 +379,13 @@ def fragment(self) -> str: def callSite(self) -> str: return

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1552966218 ## python/pyspark/errors/exceptions/captured.py: ## @@ -379,5 +379,13 @@ def fragment(self) -> str: def callSite(self) -> str: return

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1552966218 ## python/pyspark/errors/exceptions/captured.py: ## @@ -379,5 +379,13 @@ def fragment(self) -> str: def callSite(self) -> str: return

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1552964769 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -165,6 +172,20 @@ case class DataFrameQueryContext(stackTrace:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2039001191 > ``` > == DataFrame == > "divide" was called from > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > == PySpark call site == >

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2039000576 Thanks @ueshin for spotting the negative case! Let me add it to the test case and fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
ueshin commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2038202349 When `df.select` with multiple expressions, both `fragment` and `callSite` seem to be different from Scala's. ```py >>> spark.conf.set("spark.sql.ansi.enabled", True) >>> df =

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
xinrong-meng commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1552181315 ## python/pyspark/errors/exceptions/captured.py: ## @@ -379,5 +379,13 @@ def fragment(self) -> str: def callSite(self) -> str: return

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
cloud-fan commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2036729930 ``` == DataFrame == "divide" was called from java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) == PySpark call site == "divide" was

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1551367597 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -165,6 +172,20 @@ case class DataFrameQueryContext(stackTrace:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1551362982 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -165,6 +172,20 @@ case class DataFrameQueryContext(stackTrace:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1551361278 ## python/pyspark/errors/exceptions/captured.py: ## @@ -379,5 +379,13 @@ def fragment(self) -> str: def callSite(self) -> str: return

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-04 Thread via GitHub
cloud-fan commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1551358514 ## python/pyspark/errors/exceptions/captured.py: ## @@ -379,5 +379,13 @@ def fragment(self) -> str: def callSite(self) -> str: return

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-03 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1549517320 ## python/pyspark/sql/column.py: ## @@ -195,6 +197,7 @@ def _(self: "Column", other: Union["LiteralType", "DecimalLiteral"]) -> "Column" return _

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-03 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1549517320 ## python/pyspark/sql/column.py: ## @@ -195,6 +197,7 @@ def _(self: "Column", other: Union["LiteralType", "DecimalLiteral"]) -> "Column" return _

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-03 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1549476644 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,61 @@ def get_message_template(self, error_class: str) -> str: message_template =

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-03 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1549475904 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,61 @@ def get_message_template(self, error_class: str) -> str: message_template =

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-03 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1549469336 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,61 @@ def get_message_template(self, error_class: str) -> str: message_template =

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-03 Thread via GitHub
HyukjinKwon commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1549242008 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,61 @@ def get_message_template(self, error_class: str) -> str: message_template =

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-03 Thread via GitHub
HyukjinKwon commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1549240235 ## python/pyspark/sql/column.py: ## @@ -195,6 +197,7 @@ def _(self: "Column", other: Union["LiteralType", "DecimalLiteral"]) -> "Column" return _

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-03 Thread via GitHub
HyukjinKwon commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1549232872 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,61 @@ def get_message_template(self, error_class: str) -> str: message_template =

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-03 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1548983997 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -825,6 +828,172 @@ def test_duplicate_field_names(self): self.assertEqual(df.schema, schema)

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-03 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1548983997 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -825,6 +828,172 @@ def test_duplicate_field_names(self): self.assertEqual(df.schema, schema)

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-02 Thread via GitHub
HyukjinKwon commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2033393150 cc @cloud-fan too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-02 Thread via GitHub
HyukjinKwon commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1548814091 ## python/pyspark/testing/utils.py: ## @@ -280,7 +282,14 @@ def check_error( exception: PySparkException, error_class: str,

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-02 Thread via GitHub
HyukjinKwon commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1548812966 ## python/pyspark/errors/exceptions/captured.py: ## @@ -379,5 +379,17 @@ def fragment(self) -> str: def callSite(self) -> str: return

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-02 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-200087 Added QueryContext testing for DataFrameContext and UTs. The CI failures seems not related. cc @HyukjinKwon FYI -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-02 Thread via GitHub
itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1547836649 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,62 @@ def get_message_template(self, error_class: str) -> str: message_template =