[GitHub] [spark] HyukjinKwon commented on a change in pull request #29242: [SPARK-31448] [PYTHON] Fix storage level used in cache() in dataframe.py

2020-09-06 Thread GitBox


HyukjinKwon commented on a change in pull request #29242:
URL: https://github.com/apache/spark/pull/29242#discussion_r484197550



##
File path: python/pyspark/storagelevel.py
##
@@ -56,3 +56,5 @@ def __str__(self):
 StorageLevel.MEMORY_AND_DISK = StorageLevel(True, True, False, False)
 StorageLevel.MEMORY_AND_DISK_2 = StorageLevel(True, True, False, False, 2)
 StorageLevel.OFF_HEAP = StorageLevel(True, True, True, False, 1)
+

Review comment:
   Let's remove newline here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29242: [SPARK-31448] [PYTHON] Fix storage level used in cache() in dataframe.py

2020-09-06 Thread GitBox


cloud-fan commented on a change in pull request #29242:
URL: https://github.com/apache/spark/pull/29242#discussion_r484193839



##
File path: python/pyspark/sql/dataframe.py
##
@@ -678,13 +678,14 @@ def cache(self):
 return self
 
 @since(1.3)
-def persist(self, storageLevel=StorageLevel.MEMORY_AND_DISK):
+def persist(self, storageLevel=StorageLevel.MEMORY_AND_DISK_DESER):

Review comment:
   Now I see the confusion. In Scala, `MEMORY_AND_DISK` means 
`deserialized=true`, while in Python, `MEMORY_AND_DISK` means 
`deserialized=false`.

##
File path: python/pyspark/sql/dataframe.py
##
@@ -678,13 +678,14 @@ def cache(self):
 return self
 
 @since(1.3)
-def persist(self, storageLevel=StorageLevel.MEMORY_AND_DISK):
+def persist(self, storageLevel=StorageLevel.MEMORY_AND_DISK_DESER):
 """Sets the storage level to persist the contents of the 
:class:`DataFrame` across
 operations after the first time it is computed. This can only be used 
to assign
 a new storage level if the :class:`DataFrame` does not have a storage 
level set yet.
-If no storage level is specified defaults to (`MEMORY_AND_DISK`).
+If no storage level is specified defaults to (`MEMORY_AND_DISK_DESER`)
 
-.. note:: The default storage level has changed to `MEMORY_AND_DISK` 
to match Scala in 2.0.
+.. note:: The default storage level has changed to 
`MEMORY_AND_DISK_DESER` to match Scala
+in 2.0.

Review comment:
   `in 3.0`?

##
File path: python/pyspark/sql/dataframe.py
##
@@ -678,13 +678,14 @@ def cache(self):
 return self
 
 @since(1.3)
-def persist(self, storageLevel=StorageLevel.MEMORY_AND_DISK):
+def persist(self, storageLevel=StorageLevel.MEMORY_AND_DISK_DESER):
 """Sets the storage level to persist the contents of the 
:class:`DataFrame` across
 operations after the first time it is computed. This can only be used 
to assign
 a new storage level if the :class:`DataFrame` does not have a storage 
level set yet.
-If no storage level is specified defaults to (`MEMORY_AND_DISK`).
+If no storage level is specified defaults to (`MEMORY_AND_DISK_DESER`)
 
-.. note:: The default storage level has changed to `MEMORY_AND_DISK` 
to match Scala in 2.0.
+.. note:: The default storage level has changed to 
`MEMORY_AND_DISK_DESER` to match Scala
+in 2.0.

Review comment:
   `in 3.0`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29649: [SPARK-32779][SQL] Avoid using synchronized API of SessionCatalog in withClient flow, this leads to DeadLock

2020-09-06 Thread GitBox


HyukjinKwon commented on a change in pull request #29649:
URL: https://github.com/apache/spark/pull/29649#discussion_r484189882



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##
@@ -1329,8 +1329,7 @@ private[client] class Shim_v3_0 extends Shim_v2_3 {
   isSrcLocal: Boolean): Unit = {
 val session = SparkSession.getActiveSession
 assert(session.nonEmpty)
-val database = session.get.sessionState.catalog.getCurrentDatabase
-val table = hive.getTable(database, tableName)
+val table = hive.getTable(tableName)

Review comment:
   I think it's okay to don't change at least in this PR since this PR will 
likely be ported back. Minimised change here looks good.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sandeep-katta commented on a change in pull request #29649: [SPARK-32779][SQL] Avoid using synchronized API of SessionCatalog in withClient flow, this leads to DeadLock

2020-09-06 Thread GitBox


sandeep-katta commented on a change in pull request #29649:
URL: https://github.com/apache/spark/pull/29649#discussion_r484189537



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##
@@ -1329,8 +1329,7 @@ private[client] class Shim_v3_0 extends Shim_v2_3 {
   isSrcLocal: Boolean): Unit = {
 val session = SparkSession.getActiveSession
 assert(session.nonEmpty)
-val database = session.get.sessionState.catalog.getCurrentDatabase
-val table = hive.getTable(database, tableName)
+val table = hive.getTable(tableName)

Review comment:
   > If the question is why `Shim.loadPartition` doesn't take database as 
an argument, yes I think we can change to take. But looks like it's to match 
with `Hive.loadPartition`'s signature.
   
   +1 for this ,  I will update the API signature to take the database name





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29605: [SPARK-31511][SQL][2.4] Make BytesToBytesMap iterators thread-safe

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29605:
URL: https://github.com/apache/spark/pull/29605#issuecomment-688043820







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] KevinSmile edited a comment on pull request #29653: [SPARK-32804][Launcher] Fix run-example command builder bug

2020-09-06 Thread GitBox


KevinSmile edited a comment on pull request #29653:
URL: https://github.com/apache/spark/pull/29653#issuecomment-688031240


   I updated my patch code and maybe the new version patch is better to explain 
my point.
   
   The following snippet shows that **The first unrecognized arg is treated as 
the primaryResource**(use `Utils.resolveURI(opt).toString` to get app-jar in 
correct format ):
   
https://github.com/apache/spark/blob/f5360e761ef161f7e04526b59a4baf53f1cf8cd5/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L450-L470
   
   
   Yes, when you do `run-example`,  you just specify the class name(e.g. 
SparkPi),  and no need to specify the app-jar. So in backend code, 
`appResource` should be auto-find-and-set as the examples' main app jar (e.g. 
`./examples/jars/spark-examples_2.12-3.0.0.jar`), and then be added as an arg ( 
this arg is the so-called `first unrecognized arg` and will later be used as 
the `primaryResource`). 
   
   **So actually, app-jar arg is always needed in backend.**
   
   The bug is, the original backend code forgot to add app-jar, and so the 
first appArg(e.g. you use SparkPi example and set `100` as its first arg) will 
be treated as app-jar, check the` :: _jarUrl ::` part at the following snippet:
   
https://github.com/apache/spark/blob/f55694638d45f34ab91f6f6ec2066cbf7631f4af/core/src/main/scala/org/apache/spark/deploy/ClientArguments.scala#L74-L89
   
   In the original code, `spark-internal` is added, so `spark-internal` is is 
treated as primaryResource(aka app-jar).
   
![image](https://user-images.githubusercontent.com/17903517/92350504-ca593680-f10b-11ea-9ed2-898f8b108587.png)
   
   `spark-internal` is useless in this case, but useful in many other cases 
(covered in some unit tests), so I prefer not to omit it here, or omit it only 
when you do `run-example`.
   
   P.S.
   Standalone-cluster is a working mode.
   
https://github.com/apache/spark/blob/de44e9cfa07e32d293d68355916ac0dbd31d5c54/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L297
   
https://github.com/apache/spark/blob/de44e9cfa07e32d293d68355916ac0dbd31d5c54/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L685
   
![image](https://user-images.githubusercontent.com/17903517/92350908-e90bfd00-f10c-11ea-9791-8aa44da9c328.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29605: [SPARK-31511][SQL][2.4] Make BytesToBytesMap iterators thread-safe

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29605:
URL: https://github.com/apache/spark/pull/29605#issuecomment-688043820







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29605: [SPARK-31511][SQL][2.4] Make BytesToBytesMap iterators thread-safe

2020-09-06 Thread GitBox


SparkQA commented on pull request #29605:
URL: https://github.com/apache/spark/pull/29605#issuecomment-688043369


   **[Test build #128336 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128336/testReport)**
 for PR 29605 at commit 
[`0e39f7a`](https://github.com/apache/spark/commit/0e39f7adca90b84fd39e6461b8864eeb6cffb634).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


HyukjinKwon commented on a change in pull request #29591:
URL: https://github.com/apache/spark/pull/29591#discussion_r484188746



##
File path: examples/src/main/python/ml/estimator_transformer_param_example.py
##
@@ -54,7 +56,7 @@
 print(model1.extractParamMap())
 
 # We may alternatively specify parameters using a Python dictionary as a 
paramMap
-paramMap = {lr.maxIter: 20}
+paramMap: Dict[Param, Any] = {lr.maxIter: 20}

Review comment:
   I don't object to having the type hints in the examples but I guess we 
should type everything in examples if we want. We could discuss and do 
separately - we'd likely have to file multiple followup JIRAs under SPARK-32681.
   
   I was thinking it's good to port the PySpark type hints alone first with 
minimized diff to other codes in Spark in this JIRA/PR.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sandeep-katta commented on a change in pull request #29649: [SPARK-32779][SQL] Avoid using synchronized API of SessionCatalog in withClient flow, this leads to DeadLock

2020-09-06 Thread GitBox


sandeep-katta commented on a change in pull request #29649:
URL: https://github.com/apache/spark/pull/29649#discussion_r484188369



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##
@@ -1329,8 +1329,7 @@ private[client] class Shim_v3_0 extends Shim_v2_3 {
   isSrcLocal: Boolean): Unit = {
 val session = SparkSession.getActiveSession
 assert(session.nonEmpty)
-val database = session.get.sessionState.catalog.getCurrentDatabase
-val table = hive.getTable(database, tableName)
+val table = hive.getTable(tableName)

Review comment:
   `Hive.loadPartition` for Hive-3.1.x it takes `Table`, but other Hive 
version such as 2.1.x takes `tablename` as `string`
   
   Hive-3.1.0
   ```
 public Partition loadPartition(Path loadPath, Table tbl, Map partSpec,
 LoadFileType loadFileType, boolean inheritTableSpecs, boolean 
isSkewedStoreAsSubdir,
 boolean isSrcLocal, boolean isAcidIUDoperation, boolean 
hasFollowingStatsTask, Long writeId,
 int stmtId, boolean isInsertOverwrite)
   ```
   
   Hive-2.1.0
   
   ```
 public void loadPartition(Path loadPath, String tableName,
 Map partSpec, boolean replace,
 boolean inheritTableSpecs, boolean isSkewedStoreAsSubdir,
 boolean isSrcLocal, boolean isAcid, boolean hasFollowingStatsTask)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-09-06 Thread GitBox


cloud-fan commented on a change in pull request #29579:
URL: https://github.com/apache/spark/pull/29579#discussion_r484188484



##
File path: core/src/main/scala/org/apache/spark/deploy/DeployMessage.scala
##
@@ -188,7 +188,7 @@ private[deploy] object DeployMessages {
   }
 
   case class ExecutorUpdated(id: Int, state: ExecutorState, message: 
Option[String],
-exitStatus: Option[Int], workerLost: Boolean)
+exitStatus: Option[Int], workerHost: Option[String])

Review comment:
   and how about  `hostOfLostWorker`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-09-06 Thread GitBox


cloud-fan commented on a change in pull request #29579:
URL: https://github.com/apache/spark/pull/29579#discussion_r484187912



##
File path: core/src/main/scala/org/apache/spark/deploy/DeployMessage.scala
##
@@ -188,7 +188,7 @@ private[deploy] object DeployMessages {
   }
 
   case class ExecutorUpdated(id: Int, state: ExecutorState, message: 
Option[String],
-exitStatus: Option[Int], workerLost: Boolean)
+exitStatus: Option[Int], workerHost: Option[String])

Review comment:
   We should add some comments to explain what `workerHost` indicates here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29605: [SPARK-31511][SQL][2.4] Make BytesToBytesMap iterators thread-safe

2020-09-06 Thread GitBox


cloud-fan commented on pull request #29605:
URL: https://github.com/apache/spark/pull/29605#issuecomment-688041790


   BTW, @cxzl25 you can use empty git commit to trigger test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29605: [SPARK-31511][SQL][2.4] Make BytesToBytesMap iterators thread-safe

2020-09-06 Thread GitBox


cloud-fan commented on pull request #29605:
URL: https://github.com/apache/spark/pull/29605#issuecomment-688041687


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29649: [SPARK-32779][SQL] Avoid using synchronized API of SessionCatalog in withClient flow, this leads to DeadLock

2020-09-06 Thread GitBox


cloud-fan commented on a change in pull request #29649:
URL: https://github.com/apache/spark/pull/29649#discussion_r484187110



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##
@@ -1329,8 +1329,7 @@ private[client] class Shim_v3_0 extends Shim_v2_3 {
   isSrcLocal: Boolean): Unit = {
 val session = SparkSession.getActiveSession
 assert(session.nonEmpty)
-val database = session.get.sessionState.catalog.getCurrentDatabase
-val table = hive.getTable(database, tableName)
+val table = hive.getTable(tableName)

Review comment:
   The `Hive.loadPartition` API we are using here takes a `Table` instance 
as a parameter, not a single table name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29649: [SPARK-32779][SQL] Avoid using synchronized API of SessionCatalog in withClient flow, this leads to DeadLock

2020-09-06 Thread GitBox


HyukjinKwon commented on a change in pull request #29649:
URL: https://github.com/apache/spark/pull/29649#discussion_r484186429



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##
@@ -1329,8 +1329,7 @@ private[client] class Shim_v3_0 extends Shim_v2_3 {
   isSrcLocal: Boolean): Unit = {
 val session = SparkSession.getActiveSession
 assert(session.nonEmpty)
-val database = session.get.sessionState.catalog.getCurrentDatabase
-val table = hive.getTable(database, tableName)
+val table = hive.getTable(tableName)

Review comment:
   If the question is why `Shim.loadPartition` doesn't take database as an 
argument, yes I think we can change to take. But looks like it's to match with 
`Hive.loadPartition`'s signature.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29635: [SPARK-32785][SQL] Interval with dangling parts should not results null

2020-09-06 Thread GitBox


cloud-fan commented on pull request #29635:
URL: https://github.com/apache/spark/pull/29635#issuecomment-688039775


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #29635: [SPARK-32785][SQL] Interval with dangling parts should not results null

2020-09-06 Thread GitBox


cloud-fan closed pull request #29635:
URL: https://github.com/apache/spark/pull/29635


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29626: [SPARK-32777][SQL] Aggregation support aggregate function with multiple foldable expressions.

2020-09-06 Thread GitBox


cloud-fan commented on a change in pull request #29626:
URL: https://github.com/apache/spark/pull/29626#discussion_r484184462



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##
@@ -293,12 +297,16 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
   val operators = expressions.map { e =>
 val af = e.aggregateFunction
 val naf = patchAggregateFunctionChildren(af) { x =>
-  val condition = if (e.filter.isDefined) {
-e.filter.map(distinctAggFilterAttrLookup.get(_)).get
+  val condition = 
e.filter.map(distinctAggFilterAttrLookup.get(_)).getOrElse(None)

Review comment:
   nit: is `.getOrElse(None)` needed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zero323 commented on a change in pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


zero323 commented on a change in pull request #29591:
URL: https://github.com/apache/spark/pull/29591#discussion_r484184260



##
File path: examples/src/main/python/ml/estimator_transformer_param_example.py
##
@@ -54,7 +56,7 @@
 print(model1.extractParamMap())
 
 # We may alternatively specify parameters using a Python dictionary as a 
paramMap
-paramMap = {lr.maxIter: 20}
+paramMap: Dict[Param, Any] = {lr.maxIter: 20}

Review comment:
   It is. In such cases  Mypy infers the type of `paramMap` to be 
`Dict[Param, int]` and will fail on subsequent update:
   
   ```python
   paramMap.update({lr.regParam: 0.1, lr.threshold: 0.55}) 
   ```
   
   In general I'd prefer to keep tests against examples as these are the 
biggest chunks of "real-life" code (short of docstrings, but these create way 
more issues)  that we have and nicely highlight many possible problems.
   
   In `pyspark-stubs` I simply clone the repo and [patch 
examples](https://github.com/zero323/pyspark-stubs/blob/faad51cfd4b8971cda889f79c6857dd28ad62078/.travis.yml#L11-L12)
 before tests. 
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #27518: [SPARK-30768][SQL] Constraints inferred from inequality attributes

2020-09-06 Thread GitBox


wangyum commented on a change in pull request #27518:
URL: https://github.com/apache/spark/pull/27518#discussion_r484184236



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
##
@@ -78,6 +91,72 @@ trait ConstraintHelper {
 inferredConstraints -- constraints
   }
 
+  /**
+   * Infers an additional set of constraints from a given set of inequality 
constraints.
+   * For e.g., if an operator has constraints of the form (`a > b`, `b > 5`), 
this returns an
+   * additional constraint of the form `a > 5`.
+   */
+  def inferInequalityConstraints(constraints: Set[Expression]): 
Set[Expression] = {
+val binaryComparisons = constraints.filter {
+  case _: GreaterThan => true
+  case _: GreaterThanOrEqual => true
+  case _: LessThan => true
+  case _: LessThanOrEqual => true
+  case _: EqualTo => true

Review comment:
   For example: `cast(a as double) > cast(b as double) and cast(b as 
double) = 1`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29626: [SPARK-32777][SQL] Aggregation support aggregate function with multiple foldable expressions.

2020-09-06 Thread GitBox


cloud-fan commented on a change in pull request #29626:
URL: https://github.com/apache/spark/pull/29626#discussion_r484184121



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
##
@@ -216,20 +216,24 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
 val distinctAggs = aggExpressions.filter(_.isDistinct)
 
 // Extract distinct aggregate expressions.
-val distinctAggGroups = aggExpressions.filter(_.isDistinct).groupBy { e =>
-val unfoldableChildren = 
e.aggregateFunction.children.filter(!_.foldable).toSet
-if (unfoldableChildren.nonEmpty) {
-  // Only expand the unfoldable children
-  unfoldableChildren
-} else {
-  // If aggregateFunction's children are all foldable
-  // we must expand at least one of the children (here we take the 
first child),
-  // or If we don't, we will get the wrong result, for example:
-  // count(distinct 1) will be explained to count(1) after the rewrite 
function.
-  // Generally, the distinct aggregateFunction should not run
-  // foldable TypeCheck for the first child.
-  e.aggregateFunction.children.take(1).toSet
-}
+val distinctAggGroupMap = aggExpressions.filter(_.isDistinct).map { e =>
+  val unfoldableChildren = 
e.aggregateFunction.children.filter(!_.foldable).toSet
+  if (unfoldableChildren.nonEmpty) {
+// Only expand the unfoldable children
+e -> unfoldableChildren
+  } else {
+// If aggregateFunction's children are all foldable
+// we must expand at least one of the children (here we take the first 
child),
+// or If we don't, we will get the wrong result, for example:
+// count(distinct 1) will be explained to count(1) after the rewrite 
function.
+// Generally, the distinct aggregateFunction should not run
+// foldable TypeCheck for the first child.
+e -> e.aggregateFunction.children.take(1).toSet
+  }
+}
+val distinctAggGroupLookup = distinctAggGroupMap.toMap
+val distinctAggGroups = distinctAggGroupMap.groupBy(_._2).map{ kv =>

Review comment:
   nit: `mapValues`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29649: [SPARK-32779][SQL] Avoid using synchronized API of SessionCatalog in withClient flow, this leads to DeadLock

2020-09-06 Thread GitBox


HyukjinKwon commented on a change in pull request #29649:
URL: https://github.com/apache/spark/pull/29649#discussion_r484183880



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##
@@ -1329,8 +1329,7 @@ private[client] class Shim_v3_0 extends Shim_v2_3 {
   isSrcLocal: Boolean): Unit = {
 val session = SparkSession.getActiveSession
 assert(session.nonEmpty)
-val database = session.get.sessionState.catalog.getCurrentDatabase
-val table = hive.getTable(database, tableName)
+val table = hive.getTable(tableName)

Review comment:
   We can use. The problem is when we get the `database` here from 
`session.get.sessionState.catalog.getCurrentDatabase `.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29649: [SPARK-32779][SQL] Avoid using synchronized API of SessionCatalog in withClient flow, this leads to DeadLock

2020-09-06 Thread GitBox


cloud-fan commented on a change in pull request #29649:
URL: https://github.com/apache/spark/pull/29649#discussion_r484183375



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##
@@ -1329,8 +1329,7 @@ private[client] class Shim_v3_0 extends Shim_v2_3 {
   isSrcLocal: Boolean): Unit = {
 val session = SparkSession.getActiveSession
 assert(session.nonEmpty)
-val database = session.get.sessionState.catalog.getCurrentDatabase
-val table = hive.getTable(database, tableName)
+val table = hive.getTable(tableName)

Review comment:
   just for curiosity, why can't we use `hive.getTable(dbName, tblName)` 
here? It looks weird that the `loadPartition` method in `HiveShim` takes a 
single `tableName` parameter which is a qualified name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] KevinSmile commented on pull request #29653: [SPARK-32804][Launcher] Fix run-example command builder bug

2020-09-06 Thread GitBox


KevinSmile commented on pull request #29653:
URL: https://github.com/apache/spark/pull/29653#issuecomment-688031240


   When you do `run-example`,  you just specify the class name(e.g. SparkPi),  
and no need to specify the app-jar. So in backend code, `appResource` should be 
auto-find-and-set as the examples' main app jar (e.g. 
`./examples/jars/spark-examples_2.12-3.0.0.jar`), and then be added as an arg ( 
this arg will later be set as the so-called `primaryResource`).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] KevinSmile commented on a change in pull request #29653: [SPARK-32804][Launcher] Fix run-example command builder bug

2020-09-06 Thread GitBox


KevinSmile commented on a change in pull request #29653:
URL: https://github.com/apache/spark/pull/29653#discussion_r484180285



##
File path: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java
##
@@ -241,9 +241,11 @@
 }
 
 args.addAll(parsedArgs);
+
 if (appResource != null) {
   args.add(appResource);

Review comment:
   When you do `run-example`,  you just specify the class name(e.g. 
SparkPi),  and no need to specify the app-jar. So in backend code, 
`appResource` should be auto-find-and-set as the examples' main app jar (e.g. 
`./examples/jars/spark-examples_2.12-3.0.0.jar`), and then be added as an arg ( 
this arg will later be set as the so-called `primaryResource`).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688027437







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


SparkQA removed a comment on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688016714


   **[Test build #128333 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128333/testReport)**
 for PR 29656 at commit 
[`1f2fa6c`](https://github.com/apache/spark/commit/1f2fa6cfc6dd841bda5c82fa8f428a54e1dfa8a5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688027437







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688026864







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


SparkQA commented on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688027227


   **[Test build #128333 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128333/testReport)**
 for PR 29656 at commit 
[`1f2fa6c`](https://github.com/apache/spark/commit/1f2fa6cfc6dd841bda5c82fa8f428a54e1dfa8a5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


SparkQA removed a comment on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688018881


   **[Test build #128335 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128335/testReport)**
 for PR 29639 at commit 
[`8162dfc`](https://github.com/apache/spark/commit/8162dfc5ee6ee761d4362176406350e3250e11f9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688026864







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


SparkQA commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688026561


   **[Test build #128335 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128335/testReport)**
 for PR 29639 at commit 
[`8162dfc`](https://github.com/apache/spark/commit/8162dfc5ee6ee761d4362176406350e3250e11f9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


HyukjinKwon commented on a change in pull request #29639:
URL: https://github.com/apache/spark/pull/29639#discussion_r484173595



##
File path: python/docs/source/development/debugging.rst
##
@@ -0,0 +1,280 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+..http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+=
+Debugging PySpark
+=
+
+PySpark uses Spark as an engine. PySpark uses `Py4J `_ 
to leverage Spark to submit and computes the jobs.
+
+On the driver side, PySpark communicates with the driver on JVM by using `Py4J 
`_.
+When :class:`pyspark.sql.SparkSession` or :class:`pyspark.SparkContext` is 
created and initialized, PySpark launches a JVM
+to communicate.
+
+On the executor side, Python workers execute and handle Python native 
functions or data. They are not launched if
+a PySpark application does not require interaction between Python workers and 
JVMs. They are lazily launched only when
+Python native functions or data have to be handled, for example, when you 
execute pandas UDFs or
+PySpark RDD APIs.
+
+This page focuses on debugging Python side of PySpark on both driver and 
executor sides instead of focusing on debugging
+with JVM. Profiling and debugging JVM is described at `Useful Developer Tools 
`_.
+
+Note that,
+
+- If you are running locally, you can directly debug the driver side via using 
your IDE without the remote debug feature.

Review comment:
   BTW, @itholic is working on documenting local PyCharm setup in another 
page (at SPARK-32189). We could add a link here once that page is finished.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


SparkQA removed a comment on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688009132


   **[Test build #128331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128331/testReport)**
 for PR 29656 at commit 
[`a1cf29a`](https://github.com/apache/spark/commit/a1cf29a0eb00ffc45296874ff445a75c5d7f42e7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688019831







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688019831







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688019481







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


SparkQA commented on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688019603


   **[Test build #128331 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128331/testReport)**
 for PR 29656 at commit 
[`a1cf29a`](https://github.com/apache/spark/commit/a1cf29a0eb00ffc45296874ff445a75c5d7f42e7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688019481







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688017442


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128332/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


SparkQA commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688018881


   **[Test build #128335 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128335/testReport)**
 for PR 29639 at commit 
[`8162dfc`](https://github.com/apache/spark/commit/8162dfc5ee6ee761d4362176406350e3250e11f9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


HyukjinKwon edited a comment on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688018098


   The link here 
https://hyukjin-spark.readthedocs.io/en/stable/development/debugging.html is 
also updated. FYI, you might need to use incognito tab or something like that 
to prevent to show the cached site if it shows the previous page.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


HyukjinKwon commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688018098


   The link here 
https://hyukjin-spark.readthedocs.io/en/stable/development/debugging.html is 
also updated.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


HyukjinKwon commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688017936


   Hey guys, sorry for a bit of noise here. Would you guys mind if I ask take 
another look? I changed a bit a lot while I am addressing @BryanCutler and 
@zero323 comments.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688017434







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688017081







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688017434







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688017138







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688017138







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688017081







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


SparkQA commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688016741


   **[Test build #128334 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128334/testReport)**
 for PR 29639 at commit 
[`99b8c1e`](https://github.com/apache/spark/commit/99b8c1e5c89dfecf4e7fb694e22043cf5ef9f373).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


SparkQA commented on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688016714


   **[Test build #128333 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128333/testReport)**
 for PR 29656 at commit 
[`1f2fa6c`](https://github.com/apache/spark/commit/1f2fa6cfc6dd841bda5c82fa8f428a54e1dfa8a5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


AngersZh commented on a change in pull request #29656:
URL: https://github.com/apache/spark/pull/29656#discussion_r484171144



##
File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala
##
@@ -69,7 +69,7 @@ private[hive] class SparkSQLSessionManager(hiveServer: 
HiveServer2, sqlContext:
 setConfMap(ctx, hiveSessionState.getOverriddenConfigurations)
 setConfMap(ctx, hiveSessionState.getHiveVariables)
 if (sessionConf != null && sessionConf.containsKey("use:database")) {
-  ctx.sql(s"use ${sessionConf.get("use:database")}")
+  
ctx.sparkSession.sessionState.catalog.setCurrentDatabase(sessionConf.get("use:database"))

Review comment:
   > `ctx.sessionState.catalog.setCurrentDatabase`?
   
   Updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


viirya commented on a change in pull request #29639:
URL: https://github.com/apache/spark/pull/29639#discussion_r484171058



##
File path: python/docs/source/development/debugging.rst
##
@@ -0,0 +1,280 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+..http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+=
+Debugging PySpark
+=
+
+PySpark uses Spark as an engine. PySpark uses `Py4J `_ 
to leverage Spark to submit and computes the jobs.
+
+On the driver side, PySpark communicates with the driver on JVM by using `Py4J 
`_.
+When :class:`pyspark.sql.SparkSession` or :class:`pyspark.SparkContext` is 
created and initialized, PySpark launches a JVM
+to communicate.
+
+On the executor side, Python workers execute and handle Python native 
functions or data. They are not launched if
+a PySpark application does not require interaction between Python workers and 
JVMs. They are lazily launched only when
+Python native functions or data have to be handled, for example, when you 
execute pandas UDFs or
+PySpark RDD APIs.
+
+This page focuses on debugging Python side of PySpark on both driver and 
executor sides instead of focusing on debugging
+with JVM. Profiling and debugging JVM is described at `Useful Developer Tools 
`_.
+
+Note that,
+
+- If you are running locally, you can directly debug the driver side via using 
your IDE without the remote debug feature.
+- *There are many other ways of debugging PySpark applications*. For example, 
you can remotely debug by using the open source `Remote Debugger 
`_ instead of using 
PyCharm Professional documented here.
+
+
+Remote Debugging (PyCharm Professional)
+---
+
+This section describes remote debugging on both driver and executor sides 
within a single machine to demonstrate easily.
+The ways of debugging PySpark on the executor side is different from doing in 
the driver. Therefore, they will be demonstrated respectively.
+In order to debug PySpark applications on other machines, please refer to the 
full instructions that are specific
+to PyCharm, documented `here 
`_.
+
+Firstly, choose **Edit Configuration...** from the *Run* menu. It opens the 
**Run/Debug Configurations dialog**.
+You have to click ``+`` configuration on the toolbar, and from the list of 
available configurations, select **Python Debug Server**.
+Enter the name of this new configuration, for example, ``MyRemoteDebugger`` 
and also specify the port number, for example ``12345``.
+
+.. image:: ../../../../docs/img/pyspark-remote-debug1.png
+:alt: PyCharm remote debugger setting
+
+| After that, you should install the corresponding version of the 
``pydevd-pycahrm`` package in all the machines which will connect to your 
PyCharm debugger. In the previous dialog, it shows the command to install.
+
+.. code-block:: text
+
+pip install pydevd-pycharm~=
+
+Driver Side
+~~~
+
+To debug on the driver side, your application should connect to the debugging 
server. Copy and paste the codes
+with ``pydevd_pycharm.settrace`` to the top of your PySpark script. Suppose 
the file name is ``app.py``:
+
+.. code-block:: bash
+
+echo "#==Copy and paste from the previous 
dialog===
+import pydevd_pycharm
+pydevd_pycharm.settrace('localhost', port=12345, stdoutToServer=True, 
stderrToServer=True)
+
#
+# Your PySpark application codes:
+from pyspark.sql import SparkSession
+spark = SparkSession.builder.getOrCreate()
+spark.range(10).show()" > app.py
+
+Start debugging with your ``MyRemoteDebugger``.
+
+.. image:: ../../../../docs/img/pyspark-remote-debug2.png
+:alt: PyCharm run remote debugger
+
+| After that, submit your application. This will connect to your PyCharm 
debugging server and enable you to debug on driver side remotely.
+
+.. code-block:: bash
+
+spark-submit app.py
+
+Executor Side

[GitHub] [spark] wangyum commented on a change in pull request #29656: [SPARK-32807][SQL] ThriftServer open session use direct API to init current DB

2020-09-06 Thread GitBox


wangyum commented on a change in pull request #29656:
URL: https://github.com/apache/spark/pull/29656#discussion_r484170616



##
File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala
##
@@ -69,7 +69,7 @@ private[hive] class SparkSQLSessionManager(hiveServer: 
HiveServer2, sqlContext:
 setConfMap(ctx, hiveSessionState.getOverriddenConfigurations)
 setConfMap(ctx, hiveSessionState.getHiveVariables)
 if (sessionConf != null && sessionConf.containsKey("use:database")) {
-  ctx.sql(s"use ${sessionConf.get("use:database")}")
+  
ctx.sparkSession.sessionState.catalog.setCurrentDatabase(sessionConf.get("use:database"))

Review comment:
   `ctx.sessionState.catalog.setCurrentDatabase`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688014688







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688014688







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


SparkQA commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-688014191


   **[Test build #128332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128332/testReport)**
 for PR 29639 at commit 
[`8709fb7`](https://github.com/apache/spark/commit/8709fb73ec396390e939d2c5c30eaf7329b482d8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cxzl25 commented on pull request #29605: [SPARK-31511][SQL][2.4] Make BytesToBytesMap iterators thread-safe

2020-09-06 Thread GitBox


cxzl25 commented on pull request #29605:
URL: https://github.com/apache/spark/pull/29605#issuecomment-688013875


   I removed the code I changed in the commit (0e39f7a), but the jenkins and 
github action tests still don't pass, which is very strange.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29656: [SPARK-32807][SQL] ThriftServer open session slow when high concurrent when init current DB

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688009576







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #27518: [SPARK-30768][SQL] Constraints inferred from inequality attributes

2020-09-06 Thread GitBox


wangyum commented on a change in pull request #27518:
URL: https://github.com/apache/spark/pull/27518#discussion_r484168351



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
##
@@ -78,6 +91,72 @@ trait ConstraintHelper {
 inferredConstraints -- constraints
   }
 
+  /**
+   * Infers an additional set of constraints from a given set of inequality 
constraints.
+   * For e.g., if an operator has constraints of the form (`a > b`, `b > 5`), 
this returns an
+   * additional constraint of the form `a > 5`.
+   */
+  def inferInequalityConstraints(constraints: Set[Expression]): 
Set[Expression] = {
+val binaryComparisons = constraints.filter {
+  case _: GreaterThan => true
+  case _: GreaterThanOrEqual => true
+  case _: LessThan => true
+  case _: LessThanOrEqual => true
+  case _: EqualTo => true
+  case _ => false
+}
+
+val greaterThans = binaryComparisons.map {
+  case EqualTo(l, r) if l.foldable => EqualTo(r, l)
+  case LessThan(l, r) => GreaterThan(r, l)
+  case LessThanOrEqual(l, r) => GreaterThanOrEqual(r, l)
+  case other => other
+}
+
+val lessThans = binaryComparisons.map {
+  case EqualTo(l, r) if l.foldable => EqualTo(r, l)
+  case GreaterThan(l, r) => LessThan(r, l)
+  case GreaterThanOrEqual(l, r) => LessThanOrEqual(r, l)
+  case other => other
+}

Review comment:
   No. for example:
   `a > b and 5 > a`. we can not infer anything. but we can infer that `b < 5` 
after rewriting `a > b and 5 > a` as `b < a and a < 5`.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29656: [SPARK-32807][SQL] ThriftServer open session slow when high concurrent when init current DB

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688009576







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #27518: [SPARK-30768][SQL] Constraints inferred from inequality attributes

2020-09-06 Thread GitBox


wangyum commented on a change in pull request #27518:
URL: https://github.com/apache/spark/pull/27518#discussion_r484167230



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
##
@@ -78,6 +91,72 @@ trait ConstraintHelper {
 inferredConstraints -- constraints
   }
 
+  /**
+   * Infers an additional set of constraints from a given set of inequality 
constraints.
+   * For e.g., if an operator has constraints of the form (`a > b`, `b > 5`), 
this returns an
+   * additional constraint of the form `a > 5`.
+   */
+  def inferInequalityConstraints(constraints: Set[Expression]): 
Set[Expression] = {
+val binaryComparisons = constraints.filter {
+  case _: GreaterThan => true
+  case _: GreaterThanOrEqual => true
+  case _: LessThan => true
+  case _: LessThanOrEqual => true
+  case _: EqualTo => true
+  case _ => false
+}
+
+val greaterThans = binaryComparisons.map {
+  case EqualTo(l, r) if l.foldable => EqualTo(r, l)
+  case LessThan(l, r) => GreaterThan(r, l)
+  case LessThanOrEqual(l, r) => GreaterThanOrEqual(r, l)
+  case other => other
+}
+
+val lessThans = binaryComparisons.map {
+  case EqualTo(l, r) if l.foldable => EqualTo(r, l)
+  case GreaterThan(l, r) => LessThan(r, l)
+  case GreaterThanOrEqual(l, r) => LessThanOrEqual(r, l)
+  case other => other
+}
+
+var inferredConstraints = Set.empty[Expression]
+greaterThans.foreach {
+  case op @ BinaryComparison(source: Attribute, destination: Expression)
+if destination.foldable =>

Review comment:
   To avoid generating too many constraints. For example: `a > b > c > 1`. 
The expected inferred constraints are: `a > 1 and b > 1`. ` a > c` is useless.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29656: [SPARK-32807][SQL] ThriftServer open session slow when high concurrent when init current DB

2020-09-06 Thread GitBox


SparkQA commented on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688009132


   **[Test build #128331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128331/testReport)**
 for PR 29656 at commit 
[`a1cf29a`](https://github.com/apache/spark/commit/a1cf29a0eb00ffc45296874ff445a75c5d7f42e7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu opened a new pull request #29656: [SPARK-32807][SQL] ThriftServer open session slow when high concurrent when init current DB

2020-09-06 Thread GitBox


AngersZh opened a new pull request #29656:
URL: https://github.com/apache/spark/pull/29656


   ### What changes were proposed in this pull request?
   When init current database, we can use direct API, don't need to call  SQL
   
   ### Why are the changes needed?
   No
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Not need



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29656: [SPARK-32807][SQL] ThriftServer open session slow when high concurrent when init current DB

2020-09-06 Thread GitBox


AngersZh commented on pull request #29656:
URL: https://github.com/apache/spark/pull/29656#issuecomment-688008917


   cc @wangyum @juliuszsompolski 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #27518: [SPARK-30768][SQL] Constraints inferred from inequality attributes

2020-09-06 Thread GitBox


wangyum commented on a change in pull request #27518:
URL: https://github.com/apache/spark/pull/27518#discussion_r484166395



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
##
@@ -78,6 +91,72 @@ trait ConstraintHelper {
 inferredConstraints -- constraints
   }
 
+  /**
+   * Infers an additional set of constraints from a given set of inequality 
constraints.
+   * For e.g., if an operator has constraints of the form (`a > b`, `b > 5`), 
this returns an
+   * additional constraint of the form `a > 5`.
+   */
+  def inferInequalityConstraints(constraints: Set[Expression]): 
Set[Expression] = {
+val binaryComparisons = constraints.filter {
+  case _: GreaterThan => true
+  case _: GreaterThanOrEqual => true
+  case _: LessThan => true
+  case _: LessThanOrEqual => true
+  case _: EqualTo => true

Review comment:
   `inferEqualityConstraints` can not handle all cases, such as constraint 
with cast.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-09-06 Thread GitBox


LuciferYang commented on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-688007790


   Gentle ping @cloud-fan  for further review
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29655: [SPARK-32806][SQL] SortMergeJoin with partial hash distribution can be optimized to remove shuffle

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29655:
URL: https://github.com/apache/spark/pull/29655#issuecomment-687991585







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29655: [SPARK-32806][SQL] SortMergeJoin with partial hash distribution can be optimized to remove shuffle

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29655:
URL: https://github.com/apache/spark/pull/29655#issuecomment-687991585







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29655: [SPARK-32806][SQL] SortMergeJoin with partial hash distribution can be optimized to remove shuffle

2020-09-06 Thread GitBox


SparkQA commented on pull request #29655:
URL: https://github.com/apache/spark/pull/29655#issuecomment-687991152


   **[Test build #128330 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128330/testReport)**
 for PR 29655 at commit 
[`1b5c4e9`](https://github.com/apache/spark/commit/1b5c4e963f0562c38f52f55703efad2c9056a12f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 opened a new pull request #29655: [SPARK-32806][SQL] SortMergeJoin with partial hash distribution can be optimized to remove shuffle

2020-09-06 Thread GitBox


imback82 opened a new pull request #29655:
URL: https://github.com/apache/spark/pull/29655


   
   
   ### What changes were proposed in this pull request?
   
   This PR proposes to optimize SortMergeJoin (SMJ) if each of its children has 
hash output partitioning which "partially" satisfies the required distribution. 
In this case where the child's output partitioning expressions are a subset of 
required distribution expressions (join keys expressions), the shuffle can be 
removed because rows will be sorted by join keys before rows are joined (the 
required child ordering for SMJ is on join keys).
   
   This PR introduces `OptimizeSortMergeJoinWithPartialHashDistribution ` which 
removes shuffle for the sort merge join if the following conditions are met:
- The child of ShuffleExchangeExec has HashPartitioning with the same 
number of partitions as the other side of join.
- The child of ShuffleExchangeExec has output partitioning which has the 
subset of join keys on the respective join side.
   
   This rule can be turned on by setting 
`spark.sql.execution.sortMergeJoin.optimizePartialHashDistribution.enabled` to 
`true` (`false` by default).
   
   ### Why are the changes needed?
   
   To remove unnecessary shuffles in certain scenarios.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Suppose the following case where `t1` is bucketed by `i1`, and `t2` by `i2`:
   ```scala
   val df1 = (0 until 100).map(i => (i % 5, i % 13, i.toString)).toDF("i1", 
"j1", "k1")
   val df2 = (0 until 100).map(i => (i % 3, i % 17, i.toString)).toDF("i2", 
"j2", "k2")
   df1.write.format("parquet").bucketBy(8, "i1").saveAsTable("t1")
   df2.write.format("parquet").bucketBy(8, "i2").saveAsTable("t2")
   val t1 = spark.table("t1")
   val t2 = spark.table("t2")
   ```
   Now if you join two tables by `t1("i1") === t2("i2") && t1("j1") === 
t2("j2")`
   Before this change:
   ```scala
   scala> spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "0")
   scala> t1.join(t2, t1("i1") === t2("i2") && t1("j1") === t2("j2")).explain
   == Physical Plan ==
   *(5) SortMergeJoin [i1#161, j1#162], [i2#167, j2#168], Inner
   :- *(2) Sort [i1#161 ASC NULLS FIRST, j1#162 ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(i1#161, j1#162, 200), true, [id=#196]
   : +- *(1) Filter (isnotnull(i1#161) AND isnotnull(j1#162))
   :+- *(1) ColumnarToRow
   :   +- FileScan parquet default.t1[i1#161,j1#162,k1#163] Batched: 
true, DataFilters: [isnotnull(i1#161), isnotnull(j1#162)], Format: Parquet, 
Location: InMemoryFileIndex[], PartitionFilters: [], PushedFilters: 
[IsNotNull(i1), IsNotNull(j1)], ReadSchema: struct, 
SelectedBucketsCount: 8 out of 8
   +- *(4) Sort [i2#167 ASC NULLS FIRST, j2#168 ASC NULLS FIRST], false, 0
  +- Exchange hashpartitioning(i2#167, j2#168, 200), true, [id=#205]
 +- *(3) Filter (isnotnull(i2#167) AND isnotnull(j2#168))
+- *(3) ColumnarToRow
   +- FileScan parquet default.t2[i2#167,j2#168,k2#169] Batched: 
true, DataFilters: [isnotnull(i2#167), isnotnull(j2#168)], Format: Parquet, 
Location: InMemoryFileIndex[], PartitionFilters: [], PushedFilters: 
[IsNotNull(i2), IsNotNull(j2)], ReadSchema: struct, 
SelectedBucketsCount: 8 out of 8
   ```
   
   After the PR:
   ```scala
   scala> 
spark.conf.set("spark.sql.execution.sortMergeJoin.optimizePartialHashDistribution.enabled",
 "true")
   scala> t1.join(t2, t1("i1") === t2("i2") && t1("j1") === t2("j2")).explain
   == Physical Plan ==
   *(3) SortMergeJoin [i1#161, j1#162], [i2#167, j2#168], Inner
   :- *(1) Sort [i1#161 ASC NULLS FIRST, j1#162 ASC NULLS FIRST], false, 0
   :  +- *(1) Filter (isnotnull(i1#161) AND isnotnull(j1#162))
   : +- *(1) ColumnarToRow
   :+- FileScan parquet default.t1[i1#161,j1#162,k1#163] Batched: true, 
DataFilters: [isnotnull(i1#161), isnotnull(j1#162)], Format: Parquet, Location: 
InMemoryFileIndex[], PartitionFilters: [], PushedFilters: [IsNotNull(i1), 
IsNotNull(j1)], ReadSchema: struct, 
SelectedBucketsCount: 8 out of 8
   +- *(2) Sort [i2#167 ASC NULLS FIRST, j2#168 ASC NULLS FIRST], false, 0
  +- *(2) Filter (isnotnull(i2#167) AND isnotnull(j2#168))
 +- *(2) ColumnarToRow
+- FileScan parquet default.t2[i2#167,j2#168,k2#169] Batched: true, 
DataFilters: [isnotnull(i2#167), isnotnull(j2#168)], Format: Parquet, Location: 
InMemoryFileIndex[], PartitionFilters: [], PushedFilters: [IsNotNull(i2), 
IsNotNull(j2)], ReadSchema: struct, 
SelectedBucketsCount: 8 out of 8
   ```
   
   ### How was this patch tested?
   
   Added tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [spark] Ngone51 commented on pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-09-06 Thread GitBox


Ngone51 commented on pull request #29579:
URL: https://github.com/apache/spark/pull/29579#issuecomment-687984920


   @cloud-fan @holdenk Could you take a look?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moomindani commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-09-06 Thread GitBox


moomindani commented on pull request #28953:
URL: https://github.com/apache/spark/pull/28953#issuecomment-687959216


   @gatorsmile Thank you for your comment.
   I understand that this kind of hooks are not specific to JDBC, it can be 
general to all the data sources.
   However, for JDBC, it should be executed in data source side. This part can 
be specific to JDBC (or other data sources which include execution engine.).
   
   Of course we can have discussion in DSV2 API design, however, can we proceed 
this PR separately?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] moomindani commented on pull request #29330: [SPARK-32432][SQL] Added support for reading ORC/Parquet files with SymlinkTextInputFormat

2020-09-06 Thread GitBox


moomindani commented on pull request #29330:
URL: https://github.com/apache/spark/pull/29330#issuecomment-687960686


   Could you please take a look and review this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29651: [SPARK-32794][SS] Fixed rare corner case error in micro-batch engine with some stateful queries + no-data-batches + V1 source

2020-09-06 Thread GitBox


HyukjinKwon commented on a change in pull request #29651:
URL: https://github.com/apache/spark/pull/29651#discussion_r484010520



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala
##
@@ -306,6 +306,14 @@ trait StreamTest extends QueryTest with SharedSparkSession 
with TimeLimits with
 def apply(func: StreamExecution => Any): AssertOnQuery = 
apply("Execute")(func)
   }
 
+  /** Call [[StreamingQuery.processAllAvailable()]] to wait. */

Review comment:
   Ah, seems like here is the problem about Unidoc:
   
   ```
   [error] 
/home/runner/work/spark/spark/sql/core/target/java/org/apache/spark/sql/streaming/StreamTest.java:358:
 error: unexpected text
   [error]   /** Call {@link StreamingQuery.processAllAvailable()} to wait. */
   [error]^
   [error] 
/home/runner/work/spark/spark/sql/core/target/java/org/apache/spark/sql/streaming/StreamTest.java:362:
 error: unexpected text
   [error]   /** Call {@link StreamingQuery.processAllAvailable()} to wait. */
   [error]^
   ```
   
   When Unidoc produces a Javadoc, it compiles Scala codes and extract Scala 
docs and converts them into Javadoc. The problem is that looks `trait` doesn't 
work well out of the box with it ..
   
   Probably we could just change it to `` 
`StreamingQuery.processAllAvailable()` ``.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


HyukjinKwon commented on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687956805


   @zero323, I usually prefer to don't block something by the env issue in 
Jenkins so such issue can be handled with enough time - @shaneknapp is sort of 
busy at this moment IIRC. We could work around for now, and file a separate 
JIRA for him about the dependency upgade.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-687956266







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


SparkQA commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-687956009


   **[Test build #128329 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128329/testReport)**
 for PR 29639 at commit 
[`9a675d3`](https://github.com/apache/spark/commit/9a675d335078164b072e2f2c1c788abfcbc19db6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-687956266







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


SparkQA removed a comment on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-687948078


   **[Test build #128329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128329/testReport)**
 for PR 29639 at commit 
[`9a675d3`](https://github.com/apache/spark/commit/9a675d335078164b072e2f2c1c788abfcbc19db6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


HyukjinKwon commented on a change in pull request #29591:
URL: https://github.com/apache/spark/pull/29591#discussion_r484138235



##
File path: dev/tox.ini
##
@@ -20,5 +20,16 @@ 
exclude=python/pyspark/cloudpickle/*.py,shared.py,python/docs/source/conf.py,wor
 
 [flake8]
 select = E901,E999,F821,F822,F823,F401,F405
-exclude = 
python/pyspark/cloudpickle/*.py,shared.py,python/docs/source/conf.py,work/*/*.py,python/.eggs/*,dist/*,.git/*
+exclude = 
python/pyspark/cloudpickle/*.py,shared.py*,python/docs/source/conf.py,work/*/*.py,python/.eggs/*,dist/*,.git/*,python/out
 max-line-length = 100
+per-file-ignores =
+python/pyspark/sql/tests/test_arrow.py: F405
+python/pyspark/sql/tests/test_dataframe.py: F405
+python/pyspark/sql/tests/test_pandas_udf_scalar.py: F405
+python/pyspark/sql/tests/test_udf.py: F405
+python/pyspark/testing/streamingutils.py: F821
+python/pyspark/testing/mlutils.py: F821

Review comment:
   Looks like these are mostly because flake8 doesn't understand `# type: 
ignore[]` - finds `` is a proper instance to check if it's being 
imported or not(?). If that's the case, let's just use `# type: ignore`. I 
think this is the easiest way.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687955824







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


HyukjinKwon commented on a change in pull request #29591:
URL: https://github.com/apache/spark/pull/29591#discussion_r484137854



##
File path: examples/src/main/python/ml/estimator_transformer_param_example.py
##
@@ -54,7 +56,7 @@
 print(model1.extractParamMap())
 
 # We may alternatively specify parameters using a Python dictionary as a 
paramMap
-paramMap = {lr.maxIter: 20}
+paramMap: Dict[Param, Any] = {lr.maxIter: 20}

Review comment:
   @zero323, is it because mypy complains? We can exclude the example files 
from the mypy check for the time being I guess rather than adding the types 
into the codes for now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


SparkQA removed a comment on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687917811


   **[Test build #128328 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128328/testReport)**
 for PR 29591 at commit 
[`601a577`](https://github.com/apache/spark/commit/601a57787e5e0224ee000ccef0a817c2ee493b65).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


SparkQA commented on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687955217


   **[Test build #128328 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128328/testReport)**
 for PR 29591 at commit 
[`601a577`](https://github.com/apache/spark/commit/601a57787e5e0224ee000ccef0a817c2ee493b65).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class _empty_cell_value(object):`
 * `skeleton_class = types.new_class(`
 * `enum_class = metacls.__new__(metacls, name, bases, classdict)`
 * `class CloudPickler(Pickler):`
 * `is_anyclass = issubclass(t, type)`
 * `except TypeError:  # t is not a class (old Boost; see SF 
#502085)`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-687948686







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-687948686







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29639: [SPARK-32186][DOCS][PYTHON] Development - Debugging

2020-09-06 Thread GitBox


SparkQA commented on pull request #29639:
URL: https://github.com/apache/spark/pull/29639#issuecomment-687948078


   **[Test build #128329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128329/testReport)**
 for PR 29639 at commit 
[`9a675d3`](https://github.com/apache/spark/commit/9a675d335078164b072e2f2c1c788abfcbc19db6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687941944


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128327/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687941936


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687941936


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


SparkQA removed a comment on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687916040


   **[Test build #128327 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128327/testReport)**
 for PR 29591 at commit 
[`1aede7c`](https://github.com/apache/spark/commit/1aede7c7616fa045dae2bd9483e24ed48f089a02).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687941944


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128327/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


SparkQA commented on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687941639


   **[Test build #128327 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128327/testReport)**
 for PR 29591 at commit 
[`1aede7c`](https://github.com/apache/spark/commit/1aede7c7616fa045dae2bd9483e24ed48f089a02).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687929852


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128325/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


AmplabJenkins removed a comment on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687929845


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-06 Thread GitBox


AmplabJenkins commented on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-687929845







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >