[GitHub] [spark] HyukjinKwon commented on pull request #28106: [SPARK-31335][SQL] Add try function support

2020-05-05 Thread GitBox


HyukjinKwon commented on pull request #28106:
URL: https://github.com/apache/spark/pull/28106#issuecomment-624456778







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-624455787


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122341/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-624455782


   Build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


SparkQA removed a comment on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-624448823


   **[Test build #122341 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122341/testReport)**
 for PR 27473 at commit 
[`448ed80`](https://github.com/apache/spark/commit/448ed80bf72c56cfa7b25f5174f9e674ab22ddcf).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


SparkQA commented on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-624455749


   **[Test build #122341 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122341/testReport)**
 for PR 27473 at commit 
[`448ed80`](https://github.com/apache/spark/commit/448ed80bf72c56cfa7b25f5174f9e674ab22ddcf).
* This patch **fails MiMa tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-624455782







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #28445: [SPARK-31212][SQL][2.4] Fix Failure of casting the '1000-02-29' string to the date type

2020-05-05 Thread GitBox


HyukjinKwon edited a comment on pull request #28445:
URL: https://github.com/apache/spark/pull/28445#issuecomment-624455200


   Okay, I had some time to investigate the related items here. I think there 
are some more places to fix, e.g. 
[here](https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L610).
   
   Let's don't fix these in branch-2.4 only because:
   
   - These were "fixed" in the master branch by switching the calendar. I think 
we should take this fix was superseded by it in the master. We can't backport 
it to branch-2.4 because it's too much breaking change, though.
   
   - It's pretty risky. Given the history of fixing it in the master, fixing 
one caused a bunch of other related issues. These new issues have been fixed 
and found over one year, and it's still in progress. I would like to avoid to 
keep fixing by completely different approaches in 2.4 compared to 3.0.
   
   - It will causes a bunch of behaviour changes. Yes, these will be bug fixes 
but I tend to be more conservative. See also [Bug 
compatibility](https://en.wikipedia.org/wiki/Bug_compatibility).
   
   Considering these risks, let's don't land these fixes. We can more 
conservatively just document these in Spark 2.4 specifically given the 
potential maintenance overhead and technical difficulty, if we should.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #28445: [SPARK-31212][SQL][2.4] Fix Failure of casting the '1000-02-29' string to the date type

2020-05-05 Thread GitBox


HyukjinKwon edited a comment on pull request #28445:
URL: https://github.com/apache/spark/pull/28445#issuecomment-624455200


   Okay, I had some time to investigate the related items here. I think there 
are some more places to fix, e.g. 
[here](https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L610).
   
   Let's don't fix these in branch-2.4 only because:
   
   - These were "fixed" in the master branch by switching the calendar. I think 
we should take this fix was superseded by it in the master. We can't backport 
it to branch-2.4 because it's too much breaking change, though.
   
   - It's pretty risky. Given the history of fixing it in the master, fixing 
one caused a bunch of other related issues. These new issues have been fixed 
and found over one year, and it's still in progress. I would like to avoid to 
keep fixing by completely different approaches in 2.4 compared to 3.0.
   
   - It will causes a bunch of behaviour changes. Yes, these will be bug fixes 
but I tend to be more conservative. See also [Bug 
compatibility](https://en.wikipedia.org/wiki/Bug_compatibility).
   
   Considering these risks, let's don't land these fixes. We can more 
conservatively just document these in Spark 2.4 specifically given the 
potential maintenance overhead and technical difficulty.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28445: [SPARK-31212][SQL][2.4] Fix Failure of casting the '1000-02-29' string to the date type

2020-05-05 Thread GitBox


HyukjinKwon commented on pull request #28445:
URL: https://github.com/apache/spark/pull/28445#issuecomment-624455200


   Okay, I had some time to investigate the related items here. I think there 
are some more places to fix, e.g. 
[here](https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L610).
   
   Let's don't fix these in branch-2.4 only because:
   
   - These were "fixed" in the master branch by switching the calendar. I think 
we should take this fix was superseded by it in the master. We can't backport 
it to branch-2.4 because it's too much breaking change, though.
   
   - It's pretty risky. Given the history of fixing it in the master, fixing 
one caused a bunch of other related issues. These new issues have been fixed 
and found over one year, and it's still in progress. I would like to avoid to 
keep fixing by completely different approaches in 2.4 and 3.0.
   
   - It will causes a bunch of behaviour changes. Yes, these will be bug fixes 
but I tend to be more conservative. See also [Bug 
compatibility](https://en.wikipedia.org/wiki/Bug_compatibility).
   
   Considering these risks, let's don't land these fixes. We can more 
conservatively just document these in Spark 2.4 specifically given the 
potential maintenance overhead and technical difficulty.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624454747


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122338/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624454742


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624454742







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


SparkQA removed a comment on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624436741


   **[Test build #122338 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122338/testReport)**
 for PR 28458 at commit 
[`0577bb2`](https://github.com/apache/spark/commit/0577bb2c4c94b61ec9e09ab40b8dceca4e02584c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


SparkQA commented on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624454509


   **[Test build #122338 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122338/testReport)**
 for PR 28458 at commit 
[`0577bb2`](https://github.com/apache/spark/commit/0577bb2c4c94b61ec9e09ab40b8dceca4e02584c).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-05 Thread GitBox


cloud-fan commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r420557946



##
File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
##
@@ -157,23 +259,23 @@ private[spark] object ThreadUtils {
*/
   def newDaemonFixedThreadPool(nThreads: Int, prefix: String): 
ThreadPoolExecutor = {
 val threadFactory = namedThreadFactory(prefix)
-Executors.newFixedThreadPool(nThreads, 
threadFactory).asInstanceOf[ThreadPoolExecutor]
+MDCAwareThreadPoolExecutor.newFixedThreadPool(nThreads, threadFactory)

Review comment:
   After a second thought, I'm wondering what's the use case for this 
change.
   
   For example, `newDaemonFixedThreadPool` is used in `TaskResultGetter`. I 
don't think we can get MDC properties there. In fact, we only set the MDC 
properties in `Executor.run`, and these thread pools are mostly created by the 
driver.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-624450968







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-624450975


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/27009/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


SparkQA commented on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-624450647


   **[Test build #122342 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122342/testReport)**
 for PR 27473 at commit 
[`473e5a2`](https://github.com/apache/spark/commit/473e5a20d23555cf2b22fb6da51175efb891538b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-624449141







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28459: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28459:
URL: https://github.com/apache/spark/pull/28459#issuecomment-624449133







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28425:
URL: https://github.com/apache/spark/pull/28425#issuecomment-624449089







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28425:
URL: https://github.com/apache/spark/pull/28425#issuecomment-624449089







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng removed a comment on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


zhengruifeng removed a comment on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-582765435


   data: `a9a`: numFeatures=123, numInstances=32,561
   
   testCode:
   ```scala
   
   import org.apache.spark.ml.clustering._
   import org.apache.spark.storage.StorageLevel
   import org.apache.spark.ml.linalg._
   
   val df = spark.read.format("libsvm").load("/data1/Datasets/a9a/a9a")
   df.persist(StorageLevel.MEMORY_AND_DISK)
   df.count
   
   new GaussianMixture().fit(df)
   
   val results = Seq(2, 8, 32).map { k => val start = System.currentTimeMillis; 
val model = new 
GaussianMixture().setK(k).setSeed(0).setTol(0).setMaxIter(20).fit(df); val dur 
= System.currentTimeMillis - start; (model, dur) }
   
   results.map(_._2)
   results.map(_._1.summary.numIter)
   results.map(_._1.summary.logLikelihood)
   results.map(t => t._2.toDouble / t._1.summary.numIter)
   ```
   
   result:
   
   |Impl| This PR(k=2) | This PR(k=8) | This PR(k=32) | Master(k=2) | 
Master(k=8) | Master(k=32)  |
   
|--|--||--||--|--|
   |numIter|3|3|5|3|8|4|
   
|logLikelihood|2814835.470027733|2817523.6371536762|2820228.372200876|2814835.4700277536|2817523.6371535994|2820228.3722007386|
   |dur per iteration| 950.0 | 2744.5 | 9780.6 | 1192.7 
| 3446.375| 15724.75 |
   
   
   
   With NativeBLAS with OPENBLAS_NUM_THREADS=1:
   
   |Impl| This PR(k=2) | This PR(k=8) | This PR(k=32) | Master(k=2) | 
Master(k=8) | Master(k=32)  |
   
|--|--||--||--|--|
   |numIter|3|3|20|3|20|8|
   |logLikelihood|2814835.4700277243 | 2817523.6371536693 | 
2820228.372200879|2814835.4700277452|2817523.6371535957|2820228.372200736|
   |dur per iteration| 780.0| 2356.5 | 8192.95 | 1183.0 | 3022.1 | 
13073.75 |
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-624449141







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28459: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28459:
URL: https://github.com/apache/spark/pull/28459#issuecomment-624449133







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-05 Thread GitBox


SparkQA commented on pull request #28425:
URL: https://github.com/apache/spark/pull/28425#issuecomment-624448806


   **[Test build #122340 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122340/testReport)**
 for PR 28425 at commit 
[`7468251`](https://github.com/apache/spark/commit/746825139c345192086492e8e256752c30d3e101).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


SparkQA commented on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-624448823


   **[Test build #122341 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122341/testReport)**
 for PR 27473 at commit 
[`448ed80`](https://github.com/apache/spark/commit/448ed80bf72c56cfa7b25f5174f9e674ab22ddcf).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28459: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-05 Thread GitBox


SparkQA commented on pull request #28459:
URL: https://github.com/apache/spark/pull/28459#issuecomment-624448813


   **[Test build #122339 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122339/testReport)**
 for PR 28459 at commit 
[`34420fb`](https://github.com/apache/spark/commit/34420fb12fcfd64e345c8487beef9c05d39b10bc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request #27473: [SPARK-30699][WIP][ML][PYSPARK] GMM blockify input vectors

2020-05-05 Thread GitBox


zhengruifeng opened a new pull request #27473:
URL: https://github.com/apache/spark/pull/27473


   ### What changes were proposed in this pull request?
   1, add Level-2 BLAS routine `ger`;
   2, stack input vectors to blocks to use Level-3 BLAS routine in training
   
   
   ### Why are the changes needed?
   for performance:
   1, save about 40% RAM;
   2, 25% ~ 60% faster; 30% ~ 60% faster with openBLAS 
   
   ### Does this PR introduce any user-facing change?
   add a new expert param `blockSize`
   
   
   ### How was this patch tested?
   added testsuites



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng edited a comment on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


zhengruifeng edited a comment on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624427337


   performace test on **sparse dataset**: the first 10,000 instances of 
`webspam_wc_normalized_trigram`
   
   code:
   ```scala
   val df = spark.read.option("numFeatures", 
"8289919").format("libsvm").load("/data1/Datasets/webspam/webspam_wc_normalized_trigram.svm.10k").withColumn("label",
 (col("label")+1)/2)
   df.persist(StorageLevel.MEMORY_AND_DISK)
   df.count
   
   val lr = new LogisticRegression().setBlockSize(1).setMaxIter(10)
   lr.fit(df)
   
   val results = Seq(1, 4, 16, 64, 256, 1024, 4096).map { size => val start = 
System.currentTimeMillis; val model = lr.setBlockSize(size).fit(df); val end = 
System.currentTimeMillis; (size, model.coefficients, end - start) }
   ```
   
   results:
   ```
   scala> results.map(_._3)
   res17: Seq[Long] = List(33948, 425923, 129811, 56288, 47587, 42816, 39809)
   
   
   scala> results.map(_._2).foreach(coef => println(coef.toString.take(100)))
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
   scala> results.map(_._2).foreach(coef => 
println(coef.toString.takeRight(100)))
   
87,-1188.1053920127556,335.5565308836645,-135.79302172669907,849.0515530033497,-27.040836637047736])
   
91,-1188.105392012755,335.55653088366444,-135.79302172669907,849.0515530033497,-27.040836637047736])
   
9,-1188.1053920127551,335.55653088366444,-135.79302172669904,849.0515530033495,-27.040836637047725])
   
94,-1188.1053920127556,335.55653088366444,-135.79302172669904,849.0515530033495,-27.04083663704773])
   
1,-1188.1053920127551,335.55653088366444,-135.79302172669904,849.0515530033493,-27.040836637047722])
   
5,-1188.1053920127556,335.55653088366444,-135.79302172669904,849.0515530033495,-27.040836637047736])
   
29,-1188.105392012756,335.55653088366444,-135.79302172669904,849.0515530033495,-27.040836637047736])
   ```
   
   **blockSize==1**
   
![lr_sparse_1](https://user-images.githubusercontent.com/7322292/81136686-e8acfb00-8f8e-11ea-9267-cf847b37eb81.png)
   
   **blockSize=16**
   
![lr_sparse_16](https://user-images.githubusercontent.com/7322292/81136711-f19dcc80-8f8e-11ea-990a-7b8fe32d81d9.png)
   
   
   
   test with **Master**:
   ```
   import org.apache.spark.ml.classification._
   import org.apache.spark.storage.StorageLevel
   
   val df = spark.read.option("numFeatures", 
"8289919").format("libsvm").load("/data1/Datasets/webspam/webspam_wc_normalized_trigram.svm.10k").withColumn("label",
 (col("label")+1)/2)
   df.persist(StorageLevel.MEMORY_AND_DISK)
   df.count
   
   val lr = new LogisticRegression().setMaxIter(10)
   lr.fit(df)
   
   val start = System.currentTimeMillis; val model = lr.setMaxIter(10).fit(df); 
val end = System.currentTimeMillis; end - start
   
   
   
   scala> val start = System.currentTimeMillis; val model = 
lr.setMaxIter(10).fit(df); val end = System.currentTimeMillis; end - start
   start: Long = 1588735447883  
   
   model: org.apache.spark.ml.classification.LogisticRegressionModel = 
LogisticRegressionModel: uid=logreg_99d29a0ecc13, numClasses=2, 
numFeatures=8289919
   end: Long = 1588735483170
   res3: Long = 35287
   ```
   In this PR, when blockSize==1, the duration is 33948, so there will be no 
performance regression on sparse datasets.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-05 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r420550918



##
File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
##
@@ -343,6 +343,54 @@ class ExplainSuite extends ExplainSuiteHelper with 
DisableAdaptiveExecutionSuite
   assert(getNormalizedExplain(df1, FormattedMode) === 
getNormalizedExplain(df2, FormattedMode))
 }
   }
+
+  test("Explain formatted output for scan operator for datasource V2") {

Review comment:
   +1. Thank you.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


baohe-zhang commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624445143


   Another way is to keep a thread-safe variable called availableMemory in 
FsHistoryProvider. The initial value can be set as a percentage of Xmx. When we 
parse a file via hybrid kvstore, we subtract an approximate memory usage from 
availableMemory, and when the hybrid store switches to leveldb, we add back 
this approximate memory usage. When availableMemory is below a threshold, we 
can disable hybridKVstore.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a change in pull request #28459: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-05 Thread GitBox


gengliangwang commented on a change in pull request #28459:
URL: https://github.com/apache/spark/pull/28459#discussion_r420545275



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -845,8 +845,9 @@ object SQLConf {
 .doc("When true, enable the metadata-only query optimization that use the 
table's metadata " +
   "to produce the partition columns instead of table scans. It applies 
when all the columns " +
   "scanned are partition columns and the query has an aggregate operator 
that satisfies " +
-  "distinct semantics. By default the optimization is disabled, since it 
may return " +
-  "incorrect results when the files are empty.")
+  "distinct semantics. By default the optimization is disabled, and 
deprecated as of Spark " +

Review comment:
   nit: could you add this note in the comment of the rule as well?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624441325


   I'm not sure that's fairly simple to do. Concurrent load of applications can 
be happening in SHS, right? The default value of 
`spark.history.retainedApplications` is 50, which means maximum 50 apps can be 
loaded into cache at the same time.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28366: [SPARK-31365][SQL] Enable nested predicate pushdown per data sources

2020-05-05 Thread GitBox


cloud-fan commented on pull request #28366:
URL: https://github.com/apache/spark/pull/28366#issuecomment-624439850


   thanks, merging to master/3.0!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


baohe-zhang commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624439430


   @HeartSaVioR One way in my mind is that we can monitor the memory usage of 
SHS. If the memory usage or event log size exceeds a threshold (e.g, over 50% 
of Xmx), we can use leveldb to parse event log, instead of hybrid kvstore.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-05 Thread GitBox


cloud-fan commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r420540911



##
File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
##
@@ -343,6 +343,54 @@ class ExplainSuite extends ExplainSuiteHelper with 
DisableAdaptiveExecutionSuite
   assert(getNormalizedExplain(df1, FormattedMode) === 
getNormalizedExplain(df2, FormattedMode))
 }
   }
+
+  test("Explain formatted output for scan operator for datasource V2") {

Review comment:
   Oh I see. Currently DS v2 scan is enabled only in `DataFrameReader`, so 
we can't get it through pure SQL. Then this is fine.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624437356







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624437356







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


SparkQA removed a comment on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624431979


   **[Test build #122337 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122337/testReport)**
 for PR 28456 at commit 
[`fd08a7a`](https://github.com/apache/spark/commit/fd08a7a8bfd98a916bda4cc4d60d74a3c02f4e26).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


SparkQA commented on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624437271


   **[Test build #122337 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122337/testReport)**
 for PR 28456 at commit 
[`fd08a7a`](https://github.com/apache/spark/commit/fd08a7a8bfd98a916bda4cc4d60d74a3c02f4e26).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624437029







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624437029







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28393: [SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string

2020-05-05 Thread GitBox


cloud-fan commented on pull request #28393:
URL: https://github.com/apache/spark/pull/28393#issuecomment-624436648


   thanks, merging to master/3.0!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


SparkQA commented on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624436741


   **[Test build #122338 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122338/testReport)**
 for PR 28458 at commit 
[`0577bb2`](https://github.com/apache/spark/commit/0577bb2c4c94b61ec9e09ab40b8dceca4e02584c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28393: [SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string

2020-05-05 Thread GitBox


cloud-fan commented on pull request #28393:
URL: https://github.com/apache/spark/pull/28393#issuecomment-624436428


   cc @juliuszsompolski as well



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] erenavsarogullari edited a comment on pull request #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-05-05 Thread GitBox


erenavsarogullari edited a comment on pull request #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-624435102


   @gengliangwang There is optional Http parameter: `details` (default: 
`false`). It needs to be set in order to fetch `node`, `edge` and 
`planDescription` details as well:
   `http://localhost:4040/api/v1/applications//sql/0?details=true`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] erenavsarogullari commented on pull request #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-05-05 Thread GitBox


erenavsarogullari commented on pull request #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-624435102


   @gengliangwang There is optional http parameter: `details` (default: 
`false`). It needs to be set in order to fetch `node`, `edge` and 
`planDescription` details as well:
   `http://localhost:4040/api/v1/applications//sql/0?details=true`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


HeartSaVioR commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624434103


   Let's discuss first with the plan how to address the major concerns in 
comments, especially how to restrict the overall memory usage. I think that's a 
blocker for the production use.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28442: [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry

2020-05-05 Thread GitBox


HyukjinKwon commented on a change in pull request #28442:
URL: https://github.com/apache/spark/pull/28442#discussion_r420536516



##
File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
##
@@ -131,11 +130,7 @@ class KafkaTestUtils(
   }
 
   private def setUpMiniKdc(): Unit = {
-val kdcDir = Utils.createTempDir()

Review comment:
   Yeah, can we just use `eventually` instead of having another class? This 
is in the guide, see also 
https://github.com/databricks/scala-style-guide#misc_well_tested_method





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


SparkQA commented on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624431979


   **[Test build #122337 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122337/testReport)**
 for PR 28456 at commit 
[`fd08a7a`](https://github.com/apache/spark/commit/fd08a7a8bfd98a916bda4cc4d60d74a3c02f4e26).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28459: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28459:
URL: https://github.com/apache/spark/pull/28459#issuecomment-624430843







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28459: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28459:
URL: https://github.com/apache/spark/pull/28459#issuecomment-624430843







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28459: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-05 Thread GitBox


SparkQA commented on pull request #28459:
URL: https://github.com/apache/spark/pull/28459#issuecomment-624430556


   **[Test build #122336 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122336/testReport)**
 for PR 28459 at commit 
[`39c1b3c`](https://github.com/apache/spark/commit/39c1b3c22e2e00b96e0cef6efed46070b833917b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28459: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-05 Thread GitBox


HyukjinKwon commented on a change in pull request #28459:
URL: https://github.com/apache/spark/pull/28459#discussion_r420533700



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2605,7 +2606,10 @@ object SQLConf {
   DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0",
 s"Use '${ARROW_PYSPARK_FALLBACK_ENABLED.key}' instead of it."),
   DeprecatedConfig(SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key, "3.0",
-s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it.")
+s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it."),
+  DeprecatedConfig(OPTIMIZER_METADATA_ONLY.key, "3.0",

Review comment:
   Technically we shouldn't necessarily go through deprecation to remove as 
it's an internal configuration but let me stay conservative here by deprecating 
first.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28459: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-05 Thread GitBox


HyukjinKwon commented on a change in pull request #28459:
URL: https://github.com/apache/spark/pull/28459#discussion_r420533700



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2605,7 +2606,10 @@ object SQLConf {
   DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0",
 s"Use '${ARROW_PYSPARK_FALLBACK_ENABLED.key}' instead of it."),
   DeprecatedConfig(SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key, "3.0",
-s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it.")
+s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it."),
+  DeprecatedConfig(OPTIMIZER_METADATA_ONLY.key, "3.0",

Review comment:
   Technically we shouldn't go through deprecation to remove as it's an 
internal configuration but let me stay conservative here by deprecating first.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #28459: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-05 Thread GitBox


HyukjinKwon opened a new pull request #28459:
URL: https://github.com/apache/spark/pull/28459


   ### What changes were proposed in this pull request?
   
   This PR proposes to deprecate 'spark.sql.optimizer.metadataOnly' 
configuration and remove it in the future release.
   
   ### Why are the changes needed?
   
   This optimization can cause a potential correctness issue, see also 
SPARK-26709.
   Also, it seems difficult to extend the optimization. Basically you should 
whitelist all available functions. It costs some maintenance overhead, see also 
SPARK-31590.
   
   Looks we should just better let users use `SparkSessionExtensions` instead 
if they must use, and remove it in Spark side.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, setting `spark.sql.optimizer.metadataOnly` will shod a deprecation 
warning:
   
   ```scala
   scala> spark.conf.unset("spark.sql.optimizer.metadataOnly")
   ```
   ```
   20/05/06 12:57:23 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been
deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
to prevent a potential correctness issue. If you must use, use 
'SparkSessionExtensions' instead to 
   inject it as a custom rule.
   ```
   ```scala
   scala> spark.conf.set("spark.sql.optimizer.metadataOnly", "true")
   ```
   ```
   20/05/06 12:57:44 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been 
   deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
to prevent a potential correctness issue.If you must use, use 
'SparkSessionExtensions' instead to 
   inject it as a custom rule.
   ```
   
   ### How was this patch tested?
   
   Manually tested.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624428995


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122334/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


SparkQA commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624429091


   **[Test build #122335 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122335/testReport)**
 for PR 28412 at commit 
[`141feed`](https://github.com/apache/spark/commit/141feed4a2537ff0481e4d03abad858df3540777).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624428988


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


SparkQA removed a comment on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624426364


   **[Test build #122334 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122334/testReport)**
 for PR 28458 at commit 
[`563cee9`](https://github.com/apache/spark/commit/563cee9e8a2edd2a0f1025de531002488de1e63a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


SparkQA commented on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624428969


   **[Test build #122334 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122334/testReport)**
 for PR 28458 at commit 
[`563cee9`](https://github.com/apache/spark/commit/563cee9e8a2edd2a0f1025de531002488de1e63a).
* This patch **fails to generate documentation**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `  instr.logWarning(s\"All labels belong to a single class and 
fitIntercept=false. It's a \" +`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624428988







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624428013







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


baohe-zhang commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r420531106



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##
@@ -1167,6 +1168,58 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 // At this point the disk data either does not exist or was deleted 
because it failed to
 // load, so the event log needs to be replayed.
 
+// TODO: Maybe need to do other check to see if there's enough memory to
+// use inMemoryStore.
+if (hybridKVStoreEnabled) {
+  logInfo("Using HybridKVStore as KVStore")
+  var retried = false
+  var store: HybridKVStore = null
+  while(store == null) {
+val reader = EventLogFileReader(fs, new Path(logDir, attempt.logPath),
+  attempt.lastIndex)
+val isCompressed = reader.compressionCodec.isDefined
+logInfo(s"Leasing disk manager space for app $appId / 
${attempt.info.attemptId}...")
+val lease = dm.lease(reader.totalSize, isCompressed)
+try {
+  val s = new HybridKVStore()
+  val levelDB = KVUtils.open(lease.tmpPath, metadata)
+  s.setLevelDB(levelDB)
+
+  s.startBackgroundThreadToWriteToDB(new 
HybridKVStore.SwitchingToLevelDBListener {
+override def onSwitchingToLevelDBSuccess: Unit = {
+  levelDB.close()
+  val newStorePath = lease.commit(appId, attempt.info.attemptId)
+  s.setLevelDB(KVUtils.open(newStorePath, metadata))
+  logInfo(s"Completely switched to use leveldb for app" +
+  s" $appId / ${attempt.info.attemptId}")
+}
+
+override def onSwitchingToLevelDBFail(e: Exception): Unit = {
+  logWarning(s"Failed to switch to use LevelDb for app" +
+  s" $appId / ${attempt.info.attemptId}")
+  levelDB.close()
+  throw e

Review comment:
   1 is addressed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624428013







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


baohe-zhang commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r420530840



##
File path: core/src/main/scala/org/apache/spark/internal/config/History.scala
##
@@ -195,4 +195,9 @@ private[spark] object History {
   .version("3.0.0")
   .booleanConf
   .createWithDefault(true)
+
+  val HYBRID_KVSTORE_ENABLED = 
ConfigBuilder("spark.history.store.hybridKVStore.enabled")
+.version("3.0.1")

Review comment:
   Addressed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


zhengruifeng commented on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624427337


   performace test on the first 10,000 instances of 
`webspam_wc_normalized_trigram`
   
   code:
   ```scala
   val df = spark.read.option("numFeatures", 
"8289919").format("libsvm").load("/data1/Datasets/webspam/webspam_wc_normalized_trigram.svm.10k").withColumn("label",
 (col("label")+1)/2)
   df.persist(StorageLevel.MEMORY_AND_DISK)
   df.count
   
   val lr = new LogisticRegression().setBlockSize(1).setMaxIter(10)
   lr.fit(df)
   
   val results = Seq(1, 4, 16, 64, 256, 1024, 4096).map { size => val start = 
System.currentTimeMillis; val model = lr.setBlockSize(size).fit(df); val end = 
System.currentTimeMillis; (size, model.coefficients, end - start) }
   ```
   
   results:
   ```
   scala> results.map(_._3)
   res17: Seq[Long] = List(33948, 425923, 129811, 56288, 47587, 42816, 39809)
   
   
   scala> results.map(_._2).foreach(coef => println(coef.toString.take(100)))
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
(8289919,[549219,551719,592137,592138,592141,592154,592160,592162,592163,592164,592166,592167,592168
   
   scala> results.map(_._2).foreach(coef => 
println(coef.toString.takeRight(100)))
   
87,-1188.1053920127556,335.5565308836645,-135.79302172669907,849.0515530033497,-27.040836637047736])
   
91,-1188.105392012755,335.55653088366444,-135.79302172669907,849.0515530033497,-27.040836637047736])
   
9,-1188.1053920127551,335.55653088366444,-135.79302172669904,849.0515530033495,-27.040836637047725])
   
94,-1188.1053920127556,335.55653088366444,-135.79302172669904,849.0515530033495,-27.04083663704773])
   
1,-1188.1053920127551,335.55653088366444,-135.79302172669904,849.0515530033493,-27.040836637047722])
   
5,-1188.1053920127556,335.55653088366444,-135.79302172669904,849.0515530033495,-27.040836637047736])
   
29,-1188.105392012756,335.55653088366444,-135.79302172669904,849.0515530033495,-27.040836637047736])
   ```
   
   **blockSize==1**
   
![lr_sparse_1](https://user-images.githubusercontent.com/7322292/81136686-e8acfb00-8f8e-11ea-9267-cf847b37eb81.png)
   
   **blockSize=16**
   
![lr_sparse_16](https://user-images.githubusercontent.com/7322292/81136711-f19dcc80-8f8e-11ea-990a-7b8fe32d81d9.png)
   
   
   
   test with **Master**:
   ```
   import org.apache.spark.ml.classification._
   import org.apache.spark.storage.StorageLevel
   
   val df = spark.read.option("numFeatures", 
"8289919").format("libsvm").load("/data1/Datasets/webspam/webspam_wc_normalized_trigram.svm.10k").withColumn("label",
 (col("label")+1)/2)
   df.persist(StorageLevel.MEMORY_AND_DISK)
   df.count
   
   val lr = new LogisticRegression().setMaxIter(10)
   lr.fit(df)
   
   val start = System.currentTimeMillis; val model = lr.setMaxIter(10).fit(df); 
val end = System.currentTimeMillis; end - start
   
   
   
   scala> val start = System.currentTimeMillis; val model = 
lr.setMaxIter(10).fit(df); val end = System.currentTimeMillis; end - start
   start: Long = 1588735447883  
   
   model: org.apache.spark.ml.classification.LogisticRegressionModel = 
LogisticRegressionModel: uid=logreg_99d29a0ecc13, numClasses=2, 
numFeatures=8289919
   end: Long = 1588735483170
   res3: Long = 35287
   ```
   In this PR, when blockSize==1, the duration is 33948, so there will be no 
performance regression on sparse datasets.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624426569







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624426569







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


SparkQA commented on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624426364


   **[Test build #122334 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122334/testReport)**
 for PR 28458 at commit 
[`563cee9`](https://github.com/apache/spark/commit/563cee9e8a2edd2a0f1025de531002488de1e63a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


zhengruifeng commented on pull request #28458:
URL: https://github.com/apache/spark/pull/28458#issuecomment-624426340


   performace test on 
[`epsilon_normalized.t`](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html)
   
   code:
   ```scala
   import org.apache.spark.ml.classification._
   import org.apache.spark.storage.StorageLevel
   
   
   val df = spark.read.option("numFeatures", 
"2000").format("libsvm").load("/data1/Datasets/epsilon/epsilon_normalized.t").withColumn("label",
 (col("label")+1)/2)
   df.persist(StorageLevel.MEMORY_AND_DISK)
   df.count
   
   val lr = new LogisticRegression().setBlockSize(1).setMaxIter(10)
   lr.fit(df)
   
   
   val results = Seq(1, 4, 16, 64, 256, 1024, 4096).map { size => val start = 
System.currentTimeMillis; val model = lr.setBlockSize(size).fit(df); val end = 
System.currentTimeMillis; (size, model.coefficients, end - start) }
   ```
   
   results:
   ```
   scala> results.map(_._3)
   res3: Seq[Long] = List(31076, 6771, 6732, 7590, 7186, 7094, 7276)
   
   scala> results.map(_._2).foreach(coef => println(coef.toString.take(100)))
   
[2.1557250220880024,-0.22767392418436572,4.569220246330072,0.04739667339597046,0.14605181933865558,-
   
[2.1557250220880064,-0.22767392418436597,4.56922024633007,0.04739667339596951,0.14605181933865646,-0
   
[2.1557250220880064,-0.2276739241843657,4.569220246330077,0.04739667339597028,0.14605181933865646,-0
   
[2.155725022088007,-0.2276739241843664,4.569220246330073,0.047396673395969764,0.14605181933865688,-0
   
[2.1557250220880038,-0.22767392418436605,4.569220246330073,0.04739667339597022,0.14605181933865458,-
   
[2.1557250220880033,-0.22767392418436683,4.569220246330072,0.047396673395970035,0.14605181933865727,
   
[2.1557250220880033,-0.22767392418436613,4.56922024633007,0.0473966733959703,0.1460518193386559,-0.0
   ```
   
   **blockSize==1**
   
![lr_dense_1](https://user-images.githubusercontent.com/7322292/81136589-966bda00-8f8e-11ea-8613-724333ce0af2.png)
   
   **blockSize==256**
   
![lr_dense_256](https://user-images.githubusercontent.com/7322292/81136603-a4b9f600-8f8e-11ea-9c5d-47b9ed44f5dc.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request #28458: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors

2020-05-05 Thread GitBox


zhengruifeng opened a new pull request #28458:
URL: https://github.com/apache/spark/pull/28458


   ### What changes were proposed in this pull request?
   1, reorg the `fit` method in LR to several blocks (`createModel`, 
`createBounds`, `createOptimizer`, `createInitCoefWithInterceptMatrix`);
   2, add new param blockSize;
   3, if blockSize==1, keep original behavior, code path `trainOnRows`; 
   4, if blockSize>1, standardize and stack input vectors to blocks (like 
ALS/MLP), code path `trainOnBlocks`
   
   ### Why are the changes needed?
   On dense dataset `epsilon_normalized.t`:
   1, reduce RAM to persist traing dataset; (save about 40% RAM)
   2, use Level-2 BLAS routines; (4x ~ 5x faster)
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, a new param is added
   
   ### How was this patch tested?
   existing and added testsuites



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624425092







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


SparkQA removed a comment on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624419881


   **[Test build #122331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122331/testReport)**
 for PR 28456 at commit 
[`42f090c`](https://github.com/apache/spark/commit/42f090cb386dc2e95641ce991e3911b0a60e6f55).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624425131







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624425131







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624425092







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


SparkQA commented on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624425018


   **[Test build #122331 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122331/testReport)**
 for PR 28456 at commit 
[`42f090c`](https://github.com/apache/spark/commit/42f090cb386dc2e95641ce991e3911b0a60e6f55).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


SparkQA commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624424796


   **[Test build #122333 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122333/testReport)**
 for PR 28412 at commit 
[`34e4564`](https://github.com/apache/spark/commit/34e4564bcf17b1e9875b61445fb4af963af89449).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624423684


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122332/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624423679


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


SparkQA removed a comment on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624423178


   **[Test build #122332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122332/testReport)**
 for PR 28412 at commit 
[`e6707bd`](https://github.com/apache/spark/commit/e6707bd61e747131b118b8326d01cba159df614a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


SparkQA commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624423673


   **[Test build #122332 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122332/testReport)**
 for PR 28412 at commit 
[`e6707bd`](https://github.com/apache/spark/commit/e6707bd61e747131b118b8326d01cba159df614a).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624423679







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624423497







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624423497







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


SparkQA commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624423178


   **[Test build #122332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122332/testReport)**
 for PR 28412 at commit 
[`e6707bd`](https://github.com/apache/spark/commit/e6707bd61e747131b118b8326d01cba159df614a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


baohe-zhang commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r420525489



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##
@@ -1167,6 +1168,58 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 // At this point the disk data either does not exist or was deleted 
because it failed to
 // load, so the event log needs to be replayed.
 
+// TODO: Maybe need to do other check to see if there's enough memory to
+// use inMemoryStore.
+if (hybridKVStoreEnabled) {
+  logInfo("Using HybridKVStore as KVStore")
+  var retried = false
+  var store: HybridKVStore = null
+  while(store == null) {
+val reader = EventLogFileReader(fs, new Path(logDir, attempt.logPath),
+  attempt.lastIndex)
+val isCompressed = reader.compressionCodec.isDefined
+logInfo(s"Leasing disk manager space for app $appId / 
${attempt.info.attemptId}...")
+val lease = dm.lease(reader.totalSize, isCompressed)
+try {
+  val s = new HybridKVStore()
+  val levelDB = KVUtils.open(lease.tmpPath, metadata)
+  s.setLevelDB(levelDB)
+
+  s.startBackgroundThreadToWriteToDB(new 
HybridKVStore.SwitchingToLevelDBListener {
+override def onSwitchingToLevelDBSuccess: Unit = {
+  levelDB.close()
+  val newStorePath = lease.commit(appId, attempt.info.attemptId)
+  s.setLevelDB(KVUtils.open(newStorePath, metadata))
+  logInfo(s"Completely switched to use leveldb for app" +
+  s" $appId / ${attempt.info.attemptId}")
+}
+
+override def onSwitchingToLevelDBFail(e: Exception): Unit = {
+  logWarning(s"Failed to switch to use LevelDb for app" +
+  s" $appId / ${attempt.info.attemptId}")
+  levelDB.close()
+  throw e
+}
+  })
+
+  rebuildAppStore(s, reader, attempt.info.lastUpdated.getTime())
+  s.stopBackgroundThreadAndSwitchToLevelDB()
+  store = s
+} catch {
+  case _: IOException if !retried =>
+// compaction may touch the file(s) which app rebuild wants to read
+// compaction wouldn't run in short interval, so try again...
+logWarning(s"Exception occurred while rebuilding app $appId - 
trying again...")
+lease.rollback()

Review comment:
   Hybrid kvstore is now cleaned up explicitly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on a change in pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-05-05 Thread GitBox


baohe-zhang commented on a change in pull request #28412:
URL: https://github.com/apache/spark/pull/28412#discussion_r420524061



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##
@@ -1167,6 +1168,58 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 // At this point the disk data either does not exist or was deleted 
because it failed to
 // load, so the event log needs to be replayed.
 
+// TODO: Maybe need to do other check to see if there's enough memory to
+// use inMemoryStore.
+if (hybridKVStoreEnabled) {

Review comment:
   I found it's difficult to pass Lease to hybrid kvstore, since Lease 
class is only visible within the History package.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28393: [SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28393:
URL: https://github.com/apache/spark/pull/28393#issuecomment-624420189







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28393: [SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28393:
URL: https://github.com/apache/spark/pull/28393#issuecomment-624420189







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28393: [SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string

2020-05-05 Thread GitBox


SparkQA removed a comment on pull request #28393:
URL: https://github.com/apache/spark/pull/28393#issuecomment-624410522


   **[Test build #122329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122329/testReport)**
 for PR 28393 at commit 
[`d185264`](https://github.com/apache/spark/commit/d185264ae4acc585855b70b2ec3caa0e63c83043).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28393: [SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string

2020-05-05 Thread GitBox


SparkQA commented on pull request #28393:
URL: https://github.com/apache/spark/pull/28393#issuecomment-624420070


   **[Test build #122329 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122329/testReport)**
 for PR 28393 at commit 
[`d185264`](https://github.com/apache/spark/commit/d185264ae4acc585855b70b2ec3caa0e63c83043).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


SparkQA commented on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624419881


   **[Test build #122331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122331/testReport)**
 for PR 28456 at commit 
[`42f090c`](https://github.com/apache/spark/commit/42f090cb386dc2e95641ce991e3911b0a60e6f55).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


AmplabJenkins removed a comment on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624419421







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


SparkQA commented on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624419352


   **[Test build #122330 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122330/testReport)**
 for PR 28456 at commit 
[`1e415a9`](https://github.com/apache/spark/commit/1e415a903e07b4ff8d0eb9d5296f8208d7a22a3b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


SparkQA removed a comment on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624413848


   **[Test build #122330 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122330/testReport)**
 for PR 28456 at commit 
[`1e415a9`](https://github.com/apache/spark/commit/1e415a903e07b4ff8d0eb9d5296f8208d7a22a3b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28456: [SPARK-31235][YARN] Fix test "specify a more specific type for the ap…

2020-05-05 Thread GitBox


AmplabJenkins commented on pull request #28456:
URL: https://github.com/apache/spark/pull/28456#issuecomment-624419421







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #28164: [SPARK-31393][SQL] Show the correct alias in schema for expression

2020-05-05 Thread GitBox


beliefer commented on pull request #28164:
URL: https://github.com/apache/spark/pull/28164#issuecomment-624418126


   cc @HyukjinKwon 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >