date:20200322

[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make 
ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602399308
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24891/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest 
Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602399307
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make 
ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602399307
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest 
Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602399308
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24891/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest 
Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602398991
 
 
   **[Test build #120178 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120178/testReport)**
 for PR 27982 at commit 
[`594b830`](https://github.com/apache/spark/commit/594b830450a15c67746a47cc37f4baa01354).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on issue #27980: [SPARK-31221][SQL] Rebase any date-times in conversions to/from Java types

2020-03-22 Thread GitBox

MaxGekk commented on issue #27980: [SPARK-31221][SQL] Rebase any date-times in 
conversions to/from Java types
URL: https://github.com/apache/spark/pull/27980#issuecomment-602398253
 
 
   @cloud-fan @HyukjinKwon Please, review this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make 
ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602397229
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24890/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27982: [SPARK-31222][ML] Make 
ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602397225
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest 
Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602397225
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27982: [SPARK-31222][ML] Make ANOVATest 
Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602397229
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24890/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] 
Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#discussion_r396222712
 
 

 ##
 File path: mllib/src/test/scala/org/apache/spark/ml/stat/ANOVATestSuite.scala
 ##
 @@ -144,22 +144,30 @@ class ANOVATestSuite
   }
 
   test("test DataFrame with sparse vector") {
-val df = spark.createDataFrame(Seq(
-  (3, Vectors.sparse(6, Array((0, 6.0), (1, 7.0), (3, 7.0), (4, 6.0,
-  (1, Vectors.sparse(6, Array((1, 9.0), (2, 6.0), (4, 5.0), (5, 9.0,
-  (3, Vectors.sparse(6, Array((1, 9.0), (2, 3.0), (4, 5.0), (5, 5.0,
-  (2, Vectors.dense(Array(0.0, 9.0, 8.0, 5.0, 6.0, 4.0))),
-  (2, Vectors.dense(Array(8.0, 9.0, 6.0, 5.0, 4.0, 4.0))),
-  (3, Vectors.dense(Array(8.0, 9.0, 6.0, 4.0, 0.0, 0.0)))
-)).toDF("label", "features")
+val data = Seq(
+  (3, Vectors.dense(Array(6.0, 7.0, 0.0, 7.0, 6.0, 0.0, 0.0))),
+  (1, Vectors.dense(Array(0.0, 9.0, 6.0, 0.0, 5.0, 9.0, 0.0))),
+  (3, Vectors.dense(Array(0.0, 9.0, 3.0, 0.0, 5.0, 5.0, 0.0))),
+  (2, Vectors.dense(Array(0.0, 9.0, 8.0, 5.0, 6.0, 4.0, 0.0))),
+  (2, Vectors.dense(Array(8.0, 9.0, 6.0, 5.0, 4.0, 4.0, 0.0))),
+  (3, Vectors.dense(Array(8.0, 9.0, 6.0, 4.0, 0.0, 0.0, 0.0
 
-val anovaResult = ANOVATest.test(df, "features", "label")
-val (pValues: Vector, fValues: Vector) =
-  anovaResult.select("pValues", "fValues")
-.as[(Vector, Vector)].head()
-assert(pValues ~== Vectors.dense(0.71554175, 0.71554175, 0.34278574, 
0.45824059, 0.84633632,
-  0.15673368) relTol 1e-6)
-assert(fValues ~== Vectors.dense(0.375, 0.375, 1.5625, 1.02364865, 
0.17647059,
-  3.66) relTol 1e-6)
+val df1 = spark.createDataFrame(data.map(t => (t._1, t._2.toDense)))
+  .toDF("label", "features")
+val df2 = spark.createDataFrame(data.map(t => (t._1, t._2.toSparse)))
+  .toDF("label", "features")
+val df3 = spark.createDataFrame(data.map(t => (t._1, t._2.compressed)))
+  .toDF("label", "features")
+
+Seq(df1, df2, df3).foreach { df =>
+  val anovaResult = ANOVATest.test(df, "features", "label")
+  val (pValues: Vector, fValues: Vector) =
+anovaResult.select("pValues", "fValues")
+  .as[(Vector, Vector)].head()
+  assert(pValues ~== Vectors.dense(0.71554175, 0.71554175, 0.34278574, 
0.45824059, 0.84633632,
+0.15673368, Double.NaN) relTol 1e-6)
 
 Review comment:
   for column only containing zero values, sklearn also returns `nan`:
   ```python
   X = np.zeros([3,5])
   
   y = [1,2,3]
   
   f_classif(X, y)
   
/home/zrf/Applications/anaconda3/lib/python3.7/site-packages/sklearn/feature_selection/_univariate_selection.py:110:
 RuntimeWarning: invalid value encountered in true_divide
 msw = sswn / float(dfwn)
   Out[24]: (array([nan, nan, nan, nan, nan]), array([nan, nan, nan, nan, nan]))
   =
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #27972: [SPARK-31207][CORE] Ensure the total number of blocks to fetch equals to the sum of local/hostLocal/remote blocks

2020-03-22 Thread GitBox

cloud-fan commented on a change in pull request #27972: [SPARK-31207][CORE] 
Ensure the total number of blocks to fetch equals to the sum of 
local/hostLocal/remote blocks
URL: https://github.com/apache/spark/pull/27972#discussion_r396222718
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
 ##
 @@ -312,19 +310,57 @@ final class ShuffleBlockFetcherIterator(
 hostLocalBlocks ++= blocksForAddress.map(info => (info._1, info._3))
 hostLocalBlockBytes += mergedBlockInfos.map(_.size).sum
   } else {
-numRemoteBlocks += blockInfos.size
 remoteBlockBytes += blockInfos.map(_._2).sum
 collectFetchRequests(address, blockInfos, collectedRemoteRequests)
   }
 }
+val numRemoteBlocks = collectedRemoteRequests.map(_.blocks.size).sum
 val totalBytes = localBlockBytes + remoteBlockBytes + hostLocalBlockBytes
+assert(numBlocksToFetch == localBlocks.size + hostLocalBlocks.size + 
numRemoteBlocks,
+  s"The number of non-empty blocks $numBlocksToFetch doesn't equal to the 
number of local " +
+s"blocks ${localBlocks.size} + the number of host-local blocks 
${hostLocalBlocks.size} " +
+s"+ the number of remote blocks ${numRemoteBlocks}.")
 logInfo(s"Getting $numBlocksToFetch (${Utils.bytesToString(totalBytes)}) 
non-empty blocks " +
   s"including ${localBlocks.size} 
(${Utils.bytesToString(localBlockBytes)}) local and " +
   s"${hostLocalBlocks.size} (${Utils.bytesToString(hostLocalBlockBytes)}) 
" +
   s"host-local and $numRemoteBlocks 
(${Utils.bytesToString(remoteBlockBytes)}) remote blocks")
 collectedRemoteRequests
   }
 
+  def createFetchRequest(
+  blocks: Seq[FetchBlockInfo],
+  address: BlockManagerId,
+  curRequestSize: Long): FetchRequest = {
+logDebug(s"Creating fetch request of $curRequestSize at $address "
+  + s"with ${blocks.size} blocks")
+FetchRequest(address, blocks)
+  }
+
+  def createFetchRequests(
+  curBlocks: ArrayBuffer[FetchBlockInfo],
 
 Review comment:
   does this parameter have to be an `ArrayBuffer` here? We don't mutate it in 
this method.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

SparkQA commented on issue #27982: [SPARK-31222][ML] Make ANOVATest 
Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#issuecomment-602396898
 
 
   **[Test build #120177 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120177/testReport)**
 for PR 27982 at commit 
[`cd968ff`](https://github.com/apache/spark/commit/cd968ffe90aef52e37acdb37d5fc6261143fb20c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

zhengruifeng commented on a change in pull request #27982: [SPARK-31222][ML] 
Make ANOVATest Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982#discussion_r396222712
 
 

 ##
 File path: mllib/src/test/scala/org/apache/spark/ml/stat/ANOVATestSuite.scala
 ##
 @@ -144,22 +144,30 @@ class ANOVATestSuite
   }
 
   test("test DataFrame with sparse vector") {
-val df = spark.createDataFrame(Seq(
-  (3, Vectors.sparse(6, Array((0, 6.0), (1, 7.0), (3, 7.0), (4, 6.0,
-  (1, Vectors.sparse(6, Array((1, 9.0), (2, 6.0), (4, 5.0), (5, 9.0,
-  (3, Vectors.sparse(6, Array((1, 9.0), (2, 3.0), (4, 5.0), (5, 5.0,
-  (2, Vectors.dense(Array(0.0, 9.0, 8.0, 5.0, 6.0, 4.0))),
-  (2, Vectors.dense(Array(8.0, 9.0, 6.0, 5.0, 4.0, 4.0))),
-  (3, Vectors.dense(Array(8.0, 9.0, 6.0, 4.0, 0.0, 0.0)))
-)).toDF("label", "features")
+val data = Seq(
+  (3, Vectors.dense(Array(6.0, 7.0, 0.0, 7.0, 6.0, 0.0, 0.0))),
+  (1, Vectors.dense(Array(0.0, 9.0, 6.0, 0.0, 5.0, 9.0, 0.0))),
+  (3, Vectors.dense(Array(0.0, 9.0, 3.0, 0.0, 5.0, 5.0, 0.0))),
+  (2, Vectors.dense(Array(0.0, 9.0, 8.0, 5.0, 6.0, 4.0, 0.0))),
+  (2, Vectors.dense(Array(8.0, 9.0, 6.0, 5.0, 4.0, 4.0, 0.0))),
+  (3, Vectors.dense(Array(8.0, 9.0, 6.0, 4.0, 0.0, 0.0, 0.0
 
-val anovaResult = ANOVATest.test(df, "features", "label")
-val (pValues: Vector, fValues: Vector) =
-  anovaResult.select("pValues", "fValues")
-.as[(Vector, Vector)].head()
-assert(pValues ~== Vectors.dense(0.71554175, 0.71554175, 0.34278574, 
0.45824059, 0.84633632,
-  0.15673368) relTol 1e-6)
-assert(fValues ~== Vectors.dense(0.375, 0.375, 1.5625, 1.02364865, 
0.17647059,
-  3.66) relTol 1e-6)
+val df1 = spark.createDataFrame(data.map(t => (t._1, t._2.toDense)))
+  .toDF("label", "features")
+val df2 = spark.createDataFrame(data.map(t => (t._1, t._2.toSparse)))
+  .toDF("label", "features")
+val df3 = spark.createDataFrame(data.map(t => (t._1, t._2.compressed)))
+  .toDF("label", "features")
+
+Seq(df1, df2, df3).foreach { df =>
+  val anovaResult = ANOVATest.test(df, "features", "label")
+  val (pValues: Vector, fValues: Vector) =
+anovaResult.select("pValues", "fValues")
+  .as[(Vector, Vector)].head()
+  assert(pValues ~== Vectors.dense(0.71554175, 0.71554175, 0.34278574, 
0.45824059, 0.84633632,
+0.15673368, Double.NaN) relTol 1e-6)
 
 Review comment:
   for column only containing zero values, sklearn will return nan:
   ```python
   X = np.zeros([3,5])
   
   y = [1,2,3]
   
   f_classif(X, y)
   
/home/zrf/Applications/anaconda3/lib/python3.7/site-packages/sklearn/feature_selection/_univariate_selection.py:110:
 RuntimeWarning: invalid value encountered in true_divide
 msw = sswn / float(dfwn)
   Out[24]: (array([nan, nan, nan, nan, nan]), array([nan, nan, nan, nan, nan]))
   =
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng opened a new pull request #27982: [SPARK-31222][ML] Make ANOVATest Sparsity-Aware

2020-03-22 Thread GitBox

zhengruifeng opened a new pull request #27982: [SPARK-31222][ML] Make ANOVATest 
Sparsity-Aware
URL: https://github.com/apache/spark/pull/27982
 
 
   ### What changes were proposed in this pull request?
   when input dataset is sparse, make `ANOVATest` only process non-zero value
   
   
   ### Why are the changes needed?
   for performance
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   ### How was this patch tested?
   existing testsuites
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27959: [SPARK-31190][SQL] ScalaReflection should erasure non user defined AnyVal type

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27959: [SPARK-31190][SQL] 
ScalaReflection should erasure non user defined AnyVal type
URL: https://github.com/apache/spark/pull/27959#issuecomment-602394345
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120172/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27959: [SPARK-31190][SQL] ScalaReflection should erasure non user defined AnyVal type

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27959: [SPARK-31190][SQL] 
ScalaReflection should erasure non user defined AnyVal type
URL: https://github.com/apache/spark/pull/27959#issuecomment-602394335
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27959: [SPARK-31190][SQL] ScalaReflection should erasure non user defined AnyVal type

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27959: [SPARK-31190][SQL] ScalaReflection 
should erasure non user defined AnyVal type
URL: https://github.com/apache/spark/pull/27959#issuecomment-602394345
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120172/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27959: [SPARK-31190][SQL] ScalaReflection should erasure non user defined AnyVal type

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27959: [SPARK-31190][SQL] ScalaReflection 
should erasure non user defined AnyVal type
URL: https://github.com/apache/spark/pull/27959#issuecomment-602394335
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27959: [SPARK-31190][SQL] ScalaReflection should erasure non user defined AnyVal type

2020-03-22 Thread GitBox

SparkQA removed a comment on issue #27959: [SPARK-31190][SQL] ScalaReflection 
should erasure non user defined AnyVal type
URL: https://github.com/apache/spark/pull/27959#issuecomment-602314748
 
 
   **[Test build #120172 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120172/testReport)**
 for PR 27959 at commit 
[`ceb2ce6`](https://github.com/apache/spark/commit/ceb2ce6eb49a39ab82568522af369b4e5c3ecd13).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] Move rebase tests to `AvroSuite` and check the rebase flag out of function bodies

2020-03-22 Thread GitBox

cloud-fan commented on issue #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] Move 
rebase tests to `AvroSuite` and check the rebase flag out of function bodies
URL: https://github.com/apache/spark/pull/27977#issuecomment-602393765
 
 
   thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27959: [SPARK-31190][SQL] ScalaReflection should erasure non user defined AnyVal type

2020-03-22 Thread GitBox

SparkQA commented on issue #27959: [SPARK-31190][SQL] ScalaReflection should 
erasure non user defined AnyVal type
URL: https://github.com/apache/spark/pull/27959#issuecomment-602393783
 
 
   **[Test build #120172 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120172/testReport)**
 for PR 27959 at commit 
[`ceb2ce6`](https://github.com/apache/spark/commit/ceb2ce6eb49a39ab82568522af369b4e5c3ecd13).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `// `case class Foo(i: Int) extends AnyVal` will return type `Int` 
instead of `Foo`.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] Move rebase tests to `AvroSuite` and check the rebase flag out of function bodies

2020-03-22 Thread GitBox

cloud-fan commented on issue #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] Move 
rebase tests to `AvroSuite` and check the rebase flag out of function bodies
URL: https://github.com/apache/spark/pull/27977#issuecomment-602393675
 
 
   thanks, merging to 3.0!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan removed a comment on issue #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] Move rebase tests to `AvroSuite` and check the rebase flag out of function bodies

2020-03-22 Thread GitBox

cloud-fan removed a comment on issue #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] 
Move rebase tests to `AvroSuite` and check the rebase flag out of function 
bodies
URL: https://github.com/apache/spark/pull/27977#issuecomment-602393675
 
 
   thanks, merging to 3.0!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #27185: [SPARK-30494][SQL] Fix cached data leakage during replacing an existing view

2020-03-22 Thread GitBox

dongjoon-hyun commented on issue #27185: [SPARK-30494][SQL] Fix cached data 
leakage during replacing an existing view
URL: https://github.com/apache/spark/pull/27185#issuecomment-602392216
 
 
   Hi, @LantaoJin . Could you make a backport PR against `branch-2.4`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #27185: [SPARK-30494][SQL] Fix cached data leakage during replacing an existing view

2020-03-22 Thread GitBox

dongjoon-hyun closed pull request #27185: [SPARK-30494][SQL] Fix cached data 
leakage during replacing an existing view
URL: https://github.com/apache/spark/pull/27185
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27185: [SPARK-30494][SQL] Fix the leak of cached data when replace an existing temp view

2020-03-22 Thread GitBox

dongjoon-hyun commented on a change in pull request #27185: [SPARK-30494][SQL] 
Fix the leak of cached data when replace an existing temp view
URL: https://github.com/apache/spark/pull/27185#discussion_r396217365
 
 

 ##
 File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
 ##
 @@ -1122,4 +1122,47 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils
   assert(!spark.catalog.isCached("t1"))
 }
   }
+
+  test("SPARK-30494 avoid duplicated cached RDD when replace an existing 
view") {
+withTempView("tempView") {
+  spark.catalog.clearCache()
+  sql("create or replace temporary view tempView as select 1")
+  sql("cache table tempView")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isDefined)
+  sql("create or replace temporary view tempView as select 1, 2")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isEmpty)
+  sql("cache table tempView")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
2")).isDefined)
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isEmpty)
+}
+
+withGlobalTempView("tempGlobalTempView") {
+  spark.catalog.clearCache()
+  sql("create or replace global temporary view tempGlobalTempView as 
select 1")
+  sql("cache table global_temp.tempGlobalTempView")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isDefined)
+  sql("create or replace global temporary view tempGlobalTempView as 
select 1, 2")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isEmpty)
+  sql("cache table global_temp.tempGlobalTempView")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
2")).isDefined)
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isEmpty)
+}
+
+withView("view1") {
+  spark.catalog.clearCache()
+  sql("create or replace view view1 as select 1")
+  sql("cache table view1")
+  sql("create or replace view view1 as select 1, 2")
+  sql("cache table view1")
+  // the cached plan of persisted view likes below,
+  // so we cannot use the same assertion of temp view.
+  // SubqueryAlias
+  //|
+  //+ View
+  //|
+  //+ Project[1 AS 1]
+  spark.sharedState.cacheManager.uncacheQuery(spark.table("view1"), 
cascade = false)
+  assert(spark.sharedState.cacheManager.isEmpty)
 
 Review comment:
   Then, please remove this misleading test case between line 1149 and 1165.
   > no cached data leak for persisted view


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27185: [SPARK-30494][SQL] Fix the leak of cached data when replace an existing temp view

2020-03-22 Thread GitBox

dongjoon-hyun commented on a change in pull request #27185: [SPARK-30494][SQL] 
Fix the leak of cached data when replace an existing temp view
URL: https://github.com/apache/spark/pull/27185#discussion_r396217365
 
 

 ##
 File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
 ##
 @@ -1122,4 +1122,47 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils
   assert(!spark.catalog.isCached("t1"))
 }
   }
+
+  test("SPARK-30494 avoid duplicated cached RDD when replace an existing 
view") {
+withTempView("tempView") {
+  spark.catalog.clearCache()
+  sql("create or replace temporary view tempView as select 1")
+  sql("cache table tempView")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isDefined)
+  sql("create or replace temporary view tempView as select 1, 2")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isEmpty)
+  sql("cache table tempView")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
2")).isDefined)
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isEmpty)
+}
+
+withGlobalTempView("tempGlobalTempView") {
+  spark.catalog.clearCache()
+  sql("create or replace global temporary view tempGlobalTempView as 
select 1")
+  sql("cache table global_temp.tempGlobalTempView")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isDefined)
+  sql("create or replace global temporary view tempGlobalTempView as 
select 1, 2")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isEmpty)
+  sql("cache table global_temp.tempGlobalTempView")
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 1, 
2")).isDefined)
+  assert(spark.sharedState.cacheManager.lookupCachedData(sql("select 
1")).isEmpty)
+}
+
+withView("view1") {
+  spark.catalog.clearCache()
+  sql("create or replace view view1 as select 1")
+  sql("cache table view1")
+  sql("create or replace view view1 as select 1, 2")
+  sql("cache table view1")
+  // the cached plan of persisted view likes below,
+  // so we cannot use the same assertion of temp view.
+  // SubqueryAlias
+  //|
+  //+ View
+  //|
+  //+ Project[1 AS 1]
+  spark.sharedState.cacheManager.uncacheQuery(spark.table("view1"), 
cascade = false)
+  assert(spark.sharedState.cacheManager.isEmpty)
 
 Review comment:
   Then, please remove this misleading test case between line 1149 and 1165.
   > no cached data leak for persisted view


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27916: [SPARK-30532] DataFrameStatFunctions to work with TABLE.COLUMN syntax

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27916: [SPARK-30532] 
DataFrameStatFunctions to work with TABLE.COLUMN syntax
URL: https://github.com/apache/spark/pull/27916#issuecomment-602383435
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24889/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27916: [SPARK-30532] DataFrameStatFunctions to work with TABLE.COLUMN syntax

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27916: [SPARK-30532] 
DataFrameStatFunctions to work with TABLE.COLUMN syntax
URL: https://github.com/apache/spark/pull/27916#issuecomment-602383432
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27916: [SPARK-30532] DataFrameStatFunctions to work with TABLE.COLUMN syntax

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27916: [SPARK-30532] DataFrameStatFunctions 
to work with TABLE.COLUMN syntax
URL: https://github.com/apache/spark/pull/27916#issuecomment-602383435
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24889/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27916: [SPARK-30532] DataFrameStatFunctions to work with TABLE.COLUMN syntax

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27916: [SPARK-30532] DataFrameStatFunctions 
to work with TABLE.COLUMN syntax
URL: https://github.com/apache/spark/pull/27916#issuecomment-602383432
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27916: [SPARK-30532] DataFrameStatFunctions to work with TABLE.COLUMN syntax

2020-03-22 Thread GitBox

SparkQA commented on issue #27916: [SPARK-30532] DataFrameStatFunctions to work 
with TABLE.COLUMN syntax
URL: https://github.com/apache/spark/pull/27916#issuecomment-602383174
 
 
   **[Test build #120176 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120176/testReport)**
 for PR 27916 at commit 
[`37f441e`](https://github.com/apache/spark/commit/37f441ed424d2ae2088565cf0ad6ddc68f039b85).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27916: [SPARK-30532] DataFrameStatFunctions to work with TABLE.COLUMN syntax

2020-03-22 Thread GitBox

HyukjinKwon commented on issue #27916: [SPARK-30532] DataFrameStatFunctions to 
work with TABLE.COLUMN syntax
URL: https://github.com/apache/spark/pull/27916#issuecomment-602381874
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] Move rebase tests to `AvroSuite` and check the rebase flag out of function bodies

2020-03-22 Thread GitBox

dongjoon-hyun commented on issue #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] Move 
rebase tests to `AvroSuite` and check the rebase flag out of function bodies
URL: https://github.com/apache/spark/pull/27977#issuecomment-602380901
 
 
   Thank you, @MaxGekk and @HyukjinKwon !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nchammas commented on a change in pull request #27928: [SPARK-31167][BUILD] Refactor how we track Python test/build dependencies

2020-03-22 Thread GitBox

nchammas commented on a change in pull request #27928: [SPARK-31167][BUILD] 
Refactor how we track Python test/build dependencies
URL: https://github.com/apache/spark/pull/27928#discussion_r396197611
 
 

 ##
 File path: dev/requirements.txt
 ##
 @@ -1,5 +1,10 @@
-flake8==3.5.0
+flake8==3.7.*
 
 Review comment:
   I mentioned it elsewhere but I'll mention it again here: Linters like flake8 
and pycodestyle introduce new checks in minor/feature releases. There is very 
high chance that every new check they introduce will flag new problems and fail 
the build.
   
   In fact, we saw exactly that behavior with pydocstyle just before we removed 
it. And I [experienced 
this](https://github.com/nchammas/flintrock/commit/9157b25d735ff6ef690cfdfb761f336bf999fc82)
 with pycodestyle in Flintrock before [pinning the 
version](https://github.com/nchammas/flintrock/commit/6c6d9562673b430d142d9062a99e9ac1c87366b8).
   
   I don't understand the point of waiting for the build to break before 
pinning or severely limiting the versions for libraries like these.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27248: [WIP][SPARK-30538][SQL] Control spark sql output small file by merge small partition

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27248: [WIP][SPARK-30538][SQL] 
Control spark sql output small file by merge small partition
URL: https://github.com/apache/spark/pull/27248#issuecomment-575452587
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27248: [WIP][SPARK-30538][SQL] Control spark sql output small file by merge small partition

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27248: [WIP][SPARK-30538][SQL] Control spark 
sql output small file by merge small partition
URL: https://github.com/apache/spark/pull/27248#issuecomment-602376013
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nchammas commented on a change in pull request #27928: [SPARK-31167][BUILD] Refactor how we track Python test/build dependencies

2020-03-22 Thread GitBox

nchammas commented on a change in pull request #27928: [SPARK-31167][BUILD] 
Refactor how we track Python test/build dependencies
URL: https://github.com/apache/spark/pull/27928#discussion_r396197611
 
 

 ##
 File path: dev/requirements.txt
 ##
 @@ -1,5 +1,10 @@
-flake8==3.5.0
+flake8==3.7.*
 
 Review comment:
   I mentioned it elsewhere but I'll mention it again here: Linters like flake8 
and pycodestyle introduce new checks in minor/feature releases. There is very 
high chance that every new check they introduce will flag new problems and fail 
the build. (In fact, we saw exactly that behavior with pydocstyle just before 
we removed it. And I experienced this with flake8 in Flintrock before [pinning 
the 
version](https://github.com/nchammas/flintrock/blob/52c6c84c9a1845b0ce89ca138172a6ec4cf0d632/requirements/developer.in#L4).)
   
   I don't understand the point of waiting for the build to break before 
pinning or severely limiting the versions for libraries like these.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nchammas commented on issue #27928: [SPARK-31167][BUILD] Refactor how we track Python test/build dependencies

2020-03-22 Thread GitBox

nchammas commented on issue #27928: [SPARK-31167][BUILD] Refactor how we track 
Python test/build dependencies
URL: https://github.com/apache/spark/pull/27928#issuecomment-602375329
 
 
   > With this change, we will have to maintain and keep `dev/requirements.txt` 
up to date. 
   
   Maybe this is the disconnect between our points of view, because so far I 
haven't really been following your objections to pinning. Assuming we pin every 
library, why do we have to keep `dev/requirements.txt` up-to-date?
   
   As long as we can build the docs, run tests, and do whatever else we need to 
do as part of regular development, that file can remain frozen as-is for years.
   
   It's only when we specifically want to use some new feature of, say, Sphinx, 
that we need to bump versions. But that will happen very rarely, I imagine not 
more than once every couple of years.
   
   Does that address your concern? Why do you think we'd need to touch that 
file more often than once in a long while?
   
   > We shouldn't pin `numpy` to encourage people to test the highest versions. 
It should ideally be `numpy>=1.7` according to `setup.py`.
   > 
   > * `numpy` is an explicit dependency for ML/MLlib in PySpark.
   
   But the specification of numpy in `dev/requirements.txt` is so that we can 
build our docs. (It seems strange, but yes, numpy is a requirement to build our 
Python API docs.)
   
   Maybe we can improve this by replacing numpy in `dev/requirements.txt` with 
a reference to `setup.py`. That way we can track PySpark dependencies (whether 
for building the docs or for general execution) in one place. This will also 
pick up the Pandas requirement. How does that sound?
   
   A separate issue I raised earlier is that, if we want to not pin our 
build/test dependencies, we need to  figure out what to do about the Spark 
Docker image and CI. Either those will also source the unpinned requirements 
from the same file, or we go back to having the requirements specified in 
duplicate--with pinned versions for Docker and CI, and without pinned versions 
for developers.
   
   Obviously, I'd prefer to pin everything and keep it in one place, but if you 
want to go one of these routes I guess I'll do that. I just want to understand 
and try to address your objections before going there.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liupc edited a comment on issue #27871: [SPARK-31105][CORE]Respect sql execution id for FIFO scheduling mode

2020-03-22 Thread GitBox

liupc edited a comment on issue #27871: [SPARK-31105][CORE]Respect sql 
execution id for FIFO scheduling mode
URL: https://github.com/apache/spark/pull/27871#issuecomment-602373911
 
 
   Thanks @dongjoon-hyun , let's spill the scopes, and add an option to respect 
`jobGroup` level priority in the `core` module. 
   And I think even in current approach, the congestion issue is serious, so 
this PR is not about to solve it, but I proposed another PR for this: 
https://github.com/apache/spark/pull/27862
   I really think this is helpful for OLAP senarios, and we test this in real 
workloads in xiaomi.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] Move rebase tests to `AvroSuite` and check the rebase flag out of function bodies

2020-03-22 Thread GitBox

HyukjinKwon commented on issue #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] Move 
rebase tests to `AvroSuite` and check the rebase flag out of function bodies
URL: https://github.com/apache/spark/pull/27977#issuecomment-602374308
 
 
   Merged to branch-3.0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] Move rebase tests to `AvroSuite` and check the rebase flag out of function bodies

2020-03-22 Thread GitBox

HyukjinKwon closed pull request #27977: [SPARK-31183][SQL][FOLLOWUP][3.0] Move 
rebase tests to `AvroSuite` and check the rebase flag out of function bodies
URL: https://github.com/apache/spark/pull/27977
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liupc edited a comment on issue #27871: [SPARK-31105][CORE]Respect sql execution id for FIFO scheduling mode

2020-03-22 Thread GitBox

liupc edited a comment on issue #27871: [SPARK-31105][CORE]Respect sql 
execution id for FIFO scheduling mode
URL: https://github.com/apache/spark/pull/27871#issuecomment-602373911
 
 
   Thanks @dongjoon-hyun , let's spill the scopes, and add an option to respect 
`jobGroup` level priority in the `core` module. 
   I really think this is helpful for OLAP senarios, and that's what we do in 
xiaomi.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liupc commented on issue #27871: [SPARK-31105][CORE]Respect sql execution id for FIFO scheduling mode

2020-03-22 Thread GitBox

liupc commented on issue #27871: [SPARK-31105][CORE]Respect sql execution id 
for FIFO scheduling mode
URL: https://github.com/apache/spark/pull/27871#issuecomment-602373911
 
 
   Thanks @dongjoon-hyun , let's spill the scopes, and add an option to respect 
`jobGroup` level priority in the `core` module. 
   I really think this is helpful for OLAP senarios.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #27898: [SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of Dstreams

2020-03-22 Thread GitBox

HyukjinKwon closed pull request #27898: [SPARK-31141][DSTREAMS][DOC] Add 
version information to the configuration of Dstreams
URL: https://github.com/apache/spark/pull/27898
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27898: [SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of Dstreams

2020-03-22 Thread GitBox

HyukjinKwon commented on issue #27898: [SPARK-31141][DSTREAMS][DOC] Add version 
information to the configuration of Dstreams
URL: https://github.com/apache/spark/pull/27898#issuecomment-602372894
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nchammas commented on a change in pull request #27928: [SPARK-31167][BUILD] Refactor how we track Python test/build dependencies

2020-03-22 Thread GitBox

nchammas commented on a change in pull request #27928: [SPARK-31167][BUILD] 
Refactor how we track Python test/build dependencies
URL: https://github.com/apache/spark/pull/27928#discussion_r396197611
 
 

 ##
 File path: dev/requirements.txt
 ##
 @@ -1,5 +1,10 @@
-flake8==3.5.0
+flake8==3.7.*
 
 Review comment:
   I mentioned it elsewhere but I'll mention it again here: Linters like flake8 
and pycodestyle introduce new checks in minor/feature releases. There is very 
high chance that every new check they introduce will flag new problems and fail 
the build. (In fact, we saw exactly that behavior with pydocstyle just before 
we removed it.)
   
   I don't understand the point of waiting for the build to break before 
pinning or severely limiting the versions for libraries like these.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #27497: [SPARK-30245][SQL][FOLLOWUP] Improve regex expression when pattern not changed

2020-03-22 Thread GitBox

beliefer commented on issue #27497: [SPARK-30245][SQL][FOLLOWUP] Improve regex 
expression when pattern not changed
URL: https://github.com/apache/spark/pull/27497#issuecomment-602364756
 
 
   @HyukjinKwon With pleasure do it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27981: [SPARK-31215][SQL][DOC] Add version information to the static configuration of SQL

2020-03-22 Thread GitBox

SparkQA commented on issue #27981: [SPARK-31215][SQL][DOC] Add version 
information to the static configuration of SQL
URL: https://github.com/apache/spark/pull/27981#issuecomment-602364594
 
 
   **[Test build #120175 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120175/testReport)**
 for PR 27981 at commit 
[`86d4eff`](https://github.com/apache/spark/commit/86d4eff3d3f8d5ba37d22eed81de912d8929309b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #27898: [SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of Dstreams

2020-03-22 Thread GitBox

beliefer commented on a change in pull request #27898: 
[SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of 
Dstreams
URL: https://github.com/apache/spark/pull/27898#discussion_r396195734
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -2483,6 +2483,7 @@ Spark subsystems.
 spark.streaming.receiver.maxRate and 
spark.streaming.kafka.maxRatePerPartition
 
 Review comment:
   Thanks for your explanation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #27898: [SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of Dstreams

2020-03-22 Thread GitBox

beliefer commented on issue #27898: [SPARK-31141][DSTREAMS][DOC] Add version 
information to the configuration of Dstreams
URL: https://github.com/apache/spark/pull/27898#issuecomment-602363897
 
 
   @HyukjinKwon OK. I updated them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on issue #27979: [SPARK-31138][ML][FOLLOWUP] ANOVA optimization

2020-03-22 Thread GitBox

zhengruifeng commented on issue #27979: [SPARK-31138][ML][FOLLOWUP] ANOVA 
optimization
URL: https://github.com/apache/spark/pull/27979#issuecomment-602363561
 
 
   Merged to master, thanks @srowen @huaxingao 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27981: [SPARK-31215][SQL][DOC] Add version information to the static configuration of SQL

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27981: [SPARK-31215][SQL][DOC] Add 
version information to the static configuration of SQL
URL: https://github.com/apache/spark/pull/27981#issuecomment-602363292
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27981: [SPARK-31215][SQL][DOC] Add version information to the static configuration of SQL

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27981: [SPARK-31215][SQL][DOC] Add 
version information to the static configuration of SQL
URL: https://github.com/apache/spark/pull/27981#issuecomment-602363298
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24888/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng closed pull request #27979: [SPARK-31138][ML][FOLLOWUP] ANOVA optimization

2020-03-22 Thread GitBox

zhengruifeng closed pull request #27979: [SPARK-31138][ML][FOLLOWUP] ANOVA 
optimization
URL: https://github.com/apache/spark/pull/27979
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27981: [SPARK-31215][SQL][DOC] Add version information to the static configuration of SQL

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27981: [SPARK-31215][SQL][DOC] Add version 
information to the static configuration of SQL
URL: https://github.com/apache/spark/pull/27981#issuecomment-602363292
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27981: [SPARK-31215][SQL][DOC] Add version information to the static configuration of SQL

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27981: [SPARK-31215][SQL][DOC] Add version 
information to the static configuration of SQL
URL: https://github.com/apache/spark/pull/27981#issuecomment-602363298
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24888/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #27931: [SPARK-31002][CORE][DOC][FOLLOWUP] Add version information to the configuration of Core

2020-03-22 Thread GitBox

beliefer commented on issue #27931: [SPARK-31002][CORE][DOC][FOLLOWUP] Add 
version information to the configuration of Core
URL: https://github.com/apache/spark/pull/27931#issuecomment-602362761
 
 
   @HyukjinKwon Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #27981: [SPARK-31215][SQL][DOC] Add version information to the static configuration of SQL

2020-03-22 Thread GitBox

beliefer commented on issue #27981: [SPARK-31215][SQL][DOC] Add version 
information to the static configuration of SQL
URL: https://github.com/apache/spark/pull/27981#issuecomment-602362943
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27861: [SPARK-30707][SQL]Window 
function set partitionSpec as order spec when orderSpec is empty
URL: https://github.com/apache/spark/pull/27861#issuecomment-602333577
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27861: [SPARK-30707][SQL]Window 
function set partitionSpec as order spec when orderSpec is empty
URL: https://github.com/apache/spark/pull/27861#issuecomment-602333612
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24887/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27861: [SPARK-30707][SQL]Window function set 
partitionSpec as order spec when orderSpec is empty
URL: https://github.com/apache/spark/pull/27861#issuecomment-602333577
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27861: [SPARK-30707][SQL]Window function set 
partitionSpec as order spec when orderSpec is empty
URL: https://github.com/apache/spark/pull/27861#issuecomment-602333612
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24887/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty

2020-03-22 Thread GitBox

SparkQA commented on issue #27861: [SPARK-30707][SQL]Window function set 
partitionSpec as order spec when orderSpec is empty
URL: https://github.com/apache/spark/pull/27861#issuecomment-602332620
 
 
   **[Test build #120174 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120174/testReport)**
 for PR 27861 at commit 
[`d369cbc`](https://github.com/apache/spark/commit/d369cbc61811f4334f511fcda9e191a89c040e3c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27861: [SPARK-30707][SQL]Window function set partitionSpec as order spec when orderSpec is empty

2020-03-22 Thread GitBox

HyukjinKwon commented on issue #27861: [SPARK-30707][SQL]Window function set 
partitionSpec as order spec when orderSpec is empty
URL: https://github.com/apache/spark/pull/27861#issuecomment-602330738
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-03-22 Thread GitBox

HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] 
Added a new option to use non-blobstore storage when writing into blobstore 
storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396186608
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##
 @@ -125,10 +163,22 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
 val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
 val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+// Hive sets session_path as 
HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+val sessionScratchDir = 
externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+  .client.getConf("_hive.hdfs.session.path", "")
 
 Review comment:
   Thanks, @moomindani. How does Hive behaves when `_hive.hdfs.session.path` is 
not set?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-03-22 Thread GitBox

HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] 
Added a new option to use non-blobstore storage when writing into blobstore 
storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396185493
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##
 @@ -97,7 +99,43 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
   options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
+   *
+   */
+  def getMRTmpPath(
+  hadoopConf: Configuration,
+  sessionScratchDir: String,
+  scratchDir: String): Path = {
+
+// Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-1',
+// which is ruled by 'hive.exec.scratchdir' including file system.
+// This is the same as Spark's #oldVersionExternalTempPath
+// Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is 
HIVE-7090
+// HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+// Here it uses session_path unless it's emtpy, otherwise uses scratchDir
+val sessionPath = 
Option(sessionScratchDir).filterNot(_.isEmpty).getOrElse(scratchDir)
 
 Review comment:
   Why do we need to do `Option` here if it becomes an empty string If 
`_hive.hdfs.session.path` is empty?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27497: [SPARK-30245][SQL][FOLLOWUP] Improve regex expression when pattern not changed

2020-03-22 Thread GitBox

HyukjinKwon commented on issue #27497: [SPARK-30245][SQL][FOLLOWUP] Improve 
regex expression when pattern not changed
URL: https://github.com/apache/spark/pull/27497#issuecomment-602326247
 
 
   Thank you so much for investigations @beliefer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static

2020-03-22 Thread GitBox

HyukjinKwon commented on issue #26875: [SPARK-30245][SQL] Add cache for Like 
and RLike when pattern is not static
URL: https://github.com/apache/spark/pull/26875#issuecomment-602326196
 
 
   Thank you so much @beliefer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27898: [SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of Dstreams

2020-03-22 Thread GitBox

HyukjinKwon commented on issue #27898: [SPARK-31141][DSTREAMS][DOC] Add version 
information to the configuration of Dstreams
URL: https://github.com/apache/spark/pull/27898#issuecomment-602325359
 
 
   Looks `spark.streaming.kafka.maxRatePerPartition` and 
`spark.streaming.kafka.minRatePerPartition` were removed from this PR. Can you 
also update PR description to exclude please @belifer?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on issue #27898: [SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of Dstreams

2020-03-22 Thread GitBox

HyukjinKwon edited a comment on issue #27898: [SPARK-31141][DSTREAMS][DOC] Add 
version information to the configuration of Dstreams
URL: https://github.com/apache/spark/pull/27898#issuecomment-602325359
 
 
   Looks `spark.streaming.kafka.maxRatePerPartition` and 
`spark.streaming.kafka.minRatePerPartition` were removed from this PR. Can you 
also update PR description to exclude please @beliefer?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make 
Locality wait time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-602324849
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27898: [SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of Dstreams

2020-03-22 Thread GitBox

HyukjinKwon commented on a change in pull request #27898: 
[SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of 
Dstreams
URL: https://github.com/apache/spark/pull/27898#discussion_r396182832
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -2551,21 +2558,24 @@ Spark subsystems.
 Kafka Integration guide
 for more details.
   
+  1.3.0
 
 
-spark.streaming.kafka.minRatePerPartition
-1
-
-  Minimum rate (number of records per second) at which data will be read 
from each Kafka
-  partition when using the new Kafka direct stream API.
-
+  spark.streaming.kafka.minRatePerPartition
 
 Review comment:
   @beliefer, I think you had to clarify you're working on this for each file 
as the logical group. Seems like that missing context is causing the 
miscommunication here with @gaborgsomogyi.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27207: [SPARK-18886][CORE] Make 
Locality wait time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-602324857
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120171/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27898: [SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of Dstreams

2020-03-22 Thread GitBox

HyukjinKwon commented on a change in pull request #27898: 
[SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of 
Dstreams
URL: https://github.com/apache/spark/pull/27898#discussion_r396182832
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -2551,21 +2558,24 @@ Spark subsystems.
 Kafka Integration guide
 for more details.
   
+  1.3.0
 
 
-spark.streaming.kafka.minRatePerPartition
-1
-
-  Minimum rate (number of records per second) at which data will be read 
from each Kafka
-  partition when using the new Kafka direct stream API.
-
+  spark.streaming.kafka.minRatePerPartition
 
 Review comment:
   @beliefer, I think you had to clarify you're working on this for each file 
as the logical group. Seems like that missing context is causing the 
miscommunication here with @gaborgsomogyi.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27898: [SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of Dstreams

2020-03-22 Thread GitBox

HyukjinKwon commented on a change in pull request #27898: 
[SPARK-31141][DSTREAMS][DOC] Add version information to the configuration of 
Dstreams
URL: https://github.com/apache/spark/pull/27898#discussion_r396182847
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -2483,6 +2483,7 @@ Spark subsystems.
 spark.streaming.receiver.maxRate and 
spark.streaming.kafka.maxRatePerPartition
 
 Review comment:
   @beliefer, I think you had to clarify you're working on this for each file 
as the logical group. Seems like that missing context is causing the 
miscommunication here with @gaborgsomogyi.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait 
time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-602324849
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27207: [SPARK-18886][CORE] Make Locality wait 
time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-602324857
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120171/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-03-22 Thread GitBox

SparkQA removed a comment on issue #27207: [SPARK-18886][CORE] Make Locality 
wait time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-602299952
 
 
   **[Test build #120171 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120171/testReport)**
 for PR 27207 at commit 
[`97688e6`](https://github.com/apache/spark/commit/97688e618cacc8442629300600ca2d3461891715).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27937: [SPARK-30127][SQL] Support case class parameter for typed Scala UDF

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27937: [SPARK-30127][SQL] Support 
case class parameter for typed Scala UDF
URL: https://github.com/apache/spark/pull/27937#issuecomment-602324391
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27937: [SPARK-30127][SQL] Support case class parameter for typed Scala UDF

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27937: [SPARK-30127][SQL] Support 
case class parameter for typed Scala UDF
URL: https://github.com/apache/spark/pull/27937#issuecomment-602324393
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24886/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #27931: [SPARK-31002][CORE][DOC][FOLLOWUP] Add version information to the configuration of Core

2020-03-22 Thread GitBox

HyukjinKwon closed pull request #27931: [SPARK-31002][CORE][DOC][FOLLOWUP] Add 
version information to the configuration of Core
URL: https://github.com/apache/spark/pull/27931
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27937: [SPARK-30127][SQL] Support case class parameter for typed Scala UDF

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27937: [SPARK-30127][SQL] Support case class 
parameter for typed Scala UDF
URL: https://github.com/apache/spark/pull/27937#issuecomment-602324393
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24886/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27937: [SPARK-30127][SQL] Support case class parameter for typed Scala UDF

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27937: [SPARK-30127][SQL] Support case class 
parameter for typed Scala UDF
URL: https://github.com/apache/spark/pull/27937#issuecomment-602324391
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-03-22 Thread GitBox

SparkQA commented on issue #27207: [SPARK-18886][CORE] Make Locality wait time 
measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-602324470
 
 
   **[Test build #120171 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120171/testReport)**
 for PR 27207 at commit 
[`97688e6`](https://github.com/apache/spark/commit/97688e618cacc8442629300600ca2d3461891715).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #27931: [SPARK-31002][CORE][DOC][FOLLOWUP] Add version information to the configuration of Core

2020-03-22 Thread GitBox

HyukjinKwon commented on issue #27931: [SPARK-31002][CORE][DOC][FOLLOWUP] Add 
version information to the configuration of Core
URL: https://github.com/apache/spark/pull/27931#issuecomment-602324227
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27937: [SPARK-30127][SQL] Support case class parameter for typed Scala UDF

2020-03-22 Thread GitBox

SparkQA commented on issue #27937: [SPARK-30127][SQL] Support case class 
parameter for typed Scala UDF
URL: https://github.com/apache/spark/pull/27937#issuecomment-602324100
 
 
   **[Test build #120173 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120173/testReport)**
 for PR 27937 at commit 
[`8e82f3f`](https://github.com/apache/spark/commit/8e82f3f75a770fc9c6163a483f297eac38c30edd).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #27937: [SPARK-30127][SQL] Support case class parameter for typed Scala UDF

2020-03-22 Thread GitBox

Ngone51 commented on a change in pull request #27937: [SPARK-30127][SQL] 
Support case class parameter for typed Scala UDF
URL: https://github.com/apache/spark/pull/27937#discussion_r396181906
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ##
 @@ -48,25 +46,87 @@ case class ScalaUDF(
 function: AnyRef,
 dataType: DataType,
 children: Seq[Expression],
-inputPrimitives: Seq[Boolean],
-inputTypes: Seq[AbstractDataType] = Nil,
+inputEncoders: Seq[Option[ExpressionEncoder[_]]] = Nil,
 udfName: Option[String] = None,
 nullable: Boolean = true,
 udfDeterministic: Boolean = true)
   extends Expression with NonSQLExpression with UserDefinedExpression {
 
   override lazy val deterministic: Boolean = udfDeterministic && 
children.forall(_.deterministic)
 
+  private lazy val resolvedEnc = mutable.HashMap[Int, ExpressionEncoder[_]]()
+
   override def toString: String = 
s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})"
 
+  /**
+   * The analyzer should be aware of Scala primitive types so as to make the
+   * UDF return null if there is any null input value of these types. On the
+   * other hand, Java UDFs can only have boxed types, thus this parameter will
+   * always be all false.
+   */
+  def inputPrimitives: Seq[Boolean] = {
+inputEncoders.map { encoderOpt =>
+  // It's possible that some of the inputs don't have a specific 
encoder(e.g. `Any`)
+  if (encoderOpt.isDefined) {
+val encoder = encoderOpt.get
+if (encoder.isSerializedAsStruct) {
+  // struct type is not primitive
+  false
+} else {
+  // `nullable` is false iff the type is primitive
+  !encoder.schema.head.nullable
+}
+  } else {
+// Any type is not primitive
+false
+  }
+}
+  }
+
+  /**
+   * The expected input types of this UDF, used to perform type coercion. If 
we do
+   * not want to perform coercion, simply use "Nil". Note that it would've been
+   * better to use Option of Seq[DataType] so we can use "None" as the case 
for no
+   * type coercion. However, that would require more refactoring of the 
codebase.
+   */
+  def inputTypes: Seq[AbstractDataType] = {
+inputEncoders.map { encoderOpt =>
+  if (encoderOpt.isDefined) {
+val encoder = encoderOpt.get
+if (encoder.isSerializedAsStruct) {
+  encoder.schema
+} else {
+  encoder.schema.head.dataType
+}
+  } else {
+AnyDataType
+  }
+}
+  }
+
+  private def createToScalaConverter(i: Int, dataType: DataType): Any => Any = 
{
+if (inputEncoders.isEmpty) {
+  // for untyped Scala UDF
+  CatalystTypeConverters.createToScalaConverter(dataType)
+} else {
+  val encoder = inputEncoders(i)
+  if (encoder.isDefined && encoder.get.isSerializedAsStructForTopLevel) {
+val enc = resolvedEnc.getOrElseUpdate(i, encoder.get.resolveAndBind())
 
 Review comment:
   make sense. updated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #27937: [SPARK-30127][SQL] Support case class parameter for typed Scala UDF

2020-03-22 Thread GitBox

Ngone51 commented on a change in pull request #27937: [SPARK-30127][SQL] 
Support case class parameter for typed Scala UDF
URL: https://github.com/apache/spark/pull/27937#discussion_r396181856
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
 ##
 @@ -200,8 +200,8 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
*/
   def register[RT: TypeTag, A1: TypeTag](name: String, func: Function1[A1, 
RT]): UserDefinedFunction = {
 val ScalaReflection.Schema(dataType, nullable) = 
ScalaReflection.schemaFor[RT]
-val inputSchemas: Seq[Option[ScalaReflection.Schema]] = 
Try(ScalaReflection.schemaFor[A1]).toOption :: Nil
-val udf = SparkUserDefinedFunction(func, dataType, 
inputSchemas).withName(name)
+val inputEncoders: Seq[Option[ExpressionEncoder[_]]] = 
Try(ExpressionEncoder[A1]()).toOption :: Nil
 
 Review comment:
   removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #27937: [SPARK-30127][SQL] Support case class parameter for typed Scala UDF

2020-03-22 Thread GitBox

Ngone51 commented on a change in pull request #27937: [SPARK-30127][SQL] 
Support case class parameter for typed Scala UDF
URL: https://github.com/apache/spark/pull/27937#discussion_r396181776
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ##
 @@ -48,25 +46,87 @@ case class ScalaUDF(
 function: AnyRef,
 dataType: DataType,
 children: Seq[Expression],
-inputPrimitives: Seq[Boolean],
-inputTypes: Seq[AbstractDataType] = Nil,
+inputEncoders: Seq[Option[ExpressionEncoder[_]]] = Nil,
 udfName: Option[String] = None,
 nullable: Boolean = true,
 udfDeterministic: Boolean = true)
   extends Expression with NonSQLExpression with UserDefinedExpression {
 
   override lazy val deterministic: Boolean = udfDeterministic && 
children.forall(_.deterministic)
 
+  private lazy val resolvedEnc = mutable.HashMap[Int, ExpressionEncoder[_]]()
+
   override def toString: String = 
s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})"
 
+  /**
+   * The analyzer should be aware of Scala primitive types so as to make the
+   * UDF return null if there is any null input value of these types. On the
+   * other hand, Java UDFs can only have boxed types, thus this parameter will
+   * always be all false.
+   */
+  def inputPrimitives: Seq[Boolean] = {
 
 Review comment:
   It can be `Nil`.  Previously, Java UDF returns `children.map(_ => false)` 
and it has the same affect with `Nil` indeed. And also, untyped Scala UDF 
always input `Nil`. 
   
   But for typed Scala UDF, it will aways has `inputPrimitives` and 
`inputTypes`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #27937: [SPARK-30127][SQL] Support case class parameter for typed Scala UDF

2020-03-22 Thread GitBox

Ngone51 commented on a change in pull request #27937: [SPARK-30127][SQL] 
Support case class parameter for typed Scala UDF
URL: https://github.com/apache/spark/pull/27937#discussion_r396181782
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ##
 @@ -48,25 +46,87 @@ case class ScalaUDF(
 function: AnyRef,
 dataType: DataType,
 children: Seq[Expression],
-inputPrimitives: Seq[Boolean],
-inputTypes: Seq[AbstractDataType] = Nil,
+inputEncoders: Seq[Option[ExpressionEncoder[_]]] = Nil,
 udfName: Option[String] = None,
 nullable: Boolean = true,
 udfDeterministic: Boolean = true)
   extends Expression with NonSQLExpression with UserDefinedExpression {
 
   override lazy val deterministic: Boolean = udfDeterministic && 
children.forall(_.deterministic)
 
+  private lazy val resolvedEnc = mutable.HashMap[Int, ExpressionEncoder[_]]()
+
   override def toString: String = 
s"${udfName.getOrElse("UDF")}(${children.mkString(", ")})"
 
+  /**
+   * The analyzer should be aware of Scala primitive types so as to make the
+   * UDF return null if there is any null input value of these types. On the
+   * other hand, Java UDFs can only have boxed types, thus this parameter will
+   * always be all false.
+   */
+  def inputPrimitives: Seq[Boolean] = {
+inputEncoders.map { encoderOpt =>
+  // It's possible that some of the inputs don't have a specific 
encoder(e.g. `Any`)
+  if (encoderOpt.isDefined) {
+val encoder = encoderOpt.get
+if (encoder.isSerializedAsStruct) {
+  // struct type is not primitive
+  false
+} else {
+  // `nullable` is false iff the type is primitive
+  !encoder.schema.head.nullable
+}
+  } else {
+// Any type is not primitive
+false
+  }
+}
+  }
+
+  /**
+   * The expected input types of this UDF, used to perform type coercion. If 
we do
+   * not want to perform coercion, simply use "Nil". Note that it would've been
+   * better to use Option of Seq[DataType] so we can use "None" as the case 
for no
+   * type coercion. However, that would require more refactoring of the 
codebase.
+   */
+  def inputTypes: Seq[AbstractDataType] = {
 
 Review comment:
   Similarly, the input types of Java UDF and untyped Scala UDF are always 
`Nil`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-03-22 Thread GitBox

moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] 
Added a new option to use non-blobstore storage when writing into blobstore 
storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396179637
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##
 @@ -125,10 +163,22 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
 val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
 val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+// Hive sets session_path as 
HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+val sessionScratchDir = 
externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+  .client.getConf("_hive.hdfs.session.path", "")
 
 Review comment:
   It is tested in this unit test.
   
https://github.com/apache/spark/pull/27690/files#diff-ee422d26750ba346c81b7f85b4b14577R46


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

2020-03-22 Thread GitBox

moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] 
Added a new option to use non-blobstore storage when writing into blobstore 
storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396178891
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##
 @@ -125,10 +163,22 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
 val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
 val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+// Hive sets session_path as 
HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+val sessionScratchDir = 
externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+  .client.getConf("_hive.hdfs.session.path", "")
 
 Review comment:
   If `_hive.hdfs.session.path` is empty, `getMRTmpPath` uses `scratchDir` 
instead of `sessionScratchDir` in this line:  `val sessionPath = 
Option(sessionScratchDir).filterNot(_.isEmpty).getOrElse(scratchDir)`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27928: [SPARK-31167][BUILD] Refactor how we track Python test/build dependencies

2020-03-22 Thread GitBox

HyukjinKwon commented on a change in pull request #27928: [SPARK-31167][BUILD] 
Refactor how we track Python test/build dependencies
URL: https://github.com/apache/spark/pull/27928#discussion_r396177554
 
 

 ##
 File path: dev/requirements.txt
 ##
 @@ -1,5 +1,10 @@
-flake8==3.5.0
+flake8==3.7.*
 
 Review comment:
   If we can assume these dependencies follow SemVer, it we should better use 
wildcards on minor versions ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on issue #27497: [SPARK-30245][SQL][FOLLOWUP] Improve regex expression when pattern not changed

2020-03-22 Thread GitBox

beliefer commented on issue #27497: [SPARK-30245][SQL][FOLLOWUP] Improve regex 
expression when pattern not changed
URL: https://github.com/apache/spark/pull/27497#issuecomment-602318148
 
 
   @maropu Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27959: [SPARK-31190][SQL] ScalaReflection should erasure non user defined AnyVal type

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27959: [SPARK-31190][SQL] 
ScalaReflection should erasure non user defined AnyVal type
URL: https://github.com/apache/spark/pull/27959#issuecomment-602315001
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24885/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27959: [SPARK-31190][SQL] ScalaReflection should erasure non user defined AnyVal type

2020-03-22 Thread GitBox

AmplabJenkins removed a comment on issue #27959: [SPARK-31190][SQL] 
ScalaReflection should erasure non user defined AnyVal type
URL: https://github.com/apache/spark/pull/27959#issuecomment-602314994
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27959: [SPARK-31190][SQL] ScalaReflection should erasure non user defined AnyVal type

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27959: [SPARK-31190][SQL] ScalaReflection 
should erasure non user defined AnyVal type
URL: https://github.com/apache/spark/pull/27959#issuecomment-602314994
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27959: [SPARK-31190][SQL] ScalaReflection should erasure non user defined AnyVal type

2020-03-22 Thread GitBox

AmplabJenkins commented on issue #27959: [SPARK-31190][SQL] ScalaReflection 
should erasure non user defined AnyVal type
URL: https://github.com/apache/spark/pull/27959#issuecomment-602315001
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24885/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 422 matches

Mail list logo