date:20200215

[GitHub] [spark] AmplabJenkins removed a comment on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27594: [GRAPHX] [MINOR] Fix typo 
setRest => setDest
URL: https://github.com/apache/spark/pull/27594#issuecomment-586574117
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27570: 
[SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
URL: https://github.com/apache/spark/pull/27570#discussion_r379880577
 
 

 ##
 File path: R/pkg/tests/fulltests/test_mllib_classification.R
 ##
 @@ -488,4 +488,36 @@ test_that("spark.naiveBayes", {
   expect_equal(class(collect(predictions)$clicked[1]), "character")
 })
 
+test_that("spark.fmClassifier", {
+  df <- withColumn(
+suppressWarnings(createDataFrame(iris)),
+"Species", otherwise(when(column("Species") == "Setosa", "Setosa"), 
"Not-Setosa")
+  )
+
+  model1 <- spark.fmClassifier(
+df,  Species ~ .,
+regParam = 0.01, maxIter = 10, fitLinear = TRUE, factorSize = 3
+  )
+
+  prediction1 <- predict(model1, df)
+  expect_is(prediction1, "SparkDataFrame")
+  expect_equal(summary(model1)$factorSize, 3)
+
+  # Test model save/load
+  if (windows_with_hadoop()) {
+modelPath <- tempfile(pattern = "spark-fmclassifier", fileext = ".tmp")
+write.ml(model1, modelPath)
+model2 <- read.ml(modelPath)
+
+expect_is(model2, "FMClassificationModel")
+
+prediction2 <- predict(model2, df)
+expect_equal(
+  collect(drop(prediction1, c("rawPrediction", "probability"))),
+  collect(drop(prediction2, c("rawPrediction", "probability")))
+)
+  }
+})
+
+
 
 Review comment:
   nit: delete extra line


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27570: 
[SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
URL: https://github.com/apache/spark/pull/27570#discussion_r379880985
 
 

 ##
 File path: R/pkg/R/mllib_classification.R
 ##
 @@ -649,3 +655,155 @@ setMethod("write.ml", signature(object = 
"NaiveBayesModel", path = "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
+#' Factorization Machines Classification Model
+#'
+#' \code{spark.fmClassifier} fits a factorization classification model against 
a SparkDataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load 
fitted models.
+#' Only categorical data is supported.
+#'
+#' @param data a \code{SparkDataFrame} of observations and labels for model 
fitting.
+#' @param formula a symbolic description of the model to be fitted. Currently 
only a few formula
+#'operators are supported, including '~', '.', ':', '+', and 
'-'.
+#' @param factorSize dimensionality of the factors.
+#' @param fitLinear whether to fit linear term.  # TODO Can we express this 
with formula?
+#' @param regParam the regularization parameter.
+#' @param miniBatchFraction the mini-batch fraction parameter.
+#' @param initStd the standard deviation of initial coefficients.
+#' @param maxIter maximum iteration number.
+#' @param stepSize stepSize parameter.
+#' @param tol convergence tolerance of iterations.
+#' @param solver solver parameter, supported options: "gd" (minibatch gradient 
descent) or "adamW".
+#' @param thresholds in binary classification, in range [0, 1]. If the 
estimated probability of
+#'   class label 1 is > threshold, then predict 1, else 0. A 
high threshold
+#'   encourages the model to predict 0 more often; a low 
threshold encourages the
+#'   model to predict 1 more often. Note: Setting this with 
threshold p is
+#'   equivalent to setting thresholds c(1-p, p).
+#' @param seed seed parameter for weights initialization.
+#' @param handleInvalid How to handle invalid data (unseen labels or NULL 
values) in features and
+#'  label column of string type.
+#'  Supported options: "skip" (filter out rows with 
invalid data),
+#' "error" (throw an error), "keep" 
(put invalid data in
+#' a special additional bucket, at 
index numLabels). Default
+#' is "error".
+#' @param ... additional arguments passed to the method.
+#' @return \code{spark.fmClassifier} returns a fitted Factorization Machines 
Classification Model.
+#' @rdname spark.fmClassifier
+#' @aliases spark.fmClassifier,SparkDataFrame,formula-method
+#' @name spark.fmClassifier
+#' @seealso \link{read.ml}
+#' @examples
+#' \dontrun{
+#' df <- read.df("data/mllib/sample_binary_classification_data.txt", source = 
"libsvm")
+#'
+#' # fit Factorization Machines Classification Model
+#' model <- spark.fmClassifier(
+#'df, label ~ features,
+#'regParam = 0.01, maxIter = 10, fitLinear = TRUE
+#'  )
+#'
+#' # get the summary of the model
+#' summary(model)
+#'
+#' # make predictions
+#' predictions <- predict(model, df)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.fmClassifier since 3.0.0
+setMethod("spark.fmClassifier", signature(data = "SparkDataFrame", formula = 
"formula"),
+  function(data, formula, factorSize = 8, fitLinear = TRUE, regParam = 
0.0,
+   miniBatchFraction = 1.0, initStd = 0.01, maxIter = 100, 
stepSize=1.0,
+   tol = 1e-6, solver = c("adamW", "gd"), thresholds = NULL, 
seed = NULL,
+   handleInvalid = c("error", "keep", "skip")) {
 
 Review comment:
   any reason why ```fitIntercept``` is not here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27570: 
[SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
URL: https://github.com/apache/spark/pull/27570#discussion_r379879918
 
 

 ##
 File path: examples/src/main/r/ml/fmClassifier.R
 ##
 @@ -0,0 +1,38 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/decisionTree.R
 
 Review comment:
   decisionTree.R -> fmClassifier.R


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27570: 
[SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
URL: https://github.com/apache/spark/pull/27570#discussion_r379880122
 
 

 ##
 File path: examples/src/main/r/ml/fmClassifier.R
 ##
 @@ -0,0 +1,38 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/decisionTree.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-fmclasfier-example")
+
+# $example on:classification$
+# Load training data
+df <- read.df("data/mllib/sample_libsvm_data.txt", source = "libsvm")
+training <- df
+test <- df
+
+# Fit a FM classification model
+model <- spark.fmClassifier(df, label ~ features)
+
 
 Review comment:
   add ```summary(model)``` as an example too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27570: 
[SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
URL: https://github.com/apache/spark/pull/27570#discussion_r379880075
 
 

 ##
 File path: examples/src/main/r/ml/fmClassifier.R
 ##
 @@ -0,0 +1,38 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/decisionTree.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-fmclasfier-example")
+
+# $example on:classification$
+# Load training data
+df <- read.df("data/mllib/sample_libsvm_data.txt", source = "libsvm")
+training <- df
+test <- df
+
+# Fit a FM classification model
+model <- spark.fmClassifier(df, label ~ features)
+
+# Prediction
+predictions <- predict(model, test)
 
 Review comment:
   add ```head(predictions)```?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest

2020-02-15 Thread GitBox

maropu commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest
URL: https://github.com/apache/spark/pull/27594#issuecomment-586677693
 
 
   Since this fix is trivial, it looks fine to me if the tests passed. cc: 
@srowen 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27570: 
[SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
URL: https://github.com/apache/spark/pull/27570#discussion_r379880740
 
 

 ##
 File path: R/pkg/R/mllib_classification.R
 ##
 @@ -649,3 +655,155 @@ setMethod("write.ml", signature(object = 
"NaiveBayesModel", path = "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
+#' Factorization Machines Classification Model
+#'
+#' \code{spark.fmClassifier} fits a factorization classification model against 
a SparkDataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load 
fitted models.
+#' Only categorical data is supported.
+#'
+#' @param data a \code{SparkDataFrame} of observations and labels for model 
fitting.
+#' @param formula a symbolic description of the model to be fitted. Currently 
only a few formula
+#'operators are supported, including '~', '.', ':', '+', and 
'-'.
+#' @param factorSize dimensionality of the factors.
+#' @param fitLinear whether to fit linear term.  # TODO Can we express this 
with formula?
+#' @param regParam the regularization parameter.
+#' @param miniBatchFraction the mini-batch fraction parameter.
+#' @param initStd the standard deviation of initial coefficients.
+#' @param maxIter maximum iteration number.
+#' @param stepSize stepSize parameter.
+#' @param tol convergence tolerance of iterations.
+#' @param solver solver parameter, supported options: "gd" (minibatch gradient 
descent) or "adamW".
+#' @param thresholds in binary classification, in range [0, 1]. If the 
estimated probability of
+#'   class label 1 is > threshold, then predict 1, else 0. A 
high threshold
+#'   encourages the model to predict 0 more often; a low 
threshold encourages the
+#'   model to predict 1 more often. Note: Setting this with 
threshold p is
+#'   equivalent to setting thresholds c(1-p, p).
+#' @param seed seed parameter for weights initialization.
+#' @param handleInvalid How to handle invalid data (unseen labels or NULL 
values) in features and
+#'  label column of string type.
+#'  Supported options: "skip" (filter out rows with 
invalid data),
+#' "error" (throw an error), "keep" 
(put invalid data in
+#' a special additional bucket, at 
index numLabels). Default
+#' is "error".
+#' @param ... additional arguments passed to the method.
+#' @return \code{spark.fmClassifier} returns a fitted Factorization Machines 
Classification Model.
+#' @rdname spark.fmClassifier
+#' @aliases spark.fmClassifier,SparkDataFrame,formula-method
+#' @name spark.fmClassifier
+#' @seealso \link{read.ml}
+#' @examples
+#' \dontrun{
+#' df <- read.df("data/mllib/sample_binary_classification_data.txt", source = 
"libsvm")
+#'
+#' # fit Factorization Machines Classification Model
+#' model <- spark.fmClassifier(
+#'df, label ~ features,
+#'regParam = 0.01, maxIter = 10, fitLinear = TRUE
+#'  )
+#'
+#' # get the summary of the model
+#' summary(model)
+#'
+#' # make predictions
+#' predictions <- predict(model, df)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.fmClassifier since 3.0.0
 
 Review comment:
   3.1.0?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest

2020-02-15 Thread GitBox

maropu commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest
URL: https://github.com/apache/spark/pull/27594#issuecomment-586677656
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27570: 
[SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
URL: https://github.com/apache/spark/pull/27570#discussion_r379880095
 
 

 ##
 File path: examples/src/main/r/ml/fmClassifier.R
 ##
 @@ -0,0 +1,38 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/decisionTree.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-fmclasfier-example")
+
+# $example on:classification$
+# Load training data
+df <- read.df("data/mllib/sample_libsvm_data.txt", source = "libsvm")
+training <- df
+test <- df
+
+# Fit a FM classification model
+model <- spark.fmClassifier(df, label ~ features)
+
+# Prediction
+predictions <- predict(model, test)
+# $example off:classification$
 
 Review comment:
   add ```sparkR.session.stop()```?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments

2020-02-15 Thread GitBox

maropu commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested 
bracketed comments
URL: https://github.com/apache/spark/pull/27495#issuecomment-586678147
 
 
   Looks fine now except for one comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liangz1 commented on a change in pull request #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

liangz1 commented on a change in pull request #27565: 
[WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods 
in Dataset
URL: https://github.com/apache/spark/pull/27565#discussion_r379881028
 
 

 ##
 File path: python/pyspark/sql/dataframe.py
 ##
 @@ -2153,6 +2153,59 @@ def transform(self, func):
   "should have been DataFrame." % 
type(result)
 return result
 
+@since(3.1)
+def sameSemantics(self, other):
+"""
+Returns `True` when the logical query plans inside both 
:class:`DataFrame`\\s are equal and
+therefore return same results.
+
+.. note:: The equality comparison here is simplified by tolerating the 
cosmetic differences
+such as attribute names.
+
+.. note::This API can compare both :class:`DataFrame`\\s very fast but 
can still return
+`False` on the :class:`DataFrame` that return the same results, 
for instance, from
+different plans. Such false negative semantic can be useful when 
caching as an example.
+
+>>> df1 = spark.range(100)
+>>> df2 = spark.range(100)
+>>> df3 = spark.range(100)
+>>> df4 = spark.range(100)
+>>> df1.withColumn("col1", df1.id * 
2).sameSemantics(df2.withColumn("col1", df2.id * 2))
+True
+>>> df1.withColumn("col1", df1.id * 
2).sameSemantics(df3.withColumn("col1", df3.id + 2))
+False
+>>> df1.withColumn("col1", df1.id * 
2).sameSemantics(df4.withColumn("col0", df4.id * 2))
+True
+"""
+if not isinstance(other, DataFrame):
+raise ValueError("other parameter should be of DataFrame; however, 
got %s"
+ % type(other))
+return self._jdf.sameSemantics(other._jdf)
+
+@since(3.1)
+def semanticHash(self):
+"""
+Returns a hash code of the logical query plan against this 
:class:`DataFrame`.
+
+.. note:: Unlike the standard hash code, the hash is calculated 
against the query plan
+simplified by tolerating the cosmetic differences such as 
attribute names.
+
+>>> df1 = spark.range(100)
+>>> df2 = spark.range(100)
+>>> df3 = spark.range(100)
+>>> df4 = spark.range(100)
+>>> df1.withColumn("col1", df1.id * 2).semanticHash() == \
+df2.withColumn("col1", df2.id * 2).semanticHash()
+True
+>>> df1.withColumn("col1", df1.id * 2).semanticHash() == \
+df3.withColumn("col1", df3.id + 2).semanticHash()
+False
 
 Review comment:
   ```
   Failed example:
   df1.withColumn("col1", df1.id * 2).semanticHash() == 
df3.withColumn("col1", df3.id + 2).semanticHash()
   Differences (ndiff with -expected +actual):
   - False
   + True
   ```
   Now we have another unexpected result. (Note L2176 passed, which is 
expected.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => 
setDest
URL: https://github.com/apache/spark/pull/27594#issuecomment-586677912
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => 
setDest
URL: https://github.com/apache/spark/pull/27594#issuecomment-586677913
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23250/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27594: [GRAPHX] [MINOR] Fix typo 
setRest => setDest
URL: https://github.com/apache/spark/pull/27594#issuecomment-586677912
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments

2020-02-15 Thread GitBox

maropu commented on a change in pull request #27495: [SPARK-28880][SQL] Support 
ANSI nested bracketed comments
URL: https://github.com/apache/spark/pull/27495#discussion_r379882153
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/comments.sql
 ##
 @@ -47,4 +45,5 @@ Now just one deep...
 */
 'deeply nested example' AS sixth;
 --QUERY-DELIMITER-END
-/* and this is the end of the file */
+-- [SPARK-30824] Support submit sql content only contains comments.
 
 Review comment:
   Is this an ANSI-related issue? 
https://issues.apache.org/jira/browse/SPARK-30824


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27594: [GRAPHX] [MINOR] Fix typo 
setRest => setDest
URL: https://github.com/apache/spark/pull/27594#issuecomment-586678768
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118494/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => 
setDest
URL: https://github.com/apache/spark/pull/27594#issuecomment-586678766
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27570: 
[SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
URL: https://github.com/apache/spark/pull/27570#discussion_r379880677
 
 

 ##
 File path: R/pkg/tests/fulltests/test_mllib_classification.R
 ##
 @@ -488,4 +488,36 @@ test_that("spark.naiveBayes", {
   expect_equal(class(collect(predictions)$clicked[1]), "character")
 })
 
+test_that("spark.fmClassifier", {
+  df <- withColumn(
+suppressWarnings(createDataFrame(iris)),
+"Species", otherwise(when(column("Species") == "Setosa", "Setosa"), 
"Not-Setosa")
+  )
+
+  model1 <- spark.fmClassifier(
+df,  Species ~ .,
+regParam = 0.01, maxIter = 10, fitLinear = TRUE, factorSize = 3
+  )
+
+  prediction1 <- predict(model1, df)
+  expect_is(prediction1, "SparkDataFrame")
 
 Review comment:
Can we also check the predict result here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liangz1 commented on a change in pull request #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

liangz1 commented on a change in pull request #27565: 
[WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods 
in Dataset
URL: https://github.com/apache/spark/pull/27565#discussion_r379881773
 
 

 ##
 File path: python/pyspark/sql/dataframe.py
 ##
 @@ -2153,6 +2153,59 @@ def transform(self, func):
   "should have been DataFrame." % 
type(result)
 return result
 
+@since(3.1)
+def sameSemantics(self, other):
+"""
+Returns `True` when the logical query plans inside both 
:class:`DataFrame`\\s are equal and
+therefore return same results.
+
+.. note:: The equality comparison here is simplified by tolerating the 
cosmetic differences
+such as attribute names.
+
+.. note::This API can compare both :class:`DataFrame`\\s very fast but 
can still return
+`False` on the :class:`DataFrame` that return the same results, 
for instance, from
+different plans. Such false negative semantic can be useful when 
caching as an example.
+
+>>> df1 = spark.range(100)
+>>> df2 = spark.range(100)
+>>> df3 = spark.range(100)
+>>> df4 = spark.range(100)
+>>> df1.withColumn("col1", df1.id * 
2).sameSemantics(df2.withColumn("col1", df2.id * 2))
+True
+>>> df1.withColumn("col1", df1.id * 
2).sameSemantics(df3.withColumn("col1", df3.id + 2))
+False
+>>> df1.withColumn("col1", df1.id * 
2).sameSemantics(df4.withColumn("col0", df4.id * 2))
+True
+"""
+if not isinstance(other, DataFrame):
+raise ValueError("other parameter should be of DataFrame; however, 
got %s"
+ % type(other))
+return self._jdf.sameSemantics(other._jdf)
+
+@since(3.1)
+def semanticHash(self):
+"""
+Returns a hash code of the logical query plan against this 
:class:`DataFrame`.
+
+.. note:: Unlike the standard hash code, the hash is calculated 
against the query plan
+simplified by tolerating the cosmetic differences such as 
attribute names.
+
+>>> df1 = spark.range(100)
+>>> df2 = spark.range(100)
+>>> df3 = spark.range(100)
+>>> df4 = spark.range(100)
+>>> df1.withColumn("col1", df1.id * 2).semanticHash() == \
+df2.withColumn("col1", df2.id * 2).semanticHash()
+True
+>>> df1.withColumn("col1", df1.id * 2).semanticHash() == \
+df3.withColumn("col1", df3.id + 2).semanticHash()
+False
 
 Review comment:
   More tests:
   ```
   >>> df1=spark.range(100)
   >>> df2=spark.range(100)
   >>> df3=spark.range(100)
   >>> df11=df1.withColumn("col1", df1.id +1)
   >>> df21=df2.withColumn("col1", df2.id -1)
   >>> df31=df3.withColumn("col1", df3.id *2)
   >>> df32=df3.withColumn("col1", df3.id +2)
   >>> df33=df3.withColumn("col1", df3.id /2)
   >>> df34=df3.withColumn("col1", df3.id -2)
   
   >>> df11.semanticHash()
   1855039936
   >>> df21.semanticHash()
   1855039936
   
   >>> df31.semanticHash()
   -1719131362
   >>> df32.semanticHash()
   -1719131362
   
   >>> df32.semanticHash()
   -1719131362
   >>> df34.semanticHash()
   -706037631


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27570: 
[SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
URL: https://github.com/apache/spark/pull/27570#discussion_r379881557
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/r/FMClassifierWrapper.scala
 ##
 @@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.classification.{FMClassificationModel, FMClassifier}
+import org.apache.spark.ml.feature.{IndexToString, RFormula}
+import org.apache.spark.ml.r.RWrapperUtils._
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FMClassifierWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String],
+val labels: Array[String]) extends MLWritable {
+  import FMClassifierWrapper._
+
+  private val fmClassificationModel: FMClassificationModel =
+pipeline.stages(1).asInstanceOf[FMClassificationModel]
+
+  lazy val rFeatures: Array[String] = if 
(fmClassificationModel.getFitIntercept) {
+Array("(Intercept)") ++ features
+  } else {
+features
+  }
+
+  lazy val rCoefficients: Array[Double] = if 
(fmClassificationModel.getFitIntercept) {
+Array(fmClassificationModel.intercept) ++ 
fmClassificationModel.linear.toArray
+  } else {
+fmClassificationModel.linear.toArray
+  }
+
+  lazy val rFactors = fmClassificationModel.factors.toArray
+
+  lazy val numClasses: Int = fmClassificationModel.numClasses
+
+  lazy val numFeatures: Int = fmClassificationModel.numFeatures
+
+  lazy val factorSize: Int = fmClassificationModel.getFactorSize
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+pipeline.transform(dataset)
+  .drop(PREDICTED_LABEL_INDEX_COL)
+  .drop(fmClassificationModel.getFeaturesCol)
+  .drop(fmClassificationModel.getLabelCol)
+  }
+
+  override def write: MLWriter = new 
FMClassifierWrapper.FMClassifierWrapperWriter(this)
+}
+
+private[r] object FMClassifierWrapper
+  extends MLReadable[FMClassifierWrapper] {
+
+  val PREDICTED_LABEL_INDEX_COL = "pred_label_idx"
+  val PREDICTED_LABEL_COL = "prediction"
+
+  def fit(  // scalastyle:ignore
+  data: DataFrame,
+  formula: String,
+  factorSize: Int,
+  fitLinear: Boolean,
+  regParam: Double,
+  miniBatchFraction: Double,
+  initStd: Double,
+  maxIter: Int,
+  stepSize: Double,
+  tol: Double,
+  solver: String,
+  seed: String,
+  thresholds: Array[Double],
+  handleInvalid: String): FMClassifierWrapper = {
+
+val rFormula = new RFormula()
+  .setFormula(formula)
+  .setForceIndexLabel(true)
+  .setHandleInvalid(handleInvalid)
+checkDataColumns(rFormula, data)
+val rFormulaModel = rFormula.fit(data)
+
+val fitIntercept = rFormula.hasIntercept
+
+// get labels and feature names from output schema
+val (features, labels) = getFeaturesAndLabels(rFormulaModel, data)
+
+// assemble and fit the pipeline
+val fmc = new FMClassifier()
+  .setFactorSize(factorSize)
+  .setFitLinear(fitLinear)
+  .setRegParam(regParam)
+  .setMiniBatchFraction(miniBatchFraction)
+  .setInitStd(initStd)
+  .setMaxIter(maxIter)
+  .setTol(tol)
+  .setSolver(solver)
+  .setFitIntercept(fitIntercept)
+  .setFeaturesCol(rFormula.getFeaturesCol)
+  .setLabelCol(rFormula.getLabelCol)
+  .setPredictionCol(PREDICTED_LABEL_INDEX_COL)
 
 Review comment:
   add ```setStepSize```?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:

[GitHub] [spark] MaxGekk commented on a change in pull request #27596: [WIP] Fix getting of time components before 1582 year

2020-02-15 Thread GitBox

MaxGekk commented on a change in pull request #27596: [WIP] Fix getting of time 
components before 1582 year
URL: https://github.com/apache/spark/pull/27596#discussion_r379881626
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ##
 @@ -290,32 +293,38 @@ class DateTimeUtilsSuite extends SparkFunSuite with 
Matchers with SQLHelper {
   }
 
   test("hours") {
-var input = date(2015, 3, 18, 13, 2, 11, 0, TimeZonePST)
-assert(getHours(input, TimeZonePST) === 13)
-assert(getHours(input, TimeZoneGMT) === 20)
-input = date(2015, 12, 8, 2, 7, 9, 0, TimeZonePST)
-assert(getHours(input, TimeZonePST) === 2)
-assert(getHours(input, TimeZoneGMT) === 10)
+var input = date(2015, 3, 18, 13, 2, 11, 0, zonePST)
+assert(getHours(input, zonePST) === 13)
+assert(getHours(input, zoneGMT) === 20)
+input = date(2015, 12, 8, 2, 7, 9, 0, zonePST)
+assert(getHours(input, zonePST) === 2)
+assert(getHours(input, zoneGMT) === 10)
+input = date(10, 1, 1, 0, 0, 0, 0, zonePST)
+assert(getHours(input, zonePST) === 0)
 
 Review comment:
   Before the changes:
   ```sql
   spark-sql> select hour(timestamp '0010-01-01 00:00:00');
   23
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder

2020-02-15 Thread GitBox

beliefer commented on a change in pull request #27592: [SPARK-30840][CORE][SQL] 
Add version property for ConfigEntry and ConfigBuilder
URL: https://github.com/apache/spark/pull/27592#discussion_r379881659
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala
 ##
 @@ -74,7 +76,8 @@ private[spark] abstract class ConfigEntry[T] (
   def defaultValue: Option[T] = None
 
   override def toString: String = {
-s"ConfigEntry(key=$key, defaultValue=$defaultValueString, doc=$doc, 
public=$isPublic)"
+s"ConfigEntry(key=$key, defaultValue=$defaultValueString, doc=$doc, " +
+  s"public=$isPublic, version = $version)"
 
 Review comment:
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27570: [SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27570: 
[SPARK-30820][SPARKR][ML] Add FMClassifier to SparkR
URL: https://github.com/apache/spark/pull/27570#discussion_r379880497
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/r/FMClassifierWrapper.scala
 ##
 @@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.classification.{FMClassificationModel, FMClassifier}
+import org.apache.spark.ml.feature.{IndexToString, RFormula}
+import org.apache.spark.ml.r.RWrapperUtils._
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FMClassifierWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String],
+val labels: Array[String]) extends MLWritable {
+  import FMClassifierWrapper._
+
+  private val fmClassificationModel: FMClassificationModel =
+pipeline.stages(1).asInstanceOf[FMClassificationModel]
+
+  lazy val rFeatures: Array[String] = if 
(fmClassificationModel.getFitIntercept) {
+Array("(Intercept)") ++ features
+  } else {
+features
+  }
+
+  lazy val rCoefficients: Array[Double] = if 
(fmClassificationModel.getFitIntercept) {
+Array(fmClassificationModel.intercept) ++ 
fmClassificationModel.linear.toArray
+  } else {
+fmClassificationModel.linear.toArray
+  }
+
+  lazy val rFactors = fmClassificationModel.factors.toArray
+
+  lazy val numClasses: Int = fmClassificationModel.numClasses
+
+  lazy val numFeatures: Int = fmClassificationModel.numFeatures
+
+  lazy val factorSize: Int = fmClassificationModel.getFactorSize
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+pipeline.transform(dataset)
+  .drop(PREDICTED_LABEL_INDEX_COL)
+  .drop(fmClassificationModel.getFeaturesCol)
+  .drop(fmClassificationModel.getLabelCol)
+  }
+
+  override def write: MLWriter = new 
FMClassifierWrapper.FMClassifierWrapperWriter(this)
+}
+
+private[r] object FMClassifierWrapper
+  extends MLReadable[FMClassifierWrapper] {
+
+  val PREDICTED_LABEL_INDEX_COL = "pred_label_idx"
+  val PREDICTED_LABEL_COL = "prediction"
+
+  def fit(  // scalastyle:ignore
+  data: DataFrame,
+  formula: String,
+  factorSize: Int,
+  fitLinear: Boolean,
+  regParam: Double,
+  miniBatchFraction: Double,
+  initStd: Double,
+  maxIter: Int,
+  stepSize: Double,
+  tol: Double,
+  solver: String,
+  seed: String,
+  thresholds: Array[Double],
+  handleInvalid: String): FMClassifierWrapper = {
+
+val rFormula = new RFormula()
+  .setFormula(formula)
+  .setForceIndexLabel(true)
+  .setHandleInvalid(handleInvalid)
+checkDataColumns(rFormula, data)
+val rFormulaModel = rFormula.fit(data)
+
+val fitIntercept = rFormula.hasIntercept
+
+// get labels and feature names from output schema
+val (features, labels) = getFeaturesAndLabels(rFormulaModel, data)
+
+// assemble and fit the pipeline
+val fmc = new FMClassifier()
+  .setFactorSize(factorSize)
+  .setFitLinear(fitLinear)
+  .setRegParam(regParam)
+  .setMiniBatchFraction(miniBatchFraction)
+  .setInitStd(initStd)
+  .setMaxIter(maxIter)
+  .setTol(tol)
+  .setSolver(solver)
+  .setFitIntercept(fitIntercept)
+  .setFeaturesCol(rFormula.getFeaturesCol)
+  .setLabelCol(rFormula.getLabelCol)
+  .setPredictionCol(PREDICTED_LABEL_INDEX_COL)
+
+if (seed != null) {
 
 Review comment:
   ```if (seed != null && seed.length > 0)```?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail:

[GitHub] [spark] AmplabJenkins commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 
'sameSemantics' and 'sementicHash' methods in Dataset
URL: https://github.com/apache/spark/pull/27565#issuecomment-586676945
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 
'sameSemantics' and 'sementicHash' methods in Dataset
URL: https://github.com/apache/spark/pull/27565#issuecomment-586676946
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23249/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest

2020-02-15 Thread GitBox

SparkQA commented on issue #27594: [GRAPHX] [MINOR] Fix typo setRest => setDest
URL: https://github.com/apache/spark/pull/27594#issuecomment-586677822
 
 
   **[Test build #118494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118494/testReport)**
 for PR 27594 at commit 
[`0bb6301`](https://github.com/apache/spark/commit/0bb630176af979773779ff2516a8f82d37970150).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882027
 
 

 ##
 File path: R/pkg/R/mllib_regression.R
 ##
 @@ -540,3 +546,150 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
+#' Factorization Machines Regression Model Model
 
 Review comment:
   nit: ```Model Model```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882145
 
 

 ##
 File path: R/pkg/R/mllib_regression.R
 ##
 @@ -540,3 +546,150 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
+#' Factorization Machines Regression Model Model
+#'
+#' \code{spark.fmRegressor} fits a factorization regression model against a 
SparkDataFrame.
+#' Users can call \code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load 
fitted models.
 
 Review comment:
   I guess also mention ```summary``` here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882502
 
 

 ##
 File path: R/pkg/tests/fulltests/test_mllib_regression.R
 ##
 @@ -551,4 +551,33 @@ test_that("spark.survreg", {
   }
 })
 
+
+test_that("spark.fmRegressor", {
+  df <- suppressWarnings(createDataFrame(iris))
+
+  model <- spark.fmRegressor(
+df,  Sepal_Width ~ .,
+regParam = 0.01, maxIter = 10, fitLinear = TRUE
+  )
+
+  prediction1 <- predict(model, df)
+  expect_is(prediction1, "SparkDataFrame")
+
+  # Test model save/load
+  if (windows_with_hadoop()) {
+modelPath <- tempfile(pattern = "spark-fmregressor", fileext = ".tmp")
+write.ml(model, modelPath)
+model2 <- read.ml(modelPath)
+
+expect_is(model2, "FMRegressionModel")
+
+prediction2 <- predict(model2, df)
+expect_equal(
+  collect(prediction1),
+  collect(prediction2)
+)
+  }
+})
+
+
 
 Review comment:
   nit: delete extra line


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379881943
 
 

 ##
 File path: R/pkg/R/mllib_regression.R
 ##
 @@ -540,3 +546,150 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
 
 Review comment:
   nit: delete extra line?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882615
 
 

 ##
 File path: examples/src/main/r/ml/fmRegressor.R
 ##
 @@ -0,0 +1,40 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/decisionTree.R
 
 Review comment:
   change this ```decisionTree.R```?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882600
 
 

 ##
 File path: examples/src/main/r/ml/fmRegressor.R
 ##
 @@ -0,0 +1,40 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/decisionTree.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-fmRegressor-example")
+
+# $example on
+# Load training data
+df <- read.df("data/mllib/sample_linear_regression_data.txt", source = 
"libsvm")
+training_test <- randomSplit(df, c(0.7, 0.3))
+training <- training_test[[1]]
+test <- training_test[[2]]
+
+
 
 Review comment:
   nit: delete extra line


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379883014
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/r/FMRegressorWrapper.scala
 ##
 @@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.feature.RFormula
+import org.apache.spark.ml.r.RWrapperUtils._
+import org.apache.spark.ml.regression.{FMRegressionModel, FMRegressor}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FMRegressorWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String]) extends MLWritable {
+  import FMRegressorWrapper._
+
+  private val fmRegressionModel: FMRegressionModel =
+pipeline.stages(1).asInstanceOf[FMRegressionModel]
+
+  lazy val rFeatures: Array[String] = if (fmRegressionModel.getFitIntercept) {
+Array("(Intercept)") ++ features
+  } else {
+features
+  }
+
+  lazy val rCoefficients: Array[Double] = if 
(fmRegressionModel.getFitIntercept) {
+Array(fmRegressionModel.intercept) ++ fmRegressionModel.linear.toArray
+  } else {
+fmRegressionModel.linear.toArray
+  }
+
+  lazy val rFactors = fmRegressionModel.factors.toArray
+
+  lazy val numFeatures: Int = fmRegressionModel.numFeatures
+
+  lazy val factorSize: Int = fmRegressionModel.getFactorSize
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+pipeline.transform(dataset)
+  .drop(fmRegressionModel.getFeaturesCol)
+  }
+
+  override def write: MLWriter = new 
FMRegressorWrapper.FMRegressorWrapperWriter(this)
+}
+
+private[r] object FMRegressorWrapper
+  extends MLReadable[FMRegressorWrapper] {
+
+  def fit(  // scalastyle:ignore
+  data: DataFrame,
+  formula: String,
+  factorSize: Int,
+  fitLinear: Boolean,
+  regParam: Double,
+  miniBatchFraction: Double,
+  initStd: Double,
+  maxIter: Int,
+  stepSize: Double,
+  tol: Double,
+  solver: String,
+  seed: String,
+  stringIndexerOrderType: String): FMRegressorWrapper = {
+
+val rFormula = new RFormula()
+  .setFormula(formula)
+  .setStringIndexerOrderType(stringIndexerOrderType)
+checkDataColumns(rFormula, data)
+val rFormulaModel = rFormula.fit(data)
+
+val fitIntercept = rFormula.hasIntercept
+
+// get feature names from output schema
+val schema = rFormulaModel.transform(data).schema
+val featureAttrs = 
AttributeGroup.fromStructField(schema(rFormulaModel.getFeaturesCol))
+  .attributes.get
+val features = featureAttrs.map(_.name.get)
+
+// assemble and fit the pipeline
+val fmr = new FMRegressor()
+  .setFactorSize(factorSize)
+  .setFitLinear(fitLinear)
+  .setRegParam(regParam)
+  .setMiniBatchFraction(miniBatchFraction)
+  .setInitStd(initStd)
+  .setMaxIter(maxIter)
+  .setTol(tol)
+  .setSolver(solver)
+  .setFitIntercept(fitIntercept)
+  .setFeaturesCol(rFormula.getFeaturesCol)
+
+if (seed != null) {
 
 Review comment:
   also check ```seed.length > 0```?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882860
 
 

 ##
 File path: examples/src/main/r/ml/fmRegressor.R
 ##
 @@ -0,0 +1,40 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/decisionTree.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-fmRegressor-example")
+
+# $example on
+# Load training data
+df <- read.df("data/mllib/sample_linear_regression_data.txt", source = 
"libsvm")
+training_test <- randomSplit(df, c(0.7, 0.3))
+training <- training_test[[1]]
+test <- training_test[[2]]
+
+
+# Fit a FM regression model
+model <- spark.fmRegressor(training, label ~ features)
+
+# Prediction
+predictions <- predict(model, test)
 
 Review comment:
   same as the classifier example, I guess add ```summary(model)```, 
```head(predictions)``` and also add ```sparkR.session.stop()``` in the end?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882920
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/r/FMRegressorWrapper.scala
 ##
 @@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.feature.RFormula
+import org.apache.spark.ml.r.RWrapperUtils._
+import org.apache.spark.ml.regression.{FMRegressionModel, FMRegressor}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FMRegressorWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String]) extends MLWritable {
+  import FMRegressorWrapper._
+
+  private val fmRegressionModel: FMRegressionModel =
+pipeline.stages(1).asInstanceOf[FMRegressionModel]
+
+  lazy val rFeatures: Array[String] = if (fmRegressionModel.getFitIntercept) {
+Array("(Intercept)") ++ features
+  } else {
+features
+  }
+
+  lazy val rCoefficients: Array[Double] = if 
(fmRegressionModel.getFitIntercept) {
+Array(fmRegressionModel.intercept) ++ fmRegressionModel.linear.toArray
+  } else {
+fmRegressionModel.linear.toArray
+  }
+
+  lazy val rFactors = fmRegressionModel.factors.toArray
+
+  lazy val numFeatures: Int = fmRegressionModel.numFeatures
+
+  lazy val factorSize: Int = fmRegressionModel.getFactorSize
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+pipeline.transform(dataset)
+  .drop(fmRegressionModel.getFeaturesCol)
+  }
+
+  override def write: MLWriter = new 
FMRegressorWrapper.FMRegressorWrapperWriter(this)
+}
+
+private[r] object FMRegressorWrapper
+  extends MLReadable[FMRegressorWrapper] {
+
+  def fit(  // scalastyle:ignore
+  data: DataFrame,
+  formula: String,
+  factorSize: Int,
+  fitLinear: Boolean,
+  regParam: Double,
+  miniBatchFraction: Double,
+  initStd: Double,
+  maxIter: Int,
+  stepSize: Double,
+  tol: Double,
+  solver: String,
+  seed: String,
+  stringIndexerOrderType: String): FMRegressorWrapper = {
+
+val rFormula = new RFormula()
+  .setFormula(formula)
+  .setStringIndexerOrderType(stringIndexerOrderType)
+checkDataColumns(rFormula, data)
+val rFormulaModel = rFormula.fit(data)
+
+val fitIntercept = rFormula.hasIntercept
+
+// get feature names from output schema
+val schema = rFormulaModel.transform(data).schema
+val featureAttrs = 
AttributeGroup.fromStructField(schema(rFormulaModel.getFeaturesCol))
+  .attributes.get
+val features = featureAttrs.map(_.name.get)
+
+// assemble and fit the pipeline
+val fmr = new FMRegressor()
+  .setFactorSize(factorSize)
+  .setFitLinear(fitLinear)
+  .setRegParam(regParam)
+  .setMiniBatchFraction(miniBatchFraction)
+  .setInitStd(initStd)
+  .setMaxIter(maxIter)
+  .setTol(tol)
+  .setSolver(solver)
+  .setFitIntercept(fitIntercept)
+  .setFeaturesCol(rFormula.getFeaturesCol)
 
 Review comment:
   add ```setStepSize```?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882476
 
 

 ##
 File path: R/pkg/tests/fulltests/test_mllib_regression.R
 ##
 @@ -551,4 +551,33 @@ test_that("spark.survreg", {
   }
 })
 
+
+test_that("spark.fmRegressor", {
+  df <- suppressWarnings(createDataFrame(iris))
+
+  model <- spark.fmRegressor(
+df,  Sepal_Width ~ .,
+regParam = 0.01, maxIter = 10, fitLinear = TRUE
+  )
+
+  prediction1 <- predict(model, df)
+  expect_is(prediction1, "SparkDataFrame")
 
 Review comment:
   I guess we may want to check the predict result too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882343
 
 

 ##
 File path: R/pkg/R/mllib_regression.R
 ##
 @@ -540,3 +546,150 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
+#' Factorization Machines Regression Model Model
+#'
+#' \code{spark.fmRegressor} fits a factorization regression model against a 
SparkDataFrame.
+#' Users can call \code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load 
fitted models.
+#'
+#' @param data a \code{SparkDataFrame} of observations and labels for model 
fitting.
+#' @param formula a symbolic description of the model to be fitted. Currently 
only a few formula
+#'operators are supported, including '~', '.', ':', '+', and 
'-'.
+#' @param factorSize dimensionality of the factors.
+#' @param fitLinear whether to fit linear term.  # TODO Can we express this 
with formula?
+#' @param regParam the regularization parameter.
+#' @param miniBatchFraction the mini-batch fraction parameter.
+#' @param initStd the standard deviation of initial coefficients.
+#' @param maxIter maximum iteration number.
+#' @param stepSize stepSize parameter.
+#' @param tol convergence tolerance of iterations.
+#' @param solver solver parameter, supported options: "gd" (minibatch gradient 
descent) or "adamW".
+#' @param seed seed parameter for weights initialization.
+#' @param stringIndexerOrderType how to order categories of a string feature 
column. This is used to
+#'   decide the base level of a string feature as 
the last category
+#'   after ordering is dropped when encoding 
strings. Supported options
+#'   are "frequencyDesc", "frequencyAsc", 
"alphabetDesc", and
+#'   "alphabetAsc". The default value is 
"frequencyDesc". When the
+#'   ordering is set to "alphabetDesc", this drops 
the same category
+#'   as R when encoding strings.
+#' @param ... additional arguments passed to the method.
+#' @return \code{spark.fmRegressor} returns a fitted Factorization Machines 
Regression Model.
+#'
+#' @rdname spark.fmRegressor
+#' @aliases spark.fmRegressor,SparkDataFrame,formula-method
+#' @name spark.fmRegressor
+#' @seealso \link{read.ml}
+#' @examples
+#' \dontrun{
+#' df <- read.df("data/mllib/sample_linear_regression_data.txt", source = 
"libsvm")
+#'
+#' # fit Factorization Machines Regression Model
+#' model <- spark.fmRegressor(
+#'df, label ~ features,
+#'regParam = 0.01, maxIter = 10, fitLinear = TRUE
+#'  )
+#'
+#' # get the summary of the model
+#' summary(model)
+#'
+#' # make predictions
+#' predictions <- predict(model, df)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.fmRegressor since 3.1.0
+setMethod("spark.fmRegressor", signature(data = "SparkDataFrame", formula = 
"formula"),
+  function(data, formula, factorSize = 8, fitLinear = TRUE, regParam = 
0.0,
+   miniBatchFraction = 1.0, initStd = 0.01, maxIter = 100, 
stepSize=1.0,
+   tol = 1e-6, solver = c("adamW", "gd"), seed = NULL,
+   stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
+  "alphabetDesc", "alphabetAsc")) {
+
+formula <- paste(deparse(formula), collapse = "")
+
+if (!is.null(seed)) {
+  seed <- as.character(as.integer(seed))
+}
+
+solver <- match.arg(solver)
+stringIndexerOrderType <- match.arg(stringIndexerOrderType)
+
+jobj <- callJStatic("org.apache.spark.ml.r.FMRegressorWrapper",
+"fit",
+data@sdf,
+formula,
+as.integer(factorSize),
+as.logical(fitLinear),
+as.numeric(regParam),
+as.numeric(miniBatchFraction),
+as.numeric(initStd),
+as.integer(maxIter),
+as.numeric(stepSize),
+as.numeric(tol),
+solver,
+seed,
+stringIndexerOrderType)
+new("FMRegressionModel", jobj = jobj)
+  })
+
+
 
 Review comment:
   nit: delete extra line?

[GitHub] [spark] huaxingao commented on a change in pull request #27571: [SPARK-30819][SPARKR][ML] Add FMRegressor wrapper to SparkR

2020-02-15 Thread GitBox

huaxingao commented on a change in pull request #27571: 
[SPARK-30819][SPARKR][ML]  Add FMRegressor wrapper to SparkR
URL: https://github.com/apache/spark/pull/27571#discussion_r379882358
 
 

 ##
 File path: R/pkg/R/mllib_regression.R
 ##
 @@ -540,3 +546,150 @@ setMethod("write.ml", signature(object = 
"AFTSurvivalRegressionModel", path = "c
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+
+#' Factorization Machines Regression Model Model
+#'
+#' \code{spark.fmRegressor} fits a factorization regression model against a 
SparkDataFrame.
+#' Users can call \code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load 
fitted models.
+#'
+#' @param data a \code{SparkDataFrame} of observations and labels for model 
fitting.
+#' @param formula a symbolic description of the model to be fitted. Currently 
only a few formula
+#'operators are supported, including '~', '.', ':', '+', and 
'-'.
+#' @param factorSize dimensionality of the factors.
+#' @param fitLinear whether to fit linear term.  # TODO Can we express this 
with formula?
+#' @param regParam the regularization parameter.
+#' @param miniBatchFraction the mini-batch fraction parameter.
+#' @param initStd the standard deviation of initial coefficients.
+#' @param maxIter maximum iteration number.
+#' @param stepSize stepSize parameter.
+#' @param tol convergence tolerance of iterations.
+#' @param solver solver parameter, supported options: "gd" (minibatch gradient 
descent) or "adamW".
+#' @param seed seed parameter for weights initialization.
+#' @param stringIndexerOrderType how to order categories of a string feature 
column. This is used to
+#'   decide the base level of a string feature as 
the last category
+#'   after ordering is dropped when encoding 
strings. Supported options
+#'   are "frequencyDesc", "frequencyAsc", 
"alphabetDesc", and
+#'   "alphabetAsc". The default value is 
"frequencyDesc". When the
+#'   ordering is set to "alphabetDesc", this drops 
the same category
+#'   as R when encoding strings.
+#' @param ... additional arguments passed to the method.
+#' @return \code{spark.fmRegressor} returns a fitted Factorization Machines 
Regression Model.
+#'
+#' @rdname spark.fmRegressor
+#' @aliases spark.fmRegressor,SparkDataFrame,formula-method
+#' @name spark.fmRegressor
+#' @seealso \link{read.ml}
+#' @examples
+#' \dontrun{
+#' df <- read.df("data/mllib/sample_linear_regression_data.txt", source = 
"libsvm")
+#'
+#' # fit Factorization Machines Regression Model
+#' model <- spark.fmRegressor(
+#'df, label ~ features,
+#'regParam = 0.01, maxIter = 10, fitLinear = TRUE
+#'  )
+#'
+#' # get the summary of the model
+#' summary(model)
+#'
+#' # make predictions
+#' predictions <- predict(model, df)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.fmRegressor since 3.1.0
+setMethod("spark.fmRegressor", signature(data = "SparkDataFrame", formula = 
"formula"),
+  function(data, formula, factorSize = 8, fitLinear = TRUE, regParam = 
0.0,
+   miniBatchFraction = 1.0, initStd = 0.01, maxIter = 100, 
stepSize=1.0,
+   tol = 1e-6, solver = c("adamW", "gd"), seed = NULL,
+   stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
+  "alphabetDesc", "alphabetAsc")) {
+
+formula <- paste(deparse(formula), collapse = "")
+
+if (!is.null(seed)) {
+  seed <- as.character(as.integer(seed))
+}
+
+solver <- match.arg(solver)
+stringIndexerOrderType <- match.arg(stringIndexerOrderType)
+
+jobj <- callJStatic("org.apache.spark.ml.r.FMRegressorWrapper",
+"fit",
+data@sdf,
+formula,
+as.integer(factorSize),
+as.logical(fitLinear),
+as.numeric(regParam),
+as.numeric(miniBatchFraction),
+as.numeric(initStd),
+as.integer(maxIter),
+as.numeric(stepSize),
+as.numeric(tol),
+solver,
+seed,
+stringIndexerOrderType)
+new("FMRegressionModel", jobj = jobj)
+  })
+
+
+#  Returns the summary of a FM Regression model produced by 
\code{spark.fmRegressor}
+
+#' @param

[GitHub] [spark] maropu commented on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder

2020-02-15 Thread GitBox

maropu commented on issue #27592: [SPARK-30840][CORE][SQL] Add version property 
for ConfigEntry and ConfigBuilder
URL: https://github.com/apache/spark/pull/27592#issuecomment-586677152
 
 
   For reviewers, can you add a screenshot of a html document generated by 
`gen-sql-config-docs.py` in the PR description?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

SparkQA commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 
'sameSemantics' and 'sementicHash' methods in Dataset
URL: https://github.com/apache/spark/pull/27565#issuecomment-586676866
 
 
   **[Test build #118493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118493/testReport)**
 for PR 27565 at commit 
[`61f7ca1`](https://github.com/apache/spark/commit/61f7ca11af14d399d0e2512c51c2f37c4aa4a38f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #27596: [WIP] Fix getting of time components before 1582 year

2020-02-15 Thread GitBox

MaxGekk commented on a change in pull request #27596: [WIP] Fix getting of time 
components before 1582 year
URL: https://github.com/apache/spark/pull/27596#discussion_r379881509
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ##
 @@ -290,32 +293,38 @@ class DateTimeUtilsSuite extends SparkFunSuite with 
Matchers with SQLHelper {
   }
 
   test("hours") {
-var input = date(2015, 3, 18, 13, 2, 11, 0, TimeZonePST)
-assert(getHours(input, TimeZonePST) === 13)
-assert(getHours(input, TimeZoneGMT) === 20)
-input = date(2015, 12, 8, 2, 7, 9, 0, TimeZonePST)
-assert(getHours(input, TimeZonePST) === 2)
-assert(getHours(input, TimeZoneGMT) === 10)
+var input = date(2015, 3, 18, 13, 2, 11, 0, zonePST)
+assert(getHours(input, zonePST) === 13)
+assert(getHours(input, zoneGMT) === 20)
+input = date(2015, 12, 8, 2, 7, 9, 0, zonePST)
+assert(getHours(input, zonePST) === 2)
+assert(getHours(input, zoneGMT) === 10)
+input = date(10, 1, 1, 0, 0, 0, 0, zonePST)
+assert(getHours(input, zonePST) === 0)
 
 Review comment:
   This is new test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder

2020-02-15 Thread GitBox

maropu commented on a change in pull request #27592: [SPARK-30840][CORE][SQL] 
Add version property for ConfigEntry and ConfigBuilder
URL: https://github.com/apache/spark/pull/27592#discussion_r379881204
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala
 ##
 @@ -74,7 +76,8 @@ private[spark] abstract class ConfigEntry[T] (
   def defaultValue: Option[T] = None
 
   override def toString: String = {
-s"ConfigEntry(key=$key, defaultValue=$defaultValueString, doc=$doc, 
public=$isPublic)"
+s"ConfigEntry(key=$key, defaultValue=$defaultValueString, doc=$doc, " +
+  s"public=$isPublic, version = $version)"
 
 Review comment:
   nit: `version=$version`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27565: 
[WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods 
in Dataset
URL: https://github.com/apache/spark/pull/27565#issuecomment-586676945
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27565: 
[WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods 
in Dataset
URL: https://github.com/apache/spark/pull/27565#issuecomment-586676946
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23249/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liangz1 commented on a change in pull request #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

liangz1 commented on a change in pull request #27565: 
[WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods 
in Dataset
URL: https://github.com/apache/spark/pull/27565#discussion_r379882384
 
 

 ##
 File path: python/pyspark/sql/dataframe.py
 ##
 @@ -2153,6 +2153,59 @@ def transform(self, func):
   "should have been DataFrame." % 
type(result)
 return result
 
+@since(3.1)
+def sameSemantics(self, other):
+"""
+Returns `True` when the logical query plans inside both 
:class:`DataFrame`\\s are equal and
+therefore return same results.
+
+.. note:: The equality comparison here is simplified by tolerating the 
cosmetic differences
+such as attribute names.
+
+.. note::This API can compare both :class:`DataFrame`\\s very fast but 
can still return
+`False` on the :class:`DataFrame` that return the same results, 
for instance, from
+different plans. Such false negative semantic can be useful when 
caching as an example.
+
+>>> df1 = spark.range(100)
+>>> df2 = spark.range(100)
+>>> df3 = spark.range(100)
+>>> df4 = spark.range(100)
+>>> df1.withColumn("col1", df1.id * 
2).sameSemantics(df2.withColumn("col1", df2.id * 2))
+True
+>>> df1.withColumn("col1", df1.id * 
2).sameSemantics(df3.withColumn("col1", df3.id + 2))
+False
+>>> df1.withColumn("col1", df1.id * 
2).sameSemantics(df4.withColumn("col0", df4.id * 2))
+True
+"""
+if not isinstance(other, DataFrame):
+raise ValueError("other parameter should be of DataFrame; however, 
got %s"
+ % type(other))
+return self._jdf.sameSemantics(other._jdf)
+
+@since(3.1)
+def semanticHash(self):
+"""
+Returns a hash code of the logical query plan against this 
:class:`DataFrame`.
+
+.. note:: Unlike the standard hash code, the hash is calculated 
against the query plan
+simplified by tolerating the cosmetic differences such as 
attribute names.
+
+>>> df1 = spark.range(100)
+>>> df2 = spark.range(100)
+>>> df3 = spark.range(100)
+>>> df4 = spark.range(100)
+>>> df1.withColumn("col1", df1.id * 2).semanticHash() == \
+df2.withColumn("col1", df2.id * 2).semanticHash()
+True
+>>> df1.withColumn("col1", df1.id * 2).semanticHash() == \
+df3.withColumn("col1", df3.id + 2).semanticHash()
+False
 
 Review comment:
   Same behavior for dataframe from `spark.read.load()`
   ```
   >>> df4=spark.read.load(csv_file_path, format="csv", inferSchema="true", 
header="true")
   >>> df4.schema
   
StructType(List(StructField(bool_col,BooleanType,true),StructField(float_col,DoubleType,true),StructField(double_col,DoubleType,true),StructField(int_col,IntegerType,true),StructField(long_col,IntegerType,true)))
   >>> df4.withColumn("col1", df4.int_col *2).semanticHash()
   -1746346451
   >>> df4.withColumn("col1", df4.int_col +2).semanticHash()
   -1746346451


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

HeartSaVioR commented on a change in pull request #27398: 
[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of 
monitoring.md
URL: https://github.com/apache/spark/pull/27398#discussion_r379874303
 
 

 ##
 File path: docs/monitoring.md
 ##
 @@ -95,6 +95,44 @@ The history server can be configured as follows:
   
 
 
+### Applying compaction of old event log files
+
+A long-running streaming application can bring a huge single event log file 
which may cost a lot to maintain and
+also requires a bunch of resource to replay per each update in Spark History 
Server.
+
+Enabling spark.eventLog.rolling.enabled and 
spark.eventLog.rolling.maxFileSize would
+let you have multiple event log files instead of single huge event log file 
which may help some scenarios on its own,
+but it still doesn't help you reducing the overall size of logs.
+
+Spark History Server can apply 'compaction' on the rolling event log files to 
reduce the overall size of
+logs, via setting the configuration 
spark.history.fs.eventLog.rolling.maxFilesToRetain on the
+Spark History Server.
+
+When the compaction happens, History Server lists all the available event log 
files, and considers the event log files older than
+retained log files as a target of compaction. For example, if the application 
A has 5 event log files and
+spark.history.fs.eventLog.rolling.maxFilesToRetain is set to 2, 
first 3 log files will be selected to be compacted.
+
+Once it selects the files, it analyzes these files to figure out which events 
can be excluded, and rewrites these files
 
 Review comment:
   I'll try to rephrase - maybe we can refer as "target" or "candidates" 
instead of "files".


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

HeartSaVioR commented on a change in pull request #27398: 
[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of 
monitoring.md
URL: https://github.com/apache/spark/pull/27398#discussion_r379874393
 
 

 ##
 File path: docs/monitoring.md
 ##
 @@ -95,6 +95,44 @@ The history server can be configured as follows:
   
 
 
+### Applying compaction of old event log files
+
+A long-running streaming application can bring a huge single event log file 
which may cost a lot to maintain and
+also requires a bunch of resource to replay per each update in Spark History 
Server.
+
+Enabling spark.eventLog.rolling.enabled and 
spark.eventLog.rolling.maxFileSize would
+let you have multiple event log files instead of single huge event log file 
which may help some scenarios on its own,
+but it still doesn't help you reducing the overall size of logs.
+
+Spark History Server can apply 'compaction' on the rolling event log files to 
reduce the overall size of
+logs, via setting the configuration 
spark.history.fs.eventLog.rolling.maxFilesToRetain on the
+Spark History Server.
+
+When the compaction happens, History Server lists all the available event log 
files, and considers the event log files older than
+retained log files as a target of compaction. For example, if the application 
A has 5 event log files and
+spark.history.fs.eventLog.rolling.maxFilesToRetain is set to 2, 
first 3 log files will be selected to be compacted.
+
+Once it selects the files, it analyzes these files to figure out which events 
can be excluded, and rewrites these files
+into one compact file with discarding some events. Once rewriting is done, 
original log files will be deleted.
 
 Review comment:
   Yeah it wouldn't matter for the logic as listing event log files would take 
the "last" compact file, and the right side of event log files. And I'd agree 
to worth to mention the deletion is best effort.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

HeartSaVioR commented on a change in pull request #27398: 
[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of 
monitoring.md
URL: https://github.com/apache/spark/pull/27398#discussion_r379874393
 
 

 ##
 File path: docs/monitoring.md
 ##
 @@ -95,6 +95,44 @@ The history server can be configured as follows:
   
 
 
+### Applying compaction of old event log files
+
+A long-running streaming application can bring a huge single event log file 
which may cost a lot to maintain and
+also requires a bunch of resource to replay per each update in Spark History 
Server.
+
+Enabling spark.eventLog.rolling.enabled and 
spark.eventLog.rolling.maxFileSize would
+let you have multiple event log files instead of single huge event log file 
which may help some scenarios on its own,
+but it still doesn't help you reducing the overall size of logs.
+
+Spark History Server can apply 'compaction' on the rolling event log files to 
reduce the overall size of
+logs, via setting the configuration 
spark.history.fs.eventLog.rolling.maxFilesToRetain on the
+Spark History Server.
+
+When the compaction happens, History Server lists all the available event log 
files, and considers the event log files older than
+retained log files as a target of compaction. For example, if the application 
A has 5 event log files and
+spark.history.fs.eventLog.rolling.maxFilesToRetain is set to 2, 
first 3 log files will be selected to be compacted.
+
+Once it selects the files, it analyzes these files to figure out which events 
can be excluded, and rewrites these files
+into one compact file with discarding some events. Once rewriting is done, 
original log files will be deleted.
 
 Review comment:
   Yeah it wouldn't matter for the entire logic as listing event log files 
would take the "last" compact file, and the right side of event log files. But 
we don't retry deleting them. I'd agree to worth to mention the deletion is 
best effort.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

HeartSaVioR commented on a change in pull request #27398: 
[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of 
monitoring.md
URL: https://github.com/apache/spark/pull/27398#discussion_r379874443
 
 

 ##
 File path: docs/monitoring.md
 ##
 @@ -95,6 +95,44 @@ The history server can be configured as follows:
   
 
 
+### Applying compaction of old event log files
+
+A long-running streaming application can bring a huge single event log file 
which may cost a lot to maintain and
+also requires a bunch of resource to replay per each update in Spark History 
Server.
+
+Enabling spark.eventLog.rolling.enabled and 
spark.eventLog.rolling.maxFileSize would
+let you have multiple event log files instead of single huge event log file 
which may help some scenarios on its own,
+but it still doesn't help you reducing the overall size of logs.
+
+Spark History Server can apply 'compaction' on the rolling event log files to 
reduce the overall size of
+logs, via setting the configuration 
spark.history.fs.eventLog.rolling.maxFilesToRetain on the
+Spark History Server.
+
+When the compaction happens, History Server lists all the available event log 
files, and considers the event log files older than
+retained log files as a target of compaction. For example, if the application 
A has 5 event log files and
+spark.history.fs.eventLog.rolling.maxFilesToRetain is set to 2, 
first 3 log files will be selected to be compacted.
+
+Once it selects the files, it analyzes these files to figure out which events 
can be excluded, and rewrites these files
+into one compact file with discarding some events. Once rewriting is done, 
original log files will be deleted.
+
+The compaction tries to exclude the events which point to the outdated things 
like jobs, and so on. As of now, below describes
+the candidates of events to be excluded:
+
+* Events for the job which is finished, and related stage/tasks events
+* Events for the executor which is terminated
+* Events for the SQL execution which is finished, and related job/stage/tasks 
events
+
+but the details can be changed afterwards.
 
 Review comment:
   OK agreed. Once we change the logic we may just need to change here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

HeartSaVioR commented on a change in pull request #27398: 
[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of 
monitoring.md
URL: https://github.com/apache/spark/pull/27398#discussion_r379874489
 
 

 ##
 File path: docs/monitoring.md
 ##
 @@ -95,6 +95,44 @@ The history server can be configured as follows:
   
 
 
+### Applying compaction of old event log files
+
+A long-running streaming application can bring a huge single event log file 
which may cost a lot to maintain and
+also requires a bunch of resource to replay per each update in Spark History 
Server.
+
+Enabling spark.eventLog.rolling.enabled and 
spark.eventLog.rolling.maxFileSize would
+let you have multiple event log files instead of single huge event log file 
which may help some scenarios on its own,
+but it still doesn't help you reducing the overall size of logs.
+
+Spark History Server can apply 'compaction' on the rolling event log files to 
reduce the overall size of
+logs, via setting the configuration 
spark.history.fs.eventLog.rolling.maxFilesToRetain on the
+Spark History Server.
+
+When the compaction happens, History Server lists all the available event log 
files, and considers the event log files older than
+retained log files as a target of compaction. For example, if the application 
A has 5 event log files and
+spark.history.fs.eventLog.rolling.maxFilesToRetain is set to 2, 
first 3 log files will be selected to be compacted.
+
+Once it selects the files, it analyzes these files to figure out which events 
can be excluded, and rewrites these files
+into one compact file with discarding some events. Once rewriting is done, 
original log files will be deleted.
+
+The compaction tries to exclude the events which point to the outdated things 
like jobs, and so on. As of now, below describes
+the candidates of events to be excluded:
+
+* Events for the job which is finished, and related stage/tasks events
+* Events for the executor which is terminated
+* Events for the SQL execution which is finished, and related job/stage/tasks 
events
+
+but the details can be changed afterwards.
 
 Review comment:
   I thought the effect is intuitive as we "exclude" events during rewriting, 
but if explicitly mentioning would make it clearer, let's do it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

HeartSaVioR commented on a change in pull request #27398: 
[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of 
monitoring.md
URL: https://github.com/apache/spark/pull/27398#discussion_r379874721
 
 

 ##
 File path: docs/monitoring.md
 ##
 @@ -95,6 +95,44 @@ The history server can be configured as follows:
   
 
 
+### Applying compaction of old event log files
+
+A long-running streaming application can bring a huge single event log file 
which may cost a lot to maintain and
+also requires a bunch of resource to replay per each update in Spark History 
Server.
+
+Enabling spark.eventLog.rolling.enabled and 
spark.eventLog.rolling.maxFileSize would
+let you have multiple event log files instead of single huge event log file 
which may help some scenarios on its own,
+but it still doesn't help you reducing the overall size of logs.
+
+Spark History Server can apply 'compaction' on the rolling event log files to 
reduce the overall size of
+logs, via setting the configuration 
spark.history.fs.eventLog.rolling.maxFilesToRetain on the
+Spark History Server.
+
+When the compaction happens, History Server lists all the available event log 
files, and considers the event log files older than
+retained log files as a target of compaction. For example, if the application 
A has 5 event log files and
+spark.history.fs.eventLog.rolling.maxFilesToRetain is set to 2, 
first 3 log files will be selected to be compacted.
+
+Once it selects the files, it analyzes these files to figure out which events 
can be excluded, and rewrites these files
+into one compact file with discarding some events. Once rewriting is done, 
original log files will be deleted.
+
+The compaction tries to exclude the events which point to the outdated things 
like jobs, and so on. As of now, below describes
+the candidates of events to be excluded:
+
+* Events for the job which is finished, and related stage/tasks events
+* Events for the executor which is terminated
+* Events for the SQL execution which is finished, and related job/stage/tasks 
events
+
+but the details can be changed afterwards.
+
+Please note that Spark History Server may not compact the old event log files 
if figures out not a lot of space
 
 Review comment:
   We already described which events are the candidates in above, so it's 
saying "Please note that Spark History Server may not compact the old event log 
files if figures out not a lot of space would be reduced during compaction 
because these event log files majorly fill with running jobs or SQL executions."
   
   Does it answer your question?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

HeartSaVioR commented on a change in pull request #27398: 
[SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of 
monitoring.md
URL: https://github.com/apache/spark/pull/27398#discussion_r379874734
 
 

 ##
 File path: docs/monitoring.md
 ##
 @@ -95,6 +95,44 @@ The history server can be configured as follows:
   
 
 
+### Applying compaction of old event log files
+
+A long-running streaming application can bring a huge single event log file 
which may cost a lot to maintain and
+also requires a bunch of resource to replay per each update in Spark History 
Server.
+
+Enabling spark.eventLog.rolling.enabled and 
spark.eventLog.rolling.maxFileSize would
+let you have multiple event log files instead of single huge event log file 
which may help some scenarios on its own,
+but it still doesn't help you reducing the overall size of logs.
+
+Spark History Server can apply 'compaction' on the rolling event log files to 
reduce the overall size of
+logs, via setting the configuration 
spark.history.fs.eventLog.rolling.maxFilesToRetain on the
+Spark History Server.
+
+When the compaction happens, History Server lists all the available event log 
files, and considers the event log files older than
+retained log files as a target of compaction. For example, if the application 
A has 5 event log files and
+spark.history.fs.eventLog.rolling.maxFilesToRetain is set to 2, 
first 3 log files will be selected to be compacted.
+
+Once it selects the files, it analyzes these files to figure out which events 
can be excluded, and rewrites these files
+into one compact file with discarding some events. Once rewriting is done, 
original log files will be deleted.
+
+The compaction tries to exclude the events which point to the outdated things 
like jobs, and so on. As of now, below describes
+the candidates of events to be excluded:
+
+* Events for the job which is finished, and related stage/tasks events
+* Events for the executor which is terminated
+* Events for the SQL execution which is finished, and related job/stage/tasks 
events
+
+but the details can be changed afterwards.
+
+Please note that Spark History Server may not compact the old event log files 
if figures out not a lot of space
+would be reduced during compaction. For streaming query (including Structured 
Streaming) we normally expect compaction
 
 Review comment:
   OK looks redundant. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] WinkerDu commented on a change in pull request #26971: [SPARK-30320][SQL] Fix insert overwrite to DataSource table with dynamic partition error

2020-02-15 Thread GitBox

WinkerDu commented on a change in pull request #26971: [SPARK-30320][SQL] Fix 
insert overwrite to DataSource table with dynamic partition error
URL: https://github.com/apache/spark/pull/26971#discussion_r379874760
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
 ##
 @@ -1521,4 +1521,10 @@ package object config {
 .bytesConf(ByteUnit.BYTE)
 .createOptional
 
+  private[spark] val MAX_LOCAL_TASK_FAILURES = 
ConfigBuilder("spark.task.local.maxFailures")
+.doc("The max failure times for a task while SparkContext running in Local 
mode, " +
 
 Review comment:
   In UT class InsertWithMultipleTaskAttemptSuite, I don't expect launching 
speculative task in local mode. Actually, I make a customized commit protocol 
named "InsertExceptionCommitProtocol" in InsertWithMultipleTaskAttemptSuite, 
which overriding commitTask method to fail the first commit task on purpose and 
restore to normal in subsequent commit tasks. This scene is similar to what 
happened in speculative tasks with failure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27596: [WIP] Fix getting of time components before 1582 year

2020-02-15 Thread GitBox

SparkQA commented on issue #27596: [WIP] Fix getting of time components before 
1582 year
URL: https://github.com/apache/spark/pull/27596#issuecomment-586669414
 
 
   **[Test build #118483 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118483/testReport)**
 for PR 27596 at commit 
[`4759725`](https://github.com/apache/spark/commit/475972516fc2fe50496d480eab334a29e05c50bd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27596: [WIP] Fix getting of time components before 1582 year

2020-02-15 Thread GitBox

SparkQA removed a comment on issue #27596: [WIP] Fix getting of time components 
before 1582 year
URL: https://github.com/apache/spark/pull/27596#issuecomment-586657190
 
 
   **[Test build #118483 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118483/testReport)**
 for PR 27596 at commit 
[`4759725`](https://github.com/apache/spark/commit/475972516fc2fe50496d480eab334a29e05c50bd).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27596: [WIP] Fix getting of time components before 1582 year

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27596: [WIP] Fix getting of time 
components before 1582 year
URL: https://github.com/apache/spark/pull/27596#issuecomment-586669521
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27596: [WIP] Fix getting of time components before 1582 year

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27596: [WIP] Fix getting of time components 
before 1582 year
URL: https://github.com/apache/spark/pull/27596#issuecomment-586669521
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27596: [WIP] Fix getting of time components before 1582 year

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27596: [WIP] Fix getting of time components 
before 1582 year
URL: https://github.com/apache/spark/pull/27596#issuecomment-586669523
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118483/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27596: [WIP] Fix getting of time components before 1582 year

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27596: [WIP] Fix getting of time 
components before 1582 year
URL: https://github.com/apache/spark/pull/27596#issuecomment-586669523
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118483/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] 
Document event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586669868
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23244/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] 
Document event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586669867
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document 
event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586669868
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23244/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document 
event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586669867
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2020-02-15 Thread GitBox

SparkQA commented on issue #27429: [SPARK-28330][SQL] Support ANSI SQL: result 
offset clause in query expression
URL: https://github.com/apache/spark/pull/27429#issuecomment-586670026
 
 
   **[Test build #118486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118486/testReport)**
 for PR 27429 at commit 
[`14ca8d3`](https://github.com/apache/spark/commit/14ca8d326fcc5035da5b7a99772aacff8edad163).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `  case class WorkerDecommission(`
 * `  case class DecommissionExecutor(executorId: String)  extends 
CoarseGrainedClusterMessage`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

SparkQA commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event 
log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586670141
 
 
   **[Test build #118487 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118487/testReport)**
 for PR 27398 at commit 
[`841d4d0`](https://github.com/apache/spark/commit/841d4d0c2de6fae1d8cc9dbaee424079630a0540).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27429: [SPARK-28330][SQL] Support 
ANSI SQL: result offset clause in query expression
URL: https://github.com/apache/spark/pull/27429#issuecomment-586670154
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27429: [SPARK-28330][SQL] Support ANSI SQL: 
result offset clause in query expression
URL: https://github.com/apache/spark/pull/27429#issuecomment-586670157
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118486/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27429: [SPARK-28330][SQL] Support 
ANSI SQL: result offset clause in query expression
URL: https://github.com/apache/spark/pull/27429#issuecomment-586670157
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118486/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2020-02-15 Thread GitBox

SparkQA removed a comment on issue #27429: [SPARK-28330][SQL] Support ANSI SQL: 
result offset clause in query expression
URL: https://github.com/apache/spark/pull/27429#issuecomment-586657759
 
 
   **[Test build #118486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118486/testReport)**
 for PR 27429 at commit 
[`14ca8d3`](https://github.com/apache/spark/commit/14ca8d326fcc5035da5b7a99772aacff8edad163).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27429: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27429: [SPARK-28330][SQL] Support ANSI SQL: 
result offset clause in query expression
URL: https://github.com/apache/spark/pull/27429#issuecomment-586670154
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

HeartSaVioR commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document 
event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671121
 
 
   @dongjoon-hyun @tgravescs @gaborgsomogyi 
   
   I've reflected review comments and updated the PR description. As these 
comments are not only about typos or syntaxes, I guess my reflections may not 
be enough. Please take a second look and provide suggestions. Thanks in advance!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document 
event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586670958
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118487/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document 
event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586670956
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

SparkQA commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event 
log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586670941
 
 
   **[Test build #118487 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118487/testReport)**
 for PR 27398 at commit 
[`841d4d0`](https://github.com/apache/spark/commit/841d4d0c2de6fae1d8cc9dbaee424079630a0540).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] 
Document event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586670958
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118487/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

SparkQA removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] 
Document event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586670141
 
 
   **[Test build #118487 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118487/testReport)**
 for PR 27398 at commit 
[`841d4d0`](https://github.com/apache/spark/commit/841d4d0c2de6fae1d8cc9dbaee424079630a0540).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] 
Document event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586670956
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kiszk commented on issue #27577: [DOC] add config naming guideline

2020-02-15 Thread GitBox

kiszk commented on issue #27577: [DOC] add config naming guideline
URL: https://github.com/apache/spark/pull/27577#issuecomment-586671458
 
 
   Beyond this PR, can we create a sanity checker?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] 
Document event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671326
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23245/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

SparkQA commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event 
log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671228
 
 
   **[Test build #118488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118488/testReport)**
 for PR 27398 at commit 
[`803663f`](https://github.com/apache/spark/commit/803663fd3e8f6d73ee5731f0f5a0228e4f65d776).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] 
Document event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671325
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document 
event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671326
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23245/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document 
event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671325
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document 
event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671777
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118488/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

SparkQA commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event 
log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671757
 
 
   **[Test build #118488 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118488/testReport)**
 for PR 27398 at commit 
[`803663f`](https://github.com/apache/spark/commit/803663fd3e8f6d73ee5731f0f5a0228e4f65d776).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] 
Document event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671777
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118488/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

SparkQA removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] 
Document event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671228
 
 
   **[Test build #118488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118488/testReport)**
 for PR 27398 at commit 
[`803663f`](https://github.com/apache/spark/commit/803663fd3e8f6d73ee5731f0f5a0228e4f65d776).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] 
Document event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671776
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27398: [SPARK-30481][DOCS][FOLLOWUP] Document 
event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#issuecomment-586671776
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27565: 
[WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods 
in Dataset
URL: https://github.com/apache/spark/pull/27565#issuecomment-586674848
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 
'sameSemantics' and 'sementicHash' methods in Dataset
URL: https://github.com/apache/spark/pull/27565#issuecomment-586674848
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #27565: 
[WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods 
in Dataset
URL: https://github.com/apache/spark/pull/27565#issuecomment-586674850
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23246/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 
'sameSemantics' and 'sementicHash' methods in Dataset
URL: https://github.com/apache/spark/pull/27565#issuecomment-586674850
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23246/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset

2020-02-15 Thread GitBox

SparkQA commented on issue #27565: [WIP][SPARK-30791][SQL][PYTHON] Add 
'sameSemantics' and 'sementicHash' methods in Dataset
URL: https://github.com/apache/spark/pull/27565#issuecomment-586674793
 
 
   **[Test build #118489 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118489/testReport)**
 for PR 27565 at commit 
[`a1d4ba1`](https://github.com/apache/spark/commit/a1d4ba1f33c81435da84cbeee3c7e579e5dd8061).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #27596: [WIP] Fix getting of time components before 1582 year

2020-02-15 Thread GitBox

MaxGekk commented on a change in pull request #27596: [WIP] Fix getting of time 
components before 1582 year
URL: https://github.com/apache/spark/pull/27596#discussion_r379880203
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ##
 @@ -51,7 +51,6 @@ trait TimeZoneAwareExpression extends Expression {
   /** Returns a copy of this expression with the specified timeZoneId. */
   def withTimeZone(timeZoneId: String): TimeZoneAwareExpression
 
-  @transient lazy val timeZone: TimeZone = 
DateTimeUtils.getTimeZone(timeZoneId.get)
 
 Review comment:
   Finally, all expressions bound on legacy `TimeZone` have been gone.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark plus allowed delay

2020-02-15 Thread GitBox

SparkQA commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding 
number of rows later than watermark plus allowed delay
URL: https://github.com/apache/spark/pull/24936#issuecomment-586675997
 
 
   **[Test build #118490 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118490/testReport)**
 for PR 24936 at commit 
[`f99a528`](https://github.com/apache/spark/commit/f99a528fafa9f59a37de8fd7b824c4168d7f4e26).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark plus allowed delay

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new 
metric regarding number of rows later than watermark plus allowed delay
URL: https://github.com/apache/spark/pull/24936#issuecomment-586676073
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23247/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark plus allowed delay

2020-02-15 Thread GitBox

AmplabJenkins commented on issue #24936: [SPARK-24634][SS] Add a new metric 
regarding number of rows later than watermark plus allowed delay
URL: https://github.com/apache/spark/pull/24936#issuecomment-586676073
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23247/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark plus allowed delay

2020-02-15 Thread GitBox

AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new 
metric regarding number of rows later than watermark plus allowed delay
URL: https://github.com/apache/spark/pull/24936#issuecomment-586676070
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 376 matches

Mail list logo