date:20160224

[GitHub] spark pull request: [SPARK-11691][SQL] Support setting hadoop comp...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11324#issuecomment-188661197
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51935/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11691][SQL] Support setting hadoop comp...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11324#issuecomment-188660473
  
**[Test build #51935 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51935/consoleFull)**
 for PR 11324 at commit 
[`86c8c0c`](https://github.com/apache/spark/commit/86c8c0c48477170fb856e93db0c1c46018c9f7e5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-24 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54059562
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -0,0 +1,565 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import breeze.stats.distributions.{Gaussian => GD}
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.PredictorParams
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.optim._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.linalg.{BLAS, Vector}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+
+/**
+ * Params for Generalized Linear Regression.
+ */
+private[regression] trait GeneralizedLinearRegressionBase extends 
PredictorParams
+  with HasFitIntercept with HasMaxIter with HasTol with HasRegParam with 
HasWeightCol
+  with HasSolver with Logging {
+
+  /**
+   * Param for the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * Supported options: "gaussian", "binomial", "poisson" and "gamma".
+   * Default is "gaussian".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val family: Param[String] = new Param(this, "family",
+"The name of family which is a description of the error distribution 
to be used in the " +
+  "model. Supported options: gaussian(default), binomial, poisson and 
gamma.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedFamilyNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getFamily: String = $(family)
+
+  /**
+   * Param for the name of link function which provides the relationship
+   * between the linear predictor and the mean of the distribution 
function.
+   * Supported options: "identity", "log", "inverse", "logit", "probit", 
"cloglog" and "sqrt".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val link: Param[String] = new Param(this, "link", "The name of 
link function " +
+"which provides the relationship between the linear predictor and the 
mean of the " +
+"distribution function. Supported options: identity, log, inverse, 
logit, probit, " +
+"cloglog and sqrt.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedLinkNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getLink: String = $(link)
+
+  import GeneralizedLinearRegression._
+  protected lazy val familyObj = Family.fromName($(family))
+  protected lazy val linkObj = if (isDefined(link)) {
+Link.fromName($(link))
+  } else {
+familyObj.defaultLink
+  }
+  protected lazy val familyAndLink = new FamilyAndLink(familyObj, linkObj)
+
+  @Since("2.0.0")
+  override def validateParams(): Unit = {
+if ($(solver) == "irls") {
+  setDefault(maxIter -> 25)
+}
+if (isDefined(link)) {
+  
require(GeneralizedLinearRegression.supportedFamilyAndLinkParis.contains(
+familyObj -> linkObj), s"Generalized Linear Regression with 
${$(family)} family " +
+s"does not support ${$(link)} link function.")
+}
+  }
+}
+
+/**
+ * :: Experimental ::
+ *
+ * Fit a Generalized Linear Model 
([[https://en.wikipedia.org/wiki/Generalized_linear_model]])
+ * specified by giving a symbolic description of the linear predictor 
(link function) and
+ * a description of the error distribution (family).
+ * It supports "gaussian", "binomial", "poisson" and "gamma" as family.
+ * Valid link functions for each family is listed below. The

[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-24 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54059544
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -0,0 +1,565 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import breeze.stats.distributions.{Gaussian => GD}
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.PredictorParams
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.optim._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.linalg.{BLAS, Vector}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+
+/**
+ * Params for Generalized Linear Regression.
+ */
+private[regression] trait GeneralizedLinearRegressionBase extends 
PredictorParams
+  with HasFitIntercept with HasMaxIter with HasTol with HasRegParam with 
HasWeightCol
+  with HasSolver with Logging {
+
+  /**
+   * Param for the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * Supported options: "gaussian", "binomial", "poisson" and "gamma".
+   * Default is "gaussian".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val family: Param[String] = new Param(this, "family",
+"The name of family which is a description of the error distribution 
to be used in the " +
+  "model. Supported options: gaussian(default), binomial, poisson and 
gamma.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedFamilyNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getFamily: String = $(family)
+
+  /**
+   * Param for the name of link function which provides the relationship
+   * between the linear predictor and the mean of the distribution 
function.
+   * Supported options: "identity", "log", "inverse", "logit", "probit", 
"cloglog" and "sqrt".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val link: Param[String] = new Param(this, "link", "The name of 
link function " +
+"which provides the relationship between the linear predictor and the 
mean of the " +
+"distribution function. Supported options: identity, log, inverse, 
logit, probit, " +
+"cloglog and sqrt.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedLinkNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getLink: String = $(link)
+
+  import GeneralizedLinearRegression._
+  protected lazy val familyObj = Family.fromName($(family))
+  protected lazy val linkObj = if (isDefined(link)) {
+Link.fromName($(link))
+  } else {
+familyObj.defaultLink
+  }
+  protected lazy val familyAndLink = new FamilyAndLink(familyObj, linkObj)
+
+  @Since("2.0.0")
+  override def validateParams(): Unit = {
+if ($(solver) == "irls") {
+  setDefault(maxIter -> 25)
+}
+if (isDefined(link)) {
+  
require(GeneralizedLinearRegression.supportedFamilyAndLinkParis.contains(
+familyObj -> linkObj), s"Generalized Linear Regression with 
${$(family)} family " +
+s"does not support ${$(link)} link function.")
+}
+  }
+}
+
+/**
+ * :: Experimental ::
+ *
+ * Fit a Generalized Linear Model 
([[https://en.wikipedia.org/wiki/Generalized_linear_model]])
+ * specified by giving a symbolic description of the linear predictor 
(link function) and
+ * a description of the error distribution (family).
+ * It supports "gaussian", "binomial", "poisson" and "gamma" as family.
+ * Valid link functions for each family is listed below. The

[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-24 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54059585
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -0,0 +1,565 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import breeze.stats.distributions.{Gaussian => GD}
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.PredictorParams
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.optim._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.linalg.{BLAS, Vector}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+
+/**
+ * Params for Generalized Linear Regression.
+ */
+private[regression] trait GeneralizedLinearRegressionBase extends 
PredictorParams
+  with HasFitIntercept with HasMaxIter with HasTol with HasRegParam with 
HasWeightCol
+  with HasSolver with Logging {
+
+  /**
+   * Param for the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * Supported options: "gaussian", "binomial", "poisson" and "gamma".
+   * Default is "gaussian".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val family: Param[String] = new Param(this, "family",
+"The name of family which is a description of the error distribution 
to be used in the " +
+  "model. Supported options: gaussian(default), binomial, poisson and 
gamma.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedFamilyNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getFamily: String = $(family)
+
+  /**
+   * Param for the name of link function which provides the relationship
+   * between the linear predictor and the mean of the distribution 
function.
+   * Supported options: "identity", "log", "inverse", "logit", "probit", 
"cloglog" and "sqrt".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val link: Param[String] = new Param(this, "link", "The name of 
link function " +
+"which provides the relationship between the linear predictor and the 
mean of the " +
+"distribution function. Supported options: identity, log, inverse, 
logit, probit, " +
+"cloglog and sqrt.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedLinkNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getLink: String = $(link)
+
+  import GeneralizedLinearRegression._
+  protected lazy val familyObj = Family.fromName($(family))
+  protected lazy val linkObj = if (isDefined(link)) {
+Link.fromName($(link))
+  } else {
+familyObj.defaultLink
+  }
+  protected lazy val familyAndLink = new FamilyAndLink(familyObj, linkObj)
+
+  @Since("2.0.0")
+  override def validateParams(): Unit = {
+if ($(solver) == "irls") {
+  setDefault(maxIter -> 25)
+}
+if (isDefined(link)) {
+  
require(GeneralizedLinearRegression.supportedFamilyAndLinkParis.contains(
+familyObj -> linkObj), s"Generalized Linear Regression with 
${$(family)} family " +
+s"does not support ${$(link)} link function.")
+}
+  }
+}
+
+/**
+ * :: Experimental ::
+ *
+ * Fit a Generalized Linear Model 
([[https://en.wikipedia.org/wiki/Generalized_linear_model]])
+ * specified by giving a symbolic description of the linear predictor 
(link function) and
+ * a description of the error distribution (family).
+ * It supports "gaussian", "binomial", "poisson" and "gamma" as family.
+ * Valid link functions for each family is listed below. The

[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-24 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54059572
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -0,0 +1,565 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import breeze.stats.distributions.{Gaussian => GD}
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.PredictorParams
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.optim._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.linalg.{BLAS, Vector}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+
+/**
+ * Params for Generalized Linear Regression.
+ */
+private[regression] trait GeneralizedLinearRegressionBase extends 
PredictorParams
+  with HasFitIntercept with HasMaxIter with HasTol with HasRegParam with 
HasWeightCol
+  with HasSolver with Logging {
+
+  /**
+   * Param for the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * Supported options: "gaussian", "binomial", "poisson" and "gamma".
+   * Default is "gaussian".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val family: Param[String] = new Param(this, "family",
+"The name of family which is a description of the error distribution 
to be used in the " +
+  "model. Supported options: gaussian(default), binomial, poisson and 
gamma.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedFamilyNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getFamily: String = $(family)
+
+  /**
+   * Param for the name of link function which provides the relationship
+   * between the linear predictor and the mean of the distribution 
function.
+   * Supported options: "identity", "log", "inverse", "logit", "probit", 
"cloglog" and "sqrt".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val link: Param[String] = new Param(this, "link", "The name of 
link function " +
+"which provides the relationship between the linear predictor and the 
mean of the " +
+"distribution function. Supported options: identity, log, inverse, 
logit, probit, " +
+"cloglog and sqrt.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedLinkNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getLink: String = $(link)
+
+  import GeneralizedLinearRegression._
+  protected lazy val familyObj = Family.fromName($(family))
+  protected lazy val linkObj = if (isDefined(link)) {
+Link.fromName($(link))
+  } else {
+familyObj.defaultLink
+  }
+  protected lazy val familyAndLink = new FamilyAndLink(familyObj, linkObj)
+
+  @Since("2.0.0")
+  override def validateParams(): Unit = {
+if ($(solver) == "irls") {
+  setDefault(maxIter -> 25)
+}
+if (isDefined(link)) {
+  
require(GeneralizedLinearRegression.supportedFamilyAndLinkParis.contains(
+familyObj -> linkObj), s"Generalized Linear Regression with 
${$(family)} family " +
+s"does not support ${$(link)} link function.")
+}
+  }
+}
+
+/**
+ * :: Experimental ::
+ *
+ * Fit a Generalized Linear Model 
([[https://en.wikipedia.org/wiki/Generalized_linear_model]])
+ * specified by giving a symbolic description of the linear predictor 
(link function) and
+ * a description of the error distribution (family).
+ * It supports "gaussian", "binomial", "poisson" and "gamma" as family.
+ * Valid link functions for each family is listed below. The

[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-24 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54059552
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -0,0 +1,565 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import breeze.stats.distributions.{Gaussian => GD}
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.PredictorParams
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.optim._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.linalg.{BLAS, Vector}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+
+/**
+ * Params for Generalized Linear Regression.
+ */
+private[regression] trait GeneralizedLinearRegressionBase extends 
PredictorParams
+  with HasFitIntercept with HasMaxIter with HasTol with HasRegParam with 
HasWeightCol
+  with HasSolver with Logging {
+
+  /**
+   * Param for the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * Supported options: "gaussian", "binomial", "poisson" and "gamma".
+   * Default is "gaussian".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val family: Param[String] = new Param(this, "family",
+"The name of family which is a description of the error distribution 
to be used in the " +
+  "model. Supported options: gaussian(default), binomial, poisson and 
gamma.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedFamilyNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getFamily: String = $(family)
+
+  /**
+   * Param for the name of link function which provides the relationship
+   * between the linear predictor and the mean of the distribution 
function.
+   * Supported options: "identity", "log", "inverse", "logit", "probit", 
"cloglog" and "sqrt".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val link: Param[String] = new Param(this, "link", "The name of 
link function " +
+"which provides the relationship between the linear predictor and the 
mean of the " +
+"distribution function. Supported options: identity, log, inverse, 
logit, probit, " +
+"cloglog and sqrt.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedLinkNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getLink: String = $(link)
+
+  import GeneralizedLinearRegression._
+  protected lazy val familyObj = Family.fromName($(family))
+  protected lazy val linkObj = if (isDefined(link)) {
+Link.fromName($(link))
+  } else {
+familyObj.defaultLink
+  }
+  protected lazy val familyAndLink = new FamilyAndLink(familyObj, linkObj)
+
+  @Since("2.0.0")
+  override def validateParams(): Unit = {
+if ($(solver) == "irls") {
+  setDefault(maxIter -> 25)
+}
+if (isDefined(link)) {
+  
require(GeneralizedLinearRegression.supportedFamilyAndLinkParis.contains(
+familyObj -> linkObj), s"Generalized Linear Regression with 
${$(family)} family " +
+s"does not support ${$(link)} link function.")
+}
+  }
+}
+
+/**
+ * :: Experimental ::
+ *
+ * Fit a Generalized Linear Model 
([[https://en.wikipedia.org/wiki/Generalized_linear_model]])
+ * specified by giving a symbolic description of the linear predictor 
(link function) and
+ * a description of the error distribution (family).
+ * It supports "gaussian", "binomial", "poisson" and "gamma" as family.
+ * Valid link functions for each family is listed below. The

[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-24 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54059534
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -0,0 +1,565 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import breeze.stats.distributions.{Gaussian => GD}
+
+import org.apache.spark.{Logging, SparkException}
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.PredictorParams
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.optim._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.linalg.{BLAS, Vector}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+
+/**
+ * Params for Generalized Linear Regression.
+ */
+private[regression] trait GeneralizedLinearRegressionBase extends 
PredictorParams
+  with HasFitIntercept with HasMaxIter with HasTol with HasRegParam with 
HasWeightCol
+  with HasSolver with Logging {
+
+  /**
+   * Param for the name of family which is a description of the error 
distribution
+   * to be used in the model.
+   * Supported options: "gaussian", "binomial", "poisson" and "gamma".
+   * Default is "gaussian".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val family: Param[String] = new Param(this, "family",
+"The name of family which is a description of the error distribution 
to be used in the " +
+  "model. Supported options: gaussian(default), binomial, poisson and 
gamma.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedFamilyNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getFamily: String = $(family)
+
+  /**
+   * Param for the name of link function which provides the relationship
+   * between the linear predictor and the mean of the distribution 
function.
+   * Supported options: "identity", "log", "inverse", "logit", "probit", 
"cloglog" and "sqrt".
+   * @group param
+   */
+  @Since("2.0.0")
+  final val link: Param[String] = new Param(this, "link", "The name of 
link function " +
+"which provides the relationship between the linear predictor and the 
mean of the " +
+"distribution function. Supported options: identity, log, inverse, 
logit, probit, " +
+"cloglog and sqrt.",
+
ParamValidators.inArray[String](GeneralizedLinearRegression.supportedLinkNames.toArray))
+
+  /** @group getParam */
+  @Since("2.0.0")
+  def getLink: String = $(link)
+
+  import GeneralizedLinearRegression._
+  protected lazy val familyObj = Family.fromName($(family))
--- End diff --

This cannot be a member `val`. Users can set param `family` multiple times. 
We should move it and `linkObj` to `fit/train`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-24 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54059489
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
@@ -0,0 +1,499 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import scala.util.Random
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.ml.util.MLTestingUtils
+import org.apache.spark.mllib.classification.LogisticRegressionSuite._
+import org.apache.spark.mllib.linalg.{BLAS, DenseVector, Vectors}
+import org.apache.spark.mllib.random._
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.mllib.util.TestingUtils._
+import org.apache.spark.sql.{DataFrame, Row}
+
+class GeneralizedLinearRegressionSuite extends SparkFunSuite with 
MLlibTestSparkContext {
+
+  private val seed: Int = 42
+  @transient var datasetGaussianIdentity: DataFrame = _
+  @transient var datasetGaussianLog: DataFrame = _
+  @transient var datasetGaussianInverse: DataFrame = _
+  @transient var datasetBinomial: DataFrame = _
+  @transient var datasetPoissonLog: DataFrame = _
+  @transient var datasetPoissonIdentity: DataFrame = _
+  @transient var datasetPoissonSqrt: DataFrame = _
+  @transient var datasetGammaInverse: DataFrame = _
+  @transient var datasetGammaIdentity: DataFrame = _
+  @transient var datasetGammaLog: DataFrame = _
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+import GeneralizedLinearRegressionSuite._
+
+datasetGaussianIdentity = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "identity"), 2))
+
+datasetGaussianLog = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "log"), 2))
+
+datasetGaussianInverse = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "gaussian", link = "inverse"), 2))
+
+datasetBinomial = {
+  val nPoints = 1
+  val coefficients = Array(-0.57997, 0.912083, -0.371077, -0.819866, 
2.688191)
+  val xMean = Array(5.843, 3.057, 3.758, 1.199)
+  val xVariance = Array(0.6856, 0.1899, 3.116, 0.581)
+
+  val testData =
+generateMultinomialLogisticInput(coefficients, xMean, xVariance, 
true, nPoints, seed)
+
+  sqlContext.createDataFrame(sc.parallelize(testData, 4))
+}
+
+datasetPoissonLog = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "poisson", link = "log"), 2))
+
+datasetPoissonIdentity = sqlContext.createDataFrame(
+  sc.parallelize(generateGeneralizedLinearRegressionInput(
+intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = 
Array(2.9, 10.5),
+xVariance = Array(0.7, 1.2), nPoints = 1, seed, eps = 0.01,
+family = "poisson", link = "identity"), 2))
+
+datasetPoissonSqrt = sqlContext.createDataFrame(
+

[GitHub] spark pull request: [SPARK-12811] [ML] Estimator for Generalized L...

2016-02-24 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/11136#discussion_r54059524
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -156,6 +156,8 @@ private[ml] class WeightedLeastSquares(
 
 private[ml] object WeightedLeastSquares {
 
+  val MaxNumFeatures: Int = 4096
--- End diff --

add doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

2016-02-24 Thread vectorijk

Github user vectorijk commented on the pull request:

https://github.com/apache/spark/pull/11321#issuecomment-188658913
  
@mengxr, Thanks for replying!

Definitely, I will post a rough draft proposal on JIRA later.

On Wed, Feb 24, 2016 at 11:31 PM, Xiangrui Meng 
wrote:

> LGTM. Merged into master. Thanks!
>
> For GSoC, I created https://issues.apache.org/jira/browse/SPARK-13489 to
> collect some project ideas. Let's move our discussion there. If I don't
> have time to mentor a GSoC project, other committers might be interested.
> Could you prepare a draft proposal and post it on the JIRA?
>
> â
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-24 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54059178
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -78,6 +78,9 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   // Executors that have been lost, but for which we don't yet know the 
real exit reason.
   protected val executorsPendingLossReason = new HashSet[String]
 
+  // The num of current max ExecutorId used to re-register appMaster
+  var currentExecutorIdCounter = 0
--- End diff --

I don't think so. Though they're in different modules, still they're under 
same package, please see other variables like `hostToLocalTaskCount`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][SPARK-13482][Configuration]Make consis...

2016-02-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11360


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-24 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54058886
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -81,8 +82,20 @@ private[yarn] class YarnAllocator(
 new ConcurrentHashMap[ContainerId, java.lang.Boolean])
 
   @volatile private var numExecutorsRunning = 0
-  // Used to generate a unique ID per executor
-  private var executorIdCounter = 0
+
+  /**
+   * Used to generate a unique ID per executor
+   *
+   * Init `executorIdCounter`. when AM restart, `executorIdCounter` will 
reset to 0. Then
+   * the id of new executor will start from 1, this will conflict with the 
executor has
+   * already created before. So, we should initialize the 
`executorIdCounter` by getting
+   * the max executorId from driver.
+   *
+   * @see SPARK-12864
+   */
+  private var executorIdCounter: Int = {
+driverRef.askWithRetry[Int](RetrieveMaxExecutorId) + 1
--- End diff --

I see, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][SPARK-13482][Configuration]Make consis...

2016-02-24 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11360#issuecomment-188658196
  
Thanks - going to merge this in master and branch-1.6.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-24 Thread zhonghaihua

Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54058736
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -81,8 +82,20 @@ private[yarn] class YarnAllocator(
 new ConcurrentHashMap[ContainerId, java.lang.Boolean])
 
   @volatile private var numExecutorsRunning = 0
-  // Used to generate a unique ID per executor
-  private var executorIdCounter = 0
+
+  /**
+   * Used to generate a unique ID per executor
+   *
+   * Init `executorIdCounter`. when AM restart, `executorIdCounter` will 
reset to 0. Then
+   * the id of new executor will start from 1, this will conflict with the 
executor has
+   * already created before. So, we should initialize the 
`executorIdCounter` by getting
+   * the max executorId from driver.
+   *
+   * @see SPARK-12864
+   */
+  private var executorIdCounter: Int = {
+driverRef.askWithRetry[Int](RetrieveMaxExecutorId) + 1
--- End diff --

Hi @jerryshao , thanks for reviewing. Allocating a new executor won't 
execute that code. It just request the executor id when `YarnAllocator` being 
created.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-24 Thread zhonghaihua

Github user zhonghaihua commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54058368
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -78,6 +78,9 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   // Executors that have been lost, but for which we don't yet know the 
real exit reason.
   protected val executorsPendingLossReason = new HashSet[String]
 
+  // The num of current max ExecutorId used to re-register appMaster
+  var currentExecutorIdCounter = 0
--- End diff --

Hi @jerryshao , thanks for your comments. The master branch is different 
from branch-1.5.x version. In master branch,`CoarseGrainedSchedulerBackend` is 
belong to module `core` and `YarnSchedulerBackend` is belong to module `yarn` , 
while in branch-1.5.x version it is belong to the same package. So, from my 
understanding, `protected` is unsuited here, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12613] [SQL] Outer Join Elimination by ...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10566#issuecomment-188656355
  
**[Test build #51943 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51943/consoleFull)**
 for PR 10566 at commit 
[`1a9ebdf`](https://github.com/apache/spark/commit/1a9ebdff1da8b0661738db1c7cf466344261ed33).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13069][STREAMING]. Adds synchronous sto...

2016-02-24 Thread lin-zhao

Github user lin-zhao commented on a diff in the pull request:

https://github.com/apache/spark/pull/11176#discussion_r54057678
  
--- Diff: 
external/akka/src/main/scala/org/apache/spark/streaming/akka/ActorReceiver.scala
 ---
@@ -162,6 +177,18 @@ abstract class JavaActorReceiver extends UntypedActor {
   def store[T](item: T) {
 context.parent ! SingleItemData(item)
   }
+
+  /**
+   * Store a single item of received data to Spark's memory synchronously.
+   * These single items will be aggregated together into data blocks before
+   * being pushed into Spark's memory.
+   *
+   * As opposed to [[ActorReceiver.store[T]: Unit]], this method allows 
flow control
+   * (maxRate, backpressure) to block the input.
+   */
+  def storeSync[T](item: T)(timeout: Timeout) {
+Await.ready(context.parent.ask(SingleItemDataSync(item))(timeout), 
timeout.duration)
--- End diff --

Addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-24 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54057627
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -81,8 +82,20 @@ private[yarn] class YarnAllocator(
 new ConcurrentHashMap[ContainerId, java.lang.Boolean])
 
   @volatile private var numExecutorsRunning = 0
-  // Used to generate a unique ID per executor
-  private var executorIdCounter = 0
+
+  /**
+   * Used to generate a unique ID per executor
+   *
+   * Init `executorIdCounter`. when AM restart, `executorIdCounter` will 
reset to 0. Then
+   * the id of new executor will start from 1, this will conflict with the 
executor has
+   * already created before. So, we should initialize the 
`executorIdCounter` by getting
+   * the max executorId from driver.
+   *
+   * @see SPARK-12864
+   */
+  private var executorIdCounter: Int = {
+driverRef.askWithRetry[Int](RetrieveMaxExecutorId) + 1
--- End diff --

Is it necessary to request the executor id every time allocating a new 
executor?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

2016-02-24 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11321#issuecomment-188651306
  
LGTM. Merged into master. Thanks!

For GSoC, I created https://issues.apache.org/jira/browse/SPARK-13489 to 
collect some project ideas. Let's move our discussion there. If I don't have 
time to mentor a GSoC project, other committers might be interested. Could you 
prepare a draft proposal and post it on the JIRA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-24 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10794#discussion_r54057531
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -78,6 +78,9 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   // Executors that have been lost, but for which we don't yet know the 
real exit reason.
   protected val executorsPendingLossReason = new HashSet[String]
 
+  // The num of current max ExecutorId used to re-register appMaster
+  var currentExecutorIdCounter = 0
--- End diff --

Please add scope keyword `protected` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7106][MLlib][PySpark] Support model sav...

2016-02-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11321


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11356


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11356#issuecomment-188647418
  
Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13454] [SQL] Allow users to drop a tabl...

2016-02-24 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11349#discussion_r54056540
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
 ---
@@ -47,6 +47,17 @@ class HiveMetastoreCatalogSuite extends SparkFunSuite 
with TestHiveSingleton {
 logInfo(df.queryExecution.toString)
 df.as('a).join(df.as('b), $"a.key" === $"b.key")
   }
+
+  test("SPARK-13454: drop a table with a name starting with underscore") {
+hiveContext.range(10).write.saveAsTable("_spark13454")
+hiveContext.sql("drop table `_spark13454`")
+
+hiveContext.range(10).write.saveAsTable("_spark13454")
+hiveContext.sql("drop table default.`_spark13454`")
+
+hiveContext.range(10).write.saveAsTable("spark13454")
+hiveContext.sql("drop table spark13454")
+  }
--- End diff --

Let's add test case that

1. drops temporary tables
2. drops persistent and temporary tables with the same name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13233][SQL][WIP] Python Dataset (basic ...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11347#issuecomment-188647094
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51926/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13233][SQL][WIP] Python Dataset (basic ...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11347#issuecomment-188647091
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13233][SQL][WIP] Python Dataset (basic ...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11347#issuecomment-188646971
  
**[Test build #51926 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51926/consoleFull)**
 for PR 11347 at commit 
[`e549d48`](https://github.com/apache/spark/commit/e549d48b44199a339e908ec9a807a84cb7fb80a1).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13454] [SQL] Allow users to drop a tabl...

2016-02-24 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/11349#discussion_r54055804
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -542,7 +542,14 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
 case Token("TOK_DROPTABLE",
Token("TOK_TABNAME", tableNameParts) ::
ifExists) =>
-  val tableName = tableNameParts.map { case Token(p, Nil) => p 
}.mkString(".")
+  // Hive's parser will unquote an identifier (see the rule of 
QuotedIdentifier in
+  // HiveLexer.g of Hive 1.2.1). So, we will quote a table name part 
if it starts with
+  // underscore (_). Please note that although QuotedIdentifier rule 
allows backticks
+  // appearing in an identifier, Hive does not actually allow such an 
identifier be
+  // a table name. So, we
+  val tableName = tableNameParts.map {
+case Token(p, Nil) => if (p.startsWith("_")) s"`$p`" else p
+  }.mkString(".")
--- End diff --

`DropTable` also tries to unregister temporary tables of the same name 
using:

```scala
hiveContext.catalog.unregisterTable(TableIdentifier(tableName))


Now the `tableName` argument of `DropTable` is quoted when it's started 
with `_`, but `TableIdentifier` doesn't parse and unquote the table name. Thus 
the temporary table can't be dropped in this case.

Another problem of the above line is that, `DropTable.tableName` can be the 
form of `db.table`, but we throw the whole string into `TableIdentifier` 
without splitting them. But that's another bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13233][SQL][WIP] Python Dataset (basic ...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11347#issuecomment-188640712
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13233][SQL][WIP] Python Dataset (basic ...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11347#issuecomment-188640714
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51925/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13233][SQL][WIP] Python Dataset (basic ...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11347#issuecomment-188640208
  
**[Test build #51925 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51925/consoleFull)**
 for PR 11347 at commit 
[`e549d48`](https://github.com/apache/spark/commit/e549d48b44199a339e908ec9a807a84cb7fb80a1).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8813][SQL]Combine splits by size

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9097#issuecomment-188638097
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8813][SQL]Combine splits by size

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9097#issuecomment-188638099
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51929/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8813][SQL]Combine splits by size

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9097#issuecomment-188637662
  
**[Test build #51929 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51929/consoleFull)**
 for PR 9097 at commit 
[`085ce5f`](https://github.com/apache/spark/commit/085ce5feca2294f81f9ec7a5660635be13c70a4a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12820][SQL]Resolve db.table.column

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10753#issuecomment-188634972
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12820][SQL]Resolve db.table.column

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10753#issuecomment-188634976
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51930/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12820][SQL]Resolve db.table.column

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10753#issuecomment-188634859
  
**[Test build #51930 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51930/consoleFull)**
 for PR 10753 at commit 
[`4155ffe`](https://github.com/apache/spark/commit/4155ffe6b01ec5d6091946877175b71ecbc823b9).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13487][SQL] User-facing RuntimeConfig i...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11364#issuecomment-188634685
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51939/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13487][SQL] User-facing RuntimeConfig i...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11364#issuecomment-188634677
  
**[Test build #51939 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51939/consoleFull)**
 for PR 11364 at commit 
[`232d23c`](https://github.com/apache/spark/commit/232d23cc7bf7d0b070eb8afbe0b931c02a0a7718).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13487][SQL] User-facing RuntimeConfig i...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11364#issuecomment-188634683
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11363#issuecomment-188634681
  
**[Test build #51942 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51942/consoleFull)**
 for PR 11363 at commit 
[`cb751e7`](https://github.com/apache/spark/commit/cb751e78c75ca35c0a7f2a477b1d95b2c1020d09).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13361][SQL] Add benchmark codes for Enc...

2016-02-24 Thread maropu

Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/11236#issuecomment-188634551
  
@nongli @rxin ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11363#issuecomment-188634349
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13487][SQL] User-facing RuntimeConfig i...

2016-02-24 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11364#issuecomment-188634330
  
Note that Jenkins will fail, until we merge #11363


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13487][SQL] User-facing RuntimeConfig i...

2016-02-24 Thread rxin

Github user rxin closed the pull request at:

https://github.com/apache/spark/pull/11364


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11363#issuecomment-188634322
  
**[Test build #51937 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51937/consoleFull)**
 for PR 11363 at commit 
[`acbb3b6`](https://github.com/apache/spark/commit/acbb3b6954df6f100d492e33e151c8288451fcf2).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11363#issuecomment-188634351
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51937/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13376] [SPARK-13476] [SQL] improve colu...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11354#issuecomment-188633958
  
**[Test build #51940 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51940/consoleFull)**
 for PR 11354 at commit 
[`dc576be`](https://github.com/apache/spark/commit/dc576beb9bdcf008f918e1a6665a85fa6df9a483).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11691][SQL] Support setting hadoop comp...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11324#issuecomment-188633964
  
**[Test build #51941 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51941/consoleFull)**
 for PR 11324 at commit 
[`86c8c0c`](https://github.com/apache/spark/commit/86c8c0c48477170fb856e93db0c1c46018c9f7e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13487][SQL] User-facing RuntimeConfig i...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11364#issuecomment-188633957
  
**[Test build #51939 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51939/consoleFull)**
 for PR 11364 at commit 
[`232d23c`](https://github.com/apache/spark/commit/232d23cc7bf7d0b070eb8afbe0b931c02a0a7718).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13487][SQL] User-facing RuntimeConfig i...

2016-02-24 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11364#issuecomment-188633707
  
cc @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11363#issuecomment-188633649
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13487][SQL] User-facing RuntimeConfig i...

2016-02-24 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/11364

[SPARK-13487][SQL] User-facing RuntimeConfig interface

## What changes were proposed in this pull request?
This patch creates the public API for runtime configuration and an 
implementation for it. The public runtime configuration includes configs for 
existing SQL, as well as Hadoop Configuration.

This new interface is currently dead code. It will be added to SQLContext 
and a session entry point to Spark when we add that.

## How was this patch tested?
a new unit test suite


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-13487

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11364.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11364


commit 934cb3fc395ddfb152cba063c264f35a0f9dd371
Author: Reynold Xin 
Date:   2016-02-25T06:15:19Z

[SPARK-13487][SQL] User-facing RuntimeConfig interface

commit e61e4a3c4c8fccd2d950d1804e3c52f34392ddf2
Author: Reynold Xin 
Date:   2016-02-25T06:16:36Z

Remove SparkSession

commit e197546524f847e25d87e380cf32bb5819def715
Author: Reynold Xin 
Date:   2016-02-25T06:24:18Z

unit test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11363#issuecomment-188633638
  
**[Test build #51936 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51936/consoleFull)**
 for PR 11363 at commit 
[`3bb44da`](https://github.com/apache/spark/commit/3bb44da10c03ffd3f8d1fe132cbf44d953867857).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11363#issuecomment-188633652
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51936/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11691][SQL] Support setting hadoop comp...

2016-02-24 Thread maropu

Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/11324#issuecomment-188633506
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11356#issuecomment-188633385
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51933/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11356#issuecomment-188633381
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11356#issuecomment-188633059
  
**[Test build #51933 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51933/consoleFull)**
 for PR 11356 at commit 
[`4f21f06`](https://github.com/apache/spark/commit/4f21f06a21e5ffa415ed4d63d9d1cada078d108a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-188633095
  
**[Test build #51938 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51938/consoleFull)**
 for PR 10794 at commit 
[`3a1724c`](https://github.com/apache/spark/commit/3a1724c19ad3eb9e87d9a5b10007b6c53424aac0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11363#issuecomment-188632407
  
**[Test build #51937 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51937/consoleFull)**
 for PR 11363 at commit 
[`acbb3b6`](https://github.com/apache/spark/commit/acbb3b6954df6f100d492e33e151c8288451fcf2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11363#issuecomment-188632012
  
cc @cloud-fan for review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11363#discussion_r54052828
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/TestSQLContext.scala ---
@@ -39,7 +39,7 @@ private[sql] class TestSQLContext(sc: SparkContext) 
extends SQLContext(sc) { sel
   super.clear()
 
   // Make sure we start with the default test configs even after clear
-  TestSQLContext.overrideConfs.map {
+  TestSQLContext.overrideConfs.foreach {
--- End diff --

found a potential bug here ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11363#issuecomment-188631746
  
**[Test build #51936 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51936/consoleFull)**
 for PR 11363 at commit 
[`3bb44da`](https://github.com/apache/spark/commit/3bb44da10c03ffd3f8d1fe132cbf44d953867857).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13486][SQL] Move SQLConf into an intern...

2016-02-24 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/11363

[SPARK-13486][SQL] Move SQLConf into an internal package

## What changes were proposed in this pull request?
This patch moves SQLConf into org.apache.spark.sql.internal package to make 
it very explicit that it is internal. Soon I will also submit more API work 
that creates implementations of interfaces in this internal package.

## How was this patch tested?
If it compiles, then the refactoring should work.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-13486

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11363


commit 3bb44da10c03ffd3f8d1fe132cbf44d953867857
Author: Reynold Xin 
Date:   2016-02-25T06:10:05Z

[SPARK-13486][SQL] Move SQLConf into an internal package




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11356#issuecomment-188631576
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11356#issuecomment-188631579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51932/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11356#issuecomment-188631382
  
**[Test build #51932 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51932/consoleFull)**
 for PR 11356 at commit 
[`3f18d78`](https://github.com/apache/spark/commit/3f18d78502ce89ab961c75afd857ffd8cfd0d5f1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13321][SQL] Support nested UNION in par...

2016-02-24 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11361#issuecomment-188629681
  
Can you take a look why would 2 queries take 13 mins?

```
EXPLAIN
SELECT count(1) FROM (
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL

  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL

  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL

  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL

  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src) src;


SELECT count(1) FROM (
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL

  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL

  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL

  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL

  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src UNION ALL
  SELECT key, value FROM src) src;
```


When I was running this, this was running in parser forever.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11691][SQL] Support setting hadoop comp...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11324#issuecomment-188627486
  
**[Test build #51934 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51934/consoleFull)**
 for PR 11324 at commit 
[`14ef39a`](https://github.com/apache/spark/commit/14ef39a3093c46072f793010e9345c8f53a49938).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11691][SQL] Support setting hadoop comp...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11324#issuecomment-188627492
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51934/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11691][SQL] Support setting hadoop comp...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11324#issuecomment-188627490
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11691][SQL] Support setting hadoop comp...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11324#issuecomment-188627467
  
**[Test build #51935 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51935/consoleFull)**
 for PR 11324 at commit 
[`86c8c0c`](https://github.com/apache/spark/commit/86c8c0c48477170fb856e93db0c1c46018c9f7e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13321][SQL] Support nested UNION in par...

2016-02-24 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11361#issuecomment-188627434
  
@rxin Looks like HiveCompatibilitySuite.union16 doesn't hang from this. But 
it actually takes long time to finish that test (`[info] - union16 (13 minutes, 
21 seconds)`). Because this only change parser rule, I think it should not 
modify the time to run the union test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13321][SQL] Support nested UNION in par...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11361#issuecomment-188626798
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51928/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13321][SQL] Support nested UNION in par...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11361#issuecomment-188626796
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11691][SQL] Support setting hadoop comp...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11324#issuecomment-188626411
  
**[Test build #51934 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51934/consoleFull)**
 for PR 11324 at commit 
[`14ef39a`](https://github.com/apache/spark/commit/14ef39a3093c46072f793010e9345c8f53a49938).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13321][SQL] Support nested UNION in par...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11361#issuecomment-188626175
  
**[Test build #51928 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51928/consoleFull)**
 for PR 11361 at commit 
[`5ff5ac2`](https://github.com/apache/spark/commit/5ff5ac2c336c63d718c98e27216c7510b6ab4b64).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][SPARK-13482][Configuration]Make consis...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11360#issuecomment-188625878
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][SPARK-13482][Configuration]Make consis...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11360#issuecomment-188625881
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51927/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][SPARK-13482][Configuration]Make consis...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11360#issuecomment-188624899
  
**[Test build #51927 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51927/consoleFull)**
 for PR 11360 at commit 
[`f8367ee`](https://github.com/apache/spark/commit/f8367ee7f9685503b8ef495b1cd34047e4926af4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13471] [SQL]: WiP update hive version t...

2016-02-24 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11346#issuecomment-188622694
  
@steveloughran do we also needs an update for master branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13478] [yarn] Use real user when fetchi...

2016-02-24 Thread harishreedharan

Github user harishreedharan commented on the pull request:

https://github.com/apache/spark/pull/11358#issuecomment-188621725
  
This looks like it might affect HDFS tokens as well and error that looks 
like this might come up during the initial token renewal:
```
WARN UserGroupInformation: PriviledgedActionException as:hari (auth:PROXY) 
via hdfs@EXAMPLE (auth:KERBEROS) 
cause:org.apache.hadoop.security.AccessControlException: hari tries to renew a 
token with renewer hdfs
```

In addition to the code that gets the new tokens, I think the 
[`getTokenRenewalInterval`](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L580)
 method also needs to be be run as the real user. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6735][YARN] Add window based executor f...

2016-02-24 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10241#discussion_r54049969
  
--- Diff: docs/running-on-yarn.md ---
@@ -318,6 +318,14 @@ If you need a reference to the proper location to put 
log files in the YARN so t
   
 
 
+  spark.yarn.executor.failuresValidityInterval
--- End diff --

I would keep the original one, since we have an equivalence for AM that is 
the same name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13478] [yarn] Use real user when fetchi...

2016-02-24 Thread harishreedharan

Github user harishreedharan commented on the pull request:

https://github.com/apache/spark/pull/11358#issuecomment-188615270
  
So I have not tested using the keytab-based login with proxy user stuff at 
all. We get delegation tokens even there - does this issue affect that as well? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13123][SQL] Implement whole state codeg...

2016-02-24 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11359#issuecomment-188613970
  
@sameeragarwal  you should update the pr description to actually include 
what this patch does (in addition to that it was built on an earlier pr).

For code gen prs, would be great to paste in the generated code.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13320] [SQL] Support Star in CreateStru...

2016-02-24 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/11208#discussion_r54048383
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -369,28 +370,83 @@ class Analyzer(
   }
 
   /**
-   * Replaces [[UnresolvedAttribute]]s with concrete 
[[AttributeReference]]s from
-   * a logical plan node's children.
+   * Expand [[UnresolvedStar]] or [[ResolvedStar]] to the matching 
attributes in child's output.
*/
-  object ResolveReferences extends Rule[LogicalPlan] {
+  object ResolveStar extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case p: LogicalPlan if !p.childrenResolved => p
+
+  // If the projection list contains Stars, expand it.
+  case p: Project if containsStar(p.projectList) =>
+val expanded = p.projectList.flatMap {
+  case s: Star => s.expand(p.child, resolver)
+  case ua @ UnresolvedAlias(_: UnresolvedFunction | _: CreateArray 
| _: CreateStruct, _) =>
+UnresolvedAlias(child = expandStarExpression(ua.child, 
p.child)) :: Nil
+  case a @ Alias(_: UnresolvedFunction | _: CreateArray | _: 
CreateStruct, _) =>
+Alias(child = expandStarExpression(a.child, p.child), a.name)(
+  isGenerated = a.isGenerated) :: Nil
+  case o => o :: Nil
+}
+Project(projectList = expanded, p.child)
+  // If the aggregate function argument contains Stars, expand it.
+  case a: Aggregate if containsStar(a.aggregateExpressions) =>
+val expanded = a.aggregateExpressions.flatMap {
--- End diff --

Why `expandStarExpressions(a.aggregateExpression, a.child)` does not work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11356#issuecomment-188613807
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11356#issuecomment-18865
  
**[Test build #51933 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51933/consoleFull)**
 for PR 11356 at commit 
[`4f21f06`](https://github.com/apache/spark/commit/4f21f06a21e5ffa415ed4d63d9d1cada078d108a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13292][ML][PYTHON] QuantileDiscretizer ...

2016-02-24 Thread yu-iskw

Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/11362#issuecomment-188610588
  
@mengxr can you review it when you have time? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/11356#discussion_r54047942
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -71,6 +71,41 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
   }
 
   /**
+   * Calculates the approximate quantiles of a numerical column of a 
DataFrame.
+   * Provided for the Python API.
+   *
+   * The result of this algorithm has the following deterministic bound:
+   * If the DataFrame has N elements and if we request the quantile at 
probability `p` up to error
+   * `err`, then the algorithm will return a sample `x` from the DataFrame 
so that the *exact* rank
+   * of `x` is close to (p * N).
+   * More precisely,
+   *
+   *   floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).
+   *
+   * This method implements a variation of the Greenwald-Khanna algorithm 
(with some speed
+   * optimizations).
+   * The algorithm was first present in 
[[http://dx.doi.org/10.1145/375663.375670 Space-efficient
+   * Online Computation of Quantile Summaries]] by Greenwald and Khanna.
+   *
+   * @param col the name of the numerical column
+   * @param probabilities a list of quantile probabilities
+   *   Each number must belong to [0, 1].
+   *   For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
+   * @param relativeError The relative target precision to achieve (>= 0).
+   *   If set to zero, the exact quantiles are computed, which could be 
very expensive.
+   *   Note that values greater than 1 are accepted but give the same 
result as 1.
+   * @return the approximate quantiles at the given probabilities
+   *
+   * @since 2.0.0
+   */
+  private[spark] def approxQuantile(
+  col: String,
+  probabilities: List[Double],
+  relativeError: Double): Array[Double] = {
--- End diff --

Yeah, again I was trying to follow other code there...but probably 
shouldn't.  Fixed now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13292][ML][PYTHON] QuantileDiscretizer ...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11362#issuecomment-188609863
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51931/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13292][ML][PYTHON] QuantileDiscretizer ...

2016-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11362#issuecomment-188609862
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13292][ML][PYTHON] QuantileDiscretizer ...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11362#issuecomment-188609806
  
**[Test build #51931 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51931/consoleFull)**
 for PR 11362 at commit 
[`02ffa76`](https://github.com/apache/spark/commit/02ffa763358ba0300f0ffcf6f8755951336a5b17).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class QuantileDiscretizer(JavaEstimator, HasInputCol, HasOutputCol, 
HasSeed):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/11356#discussion_r54047585
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -71,6 +71,41 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
   }
 
   /**
+   * Calculates the approximate quantiles of a numerical column of a 
DataFrame.
+   * Provided for the Python API.
+   *
+   * The result of this algorithm has the following deterministic bound:
+   * If the DataFrame has N elements and if we request the quantile at 
probability `p` up to error
+   * `err`, then the algorithm will return a sample `x` from the DataFrame 
so that the *exact* rank
+   * of `x` is close to (p * N).
+   * More precisely,
+   *
+   *   floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).
+   *
+   * This method implements a variation of the Greenwald-Khanna algorithm 
(with some speed
+   * optimizations).
+   * The algorithm was first present in 
[[http://dx.doi.org/10.1145/375663.375670 Space-efficient
+   * Online Computation of Quantile Summaries]] by Greenwald and Khanna.
+   *
+   * @param col the name of the numerical column
+   * @param probabilities a list of quantile probabilities
+   *   Each number must belong to [0, 1].
+   *   For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
+   * @param relativeError The relative target precision to achieve (>= 0).
+   *   If set to zero, the exact quantiles are computed, which could be 
very expensive.
+   *   Note that values greater than 1 are accepted but give the same 
result as 1.
+   * @return the approximate quantiles at the given probabilities
+   *
+   * @since 2.0.0
--- End diff --

I would have, but I was just following other conventions in the DataFrame 
code.  I'll change it though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13376] [SPARK-13476] [SQL] improve colu...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11354#issuecomment-188608671
  
**[Test build #2583 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2583/consoleFull)**
 for PR 11354 at commit 
[`52e98b8`](https://github.com/apache/spark/commit/52e98b8afe603766193196776e13077b1eebd5fb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11356#issuecomment-188608553
  
**[Test build #51932 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51932/consoleFull)**
 for PR 11356 at commit 
[`3f18d78`](https://github.com/apache/spark/commit/3f18d78502ce89ab961c75afd857ffd8cfd0d5f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13479] [SQL] [PYTHON] Added Python API ...

2016-02-24 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/11356#discussion_r54047396
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -71,6 +71,41 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
   }
 
   /**
+   * Calculates the approximate quantiles of a numerical column of a 
DataFrame.
+   * Provided for the Python API.
+   *
+   * The result of this algorithm has the following deterministic bound:
+   * If the DataFrame has N elements and if we request the quantile at 
probability `p` up to error
+   * `err`, then the algorithm will return a sample `x` from the DataFrame 
so that the *exact* rank
+   * of `x` is close to (p * N).
+   * More precisely,
+   *
+   *   floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).
+   *
+   * This method implements a variation of the Greenwald-Khanna algorithm 
(with some speed
+   * optimizations).
+   * The algorithm was first present in 
[[http://dx.doi.org/10.1145/375663.375670 Space-efficient
+   * Online Computation of Quantile Summaries]] by Greenwald and Khanna.
+   *
+   * @param col the name of the numerical column
+   * @param probabilities a list of quantile probabilities
+   *   Each number must belong to [0, 1].
+   *   For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
+   * @param relativeError The relative target precision to achieve (>= 0).
+   *   If set to zero, the exact quantiles are computed, which could be 
very expensive.
+   *   Note that values greater than 1 are accepted but give the same 
result as 1.
+   * @return the approximate quantiles at the given probabilities
+   *
+   * @since 2.0.0
+   */
+  private[spark] def approxQuantile(
+  col: String,
+  probabilities: List[Double],
+  relativeError: Double): Array[Double] = {
--- End diff --

We can return `java.util.List[Double]`, which would simplify the Python 
implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 581 matches

Mail list logo