[GitHub] spark pull request: [SPARK-11701] dynamic allocation and speculati...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10951#issuecomment-175885311
  
**[Test build #50219 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50219/consoleFull)**
 for PR 10951 at commit 
[`249fc78`](https://github.com/apache/spark/commit/249fc78fd0fe7b3cbe5430a075ab5f9e281c015c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-175885463
  
cc @nongli @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11701] dynamic allocation and speculati...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10951#issuecomment-175885501
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11701] dynamic allocation and speculati...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10951#issuecomment-175885503
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50219/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2016-01-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10274#discussion_r51059677
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala 
---
@@ -74,6 +89,35 @@ class WeightedLeastSquaresSuite extends SparkFunSuite 
with MLlibTestSparkContext
 }
   }
 
+  test("WLS against lm when label is constant") {
+/*
+   R code:
+   # here b is constant
+   df <- as.data.frame(cbind(A, b))
+   for (formula in c(b ~ . -1, b ~ .)) {
+ model <- lm(formula, data=df, weights=w)
+ print(as.vector(coef(model)))
+   }
+
+  [1] -9.221298  3.394343
+  [1] 17  0  0
+*/
+
+val expected = Seq(
+  Vectors.dense(0.0, -9.221298, 3.394343),
+  Vectors.dense(17.0, 0.0, 0.0))
+
+var idx = 0
+for (fitIntercept <- Seq(false, true)) {
+  val wls = new WeightedLeastSquares(
+fitIntercept, regParam = 0.0, standardizeFeatures = false, 
standardizeLabel = true)
+.fit(instancesConstLabel)
--- End diff --

Sorry for getting you back so late. The difference is due to that `glmnet` 
always standardizes labels even `standardization == false`. `standardization == 
false` is turning off the standardization on features. As a result, at least in 
`glmnet`, when `ystd == 0.0`, the training is not valid. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10957#issuecomment-175889533
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50230/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10957#issuecomment-175889530
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r51061122
  
--- Diff: project/MimaExcludes.scala ---
@@ -145,6 +145,15 @@ object MimaExcludes {
 // SPARK-12510 Refactor ActorReceiver to support Java
 
ProblemFilters.exclude[AbstractClassProblem]("org.apache.spark.streaming.receiver.ActorReceiver")
   ) ++ Seq(
+// SPARK-12895 Implement TaskMetrics using accumulators
+
ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.TaskContext.internalMetricsToAccumulators"),
+
ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.TaskContext.collectInternalAccumulators"),
+
ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.TaskContext.collectAccumulators")
+  ) ++ Seq(
+// SPARK-12896 Send only accumulator updates to driver, not 
TaskMetrics
+
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.Accumulable.this"),
--- End diff --

I just did an audit. See message on main thread for more detail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-27 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10835#issuecomment-175893531
  
I was able to verify that the changes in `Accumulable` and `Accumulator` do 
not break compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r51062164
  
--- Diff: core/src/main/scala/org/apache/spark/InternalAccumulator.scala ---
@@ -17,42 +17,193 @@
 
 package org.apache.spark
 
+import org.apache.spark.storage.{BlockId, BlockStatus}
 
-// This is moved to its own file because many more things will be added to 
it in SPARK-10620.
+
+/**
+ * A collection of fields and methods concerned with internal accumulators 
that represent
+ * task level metrics.
+ */
 private[spark] object InternalAccumulator {
-  val PEAK_EXECUTION_MEMORY = "peakExecutionMemory"
-  val TEST_ACCUMULATOR = "testAccumulator"
-
-  // For testing only.
-  // This needs to be a def since we don't want to reuse the same 
accumulator across stages.
-  private def maybeTestAccumulator: Option[Accumulator[Long]] = {
-if (sys.props.contains("spark.testing")) {
-  Some(new Accumulator(
-0L, AccumulatorParam.LongAccumulatorParam, Some(TEST_ACCUMULATOR), 
internal = true))
-} else {
-  None
+
+  import AccumulatorParam._
+
+  // Prefixes used in names of internal task level metrics
+  val METRICS_PREFIX = "internal.metrics."
+  val SHUFFLE_READ_METRICS_PREFIX = METRICS_PREFIX + "shuffle.read."
+  val SHUFFLE_WRITE_METRICS_PREFIX = METRICS_PREFIX + "shuffle.write."
+  val OUTPUT_METRICS_PREFIX = METRICS_PREFIX + "output."
+  val INPUT_METRICS_PREFIX = METRICS_PREFIX + "input."
+
+  // Names of internal task level metrics
+  val EXECUTOR_DESERIALIZE_TIME = METRICS_PREFIX + 
"executorDeserializeTime"
+  val EXECUTOR_RUN_TIME = METRICS_PREFIX + "executorRunTime"
+  val RESULT_SIZE = METRICS_PREFIX + "resultSize"
+  val JVM_GC_TIME = METRICS_PREFIX + "jvmGCTime"
+  val RESULT_SERIALIZATION_TIME = METRICS_PREFIX + 
"resultSerializationTime"
+  val MEMORY_BYTES_SPILLED = METRICS_PREFIX + "memoryBytesSpilled"
+  val DISK_BYTES_SPILLED = METRICS_PREFIX + "diskBytesSpilled"
+  val PEAK_EXECUTION_MEMORY = METRICS_PREFIX + "peakExecutionMemory"
+  val UPDATED_BLOCK_STATUSES = METRICS_PREFIX + "updatedBlockStatuses"
+  val TEST_ACCUM = METRICS_PREFIX + "testAccumulator"
+
+  // scalastyle:off
+
+  // Names of shuffle read metrics
+  object shuffleRead {
+val REMOTE_BLOCKS_FETCHED = SHUFFLE_READ_METRICS_PREFIX + 
"remoteBlocksFetched"
+val LOCAL_BLOCKS_FETCHED = SHUFFLE_READ_METRICS_PREFIX + 
"localBlocksFetched"
+val REMOTE_BYTES_READ = SHUFFLE_READ_METRICS_PREFIX + "remoteBytesRead"
+val LOCAL_BYTES_READ = SHUFFLE_READ_METRICS_PREFIX + "localBytesRead"
+val FETCH_WAIT_TIME = SHUFFLE_READ_METRICS_PREFIX + "fetchWaitTime"
+val RECORDS_READ = SHUFFLE_READ_METRICS_PREFIX + "recordsRead"
+  }
+
+  // Names of shuffle write metrics
+  object shuffleWrite {
+val BYTES_WRITTEN = SHUFFLE_WRITE_METRICS_PREFIX + "bytesWritten"
+val RECORDS_WRITTEN = SHUFFLE_WRITE_METRICS_PREFIX + "recordsWritten"
+val WRITE_TIME = SHUFFLE_WRITE_METRICS_PREFIX + "writeTime"
+  }
+
+  // Names of output metrics
+  object output {
+val WRITE_METHOD = OUTPUT_METRICS_PREFIX + "writeMethod"
+val BYTES_WRITTEN = OUTPUT_METRICS_PREFIX + "bytesWritten"
+val RECORDS_WRITTEN = OUTPUT_METRICS_PREFIX + "recordsWritten"
+  }
+
+  // Names of input metrics
+  object input {
+val READ_METHOD = INPUT_METRICS_PREFIX + "readMethod"
+val BYTES_READ = INPUT_METRICS_PREFIX + "bytesRead"
+val RECORDS_READ = INPUT_METRICS_PREFIX + "recordsRead"
+  }
+
+  // scalastyle:on
+
+  /**
+   * Create an internal [[Accumulator]] by name, which must begin with 
[[METRICS_PREFIX]].
+   */
+  def create(name: String): Accumulator[_] = {
+assert(name.startsWith(METRICS_PREFIX),
+  s"internal accumulator name must start with '$METRICS_PREFIX': 
$name")
+getParam(name) match {
+  case p @ LongAccumulatorParam => newMetric[Long](0L, name, p)
+  case p @ IntAccumulatorParam => newMetric[Int](0, name, p)
+  case p @ StringAccumulatorParam => newMetric[String]("", name, p)
+  case p @ UpdatedBlockStatusesAccumulatorParam =>
+newMetric[Seq[(BlockId, BlockStatus)]](Seq(), name, p)
+  case p => throw new IllegalArgumentException(
+s"unsupported accumulator param '${p.getClass.getSimpleName}' for 
metric '$name'.")
+}
+  }
+
+  /**
+   * Get the [[AccumulatorParam]] associated with the internal metric name,
+   * which must begin with [[METRICS_PREFIX]].
+   */
+  def getParam(name: String): AccumulatorParam[_] = {
+assert(name.startsWith(METRICS_PREFIX),
+  s"internal accumulator name must 

[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835

2016-01-27 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10958#issuecomment-175897116
  
@JoshRosen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10702#discussion_r51064724
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
 }
   }
 
+  test("linear regression model with constant label") {
+/*
+   R code:
+   for (formula in c(b.const ~ . -1, b.const ~ .)) {
+ model <- lm(formula, data=df.const.label, weights=w)
+ print(as.vector(coef(model)))
+   }
+  [1] -9.221298  3.394343
+  [1] 17  0  0
+*/
+val expected = Seq(
+  Vectors.dense(0.0, -9.221298, 3.394343),
+  Vectors.dense(17.0, 0.0, 0.0))
+
+Seq("auto", "l-bfgs", "normal").foreach { solver =>
+  var idx = 0
+  for (fitIntercept <- Seq(false, true)) {
+val model = new LinearRegression()
+  .setFitIntercept(fitIntercept)
+  .setWeightCol("weight")
+  .setSolver(solver)
+  .fit(datasetWithWeightConstantLabel)
+val actual = Vectors.dense(model.intercept, model.coefficients(0), 
model.coefficients(1))
+assert(actual ~== expected(idx) absTol 1e-4)
+idx += 1
+  }
+}
+  }
+
+  test("regularized linear regression through origin with constant label") 
{
+// The problem is ill-defined if fitIntercept=false, regParam is 
non-zero and \
--- End diff --

Remove `\` in the end of line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10702#discussion_r51064615
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
 }
   }
 
+  test("linear regression model with constant label") {
+/*
+   R code:
+   for (formula in c(b.const ~ . -1, b.const ~ .)) {
+ model <- lm(formula, data=df.const.label, weights=w)
+ print(as.vector(coef(model)))
+   }
+  [1] -9.221298  3.394343
+  [1] 17  0  0
+*/
+val expected = Seq(
+  Vectors.dense(0.0, -9.221298, 3.394343),
+  Vectors.dense(17.0, 0.0, 0.0))
+
+Seq("auto", "l-bfgs", "normal").foreach { solver =>
+  var idx = 0
+  for (fitIntercept <- Seq(false, true)) {
+val model = new LinearRegression()
+  .setFitIntercept(fitIntercept)
+  .setWeightCol("weight")
+  .setSolver(solver)
+  .fit(datasetWithWeightConstantLabel)
+val actual = Vectors.dense(model.intercept, model.coefficients(0), 
model.coefficients(1))
+assert(actual ~== expected(idx) absTol 1e-4)
+idx += 1
+  }
+}
+  }
+
+  test("regularized linear regression through origin with constant label") 
{
+// The problem is ill-defined if fitIntercept=false, regParam is 
non-zero and \
+// standardization=true. An exception is thrown in this case.
--- End diff --

When `standardization=false`, the problem is still ill-defined since GLMNET 
always standardizes the labels. That's why you see it in the analytical 
solution. Let's throw exception when `fitIntercept=false` and `regParam != 0.0`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...

2016-01-27 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10957#issuecomment-175904468
  
Why might this be a bug fix?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10702#discussion_r51069962
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") 
(@Since("1.3.0") override val uid: String
 }
 
 val yMean = ySummarizer.mean(0)
-val yStd = math.sqrt(ySummarizer.variance(0))
-
-// If the yStd is zero, then the intercept is yMean with zero 
coefficient;
-// as a result, training is not needed.
-if (yStd == 0.0) {
-  logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
-s"zeros and the intercept will be the mean of the label; as a 
result, " +
-s"training is not needed.")
-  if (handlePersistence) instances.unpersist()
-  val coefficients = Vectors.sparse(numFeatures, Seq())
-  val intercept = yMean
-
-  val model = new LinearRegressionModel(uid, coefficients, intercept)
-  // Handle possible missing or invalid prediction columns
-  val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
-
-  val trainingSummary = new LinearRegressionTrainingSummary(
-summaryModel.transform(dataset),
-predictionColName,
-$(labelCol),
-model,
-Array(0D),
-$(featuresCol),
-Array(0D))
-  return copyValues(model.setSummary(trainingSummary))
+val rawYStd = math.sqrt(ySummarizer.variance(0))
+if (rawYStd == 0.0) {
+  if ($(fitIntercept)) {
+// If the rawYStd is zero and fitIntercept=true, then the 
intercept is yMean with
+// zero coefficient; as a result, training is not needed.
+logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
+  s"zeros and the intercept will be the mean of the label; as a 
result, " +
+  s"training is not needed.")
+if (handlePersistence) instances.unpersist()
+val coefficients = Vectors.sparse(numFeatures, Seq())
+val intercept = yMean
+
+val model = new LinearRegressionModel(uid, coefficients, intercept)
+// Handle possible missing or invalid prediction columns
+val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
+
+val trainingSummary = new LinearRegressionTrainingSummary(
+  summaryModel.transform(dataset),
+  predictionColName,
+  $(labelCol),
+  model,
+  Array(0D),
+  $(featuresCol),
+  Array(0D))
+return copyValues(model.setSummary(trainingSummary))
+  } else {
+require(!($(regParam) > 0.0 && $(standardization)),
--- End diff --

remove `&& $(standardization)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13047][PYSPARK][ML] Pyspark Params.hasP...

2016-01-27 Thread sethah
GitHub user sethah opened a pull request:

https://github.com/apache/spark/pull/10962

[SPARK-13047][PYSPARK][ML] Pyspark Params.hasParam should not throw an error

Pyspark Params class has a method `hasParam(paramName)` which returns 
`True` if the class has a parameter by that name, but throws an 
`AttributeError` otherwise. There is not currently a way of getting a Boolean 
to indicate if a class has a parameter. With Spark 2.0 we could modify the 
existing behavior of `hasParam` or add an additional method with this 
functionality.

In Python:
```python
from pyspark.ml.classification import NaiveBayes
nb = NaiveBayes(smoothing=0.5)
print nb.hasParam("smoothing")
print nb.hasParam("notAParam")
```
produces:
> True
> AttributeError: 'NaiveBayes' object has no attribute 'notAParam'

However, in Scala:
```scala
import org.apache.spark.ml.classification.NaiveBayes
val nb  = new NaiveBayes()
nb.hasParam("smoothing")
nb.hasParam("notAParam")
```
produces:
> true
> false

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sethah/spark SPARK-13047

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10962.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10962


commit d52b1de1adefedb6938130d0530ea46fdb3f64f7
Author: sethah 
Date:   2016-01-27T23:55:04Z

hasParam returns False instead of throwing an error




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12177] [STREAMING] Update KafkaDStreams...

2016-01-27 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/10953#issuecomment-175891692
  
MiMA is a binary compatibility checker. It's complaining that some changes 
you made caused the public APIs exposed in the compiled classes to change - 
meaning existing code compiled against the current Spark version might not run 
on the next Spark.

First I'd look at whether those changes are necessary; if they are, it 
might be ok to add exclusions because we're being a bit lenient with API 
breakages in 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51061765
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +445,18 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+  // To resolve duplicate expression IDs for all the BinaryNode
+  case b: BinaryNode if !b.duplicateResolved => b match {
+case j @ Join(left, right, _, _) =>
+  j.copy(right = dedupRight(left, right))
+case i @ Intersect(left, right) =>
+  i.copy(right = dedupRight(left, right))
+case e @ Except(left, right) =>
+  e.copy(right = dedupRight(left, right))
+case cg: CoGroup =>
--- End diff --

For other operators, can we construct test cases that can make them fail 
without de-duplication? If we can, then we should create a JIRA and fix it in 
another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10930#issuecomment-175896474
  
**[Test build #50205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50205/consoleFull)**
 for PR 10930 at commit 
[`e627f5b`](https://github.com/apache/spark/commit/e627f5b96a21ccc748c75c7fa0a4c4839cdc63c5).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835

2016-01-27 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10958#issuecomment-175896506
  
@JoshRosen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX] Fix Scala 2.11 compilation

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10955#issuecomment-175903354
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10620] [SPARK-13054] Minor addendum to ...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10958#issuecomment-175908428
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50237/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10620] [SPARK-13054] Minor addendum to ...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10958#issuecomment-175908426
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13045][SQL] Remove ColumnVector.Struct ...

2016-01-27 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10952#issuecomment-175909805
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51068499
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -88,6 +88,12 @@ case class Generate(
 
 case class Filter(condition: Expression, child: LogicalPlan) extends 
UnaryNode {
   override def output: Seq[Attribute] = child.output
+
+  override def constraints: Set[Expression] = {
+val newConstraint = splitConjunctivePredicates(condition).filter(
+  _.references.subsetOf(outputSet)).toSet
--- End diff --

style nit: we typically avoid breaking in the middle of a function call and 
instead prefer to break in between calls (always pick the highest syntactic 
level)

```scala
val newConstraint = splitConjunctivePredicates(condition)
  .filter(_.references.subsetOf(outputSet))
  .toSet
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51068376
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -17,16 +17,31 @@
 
 package org.apache.spark.sql.catalyst.plans
 
-import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, 
Expression, VirtualColumn}
+import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.trees.TreeNode
 import org.apache.spark.sql.types.{DataType, StructType}
 
-abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends 
TreeNode[PlanType] {
+abstract class QueryPlan[PlanType <: TreeNode[PlanType]]
+  extends TreeNode[PlanType] with PredicateHelper {
   self: PlanType =>
 
   def output: Seq[Attribute]
 
   /**
+   * Extracts the output property from a given child.
+   */
+  def extractConstraintsFromChild(child: QueryPlan[PlanType]): 
Set[Expression] = {
--- End diff --

`protected`?

Also I'm not sure I get the scala doc.  Maybe `getReleventContraints` is a 
better name?  It is taking the constraints and removing those that don't apply 
anymore because we removed columns right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10702#discussion_r51070090
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") 
(@Since("1.3.0") override val uid: String
 }
 
 val yMean = ySummarizer.mean(0)
-val yStd = math.sqrt(ySummarizer.variance(0))
-
-// If the yStd is zero, then the intercept is yMean with zero 
coefficient;
-// as a result, training is not needed.
-if (yStd == 0.0) {
-  logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
-s"zeros and the intercept will be the mean of the label; as a 
result, " +
-s"training is not needed.")
-  if (handlePersistence) instances.unpersist()
-  val coefficients = Vectors.sparse(numFeatures, Seq())
-  val intercept = yMean
-
-  val model = new LinearRegressionModel(uid, coefficients, intercept)
-  // Handle possible missing or invalid prediction columns
-  val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
-
-  val trainingSummary = new LinearRegressionTrainingSummary(
-summaryModel.transform(dataset),
-predictionColName,
-$(labelCol),
-model,
-Array(0D),
-$(featuresCol),
-Array(0D))
-  return copyValues(model.setSummary(trainingSummary))
+val rawYStd = math.sqrt(ySummarizer.variance(0))
+if (rawYStd == 0.0) {
+  if ($(fitIntercept)) {
+// If the rawYStd is zero and fitIntercept=true, then the 
intercept is yMean with
+// zero coefficient; as a result, training is not needed.
+logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
+  s"zeros and the intercept will be the mean of the label; as a 
result, " +
+  s"training is not needed.")
+if (handlePersistence) instances.unpersist()
+val coefficients = Vectors.sparse(numFeatures, Seq())
+val intercept = yMean
+
+val model = new LinearRegressionModel(uid, coefficients, intercept)
+// Handle possible missing or invalid prediction columns
+val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
+
+val trainingSummary = new LinearRegressionTrainingSummary(
+  summaryModel.transform(dataset),
+  predictionColName,
+  $(labelCol),
+  model,
+  Array(0D),
+  $(featuresCol),
+  Array(0D))
+return copyValues(model.setSummary(trainingSummary))
+  } else {
+require(!($(regParam) > 0.0 && $(standardization)),
--- End diff --

Just 
` require($(regParam) != 0.0)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-27 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/10835#issuecomment-175892453
  
I have compiled a list of breaking `@DeveloperAPI` changes:

ExceptionFailure:
- changed: `apply`, `unapply`, `copy`
- removed: old constructor
- deprecated: `metrics`

InputMetrics:
- removed: old constructor, all case class methods, `updateBytesRead`, 
`setBytesReadCallback`, `var bytesReadCallback`
- deprecated: `apply`, `unapply`, `incBytesRead`, `incRecordsRead`

OutputMetrics:
- removed: old constructor, all case class methods
- deprecated: `apply`, `unapply`

ShuffleReadMetrics:
- removed: old constructor

ShuffleWriteMetrics:
- removed: old constructor

TaskMetrics:
- changed: `accumulatorUpdates` return type (Map[Long, Any] -> 
Seq[AccumulableInfo])
- removed: `hostname`
- deprecated: `var updatedBlocks`, set `var outputMetrics`, set `var 
shuffleWriteMetrics`

AccumulableInfo:
- changed: `update` type (Option[String] -> Option[Any]), `value` type 
(String -> Option[Any]), `name` type (String -> Option[String])
- removed: `internal`
- deprecated: all existing `apply` methods

SparkListenerTaskEnd:
- changed: `taskMetrics` is now @Nullable

SparkListenerExecutorMetricsUpdate:
- changed: `apply`, `unapply`, `copy`
- removed: old constructor, `taskMetrics`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-175892206
  
Can you paste some generated code? (Actually I think that's useful for most 
of the code gen prs).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10930#issuecomment-175892315
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50200/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10930#issuecomment-175892138
  
**[Test build #50200 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50200/consoleFull)**
 for PR 10930 at commit 
[`2c94ebf`](https://github.com/apache/spark/commit/2c94ebf360512fb6c58c0cf199122f349eafa0cb).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class CountMinSketchImpl extends CountMinSketch implements 
Serializable `
  * `class DefaultSource extends HadoopFsRelationProvider with 
DataSourceRegister `
  * `class ChiSqSelector(JavaEstimator, HasFeaturesCol, HasOutputCol, 
HasLabelCol):`
  * `class ChiSqSelectorModel(JavaModel):`
  * `  public static final class Array extends ArrayData `
  * `  public static final class Struct extends InternalRow `
  * `public class ColumnVectorUtils `
  * `  public static final class Row extends InternalRow `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10940#issuecomment-175898494
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50229/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10940#issuecomment-175898492
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

2016-01-27 Thread dilipbiswal
Github user dilipbiswal commented on the pull request:

https://github.com/apache/spark/pull/10943#issuecomment-175899786
  
@cloud-fan Thank you Wenchen for your comments. In my understanding , users 
need to use back-tick to quote the column names if they wanted them to be 
treated as a column name as opposed to column path. I tried the following 
example

val df = Seq((1, 2, 3)).toDF("a_b", "a.c", "b.c")
df.select("a.c") => fails to resolve
df.select("`a.c`") => works fine.

Is this not how it is supposed to work ? Can you please elaborate by taking 
a small
example ? Thanks in advance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10702#discussion_r51070105
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") 
(@Since("1.3.0") override val uid: String
 }
 
 val yMean = ySummarizer.mean(0)
-val yStd = math.sqrt(ySummarizer.variance(0))
-
-// If the yStd is zero, then the intercept is yMean with zero 
coefficient;
-// as a result, training is not needed.
-if (yStd == 0.0) {
-  logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
-s"zeros and the intercept will be the mean of the label; as a 
result, " +
-s"training is not needed.")
-  if (handlePersistence) instances.unpersist()
-  val coefficients = Vectors.sparse(numFeatures, Seq())
-  val intercept = yMean
-
-  val model = new LinearRegressionModel(uid, coefficients, intercept)
-  // Handle possible missing or invalid prediction columns
-  val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
-
-  val trainingSummary = new LinearRegressionTrainingSummary(
-summaryModel.transform(dataset),
-predictionColName,
-$(labelCol),
-model,
-Array(0D),
-$(featuresCol),
-Array(0D))
-  return copyValues(model.setSummary(trainingSummary))
+val rawYStd = math.sqrt(ySummarizer.variance(0))
+if (rawYStd == 0.0) {
+  if ($(fitIntercept)) {
+// If the rawYStd is zero and fitIntercept=true, then the 
intercept is yMean with
+// zero coefficient; as a result, training is not needed.
+logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
+  s"zeros and the intercept will be the mean of the label; as a 
result, " +
+  s"training is not needed.")
+if (handlePersistence) instances.unpersist()
+val coefficients = Vectors.sparse(numFeatures, Seq())
+val intercept = yMean
+
+val model = new LinearRegressionModel(uid, coefficients, intercept)
+// Handle possible missing or invalid prediction columns
+val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
+
+val trainingSummary = new LinearRegressionTrainingSummary(
+  summaryModel.transform(dataset),
+  predictionColName,
+  $(labelCol),
+  model,
+  Array(0D),
+  $(featuresCol),
+  Array(0D))
+return copyValues(model.setSummary(trainingSummary))
+  } else {
+require(!($(regParam) > 0.0 && $(standardization)),
--- End diff --

also change the message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12963] Improve performance of stddev/va...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10960#issuecomment-175915586
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50239/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12963] Improve performance of stddev/va...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10960#issuecomment-175915582
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13047][PYSPARK][ML] Pyspark Params.hasP...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10962#issuecomment-175920740
  
**[Test build #50242 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50242/consoleFull)**
 for PR 10962 at commit 
[`d52b1de`](https://github.com/apache/spark/commit/d52b1de1adefedb6938130d0530ea46fdb3f64f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-27 Thread iyounus
Github user iyounus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10702#discussion_r51071809
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") 
(@Since("1.3.0") override val uid: String
 }
 
 val yMean = ySummarizer.mean(0)
-val yStd = math.sqrt(ySummarizer.variance(0))
-
-// If the yStd is zero, then the intercept is yMean with zero 
coefficient;
-// as a result, training is not needed.
-if (yStd == 0.0) {
-  logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
-s"zeros and the intercept will be the mean of the label; as a 
result, " +
-s"training is not needed.")
-  if (handlePersistence) instances.unpersist()
-  val coefficients = Vectors.sparse(numFeatures, Seq())
-  val intercept = yMean
-
-  val model = new LinearRegressionModel(uid, coefficients, intercept)
-  // Handle possible missing or invalid prediction columns
-  val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
-
-  val trainingSummary = new LinearRegressionTrainingSummary(
-summaryModel.transform(dataset),
-predictionColName,
-$(labelCol),
-model,
-Array(0D),
-$(featuresCol),
-Array(0D))
-  return copyValues(model.setSummary(trainingSummary))
+val rawYStd = math.sqrt(ySummarizer.variance(0))
+if (rawYStd == 0.0) {
+  if ($(fitIntercept)) {
+// If the rawYStd is zero and fitIntercept=true, then the 
intercept is yMean with
+// zero coefficient; as a result, training is not needed.
+logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
+  s"zeros and the intercept will be the mean of the label; as a 
result, " +
+  s"training is not needed.")
+if (handlePersistence) instances.unpersist()
+val coefficients = Vectors.sparse(numFeatures, Seq())
+val intercept = yMean
+
+val model = new LinearRegressionModel(uid, coefficients, intercept)
+// Handle possible missing or invalid prediction columns
+val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
+
+val trainingSummary = new LinearRegressionTrainingSummary(
+  summaryModel.transform(dataset),
+  predictionColName,
+  $(labelCol),
+  model,
+  Array(0D),
+  $(featuresCol),
+  Array(0D))
+return copyValues(model.setSummary(trainingSummary))
+  } else {
+require(!($(regParam) > 0.0 && $(standardization)),
--- End diff --

I can change this. But, the behaviour of WeightedLeastSquares should also 
be the same. Should I also make changes to WeightedLeastSquares?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10958#issuecomment-175899434
  
**[Test build #50234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50234/consoleFull)**
 for PR 10958 at commit 
[`5404254`](https://github.com/apache/spark/commit/540425450ea0e5376d99f6ccb43857b74f34204e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-27 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/10702#issuecomment-175901752
  
@iyounus `standardizeLabel = false/ture` with non-zero `regParam`, let's 
throw the exception. I explained the mismatch against the analytic normal 
equation in the other PR. 

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-27 Thread iyounus
Github user iyounus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10702#discussion_r51071489
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") 
(@Since("1.3.0") override val uid: String
 }
 
 val yMean = ySummarizer.mean(0)
-val yStd = math.sqrt(ySummarizer.variance(0))
-
-// If the yStd is zero, then the intercept is yMean with zero 
coefficient;
-// as a result, training is not needed.
-if (yStd == 0.0) {
-  logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
-s"zeros and the intercept will be the mean of the label; as a 
result, " +
-s"training is not needed.")
-  if (handlePersistence) instances.unpersist()
-  val coefficients = Vectors.sparse(numFeatures, Seq())
-  val intercept = yMean
-
-  val model = new LinearRegressionModel(uid, coefficients, intercept)
-  // Handle possible missing or invalid prediction columns
-  val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
-
-  val trainingSummary = new LinearRegressionTrainingSummary(
-summaryModel.transform(dataset),
-predictionColName,
-$(labelCol),
-model,
-Array(0D),
-$(featuresCol),
-Array(0D))
-  return copyValues(model.setSummary(trainingSummary))
+val rawYStd = math.sqrt(ySummarizer.variance(0))
+if (rawYStd == 0.0) {
+  if ($(fitIntercept)) {
+// If the rawYStd is zero and fitIntercept=true, then the 
intercept is yMean with
+// zero coefficient; as a result, training is not needed.
+logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
+  s"zeros and the intercept will be the mean of the label; as a 
result, " +
+  s"training is not needed.")
+if (handlePersistence) instances.unpersist()
+val coefficients = Vectors.sparse(numFeatures, Seq())
+val intercept = yMean
+
+val model = new LinearRegressionModel(uid, coefficients, intercept)
+// Handle possible missing or invalid prediction columns
+val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
+
+val trainingSummary = new LinearRegressionTrainingSummary(
+  summaryModel.transform(dataset),
+  predictionColName,
+  $(labelCol),
+  model,
+  Array(0D),
+  $(featuresCol),
+  Array(0D))
+return copyValues(model.setSummary(trainingSummary))
+  } else {
+require(!($(regParam) > 0.0 && $(standardization)),
+  "The standard deviation of the label is zero. " +
+"Model cannot be regularized with standardization=true")
+logWarning(s"The standard deviation of the label is zero. " +
+  "Consider setting fitIntercept=true.")
+  }
 }
 
+// if y is constant (rawYStd is zero), then y cannot be scaled. In 
this case
+// setting yStd=1.0 ensures that y is not scaled anymore in l-bfgs 
algorithm.
+val yStd = if (rawYStd > 0) rawYStd else 1.0
--- End diff --

Its not clear to me why would you set yStd = abs(yMean) if label is 
constant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51059167
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +445,18 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+  // To resolve duplicate expression IDs for all the BinaryNode
+  case b: BinaryNode if !b.duplicateResolved => b match {
+case j @ Join(left, right, _, _) =>
+  j.copy(right = dedupRight(left, right))
+case i @ Intersect(left, right) =>
+  i.copy(right = dedupRight(left, right))
+case e @ Except(left, right) =>
+  e.copy(right = dedupRight(left, right))
+case cg: CoGroup =>
--- End diff --

In this case, it should work! Let me know if we should deduplicate the 
expression IDs for the other operators. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835

2016-01-27 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/10958

[SPARK-10620] Minor addendum to #10835



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark 
task-metrics-to-accums-followups

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10958.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10958


commit 8be6c863097d4eef0ac1b03b94165b2e61f1df7d
Author: Andrew Or 
Date:   2016-01-27T22:31:09Z

Fix indentations, visibility, deprecation etc.

commit 9de795b67ed52068472bffcce119989efd4aed43
Author: Andrew Or 
Date:   2016-01-27T22:32:47Z

Merge branch 'master' of github.com:apache/spark into 
task-metrics-to-accums-followups

Conflicts:
core/src/main/scala/org/apache/spark/Accumulable.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10620] [SPARK-13054] Minor addendum to ...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10958#issuecomment-175908753
  
**[Test build #50238 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50238/consoleFull)**
 for PR 10958 at commit 
[`6e4859d`](https://github.com/apache/spark/commit/6e4859d0aff3dbbd1c59e88101b7112610eb7d3c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13050] [Build] Scalatest tags fail buil...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10954#issuecomment-175894464
  
**[Test build #50221 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50221/consoleFull)**
 for PR 10954 at commit 
[`6ab0ec9`](https://github.com/apache/spark/commit/6ab0ec9ce3748ce395885bcefeeacc4178e31d3d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10930#issuecomment-175900800
  
**[Test build #50233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50233/consoleFull)**
 for PR 10930 at commit 
[`e627f5b`](https://github.com/apache/spark/commit/e627f5b96a21ccc748c75c7fa0a4c4839cdc63c5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10958#issuecomment-175903012
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10958#issuecomment-175902964
  
**[Test build #50234 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50234/consoleFull)**
 for PR 10958 at commit 
[`5404254`](https://github.com/apache/spark/commit/540425450ea0e5376d99f6ccb43857b74f34204e).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10958#issuecomment-175903014
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50234/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...

2016-01-27 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10957#issuecomment-175908190
  
A user is trying to get this working on 1.6 using the dataframe api. That 
doesn't work directly because functions.scala misses  the functions implemented 
in this PR. The indirect approach using ```expr(...)``` doesn't work because 
```WindowSpec``` does not support ```UnresolvedFunctions```. 

I guess this is more a feature than a bug fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12963] Improve performance of stddev/va...

2016-01-27 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/10960

[SPARK-12963] Improve performance of stddev/variance

As benchmarked and discussed here: 
https://github.com/apache/spark/pull/10786/files#r50038294, benefits from 
codegen, the declarative aggregate function could be much faster than 
imperative one.

This PR is based on #10944 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark stddev

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10960.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10960


commit b4db00675bc3c51ddf8735cace522a5d771cf7e2
Author: Davies Liu 
Date:   2016-01-27T07:43:40Z

cleanup whole stage codegen

commit 70a7c7edd1988c7dd69bccc8e563c9943775bd2c
Author: Davies Liu 
Date:   2016-01-27T23:22:33Z

improve stddev and variance




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

2016-01-27 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10844#discussion_r51068566
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -146,6 +172,26 @@ case class Union(children: Seq[LogicalPlan]) extends 
LogicalPlan {
 val sizeInBytes = children.map(_.statistics.sizeInBytes).sum
 Statistics(sizeInBytes = sizeInBytes)
   }
+
+  override def extractConstraintsFromChild(child: QueryPlan[LogicalPlan]): 
Set[Expression] = {
+child.constraints.filter(_.references.subsetOf(child.outputSet))
+  }
+
+  def rewriteConstraints(
+  planA: LogicalPlan,
+  planB: LogicalPlan,
+  constraints: Set[Expression]): Set[Expression] = {
+require(planA.output.size == planB.output.size)
+val attributeRewrites = AttributeMap(planB.output.zip(planA.output))
+constraints.map(_ transform {
+  case a: Attribute => attributeRewrites(a)
+})
+  }
+
+  override def constraints: Set[Expression] = {
+children.map(child => rewriteConstraints(children.head, child,
+  extractConstraintsFromChild(child))).reduce(_ intersect _)
--- End diff --

same style nit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10940#issuecomment-175911254
  
**[Test build #50227 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50227/consoleFull)**
 for PR 10940 at commit 
[`914cffc`](https://github.com/apache/spark/commit/914cffc6f0a9e0d847f486916ff89941c55c63ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13045][SQL] Remove ColumnVector.Struct ...

2016-01-27 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10952#issuecomment-175911147
  
Merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10810] [SPARK-10902] [SQL] Improve sess...

2016-01-27 Thread Neuw84
Github user Neuw84 commented on the pull request:

https://github.com/apache/spark/pull/8909#issuecomment-175916186
  
@deenar, I saw it after many hours reading the code in the web docs .  
Although, I think that the HiveSparkContext should implement the same logic as 
the SparkContext where you can get the same Session programmatically. Thanks by 
the way 
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10702#discussion_r51070506
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") 
(@Since("1.3.0") override val uid: String
 }
 
 val yMean = ySummarizer.mean(0)
-val yStd = math.sqrt(ySummarizer.variance(0))
-
-// If the yStd is zero, then the intercept is yMean with zero 
coefficient;
-// as a result, training is not needed.
-if (yStd == 0.0) {
-  logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
-s"zeros and the intercept will be the mean of the label; as a 
result, " +
-s"training is not needed.")
-  if (handlePersistence) instances.unpersist()
-  val coefficients = Vectors.sparse(numFeatures, Seq())
-  val intercept = yMean
-
-  val model = new LinearRegressionModel(uid, coefficients, intercept)
-  // Handle possible missing or invalid prediction columns
-  val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
-
-  val trainingSummary = new LinearRegressionTrainingSummary(
-summaryModel.transform(dataset),
-predictionColName,
-$(labelCol),
-model,
-Array(0D),
-$(featuresCol),
-Array(0D))
-  return copyValues(model.setSummary(trainingSummary))
+val rawYStd = math.sqrt(ySummarizer.variance(0))
+if (rawYStd == 0.0) {
+  if ($(fitIntercept)) {
+// If the rawYStd is zero and fitIntercept=true, then the 
intercept is yMean with
+// zero coefficient; as a result, training is not needed.
+logWarning(s"The standard deviation of the label is zero, so the 
coefficients will be " +
+  s"zeros and the intercept will be the mean of the label; as a 
result, " +
+  s"training is not needed.")
+if (handlePersistence) instances.unpersist()
+val coefficients = Vectors.sparse(numFeatures, Seq())
+val intercept = yMean
+
+val model = new LinearRegressionModel(uid, coefficients, intercept)
+// Handle possible missing or invalid prediction columns
+val (summaryModel, predictionColName) = 
model.findSummaryModelAndPredictionCol()
+
+val trainingSummary = new LinearRegressionTrainingSummary(
+  summaryModel.transform(dataset),
+  predictionColName,
+  $(labelCol),
+  model,
+  Array(0D),
+  $(featuresCol),
+  Array(0D))
+return copyValues(model.setSummary(trainingSummary))
+  } else {
+require(!($(regParam) > 0.0 && $(standardization)),
+  "The standard deviation of the label is zero. " +
+"Model cannot be regularized with standardization=true")
+logWarning(s"The standard deviation of the label is zero. " +
+  "Consider setting fitIntercept=true.")
+  }
 }
 
+// if y is constant (rawYStd is zero), then y cannot be scaled. In 
this case
+// setting yStd=1.0 ensures that y is not scaled anymore in l-bfgs 
algorithm.
+val yStd = if (rawYStd > 0) rawYStd else 1.0
--- End diff --

Actually, in the case of `yMean == 0.0 && yStd == 0.0`, the coefficients 
will be all zeros as well even `fitIntercept  == false`. This is rare case, so 
we can let model training to figure out. But if you want to handle this 
explicitly, it's great.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...

2016-01-27 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10957#issuecomment-175881244
  
@yhuai ```expr("last(r, true)")``` would return an 
```UnresolvedFunction(UnresolvedAttribute(r), Literal(true))```. The problem is 
that the ```WindowSpec``` does not recognize ```UnresolvedFunction```'s.

This is the cleaner fix. We could also add a match to the ```WindowSpec``` 
function for unresolved functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12177] [STREAMING] Update KafkaDStreams...

2016-01-27 Thread markgrover
Github user markgrover commented on the pull request:

https://github.com/apache/spark/pull/10953#issuecomment-175887290
  
OK, I have no idea what Mima is but I will take a look and try to run them 
locally and fix the issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10930#issuecomment-175892312
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10930#issuecomment-175896643
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10930#issuecomment-175896645
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50205/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10957#issuecomment-175896538
  
**[Test build #50231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50231/consoleFull)**
 for PR 10957 at commit 
[`defcc02`](https://github.com/apache/spark/commit/defcc02a8885e884d5140b11705b764a51753162).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r51062999
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulator.scala ---
@@ -75,43 +84,65 @@ private[spark] object Accumulators extends Logging {
* This global map holds the original accumulator objects that are 
created on the driver.
* It keeps weak references to these objects so that accumulators can be 
garbage-collected
* once the RDDs and user-code that reference them are cleaned up.
+   * TODO: Don't use a global map; these should be tied to a SparkContext 
at the very least.
--- End diff --

https://issues.apache.org/jira/browse/SPARK-13051


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-13052 waitingApps metric doesn't show th...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10959#issuecomment-175904859
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...

2016-01-27 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/10957#issuecomment-175881484
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-175895315
  
Here is the generated code for `sqlContext.range(values).filter("(id & 1) = 
1").count()`

```
/* 001 */
/* 002 */ public Object generate(Object[] references) {
/* 003 */   return new GeneratedIterator(references);
/* 004 */ }
/* 005 */
/* 006 */ class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 007 */
/* 008 */   private Object[] references;
/* 009 */   private boolean TungstenAggregate_initAgg0;
/* 010 */   private boolean TungstenAggregate_bufIsNull1;
/* 011 */   private long TungstenAggregate_bufValue2;
/* 012 */   private boolean Range_initRange6;
/* 013 */   private long Range_partitionEnd7;
/* 014 */   private long Range_number8;
/* 015 */   private boolean Range_overflow9;
/* 016 */   private UnsafeRow TungstenAggregate_result29;
/* 017 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
TungstenAggregate_holder30;
/* 018 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
TungstenAggregate_rowWriter31;
/* 019 */
/* 020 */   private void initRange(int idx) {
/* 021 */ java.math.BigInteger index = 
java.math.BigInteger.valueOf(idx);
/* 022 */ java.math.BigInteger numSlice = 
java.math.BigInteger.valueOf(1L);
/* 023 */ java.math.BigInteger numElement = 
java.math.BigInteger.valueOf(209715200L);
/* 024 */ java.math.BigInteger step = java.math.BigInteger.valueOf(1L);
/* 025 */ java.math.BigInteger start = java.math.BigInteger.valueOf(0L);
/* 026 */
/* 027 */ java.math.BigInteger st = 
index.multiply(numElement).divide(numSlice).multiply(step).add(start);
/* 028 */ if 
(st.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
/* 029 */   Range_number8 = Long.MAX_VALUE;
/* 030 */ } else if 
(st.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
/* 031 */   Range_number8 = Long.MIN_VALUE;
/* 032 */ } else {
/* 033 */   Range_number8 = st.longValue();
/* 034 */ }
/* 035 */
/* 036 */ java.math.BigInteger end = 
index.add(java.math.BigInteger.ONE).multiply(numElement).divide(numSlice)
/* 037 */ .multiply(step).add(start);
/* 038 */ if 
(end.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
/* 039 */   Range_partitionEnd7 = Long.MAX_VALUE;
/* 040 */ } else if 
(end.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
/* 041 */   Range_partitionEnd7 = Long.MIN_VALUE;
/* 042 */ } else {
/* 043 */   Range_partitionEnd7 = end.longValue();
/* 044 */ }
/* 045 */   }
/* 046 */
/* 047 */
/* 048 */   private void TungstenAggregate_doAgg5() {
/* 049 */ // initialize aggregation buffer
/* 050 */ /* 0 */
/* 051 */
/* 052 */ TungstenAggregate_bufIsNull1 = false;
/* 053 */ TungstenAggregate_bufValue2 = 0L;
/* 054 */
/* 055 */
/* 056 */
/* 057 */ // initialize Range
/* 058 */ if (!Range_initRange6) {
/* 059 */   Range_initRange6 = true;
/* 060 */   if (input.hasNext()) {
/* 061 */ initRange(((InternalRow) input.next()).getInt(0));
/* 062 */   } else {
/* 063 */ return;
/* 064 */   }
/* 065 */ }
/* 066 */
/* 067 */ while (!Range_overflow9 && Range_number8 < 
Range_partitionEnd7) {
/* 068 */   long Range_value10 = Range_number8;
/* 069 */   Range_number8 += 1L;
/* 070 */   if (Range_number8 < Range_value10 ^ 1L < 0) {
/* 071 */ Range_overflow9 = true;
/* 072 */   }
/* 073 */
/* 074 */   /* ((input[0, bigint] & 1) = 1) */
/* 075 */   /* (input[0, bigint] & 1) */
/* 076 */   /* input[0, bigint] */
/* 077 */
/* 078 */   /* 1 */
/* 079 */
/* 080 */   long Filter_value14 = -1L;
/* 081 */   Filter_value14 = Range_value10 & 1L;
/* 082 */   /* 1 */
/* 083 */
/* 084 */   boolean Filter_value12 = false;
/* 085 */   Filter_value12 = Filter_value14 == 1L;
/* 086 */   if (!false && Filter_value12) {
/* 087 */
/* 088 */
/* 089 */
/* 090 */
/* 091 */ // do aggregate and update aggregation buffer
/* 092 */
/* 093 */ /* (input[0, bigint] + 1) */
/* 094 */ /* input[0, bigint] */
/* 095 */
/* 096 */ /* 1 */
/* 097 */
/* 098 */ long TungstenAggregate_value22 = -1L;
/* 099 */ TungstenAggregate_value22 = TungstenAggregate_bufValue2 + 
1L;
/* 100 */ TungstenAggregate_bufIsNull1 = false;
/* 101 */ TungstenAggregate_bufValue2 = TungstenAggregate_value22;
/* 102 */
/* 103 */
/* 104 */
/* 105 */  

[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...

2016-01-27 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10930#issuecomment-175895328
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10940#issuecomment-175898182
  
**[Test build #50229 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50229/consoleFull)**
 for PR 10940 at commit 
[`914cffc`](https://github.com/apache/spark/commit/914cffc6f0a9e0d847f486916ff89941c55c63ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6363][BUILD] Make Scala 2.11 the defaul...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10608#issuecomment-175907303
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6363][BUILD] Make Scala 2.11 the defaul...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10608#issuecomment-175907259
  
**[Test build #50236 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50236/consoleFull)**
 for PR 10608 at commit 
[`18c5223`](https://github.com/apache/spark/commit/18c5223bef0330085f0f577fea49581aa82e2ca1).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12926][SQL] SQLContext to disallow user...

2016-01-27 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10849#issuecomment-175907368
  
Please update the title and description (these become the commit message 
when merging).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6363][BUILD] Make Scala 2.11 the defaul...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10608#issuecomment-175907305
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50236/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13045][SQL] Remove ColumnVector.Struct ...

2016-01-27 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10952#issuecomment-175909089
  
cc @davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13045][SQL] Remove ColumnVector.Struct ...

2016-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10952


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12963] Improve performance of stddev/va...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10960#issuecomment-175914531
  
**[Test build #50240 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50240/consoleFull)**
 for PR 10960 at commit 
[`61edd5e`](https://github.com/apache/spark/commit/61edd5e3a2c030d7387db5283eee5ada13553505).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10961#issuecomment-175917592
  
**[Test build #50241 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50241/consoleFull)**
 for PR 10961 at commit 
[`24ca13c`](https://github.com/apache/spark/commit/24ca13c7f8b2ac5fbc4a9600539bb02d22b56a91).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

2016-01-27 Thread dilipbiswal
Github user dilipbiswal commented on the pull request:

https://github.com/apache/spark/pull/10943#issuecomment-175917517
  
@cloud-fan Thank you Wenchen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11955][SQL] Mark optional fields in mer...

2016-01-27 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/9940#issuecomment-175927505
  
ping @liancheng Please see if latest updates are proper for you. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-11565 Replace deprecated DigestUtils.sha...

2016-01-27 Thread gliptak
Github user gliptak commented on the pull request:

https://github.com/apache/spark/pull/9532#issuecomment-175927249
  
Github is hiccuping ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-175927638
  
**[Test build #50243 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50243/consoleFull)**
 for PR 10527 at commit 
[`9476822`](https://github.com/apache/spark/commit/94768224126acf303e9e8b6d2697388f0fec1d23).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10916#discussion_r51073364
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
 ---
@@ -183,7 +183,7 @@ class CliSuite extends SparkFunSuite with 
BeforeAndAfterAll with Logging {
   "CREATE DATABASE hive_test_db;"
 -> "OK",
   "USE hive_test_db;"
--> "OK",
+-> "",
--- End diff --

Return OK will break hive compatibility test. I've tried in previous 
commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12995][GraphX] Remove deprecate APIs fr...

2016-01-27 Thread maropu
Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/10918#issuecomment-175927636
  
@ankurdave @srowen ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12966][SQL] Support ArrayType(DecimalTy...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10898#issuecomment-175978667
  
**[Test build #50251 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50251/consoleFull)**
 for PR 10898 at commit 
[`52eaebe`](https://github.com/apache/spark/commit/52eaebea0cf2650ee1aff4c0eb2d7dfd706d655b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10916#discussion_r51082907
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -1262,6 +1263,21 @@ class HiveQuerySuite extends HiveComparisonTest with 
BeforeAndAfter {
 
   }
 
+  test("use database") {
+val currentDatabase = sql("select 
current_database()").first().getString(0)
+
+sql("CREATE DATABASE hive_test_db")
+sql("USE hive_test_db")
+assert("hive_test_db" == sql("select 
current_database()").first().getString(0))
--- End diff --

Do we already have database support in `SQLContext`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13057][SQL] Add benchmark codes and the...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10965#issuecomment-176000161
  
**[Test build #50254 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50254/consoleFull)**
 for PR 10965 at commit 
[`b3bf70c`](https://github.com/apache/spark/commit/b3bf70c810baefaa6fb374d6b8052341b847e0d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11780][SQL] Add catalyst type aliases b...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10915#issuecomment-176007544
  
**[Test build #50256 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50256/consoleFull)**
 for PR 10915 at commit 
[`c338300`](https://github.com/apache/spark/commit/c3383003dff0c6c49849dad89da7a3fac906cab5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10916#discussion_r51073555
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientInterface.scala 
---
@@ -109,6 +109,9 @@ private[hive] trait ClientInterface {
   /** Returns the name of the active database. */
   def currentDatabase: String
--- End diff --

Yeah. I think we don't need to address database support for all catalogs in 
this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10916#discussion_r51073506
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala
 ---
@@ -46,6 +46,10 @@ trait Catalog {
 
   def lookupRelation(tableIdent: TableIdentifier, alias: Option[String] = 
None): LogicalPlan
 
+  def setCurrentDatabase(databaseName: String): Unit = {
+throw new UnsupportedOperationException
--- End diff --

I think not all catalog support database concept. So, inherited catalog can 
choose to implement it or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-175932469
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10527#issuecomment-175932477
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50245/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12792][SPARKR] Refactor RRDD to support...

2016-01-27 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/10947#issuecomment-175946838
  
cc @davies. Thanks @sunrui for the PR. I'll review this later today


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-175955462
  
**[Test build #50247 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50247/consoleFull)**
 for PR 9893 at commit 
[`ab6d601`](https://github.com/apache/spark/commit/ab6d6016a9edc42bc5fae3eebff63fca518912d8).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-175955469
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50247/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-175955464
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10873] Support column sort and search f...

2016-01-27 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10648#issuecomment-176012531
  
@tgravescs @zhuoliu are you guys interested in more UI work? I have some 
ideas that I never found time / people to work on ... I think they will make 
the UI a lot more useful.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...

2016-01-27 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/10944#issuecomment-176013043
  
@nongli  Does this one looks good to you? this one blocks others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...

2016-01-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10723#discussion_r51074963
  
--- Diff: 
sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlLexer.g
 ---
@@ -465,7 +467,7 @@ Identifier
 fragment
 QuotedIdentifier 
 :
-'`'  ( '``' | ~('`') )* '`' { setText(getText().substring(1, 
getText().length() -1 ).replaceAll("``", "`")); }
+'`'  ( '``' | ~('`') )* '`' { setText(getText().replaceAll("``", 
"`")); }
--- End diff --

The above query will get ParseException: mismatched character '' 
expecting '`'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-175955168
  
**[Test build #50247 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50247/consoleFull)**
 for PR 9893 at commit 
[`ab6d601`](https://github.com/apache/spark/commit/ab6d6016a9edc42bc5fae3eebff63fca518912d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10873] Support column sort and search f...

2016-01-27 Thread zhuoliu
Github user zhuoliu commented on the pull request:

https://github.com/apache/spark/pull/10648#issuecomment-175966503
  
Hi @tgravescs , finally fixed the paging stuff in RowsGrouping. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >