date:20160908

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15015
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15015
  
**[Test build #65135 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65135/consoleFull)**
 for PR 15015 at commit 
[`91ae8d6`](https://github.com/apache/spark/commit/91ae8d65891d8bc8e1895a3821dd10cec7c52efb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15015
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65135/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown

2016-09-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14912
  
Could you define the conditions in which the predicates are unable to be 
pushed down? Then, we can easily justify the significance. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract `outpu...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14864
  
**[Test build #65136 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65136/consoleFull)**
 for PR 14864 at commit 
[`445549b`](https://github.com/apache/spark/commit/445549b81c97f2c3024bbaff97bbca371cb37558).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15015
  
**[Test build #65135 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65135/consoleFull)**
 for PR 15015 at commit 
[`91ae8d6`](https://github.com/apache/spark/commit/91ae8d65891d8bc8e1895a3821dd10cec7c52efb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14980
  
**[Test build #65134 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65134/consoleFull)**
 for PR 14980 at commit 
[`7b55255`](https://github.com/apache/spark/commit/7b552557a0fdbfbac6fa11ae578171ac42516cd6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14980
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14980
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65134/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15020: Spark 2.0 error in Intellij

2016-09-08 Thread yintengfei

Github user yintengfei commented on the issue:

https://github.com/apache/spark/pull/15020
  
You can find : File -> Project Structure -> Modules -> Dependcies , change 
the provided to compile.

As HyukjinKwon say , you'd better  ask this to user-mailing list .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract...

2016-09-08 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/14864#discussion_r78129784
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -156,24 +155,57 @@ case class FileSourceScanExec(
 false
   }
 
-  override val outputPartitioning: Partitioning = {
+  @transient private lazy val selectedPartitions = 
relation.location.listFiles(partitionFilters)
+
+  override val (outputPartitioning, outputOrdering): (Partitioning, 
Seq[SortOrder]) = {
 val bucketSpec = if 
(relation.sparkSession.sessionState.conf.bucketingEnabled) {
   relation.bucketSpec
 } else {
   None
 }
-bucketSpec.map { spec =>
-  val numBuckets = spec.numBuckets
-  val bucketColumns = spec.bucketColumnNames.flatMap { n =>
-output.find(_.name == n)
-  }
-  if (bucketColumns.size == spec.bucketColumnNames.size) {
-HashPartitioning(bucketColumns, numBuckets)
-  } else {
-UnknownPartitioning(0)
-  }
-}.getOrElse {
-  UnknownPartitioning(0)
+bucketSpec match {
+  case Some(spec) =>
+val numBuckets = spec.numBuckets
+
+def toAttribute(colName: String, columnType: String): Attribute =
+  output.find(_.name == colName).getOrElse {
+throw new AnalysisException(s"Could not find $columnType 
column $colName for " +
--- End diff --

I see what you meant earlier. I have made changes to the PR to follow this:

For bucketed columns:
`HashPartitioning` would be used only when:
- ALL the bucketing columns are being read from the table

For sorted columns:
Sort ordering should be used when ALL these criteria's match:
1. `HashPartitioning` is being used
2. A prefix (or all) of the sort columns are being read from the table.

Sort ordering would be over the prefix subset of `sort columns` being read
from the table. eg.
Assume (col0, col2, col3) are the columns read from the table
- If sort columns are (col0, col1), then sort ordering would be considered 
as (col0)
- If sort columns are (col1, col0), then sort ordering would be empty as 
per rule #2 above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14750: [SPARK-17183][SQL] put hive serde table schema to...

2016-09-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14750#discussion_r78129508
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -169,7 +169,10 @@ case class InsertIntoHiveTable(
 
 // All partition column names in the format of "//..."
 val partitionColumns = 
fileSinkConf.getTableInfo.getProperties.getProperty("partition_columns")
-val partitionColumnNames = 
Option(partitionColumns).map(_.split("/")).getOrElse(Array.empty)
+// As the keys of partition spec `partition` is always lowercase, we 
should also lowercase the
+// partition column names of the table here.
+val partitionColumnNames =
+  
Option(partitionColumns).map(_.split("/").map(_.toLowerCase)).getOrElse(Array.empty)
--- End diff --

We can combine the above two lines into a single line?
```Scala
val partitionColumnNames = 
table.catalogTable.partitionColumnNames.map(_.toLowerCase)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14452
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14452
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65131/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #65131 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65131/consoleFull)**
 for PR 14452 at commit 
[`bc70354`](https://github.com/apache/spark/commit/bc70354fefeb2ff2cac57d869b6d342230859fd3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14980
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65133/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14980
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14980
  
**[Test build #65133 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65133/consoleFull)**
 for PR 14980 at commit 
[`1142fac`](https://github.com/apache/spark/commit/1142facf3f02ededdc57a006ea065b6014510eae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78128581
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -460,33 +564,74 @@ class LogisticRegression @Since("1.2.0") (
as a result, no scaling is needed.
  */
 val rawCoefficients = state.x.toArray.clone()
-var i = 0
-while (i < numFeatures) {
-  rawCoefficients(i) *= { if (featuresStd(i) != 0.0) 1.0 / 
featuresStd(i) else 0.0 }
-  i += 1
+val coefficientArray = Array.tabulate(numCoefficientSets * 
numFeatures) { i =>
+  // flatIndex will loop though rawCoefficients, and skip the 
intercept terms.
+  val flatIndex = if ($(fitIntercept)) i + i / numFeatures else i
+  val featureIndex = i % numFeatures
+  if (featuresStd(featureIndex) != 0.0) {
+rawCoefficients(flatIndex) / featuresStd(featureIndex)
+  } else {
+0.0
+  }
+}
+val coefficientMatrix =
+  new DenseMatrix(numCoefficientSets, numFeatures, 
coefficientArray, isTransposed = true)
+
+if ($(regParam) == 0.0 && isMultinomial) {
+  /*
+When no regularization is applied, the coefficients lack 
identifiability because
+we do not use a pivot class. We can add any constant value to 
the coefficients and
+get the same likelihood. So here, we choose the mean centered 
coefficients for
+reproducibility. This method follows the approach in glmnet, 
described here:
+
+Friedman, et al. "Regularization Paths for Generalized Linear 
Models via
+  Coordinate Descent," 
https://core.ac.uk/download/files/153/6287975.pdf
+   */
+  val coefficientMean = coefficientMatrix.values.sum / 
coefficientMatrix.values.length
+  coefficientMatrix.update(_ - coefficientMean)
 }
-bcFeaturesStd.destroy(blocking = false)
 
-if ($(fitIntercept)) {
-  (Vectors.dense(rawCoefficients.dropRight(1)).compressed, 
rawCoefficients.last,
-arrayBuilder.result())
+val interceptsArray: Array[Double] = if ($(fitIntercept)) {
+  Array.tabulate(numCoefficientSets) { i =>
+val coefIndex = (i + 1) * numFeaturesPlusIntercept - 1
+rawCoefficients(coefIndex)
+  }
+} else {
+  Array[Double]()
+}
+/*
+  The intercepts are never regularized, so we always center the 
mean.
+ */
+val interceptVector = if (interceptsArray.nonEmpty && 
isMultinomial) {
+  val interceptMean = interceptsArray.sum / numClasses
+  interceptsArray.indices.foreach { i => interceptsArray(i) -= 
interceptMean }
+  Vectors.dense(interceptsArray)
+} else if (interceptsArray.length == 1) {
+  Vectors.dense(interceptsArray)
 } else {
-  (Vectors.dense(rawCoefficients).compressed, 0.0, 
arrayBuilder.result())
+  Vectors.sparse(numCoefficientSets, Seq())
 }
+(coefficientMatrix, interceptVector, arrayBuilder.result())
   }
 }
 
 if (handlePersistence) instances.unpersist()
 
-val model = copyValues(new LogisticRegressionModel(uid, coefficients, 
intercept))
-val (summaryModel, probabilityColName) = 
model.findSummaryModelAndProbabilityCol()
-val logRegSummary = new BinaryLogisticRegressionTrainingSummary(
-  summaryModel.transform(dataset),
-  probabilityColName,
-  $(labelCol),
-  $(featuresCol),
-  objectiveHistory)
--- End diff --

Change the outer

```scala
val (coefficients, intercept, objectiveHistory) = {
  .
 }
```
to
```scala
val (coefficientMatrix, interceptVector, objectiveHistory) = {
  .
 }
```
for clarity. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14980
  
**[Test build #65134 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65134/consoleFull)**
 for PR 14980 at commit 
[`7b55255`](https://github.com/apache/spark/commit/7b552557a0fdbfbac6fa11ae578171ac42516cd6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78128374
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -460,33 +564,74 @@ class LogisticRegression @Since("1.2.0") (
as a result, no scaling is needed.
  */
 val rawCoefficients = state.x.toArray.clone()
-var i = 0
-while (i < numFeatures) {
-  rawCoefficients(i) *= { if (featuresStd(i) != 0.0) 1.0 / 
featuresStd(i) else 0.0 }
-  i += 1
+val coefficientArray = Array.tabulate(numCoefficientSets * 
numFeatures) { i =>
+  // flatIndex will loop though rawCoefficients, and skip the 
intercept terms.
+  val flatIndex = if ($(fitIntercept)) i + i / numFeatures else i
+  val featureIndex = i % numFeatures
+  if (featuresStd(featureIndex) != 0.0) {
+rawCoefficients(flatIndex) / featuresStd(featureIndex)
+  } else {
+0.0
+  }
+}
+val coefficientMatrix =
+  new DenseMatrix(numCoefficientSets, numFeatures, 
coefficientArray, isTransposed = true)
+
+if ($(regParam) == 0.0 && isMultinomial) {
+  /*
+When no regularization is applied, the coefficients lack 
identifiability because
+we do not use a pivot class. We can add any constant value to 
the coefficients and
+get the same likelihood. So here, we choose the mean centered 
coefficients for
+reproducibility. This method follows the approach in glmnet, 
described here:
+
+Friedman, et al. "Regularization Paths for Generalized Linear 
Models via
+  Coordinate Descent," 
https://core.ac.uk/download/files/153/6287975.pdf
+   */
+  val coefficientMean = coefficientMatrix.values.sum / 
coefficientMatrix.values.length
+  coefficientMatrix.update(_ - coefficientMean)
 }
-bcFeaturesStd.destroy(blocking = false)
 
-if ($(fitIntercept)) {
-  (Vectors.dense(rawCoefficients.dropRight(1)).compressed, 
rawCoefficients.last,
-arrayBuilder.result())
+val interceptsArray: Array[Double] = if ($(fitIntercept)) {
+  Array.tabulate(numCoefficientSets) { i =>
+val coefIndex = (i + 1) * numFeaturesPlusIntercept - 1
+rawCoefficients(coefIndex)
+  }
+} else {
+  Array[Double]()
+}
+/*
+  The intercepts are never regularized, so we always center the 
mean.
+ */
+val interceptVector = if (interceptsArray.nonEmpty && 
isMultinomial) {
+  val interceptMean = interceptsArray.sum / numClasses
+  interceptsArray.indices.foreach { i => interceptsArray(i) -= 
interceptMean }
+  Vectors.dense(interceptsArray)
--- End diff --

Let's have it in TODO. This will not work if one of the class is not in 
training. Since the intercept corresponding to that class will be negative 
infinity, and there is no well defined interceptMean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15021: [SPARK-17464][SparkR][ML] SparkR spark.als argument reg ...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15021
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15021: [SPARK-17464][SparkR][ML] SparkR spark.als argument reg ...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15021
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65132/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15021: [SPARK-17464][SparkR][ML] SparkR spark.als argument reg ...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15021
  
**[Test build #65132 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65132/consoleFull)**
 for PR 15021 at commit 
[`275fd85`](https://github.com/apache/spark/commit/275fd85df4367ffc7c17b1bb61494ba7d03c497f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78128102
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -452,6 +555,7 @@ class LogisticRegression @Since("1.2.0") (
   logError(msg)
   throw new SparkException(msg)
 }
+bcFeaturesStd.destroy(blocking = false)
--- End diff --

You could move it up right after while. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78127943
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -370,49 +420,102 @@ class LogisticRegression @Since("1.2.0") (
 
 val bcFeaturesStd = instances.context.broadcast(featuresStd)
 val costFun = new LogisticCostFun(instances, numClasses, 
$(fitIntercept),
-  $(standardization), bcFeaturesStd, regParamL2, multinomial = 
false, $(aggregationDepth))
+  $(standardization), bcFeaturesStd, regParamL2, multinomial = 
isMultinomial,
+  $(aggregationDepth))
 
 val optimizer = if ($(elasticNetParam) == 0.0 || $(regParam) == 
0.0) {
   new BreezeLBFGS[BDV[Double]]($(maxIter), 10, $(tol))
 } else {
   val standardizationParam = $(standardization)
   def regParamL1Fun = (index: Int) => {
 // Remove the L1 penalization on the intercept
-if (index == numFeatures) {
+val isIntercept = $(fitIntercept) && ((index + 1) % 
numFeaturesPlusIntercept == 0)
+if (isIntercept) {
   0.0
 } else {
   if (standardizationParam) {
 regParamL1
   } else {
+val featureIndex = if ($(fitIntercept)) {
+  index % numFeaturesPlusIntercept
+} else {
+  index % numFeatures
+}
 // If `standardization` is false, we still standardize the 
data
 // to improve the rate of convergence; as a result, we 
have to
 // perform this reverse standardization by penalizing each 
component
 // differently to get effectively the same objective 
function when
 // the training dataset is not standardized.
-if (featuresStd(index) != 0.0) regParamL1 / 
featuresStd(index) else 0.0
+if (featuresStd(featureIndex) != 0.0) {
+  regParamL1 / featuresStd(featureIndex)
+} else {
+  0.0
+}
   }
 }
   }
   new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, 
$(tol))
 }
 
 val initialCoefficientsWithIntercept =
-  Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
-
-if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size != numFeatures) {
-  val vecSize = optInitialModel.get.coefficients.size
-  logWarning(
-s"Initial coefficients will be ignored!! As its size $vecSize 
did not match the " +
-s"expected size $numFeatures")
+  Vectors.zeros(numCoefficientSets * numFeaturesPlusIntercept)
+
+val initialModelIsValid = optInitialModel.exists { model =>
+  val providedCoefs = model.coefficientMatrix
+  val modelValid = (providedCoefs.numRows == numCoefficientSets) &&
+(providedCoefs.numCols == numFeatures) &&
+(model.interceptVector.size == numCoefficientSets)
+  if (!modelValid) {
+logWarning(s"Initial coefficients will be ignored! Its 
dimensions " +
+  s"(${providedCoefs.numRows}, ${providedCoefs.numCols}) did 
not match the expected " +
+  s"size ($numCoefficientSets, $numFeatures)")
+  }
+  modelValid
 }
 
-if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size == numFeatures) {
-  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
-  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
-initialCoefficientsWithInterceptArray(index) = value
+if (initialModelIsValid) {
+  val initialCoefArray = initialCoefficientsWithIntercept.toArray
+  val providedCoef = optInitialModel.get.coefficientMatrix
+  providedCoef.foreachActive { (row, col, value) =>
+val flatIndex = row * numFeaturesPlusIntercept + col
+// We need to scale the coefficients since they will be 
trained in the scaled space
+initialCoefArray(flatIndex) = value * featuresStd(col)
--- End diff --

Cool. I think the original code doesn't do `value * featuresStd(col)` which 
is a bug. Thank you for finding it. Can you change `initialCoefArray` into 
something withIntercept? Also, please check if the parent initalModel has 
fitIntercept or not.


---
If your project is set up for it, you can reply to this email and have your
reply

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14980
  
**[Test build #65133 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65133/consoleFull)**
 for PR 14980 at commit 
[`1142fac`](https://github.com/apache/spark/commit/1142facf3f02ededdc57a006ea065b6014510eae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15015
  
Up to my knowledge, INFRA ticket made by @shivaram is enough for now. For 
Apache REEF, we filed the following INFRA issue like that.

https://issues.apache.org/jira/browse/INFRA-11411


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78127321
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -370,49 +420,102 @@ class LogisticRegression @Since("1.2.0") (
 
 val bcFeaturesStd = instances.context.broadcast(featuresStd)
 val costFun = new LogisticCostFun(instances, numClasses, 
$(fitIntercept),
-  $(standardization), bcFeaturesStd, regParamL2, multinomial = 
false, $(aggregationDepth))
+  $(standardization), bcFeaturesStd, regParamL2, multinomial = 
isMultinomial,
+  $(aggregationDepth))
 
 val optimizer = if ($(elasticNetParam) == 0.0 || $(regParam) == 
0.0) {
   new BreezeLBFGS[BDV[Double]]($(maxIter), 10, $(tol))
 } else {
   val standardizationParam = $(standardization)
   def regParamL1Fun = (index: Int) => {
 // Remove the L1 penalization on the intercept
-if (index == numFeatures) {
+val isIntercept = $(fitIntercept) && ((index + 1) % 
numFeaturesPlusIntercept == 0)
+if (isIntercept) {
   0.0
 } else {
   if (standardizationParam) {
 regParamL1
   } else {
+val featureIndex = if ($(fitIntercept)) {
+  index % numFeaturesPlusIntercept
+} else {
+  index % numFeatures
+}
 // If `standardization` is false, we still standardize the 
data
 // to improve the rate of convergence; as a result, we 
have to
 // perform this reverse standardization by penalizing each 
component
 // differently to get effectively the same objective 
function when
 // the training dataset is not standardized.
-if (featuresStd(index) != 0.0) regParamL1 / 
featuresStd(index) else 0.0
+if (featuresStd(featureIndex) != 0.0) {
+  regParamL1 / featuresStd(featureIndex)
+} else {
+  0.0
+}
   }
 }
   }
   new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, 
$(tol))
 }
 
 val initialCoefficientsWithIntercept =
-  Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
-
-if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size != numFeatures) {
-  val vecSize = optInitialModel.get.coefficients.size
-  logWarning(
-s"Initial coefficients will be ignored!! As its size $vecSize 
did not match the " +
-s"expected size $numFeatures")
+  Vectors.zeros(numCoefficientSets * numFeaturesPlusIntercept)
+
+val initialModelIsValid = optInitialModel.exists { model =>
+  val providedCoefs = model.coefficientMatrix
+  val modelValid = (providedCoefs.numRows == numCoefficientSets) &&
+(providedCoefs.numCols == numFeatures) &&
+(model.interceptVector.size == numCoefficientSets)
+  if (!modelValid) {
+logWarning(s"Initial coefficients will be ignored! Its 
dimensions " +
+  s"(${providedCoefs.numRows}, ${providedCoefs.numCols}) did 
not match the expected " +
+  s"size ($numCoefficientSets, $numFeatures)")
+  }
+  modelValid
 }
--- End diff --

```scala
val isValidInitialModel = optInitialModel match {
  case Some(model) => 
 val providedCoefs = model.coefficientMatrix
 if ((providedCoefs.numRows == numCoefficientSets) && 
(providedCoefs.numCols == numFeatures) &&  (model.interceptVector.size == 
numCoefficientSets) ) true
else {
   logWarning(.)
   false
}
  case None => false
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14980
  
Hmm the Jenkins error was
```
find: cannot delete `pkg/vignettes': Directory not empty
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression...

2016-09-08 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78126776
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -370,49 +420,102 @@ class LogisticRegression @Since("1.2.0") (
 
 val bcFeaturesStd = instances.context.broadcast(featuresStd)
 val costFun = new LogisticCostFun(instances, numClasses, 
$(fitIntercept),
-  $(standardization), bcFeaturesStd, regParamL2, multinomial = 
false, $(aggregationDepth))
+  $(standardization), bcFeaturesStd, regParamL2, multinomial = 
isMultinomial,
+  $(aggregationDepth))
 
 val optimizer = if ($(elasticNetParam) == 0.0 || $(regParam) == 
0.0) {
   new BreezeLBFGS[BDV[Double]]($(maxIter), 10, $(tol))
 } else {
   val standardizationParam = $(standardization)
   def regParamL1Fun = (index: Int) => {
 // Remove the L1 penalization on the intercept
-if (index == numFeatures) {
+val isIntercept = $(fitIntercept) && ((index + 1) % 
numFeaturesPlusIntercept == 0)
+if (isIntercept) {
   0.0
 } else {
   if (standardizationParam) {
 regParamL1
   } else {
+val featureIndex = if ($(fitIntercept)) {
+  index % numFeaturesPlusIntercept
+} else {
+  index % numFeatures
+}
 // If `standardization` is false, we still standardize the 
data
 // to improve the rate of convergence; as a result, we 
have to
 // perform this reverse standardization by penalizing each 
component
 // differently to get effectively the same objective 
function when
 // the training dataset is not standardized.
-if (featuresStd(index) != 0.0) regParamL1 / 
featuresStd(index) else 0.0
+if (featuresStd(featureIndex) != 0.0) {
+  regParamL1 / featuresStd(featureIndex)
+} else {
+  0.0
+}
   }
 }
   }
   new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, 
$(tol))
 }
 
 val initialCoefficientsWithIntercept =
-  Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
-
-if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size != numFeatures) {
-  val vecSize = optInitialModel.get.coefficients.size
-  logWarning(
-s"Initial coefficients will be ignored!! As its size $vecSize 
did not match the " +
-s"expected size $numFeatures")
+  Vectors.zeros(numCoefficientSets * numFeaturesPlusIntercept)
+
+val initialModelIsValid = optInitialModel.exists { model =>
+  val providedCoefs = model.coefficientMatrix
+  val modelValid = (providedCoefs.numRows == numCoefficientSets) &&
--- End diff --

isValidModel?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14719: [SPARK-17154][SQL] Wrong result can be returned o...

2016-09-08 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/14719#discussion_r78126865
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -683,8 +710,14 @@ class Analyzer(
 try {
   expr transformUp {
 case GetColumnByOrdinal(ordinal, _) => plan.output(ordinal)
-case u @ UnresolvedAttribute(nameParts) =>
-  withPosition(u) { plan.resolve(nameParts, resolver).getOrElse(u) 
}
+case u @ UnresolvedAttribute(nameParts, targetPlanIdOpt) =>
+  withPosition(u) {
+targetPlanIdOpt match {
+  case Some(targetPlanId) =>
+resolveExpressionFromSpecificLogicalPlan(nameParts, plan, 
targetPlanId)
--- End diff --

Thank you for taking a look.
If resolved attributes are not in output of a child-logical-plan even 
though those are in output of a sub-tree, `CheckAnalysys` verifies and raise 
`AnalysysException` as well as before applying this patch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15020: Spark 2.0 error in Intellij

2016-09-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15020
  
Maybe you'd better close this PR and ask this to user-mailing list. I think 
we can have a better answer. Please refer 
http://spark.apache.org/community.html to subscribe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15021: [SPARK-17464][SparkR][ML] SparkR spark.als argument reg ...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15021
  
**[Test build #65132 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65132/consoleFull)**
 for PR 15021 at commit 
[`275fd85`](https://github.com/apache/spark/commit/275fd85df4367ffc7c17b1bb61494ba7d03c497f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15021: [SPARK-17464][SparkR][ML] SparkR spark.als argume...

2016-09-08 Thread yanboliang

GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/15021

[SPARK-17464][SparkR][ML] SparkR spark.als arguments reg should be 0.1 by 
default.

## What changes were proposed in this pull request?
SparkR ```spark.als``` arguments ```reg``` should be 0.1 by default, which 
need to be consistent with ML.

## How was this patch tested?
Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-17464

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15021.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15021


commit 275fd85df4367ffc7c17b1bb61494ba7d03c497f
Author: Yanbo Liang 
Date:   2016-09-09T04:08:34Z

SparkR spark.als arguments reg should be 0.1 by default.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14980
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14980
  
**[Test build #65130 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65130/consoleFull)**
 for PR 14980 at commit 
[`adabb2d`](https://github.com/apache/spark/commit/adabb2d6b4c8359b02bbfeacd36b6793c354274b).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14980
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65130/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15015
  
@dongjoon-hyun Meanwhile, do you mind if I ask whether you have any idea on 
this maybe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread spark-test-client

Github user spark-test-client commented on the issue:

https://github.com/apache/spark/pull/15015
  
@dongjoon-hyun Meanwhile, do you mind if I ask whether you have any idea on 
this maybe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14737: [SPARK-17171][WEB UI] DAG will list all partitions in th...

2016-09-08 Thread cenyuhai

Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14737
  
@srowen If it is okï¼can you merge this pr to masterï¼thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown

2016-09-08 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14912
  
@srinathshankar @gatorsmile I think CNF is another issue other then the 
issue this PR was proposed to solve at the first. I would like to solve the 
original adjoining Filter pushdown problem here. And leave CNF issue (it is not 
trivial and I don't expect it will be solved soon) for later PRs.

What do you think? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread junyangq

Github user junyangq commented on the issue:

https://github.com/apache/spark/pull/14980
  
@felixcheung we still need to deal with the leftover files in that case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #65131 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65131/consoleFull)**
 for PR 14452 at commit 
[`bc70354`](https://github.com/apache/spark/commit/bc70354fefeb2ff2cac57d869b6d342230859fd3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15015
  
I will check this out as far as I can and be back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14980
  
**[Test build #65130 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65130/consoleFull)**
 for PR 14980 at commit 
[`adabb2d`](https://github.com/apache/spark/commit/adabb2d6b4c8359b02bbfeacd36b6793c354274b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15015
  
Created https://issues.apache.org/jira/browse/INFRA-12590


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14750: [SPARK-17183][SQL] put hive serde table schema to...

2016-09-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14750#discussion_r78124662
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -169,7 +169,10 @@ case class InsertIntoHiveTable(
 
 // All partition column names in the format of "//..."
 val partitionColumns = 
fileSinkConf.getTableInfo.getProperties.getProperty("partition_columns")
-val partitionColumnNames = 
Option(partitionColumns).map(_.split("/")).getOrElse(Array.empty)
+// As the keys of partition spec `partition` is always lowercase, we 
should also lowercase the
+// partition column names of the table here.
+val partitionColumnNames =
+  
Option(partitionColumns).map(_.split("/").map(_.toLowerCase)).getOrElse(Array.empty)
--- End diff --

After more investigation, 
[`fileSinkConf.getTableInfo`](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala#L171)
 is 
[`table.tableDesc`](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala#L145).
 That means, we do not need to convert it to lower case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15015
  
I think we need to file an INFRA ticket like 
https://issues.apache.org/jira/browse/INFRA-11294 -- I'll file one now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15015
  
@HyukjinKwon I thought it should have run the tests ? Any ideas why its not 
getting picked up ? Is there some other set of steps that we need to do ? (I'm 
now investigating how it works for Apache Thrift)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65129/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13571
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #65129 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65129/consoleFull)**
 for PR 11105 at commit 
[`2a8d2b2`](https://github.com/apache/spark/commit/2a8d2b2e250d97cc563a170161e9a2bfc0c45864).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13571
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65128/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13571
  
**[Test build #65128 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65128/consoleFull)**
 for PR 13571 at commit 
[`fbe4549`](https://github.com/apache/spark/commit/fbe4549e82150154c835f01db7d3b56e33671b93).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15015
  
@keypointt let's have a check for the first few values perhaps?
@HyukjinKwon It seems it would be good to start running R tests though I'm 
not sure actually - I'd defer to @shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14995: [Test Only][SPARK-6235][CORE]Address various 2G limits

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14995
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65126/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14995: [Test Only][SPARK-6235][CORE]Address various 2G limits

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14995
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14995: [Test Only][SPARK-6235][CORE]Address various 2G limits

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14995
  
**[Test build #65126 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65126/consoleFull)**
 for PR 14995 at commit 
[`a8f89d4`](https://github.com/apache/spark/commit/a8f89d46816133119187828f35ea38dea7d256ae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15020: Spark 2.0 error in Intellij

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15020
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15020: Spark 2.0 error in Intellij

2016-09-08 Thread bigdatatraining

GitHub user bigdatatraining opened a pull request:

https://github.com/apache/spark/pull/15020

Spark 2.0 error in Intellij


If i run twitter code in Console  it's working fine, but if i run same 
command in Spark 2.0 in Intellij I got this error

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/spark/Logging

Not only this problem most of the programs getting same errors please let 
me know why?
import org.apache.spark.Logging
Its not available in spark 2.0 How to resolve this issues?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15020.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15020


commit fb944a1e85a4d0e618cf7485afb0d0b39367fbda
Author: Tom Graves 
Date:   2016-07-22T11:41:38Z

[SPARK-16650] Improve documentation of spark.task.maxFailures

Clarify documentation on spark.task.maxFailures

No tests run as its documentation

Author: Tom Graves 

Closes #14287 from tgravescs/SPARK-16650.

(cherry picked from commit 6c56fff118ff2380c661456755db17976040de66)
Signed-off-by: Sean Owen 

commit 28bb2b0447e9b47c4c568de983adde4a49b29263
Author: Dongjoon Hyun 
Date:   2016-07-22T12:20:06Z

[SPARK-16651][PYSPARK][DOC] Make `withColumnRenamed/drop` description more 
consistent with Scala API

## What changes were proposed in this pull request?

`withColumnRenamed` and `drop` is a no-op if the given column name does not 
exists. Python documentation also describe that, but this PR adds more explicit 
line consistently with Scala to reduce the ambiguity.

## How was this patch tested?

It's about docs.

Author: Dongjoon Hyun 

Closes #14288 from dongjoon-hyun/SPARK-16651.

(cherry picked from commit 47f5b88db4d65f1870b16745d3c93d01051ba20b)
Signed-off-by: Sean Owen 

commit da34e8e8faaf7239f6dfe248812c83e1b2e2c1fd
Author: Cheng Lian 
Date:   2016-07-23T18:41:24Z

[SPARK-16380][EXAMPLES] Update SQL examples and programming guide for 
Python language binding

This PR is based on PR #14098 authored by wangmiao1981.

## What changes were proposed in this pull request?

This PR replaces the original Python Spark SQL example file with the 
following three files:

- `sql/basic.py`

  Demonstrates basic Spark SQL features.

- `sql/datasource.py`

  Demonstrates various Spark SQL data sources.

- `sql/hive.py`

  Demonstrates Spark SQL Hive interaction.

This PR also removes hard-coded Python example snippets in the SQL 
programming guide by extracting snippets from the above files using the 
`include_example` Liquid template tag.

## How was this patch tested?

Manually tested.

Author: wm...@hotmail.com 
Author: Cheng Lian 

Closes #14317 from liancheng/py-examples-update.

(cherry picked from commit 53b2456d1de38b9d4f18509e7b36eb3fbe09e050)
Signed-off-by: Reynold Xin 

commit 31c3bcb46cb56b57d3cdcb8c42e7056dab0f7601
Author: Wenchen Fan 
Date:   2016-07-23T18:39:48Z

[SPARK-16690][TEST] rename SQLTestUtils.withTempTable to withTempView

after https://github.com/apache/spark/pull/12945, we renamed the 
`registerTempTable` to `createTempView`, as we do create a view actually. This 
PR renames `SQLTestUtils.withTempTable` to reflect this change.

N/A

Author: Wenchen Fan 

Closes #14318 from cloud-fan/minor4.

(cherry picked from commit 86c275206605c44e1ebca2f166d62868e44bf029)
Signed-off-by: Reynold Xin 

commit 198b0426e07f3d4b1fbbef21d39daa32a75da36c
Author: Liwei Lin 
Date:   2016-07-24T07:35:57Z

[SPARK-16515][SQL][FOLLOW-UP] Fix test `script` on OS X/Windows...

The current `sed` in `test_script.sh` is missing a `$`, leading to the 
failure of `script` test on OS X:
```
== Results ==
!== Correct Answer - 2 ==   == Spark Answer - 2 ==
![x1_y1][x1]
![x2_y2][x2]
```

In addition, this `script` test would also fail on systems like Windows 
where we couldn't be able to invoke `bash` or `echo | sed`.

This patch
- fixes `sed` in `test_script.sh`
- adds command guards so that the `script` test would pass on systems like 
Windows

- Jenkins
- Manually verified tests pass on OS

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14702
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65127/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14702
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14702
  
**[Test build #65127 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65127/consoleFull)**
 for PR 14702 at commit 
[`f5256dd`](https://github.com/apache/spark/commit/f5256dd6f4a3ca743aac6e7351baa3613523b963).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-08 Thread JustinPihony

Github user JustinPihony commented on the issue:

https://github.com/apache/spark/pull/12601
  
Can this be merged now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events

2016-09-08 Thread cenyuhai

Github user cenyuhai commented on the issue:

https://github.com/apache/spark/pull/14969
  
@srowen I remove parallel maps, please review the latest codes.Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15013: [SPARK-17451] [CORE] CoarseGrainedExecutorBackend should...

2016-09-08 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/15013
  
Done with all change. Ready for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15013: [SPARK-17451] [CORE] CoarseGrainedExecutorBackend should...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15013
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15013: [SPARK-17451] [CORE] CoarseGrainedExecutorBackend should...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15013
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65124/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15013: [SPARK-17451] [CORE] CoarseGrainedExecutorBackend should...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15013
  
**[Test build #65124 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65124/consoleFull)**
 for PR 15013 at commit 
[`71fa2e3`](https://github.com/apache/spark/commit/71fa2e32bba298920f974c652b24983306068b09).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14750: [SPARK-17183][SQL] put hive serde table schema to...

2016-09-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14750#discussion_r78120149
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -169,7 +169,10 @@ case class InsertIntoHiveTable(
 
 // All partition column names in the format of "//..."
 val partitionColumns = 
fileSinkConf.getTableInfo.getProperties.getProperty("partition_columns")
-val partitionColumnNames = 
Option(partitionColumns).map(_.split("/")).getOrElse(Array.empty)
+// As the keys of partition spec `partition` is always lowercase, we 
should also lowercase the
+// partition column names of the table here.
+val partitionColumnNames =
+  
Option(partitionColumns).map(_.split("/").map(_.toLowerCase)).getOrElse(Array.empty)
--- End diff --

It sounds like this is from Hive side. Thus, it should be always Lower 
Case? Without these changes, I ran the build. It sounds like it works well.

Can we use ExternalCatalog to get the partition info? I can make a try to 
see if it still works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...

2016-09-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14678
  
@cloud-fan Do you mind if I ask to take a look please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...

2016-09-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14660
  
Gentle ping @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14399: [SPARK-16777][SQL] Do not use deprecated listType API in...

2016-09-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14399
  
Gentle ping @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14172: [SPARK-16516][SQL] Support for pushing down filters for ...

2016-09-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14172
  
Could you take a look please @yhuai and @liancheng ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15006: [SPARK-17364][SQL] Antlr lexer wrongly treats full quali...

2016-09-08 Thread clockfly

Github user clockfly commented on the issue:

https://github.com/apache/spark/pull/15006
  
@hvanhovell 

How do you think the case @dilipbiswal posted?

Currently, there is a semantic mismatch between Spark and Postgres.
In Spark 1.6/2.0, `SELECT 123X` is interpreted as ```SELECT `123X` ```. In 
Postgres, `SELECT 123X` is interpreted as `SELECT 123 AS X`.

I think our current way of handling `123X` makes more sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #65129 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65129/consoleFull)**
 for PR 11105 at commit 
[`2a8d2b2`](https://github.com/apache/spark/commit/2a8d2b2e250d97cc563a170161e9a2bfc0c45864).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13571
  
**[Test build #65128 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65128/consoleFull)**
 for PR 13571 at commit 
[`fbe4549`](https://github.com/apache/spark/commit/fbe4549e82150154c835f01db7d3b56e33671b93).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...

2016-09-08 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14750
  
In `HiveExternalCatalog`, `client.getTable` does not call 
`restoreTableMetadata`. I found you replaced multiple `getTable` calls by 
`client.getTable`. The only remaining call of `getTable` is in 
[requireTableExists](https://github.com/cloud-fan/spark/blob/eb3bba416beedcff5a7f050415f79390a5b0b21f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L95-L97).
 Maybe we should replace it too?

In addition, the other code readers might not notice this difference. How 
about creating a private wrapper function `getRawTable`?
```
  // 
  private def getRawTable(db: String, table: String): CatalogTable = 
withClient {
withClient { client.getTable(db, table) }
  }
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-09-08 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/14702#discussion_r78115155
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala
 ---
@@ -0,0 +1,313 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.script
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema
+import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode}
+import org.apache.spark.sql.types.{DataType, StructType}
+import org.apache.spark.util.{CircularBuffer, RedirectThread, 
SerializableConfiguration, Utils}
+
+/**
+ * Transforms the input by forking and running the specified script.
+ *
+ * @param input the set of expression that should be passed to the script.
+ * @param script the command that should be executed.
+ * @param output the attributes that are produced by the script.
+ */
+private[sql]
+case class ScriptTransformationExec(
+input: Seq[Expression],
+script: String,
+output: Seq[Attribute],
+child: SparkPlan,
+ioschema: ScriptTransformIOSchema)
+  extends UnaryExecNode with ScriptTransformBase {
+
+  override def producedAttributes: AttributeSet = outputSet -- inputSet
+
+  protected override def doExecute(): RDD[InternalRow] =
+execute(sqlContext, child, schema)
+
+  override def processIterator(
+  inputIterator: Iterator[InternalRow],
+  hadoopConf: Configuration) : Iterator[InternalRow] = {
+
+val (proc, inputStream, outputStream, stderrBuffer, outputProjection) =
+  init(input, script, child)
+
+// This new thread will consume the ScriptTransformation's input rows 
and write them to the
+// external process. That process's output will be read by this 
current thread.
+val writerThread = new ScriptTransformationWriterThread(
+  inputIterator,
+  input.map(_.dataType),
+  outputProjection,
+  ioschema,
+  outputStream,
+  proc,
+  stderrBuffer,
+  TaskContext.get(),
+  hadoopConf
+)
+
+val reader = createReader(inputStream)
+
+val outputIterator: Iterator[InternalRow] = new Iterator[InternalRow] {
+  var curLine: String = null
+  val mutableRow = new SpecificMutableRow(output.map(_.dataType))
+
+  override def hasNext: Boolean = {
+try {
+  if (curLine == null) {
+curLine = reader.readLine()
+if (curLine == null) {
+  checkFailureAndPropagate(writerThread.exception, null, proc, 
stderrBuffer)
+  return false
+}
+  }
+  true
+} catch {
+  case NonFatal(e) =>
+// If this exception is due to abrupt / unclean termination of 
`proc`,
+// then detect it and propagate a better exception message for 
end users
+checkFailureAndPropagate(writerThread.exception, e, proc, 
stderrBuffer)
+
+throw e
+}
+  }
+
+  override def next(): InternalRow = {
+if (!hasNext) {
+  throw new NoSuchElementException
+}
+val prevLine = curLine
+curLine = reader.readLine()
+if (!ioschema.isSchemaLess) {
+  new GenericInternalRow(
+

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14702
  
**[Test build #65127 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65127/consoleFull)**
 for PR 14702 at commit 
[`f5256dd`](https://github.com/apache/spark/commit/f5256dd6f4a3ca743aac6e7351baa3613523b963).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-09-08 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/14702#discussion_r78115022
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala
 ---
@@ -0,0 +1,313 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.script
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema
+import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode}
+import org.apache.spark.sql.types.{DataType, StructType}
+import org.apache.spark.util.{CircularBuffer, RedirectThread, 
SerializableConfiguration, Utils}
+
+/**
+ * Transforms the input by forking and running the specified script.
+ *
+ * @param input the set of expression that should be passed to the script.
+ * @param script the command that should be executed.
+ * @param output the attributes that are produced by the script.
+ */
+private[sql]
+case class ScriptTransformationExec(
+input: Seq[Expression],
+script: String,
+output: Seq[Attribute],
+child: SparkPlan,
+ioschema: ScriptTransformIOSchema)
+  extends UnaryExecNode with ScriptTransformBase {
+
+  override def producedAttributes: AttributeSet = outputSet -- inputSet
+
+  protected override def doExecute(): RDD[InternalRow] =
+execute(sqlContext, child, schema)
+
+  override def processIterator(
+  inputIterator: Iterator[InternalRow],
+  hadoopConf: Configuration) : Iterator[InternalRow] = {
+
+val (proc, inputStream, outputStream, stderrBuffer, outputProjection) =
+  init(input, script, child)
+
+// This new thread will consume the ScriptTransformation's input rows 
and write them to the
+// external process. That process's output will be read by this 
current thread.
+val writerThread = new ScriptTransformationWriterThread(
+  inputIterator,
+  input.map(_.dataType),
+  outputProjection,
+  ioschema,
+  outputStream,
+  proc,
+  stderrBuffer,
+  TaskContext.get(),
+  hadoopConf
+)
+
+val reader = createReader(inputStream)
+
+val outputIterator: Iterator[InternalRow] = new Iterator[InternalRow] {
+  var curLine: String = null
+  val mutableRow = new SpecificMutableRow(output.map(_.dataType))
+
+  override def hasNext: Boolean = {
+try {
+  if (curLine == null) {
+curLine = reader.readLine()
+if (curLine == null) {
+  checkFailureAndPropagate(writerThread.exception, null, proc, 
stderrBuffer)
+  return false
+}
+  }
+  true
+} catch {
+  case NonFatal(e) =>
+// If this exception is due to abrupt / unclean termination of 
`proc`,
+// then detect it and propagate a better exception message for 
end users
+checkFailureAndPropagate(writerThread.exception, e, proc, 
stderrBuffer)
+
+throw e
+}
+  }
+
+  override def next(): InternalRow = {
+if (!hasNext) {
+  throw new NoSuchElementException
+}
+val prevLine = curLine
+curLine = reader.readLine()
+if (!ioschema.isSchemaLess) {
+  new GenericInternalRow(
+

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-09-08 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/14702#discussion_r78115006
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala
 ---
@@ -0,0 +1,313 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.script
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema
+import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode}
+import org.apache.spark.sql.types.{DataType, StructType}
+import org.apache.spark.util.{CircularBuffer, RedirectThread, 
SerializableConfiguration, Utils}
+
+/**
+ * Transforms the input by forking and running the specified script.
+ *
+ * @param input the set of expression that should be passed to the script.
+ * @param script the command that should be executed.
+ * @param output the attributes that are produced by the script.
+ */
+private[sql]
+case class ScriptTransformationExec(
+input: Seq[Expression],
+script: String,
+output: Seq[Attribute],
+child: SparkPlan,
+ioschema: ScriptTransformIOSchema)
+  extends UnaryExecNode with ScriptTransformBase {
+
+  override def producedAttributes: AttributeSet = outputSet -- inputSet
+
+  protected override def doExecute(): RDD[InternalRow] =
+execute(sqlContext, child, schema)
+
+  override def processIterator(
+  inputIterator: Iterator[InternalRow],
+  hadoopConf: Configuration) : Iterator[InternalRow] = {
+
+val (proc, inputStream, outputStream, stderrBuffer, outputProjection) =
+  init(input, script, child)
+
+// This new thread will consume the ScriptTransformation's input rows 
and write them to the
+// external process. That process's output will be read by this 
current thread.
+val writerThread = new ScriptTransformationWriterThread(
+  inputIterator,
+  input.map(_.dataType),
+  outputProjection,
+  ioschema,
+  outputStream,
+  proc,
+  stderrBuffer,
+  TaskContext.get(),
+  hadoopConf
+)
+
+val reader = createReader(inputStream)
+
+val outputIterator: Iterator[InternalRow] = new Iterator[InternalRow] {
+  var curLine: String = null
+  val mutableRow = new SpecificMutableRow(output.map(_.dataType))
+
+  override def hasNext: Boolean = {
+try {
+  if (curLine == null) {
+curLine = reader.readLine()
+if (curLine == null) {
+  checkFailureAndPropagate(writerThread.exception, null, proc, 
stderrBuffer)
+  return false
+}
+  }
+  true
+} catch {
+  case NonFatal(e) =>
+// If this exception is due to abrupt / unclean termination of 
`proc`,
+// then detect it and propagate a better exception message for 
end users
+checkFailureAndPropagate(writerThread.exception, e, proc, 
stderrBuffer)
+
+throw e
+}
+  }
+
+  override def next(): InternalRow = {
+if (!hasNext) {
+  throw new NoSuchElementException
+}
+val prevLine = curLine
+curLine = reader.readLine()
+if (!ioschema.isSchemaLess) {
+  new GenericInternalRow(
+

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-09-08 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/14702#discussion_r78114829
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala
 ---
@@ -0,0 +1,313 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.script
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema
+import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode}
+import org.apache.spark.sql.types.{DataType, StructType}
+import org.apache.spark.util.{CircularBuffer, RedirectThread, 
SerializableConfiguration, Utils}
+
+/**
+ * Transforms the input by forking and running the specified script.
+ *
+ * @param input the set of expression that should be passed to the script.
+ * @param script the command that should be executed.
+ * @param output the attributes that are produced by the script.
+ */
+private[sql]
+case class ScriptTransformationExec(
+input: Seq[Expression],
+script: String,
+output: Seq[Attribute],
+child: SparkPlan,
+ioschema: ScriptTransformIOSchema)
+  extends UnaryExecNode with ScriptTransformBase {
+
+  override def producedAttributes: AttributeSet = outputSet -- inputSet
+
+  protected override def doExecute(): RDD[InternalRow] =
+execute(sqlContext, child, schema)
+
+  override def processIterator(
+  inputIterator: Iterator[InternalRow],
+  hadoopConf: Configuration) : Iterator[InternalRow] = {
+
+val (proc, inputStream, outputStream, stderrBuffer, outputProjection) =
+  init(input, script, child)
+
+// This new thread will consume the ScriptTransformation's input rows 
and write them to the
+// external process. That process's output will be read by this 
current thread.
+val writerThread = new ScriptTransformationWriterThread(
+  inputIterator,
+  input.map(_.dataType),
+  outputProjection,
+  ioschema,
+  outputStream,
+  proc,
+  stderrBuffer,
+  TaskContext.get(),
+  hadoopConf
+)
+
+val reader = createReader(inputStream)
+
+val outputIterator: Iterator[InternalRow] = new Iterator[InternalRow] {
+  var curLine: String = null
+  val mutableRow = new SpecificMutableRow(output.map(_.dataType))
+
+  override def hasNext: Boolean = {
+try {
+  if (curLine == null) {
+curLine = reader.readLine()
+if (curLine == null) {
+  checkFailureAndPropagate(writerThread.exception, null, proc, 
stderrBuffer)
+  return false
+}
+  }
+  true
+} catch {
+  case NonFatal(e) =>
+// If this exception is due to abrupt / unclean termination of 
`proc`,
+// then detect it and propagate a better exception message for 
end users
+checkFailureAndPropagate(writerThread.exception, e, proc, 
stderrBuffer)
+
+throw e
+}
+  }
+
+  override def next(): InternalRow = {
+if (!hasNext) {
+  throw new NoSuchElementException
+}
+val prevLine = curLine
+curLine = reader.readLine()
+if (!ioschema.isSchemaLess) {
+  new GenericInternalRow(
+

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-09-08 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/14702#discussion_r78114838
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala
 ---
@@ -0,0 +1,313 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.script
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema
+import org.apache.spark.sql.execution.{SparkPlan, UnaryExecNode}
+import org.apache.spark.sql.types.{DataType, StructType}
+import org.apache.spark.util.{CircularBuffer, RedirectThread, 
SerializableConfiguration, Utils}
+
+/**
+ * Transforms the input by forking and running the specified script.
+ *
+ * @param input the set of expression that should be passed to the script.
+ * @param script the command that should be executed.
+ * @param output the attributes that are produced by the script.
+ */
+private[sql]
+case class ScriptTransformationExec(
+input: Seq[Expression],
+script: String,
+output: Seq[Attribute],
+child: SparkPlan,
+ioschema: ScriptTransformIOSchema)
+  extends UnaryExecNode with ScriptTransformBase {
+
+  override def producedAttributes: AttributeSet = outputSet -- inputSet
+
+  protected override def doExecute(): RDD[InternalRow] =
+execute(sqlContext, child, schema)
+
+  override def processIterator(
+  inputIterator: Iterator[InternalRow],
+  hadoopConf: Configuration) : Iterator[InternalRow] = {
+
+val (proc, inputStream, outputStream, stderrBuffer, outputProjection) =
+  init(input, script, child)
+
+// This new thread will consume the ScriptTransformation's input rows 
and write them to the
+// external process. That process's output will be read by this 
current thread.
+val writerThread = new ScriptTransformationWriterThread(
+  inputIterator,
+  input.map(_.dataType),
+  outputProjection,
+  ioschema,
+  outputStream,
+  proc,
+  stderrBuffer,
+  TaskContext.get(),
+  hadoopConf
+)
+
+val reader = createReader(inputStream)
+
+val outputIterator: Iterator[InternalRow] = new Iterator[InternalRow] {
+  var curLine: String = null
+  val mutableRow = new SpecificMutableRow(output.map(_.dataType))
+
+  override def hasNext: Boolean = {
+try {
+  if (curLine == null) {
+curLine = reader.readLine()
+if (curLine == null) {
+  checkFailureAndPropagate(writerThread.exception, null, proc, 
stderrBuffer)
+  return false
+}
+  }
+  true
+} catch {
+  case NonFatal(e) =>
+// If this exception is due to abrupt / unclean termination of 
`proc`,
+// then detect it and propagate a better exception message for 
end users
+checkFailureAndPropagate(writerThread.exception, e, proc, 
stderrBuffer)
+
+throw e
+}
+  }
+
+  override def next(): InternalRow = {
+if (!hasNext) {
+  throw new NoSuchElementException
+}
+val prevLine = curLine
+curLine = reader.readLine()
+if (!ioschema.isSchemaLess) {
+  new GenericInternalRow(
+

[GitHub] spark issue #14995: [Test Only][SPARK-6235][CORE]Address various 2G limits

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14995
  
**[Test build #65126 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65126/consoleFull)**
 for PR 14995 at commit 
[`a8f89d4`](https://github.com/apache/spark/commit/a8f89d46816133119187828f35ea38dea7d256ae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15019: [SPARK-17387][PYSPARK] Allow passing of args to gateway ...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15019
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15019: [SPARK-17387][PYSPARK] Allow passing of args to gateway ...

2016-09-08 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15019
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65125/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15019: [SPARK-17387][PYSPARK] Allow passing of args to gateway ...

2016-09-08 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15019
  
**[Test build #65125 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65125/consoleFull)**
 for PR 15019 at commit 
[`100e442`](https://github.com/apache/spark/commit/100e442983bd5f546a8dad149543a26be90ee622).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14995: [Test Only][SPARK-6235][CORE]Address various 2G limits

2016-09-08 Thread witgo

Github user witgo commented on the issue:

https://github.com/apache/spark/pull/14995
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14750: [SPARK-17183][SQL] put hive serde table schema to...

2016-09-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14750#discussion_r78114385
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -446,29 +449,46 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   table
 } else {
   getProviderFromTableProperties(table).map { provider =>
-assert(provider != "hive", "Hive serde table should not save 
provider in table properties.")
-// SPARK-15269: Persisted data source tables always store the 
location URI as a storage
-// property named "path" instead of standard Hive `dataLocation`, 
because Hive only
-// allows directory paths as location URIs while Spark SQL data 
source tables also
-// allows file paths. So the standard Hive `dataLocation` is 
meaningless for Spark SQL
-// data source tables.
-// Spark SQL may also save external data source in Hive compatible 
format when
-// possible, so that these tables can be directly accessed by 
Hive. For these tables,
-// `dataLocation` is still necessary. Here we also check for input 
format because only
-// these Hive compatible tables set this field.
-val storage = if (table.tableType == EXTERNAL && 
table.storage.inputFormat.isEmpty) {
-  table.storage.copy(locationUri = None)
+if (provider == "hive") {
+  val schemaFromTableProps = getSchemaFromTableProperties(table)
+  if 
(DataType.equalsIgnoreCaseAndNullability(schemaFromTableProps, table.schema)) {
--- End diff --

Yeah, but Hive allows users to change physical storage by issuing `ALTER 
TABLE` DDL. 
```
ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY 
(col_name, ...)]
Â Â INTO num_buckets BUCKETS;
```

That means, even if they are part of schema, but the `bucketSpect` could be 
different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...

2016-09-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15015
  
@felixcheung I think this is a irrelevant comment but is there any issue 
for enabling Windows test for now? It seems it should have ran a test for this 
PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14750: [SPARK-17183][SQL] put hive serde table schema to...

2016-09-08 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14750#discussion_r78113922
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -446,29 +449,46 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   table
 } else {
   getProviderFromTableProperties(table).map { provider =>
-assert(provider != "hive", "Hive serde table should not save 
provider in table properties.")
-// SPARK-15269: Persisted data source tables always store the 
location URI as a storage
-// property named "path" instead of standard Hive `dataLocation`, 
because Hive only
-// allows directory paths as location URIs while Spark SQL data 
source tables also
-// allows file paths. So the standard Hive `dataLocation` is 
meaningless for Spark SQL
-// data source tables.
-// Spark SQL may also save external data source in Hive compatible 
format when
-// possible, so that these tables can be directly accessed by 
Hive. For these tables,
-// `dataLocation` is still necessary. Here we also check for input 
format because only
-// these Hive compatible tables set this field.
-val storage = if (table.tableType == EXTERNAL && 
table.storage.inputFormat.isEmpty) {
-  table.storage.copy(locationUri = None)
+if (provider == "hive") {
+  val schemaFromTableProps = getSchemaFromTableProperties(table)
+  if 
(DataType.equalsIgnoreCaseAndNullability(schemaFromTableProps, table.schema)) {
--- End diff --

> Schema includes partitioning columns, but it does not include the info of 
bucketSpec

hmmm? don't bucket columns and sort columns must be part of table schema?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...

2016-09-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14474
  
@clockfly can you create a new PR against 2.0? thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15017: [SPARK-17456][CORE] Utility for parsing Spark versions

2016-09-08 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15017
  
Sounds good :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14995: [Test Only][not ready for review][SPARK-6235][CORE]Addre...

2016-09-08 Thread witgo

Github user witgo commented on the issue:

https://github.com/apache/spark/pull/14995
  
retest please. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15017: [SPARK-17456][CORE] Utility for parsing Spark versions

2016-09-08 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15017
  
I like splitting things into separate PRs, but I'll go ahead and make a 
follow-up task to check for places which can be fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15019: [SPARK-17387][PYSPARK] Allow passing of args to gateway ...

2016-09-08 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15019
  
Ok, I'll close this off then


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15019: [SPARK-17387][PYSPARK] Allow passing of args to g...

2016-09-08 Thread BryanCutler

Github user BryanCutler closed the pull request at:

https://github.com/apache/spark/pull/15019


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 443 matches

Mail list logo