[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/8880
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14676
  
**[Test build #63901 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63901/consoleFull)**
 for PR 14676 at commit 
[`4723902`](https://github.com/apache/spark/commit/47239020ac7008d8630a5c810df3db29a77795cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/8880
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63894/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/8880
  
**[Test build #63894 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63894/consoleFull)**
 for PR 8880 at commit 
[`77122bb`](https://github.com/apache/spark/commit/77122bb3662c65ffa5596d740efab41f5dfc3a0f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14678: [MINOR][SQL] Add missing functions for some optio...

2016-08-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14678#discussion_r75065275
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -322,11 +322,6 @@ object SQLConf {
   .intConf
   .createWithDefault(4000)
 
-  val PARTITION_DISCOVERY_ENABLED = 
SQLConfigBuilder("spark.sql.sources.partitionDiscovery.enabled")
--- End diff --

It seems we always enables this and this option is not referenced anywhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14670: [SPARK-15285][SQL] Generated SpecificSafeProjection.appl...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14670
  
**[Test build #63900 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63900/consoleFull)**
 for PR 14670 at commit 
[`86258eb`](https://github.com/apache/spark/commit/86258eb3ca13284ad5593a105b2ddc6d0cde58e4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...

2016-08-16 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/14676#discussion_r75065140
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/inline-table.sql ---
@@ -0,0 +1,39 @@
+
+-- single row, without table and column alias
+select * from values ("one", 1);
+
+-- single row, without column alias
+select * from values ("one", 1) as data;
+
+-- single row
+select * from values ("one", 1) as data(a, b);
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14660
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...

2016-08-16 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14155
  
I have took a close look at `HiveExternalCatalog`. My overall feeling is 
that the current version still not very clear and people may have a hard time 
to understand the code. Let me also think about it and see how to improve the 
code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14660
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63895/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14660
  
**[Test build #63895 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63895/consoleFull)**
 for PR 14660 at commit 
[`0130c39`](https://github.com/apache/spark/commit/0130c39450c7bcecfb3a14db1e581c3f3a9f6a20).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14678
  
**[Test build #63899 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63899/consoleFull)**
 for PR 14678 at commit 
[`a57dd5e`](https://github.com/apache/spark/commit/a57dd5ebaef470dbc311e31e35b2f457321b7a9f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75064724
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -81,6 +86,18 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 withClient { getTable(db, table) }
   }
 
+  /**
+   * If the given table properties contains datasource properties, throw 
an exception.
+   */
+  private def verifyTableProperties(table: CatalogTable): Unit = {
+val datasourceKeys = 
table.properties.keys.filter(_.startsWith(DATASOURCE_PREFIX))
+if (datasourceKeys.nonEmpty) {
+  throw new AnalysisException(s"Cannot persistent 
${table.qualifiedName} into hive metastore " +
+s"as table property keys may not start with '$DATASOURCE_PREFIX': 
" +
+datasourceKeys.mkString("[", ", ", "]"))
+}
+  }
--- End diff --

Just realized one thing. Is it possible that we somehow create `table` 
based on a `CatalogTable` generated from `restoreTableMetadata`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...

2016-08-16 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13796#discussion_r75064596
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -1082,57 +1343,62 @@ private class LogisticCostFun(
 fitIntercept: Boolean,
 standardization: Boolean,
 bcFeaturesStd: Broadcast[Array[Double]],
-regParamL2: Double) extends DiffFunction[BDV[Double]] {
+regParamL2: Double,
+multinomial: Boolean) extends DiffFunction[BDV[Double]] {
 
   val featuresStd = bcFeaturesStd.value
 
   override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) 
= {
-val numFeatures = featuresStd.length
 val coeffs = Vectors.fromBreeze(coefficients)
 val bcCoeffs = instances.context.broadcast(coeffs)
-val n = coeffs.size
+val localFeaturesStd = featuresStd
+val numFeatures = localFeaturesStd.length
+val numFeaturesPlusIntercept = if (fitIntercept) numFeatures + 1 else 
numFeatures
 
 val logisticAggregator = {
-  val seqOp = (c: LogisticAggregator, instance: Instance) => 
c.add(instance)
+  val seqOp = (c: LogisticAggregator, instance: Instance) =>
+c.add(instance)
   val combOp = (c1: LogisticAggregator, c2: LogisticAggregator) => 
c1.merge(c2)
 
   instances.treeAggregate(
-new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, 
numClasses, fitIntercept)
+new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, 
numClasses, fitIntercept,
+  multinomial)
   )(seqOp, combOp)
 }
 
 val totalGradientArray = logisticAggregator.gradient.toArray
-
 // regVal is the sum of coefficients squares excluding intercept for 
L2 regularization.
 val regVal = if (regParamL2 == 0.0) {
   0.0
 } else {
+  val K = if (multinomial) numClasses else numClasses - 1
   var sum = 0.0
-  coeffs.foreachActive { (index, value) =>
-// If `fitIntercept` is true, the last term which is intercept 
doesn't
-// contribute to the regularization.
-if (index != numFeatures) {
+  (0 until K).foreach { k =>
+var j = 0
+while (j < numFeatures) {
   // The following code will compute the loss of the 
regularization; also
   // the gradient of the regularization, and add back to 
totalGradientArray.
+  val value = coeffs(k * numFeaturesPlusIntercept + j)
--- End diff --

Why are you not using `foreachActive`? Although we know that `coeffs` is 
dense array, but if we implement strong rule which can know which column of 
`coeffs` will be zeros before the optimization, we may store it as sparse 
vector. As a result, using `foreachActive` will be a good abstraction. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75064560
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -200,22 +348,73 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
* Alter a table whose name that matches the one specified in 
`tableDefinition`,
* assuming the table exists.
*
-   * Note: As of now, this only supports altering table properties, serde 
properties,
-   * and num buckets!
+   * Note: As of now, this only supports altering table properties and 
serde properties.
*/
   override def alterTable(tableDefinition: CatalogTable): Unit = 
withClient {
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireTableExists(db, tableDefinition.identifier.table)
-client.alterTable(tableDefinition)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.alterTable(tableDefinition)
+} else {
+  val oldDef = client.getTable(db, tableDefinition.identifier.table)
+  // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from 
the old table definition,
+  // to retain the spark specific format if it is.
+  // Also add table meta properties to table properties, to retain the 
data source table format.
+  val newDef = tableDefinition.copy(
+schema = oldDef.schema,
+partitionColumnNames = oldDef.partitionColumnNames,
+bucketSpec = oldDef.bucketSpec,
+properties = tableMetadataToProperties(tableDefinition) ++ 
tableDefinition.properties)
+
+  client.alterTable(newDef)
+}
   }
 
   override def getTable(db: String, table: String): CatalogTable = 
withClient {
-client.getTable(db, table)
+restoreTableMetadata(client.getTable(db, table))
   }
 
   override def getTableOption(db: String, table: String): 
Option[CatalogTable] = withClient {
-client.getTableOption(db, table)
+client.getTableOption(db, table).map(restoreTableMetadata)
+  }
+
+  /**
+   * Restores table metadata from the table properties if it's a datasouce 
table. This method is
+   * kind of a opposite version of [[createTable]].
+   */
+  private def restoreTableMetadata(table: CatalogTable): CatalogTable = {
+if (table.tableType == VIEW) {
+  table
+} else {
+  getProviderFromTableProperties(table).map { provider =>
--- End diff --

`provider` can be `hive`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75064362
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -200,22 +348,73 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
* Alter a table whose name that matches the one specified in 
`tableDefinition`,
* assuming the table exists.
*
-   * Note: As of now, this only supports altering table properties, serde 
properties,
-   * and num buckets!
+   * Note: As of now, this only supports altering table properties and 
serde properties.
*/
   override def alterTable(tableDefinition: CatalogTable): Unit = 
withClient {
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireTableExists(db, tableDefinition.identifier.table)
-client.alterTable(tableDefinition)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.alterTable(tableDefinition)
+} else {
+  val oldDef = client.getTable(db, tableDefinition.identifier.table)
+  // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from 
the old table definition,
+  // to retain the spark specific format if it is.
+  // Also add table meta properties to table properties, to retain the 
data source table format.
+  val newDef = tableDefinition.copy(
+schema = oldDef.schema,
+partitionColumnNames = oldDef.partitionColumnNames,
+bucketSpec = oldDef.bucketSpec,
+properties = tableMetadataToProperties(tableDefinition) ++ 
tableDefinition.properties)
--- End diff --

If we only look at this method, it is not clear if the new 
`tableDefinition` changes other fields like `storage`. Also, we are using the 
existing `bucketSpec`. But, is it possible that we have a new `bucketSpec` in 
`tableDefinition`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...

2016-08-16 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13796#discussion_r75064380
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -1082,57 +1343,62 @@ private class LogisticCostFun(
 fitIntercept: Boolean,
 standardization: Boolean,
 bcFeaturesStd: Broadcast[Array[Double]],
-regParamL2: Double) extends DiffFunction[BDV[Double]] {
+regParamL2: Double,
+multinomial: Boolean) extends DiffFunction[BDV[Double]] {
 
   val featuresStd = bcFeaturesStd.value
 
   override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) 
= {
-val numFeatures = featuresStd.length
 val coeffs = Vectors.fromBreeze(coefficients)
 val bcCoeffs = instances.context.broadcast(coeffs)
-val n = coeffs.size
+val localFeaturesStd = featuresStd
+val numFeatures = localFeaturesStd.length
+val numFeaturesPlusIntercept = if (fitIntercept) numFeatures + 1 else 
numFeatures
 
 val logisticAggregator = {
-  val seqOp = (c: LogisticAggregator, instance: Instance) => 
c.add(instance)
+  val seqOp = (c: LogisticAggregator, instance: Instance) =>
+c.add(instance)
   val combOp = (c1: LogisticAggregator, c2: LogisticAggregator) => 
c1.merge(c2)
 
   instances.treeAggregate(
-new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, 
numClasses, fitIntercept)
+new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, 
numClasses, fitIntercept,
+  multinomial)
   )(seqOp, combOp)
 }
 
 val totalGradientArray = logisticAggregator.gradient.toArray
-
 // regVal is the sum of coefficients squares excluding intercept for 
L2 regularization.
 val regVal = if (regParamL2 == 0.0) {
   0.0
 } else {
+  val K = if (multinomial) numClasses else numClasses - 1
--- End diff --

just make else as `1` for clarity. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...

2016-08-16 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13796#discussion_r75064330
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -1082,57 +1343,62 @@ private class LogisticCostFun(
 fitIntercept: Boolean,
 standardization: Boolean,
 bcFeaturesStd: Broadcast[Array[Double]],
-regParamL2: Double) extends DiffFunction[BDV[Double]] {
+regParamL2: Double,
+multinomial: Boolean) extends DiffFunction[BDV[Double]] {
 
   val featuresStd = bcFeaturesStd.value
 
   override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) 
= {
-val numFeatures = featuresStd.length
 val coeffs = Vectors.fromBreeze(coefficients)
 val bcCoeffs = instances.context.broadcast(coeffs)
-val n = coeffs.size
+val localFeaturesStd = featuresStd
+val numFeatures = localFeaturesStd.length
+val numFeaturesPlusIntercept = if (fitIntercept) numFeatures + 1 else 
numFeatures
 
 val logisticAggregator = {
-  val seqOp = (c: LogisticAggregator, instance: Instance) => 
c.add(instance)
+  val seqOp = (c: LogisticAggregator, instance: Instance) =>
+c.add(instance)
   val combOp = (c1: LogisticAggregator, c2: LogisticAggregator) => 
c1.merge(c2)
 
   instances.treeAggregate(
-new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, 
numClasses, fitIntercept)
+new LogisticAggregator(bcCoeffs, bcFeaturesStd, numFeatures, 
numClasses, fitIntercept,
+  multinomial)
   )(seqOp, combOp)
 }
 
 val totalGradientArray = logisticAggregator.gradient.toArray
-
--- End diff --

revert this if not need.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...

2016-08-16 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13796#discussion_r75064275
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -1082,57 +1343,62 @@ private class LogisticCostFun(
 fitIntercept: Boolean,
 standardization: Boolean,
 bcFeaturesStd: Broadcast[Array[Double]],
-regParamL2: Double) extends DiffFunction[BDV[Double]] {
+regParamL2: Double,
+multinomial: Boolean) extends DiffFunction[BDV[Double]] {
 
   val featuresStd = bcFeaturesStd.value
 
   override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) 
= {
-val numFeatures = featuresStd.length
 val coeffs = Vectors.fromBreeze(coefficients)
 val bcCoeffs = instances.context.broadcast(coeffs)
-val n = coeffs.size
+val localFeaturesStd = featuresStd
--- End diff --

Where is `localFeaturesStd` being used? Why not move `val featuresStd = 
bcFeaturesStd.value` into the `calculate method`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13796: [SPARK-7159][ML] Add multiclass logistic regressi...

2016-08-16 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13796#discussion_r75064097
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -1082,57 +1343,62 @@ private class LogisticCostFun(
 fitIntercept: Boolean,
 standardization: Boolean,
 bcFeaturesStd: Broadcast[Array[Double]],
-regParamL2: Double) extends DiffFunction[BDV[Double]] {
+regParamL2: Double,
+multinomial: Boolean) extends DiffFunction[BDV[Double]] {
 
   val featuresStd = bcFeaturesStd.value
 
   override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) 
= {
-val numFeatures = featuresStd.length
 val coeffs = Vectors.fromBreeze(coefficients)
 val bcCoeffs = instances.context.broadcast(coeffs)
-val n = coeffs.size
+val localFeaturesStd = featuresStd
+val numFeatures = localFeaturesStd.length
+val numFeaturesPlusIntercept = if (fitIntercept) numFeatures + 1 else 
numFeatures
 
 val logisticAggregator = {
-  val seqOp = (c: LogisticAggregator, instance: Instance) => 
c.add(instance)
+  val seqOp = (c: LogisticAggregator, instance: Instance) =>
+c.add(instance)
--- End diff --

revert this if no change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14678
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75063920
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireDbExists(db)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.createTable(tableDefinition, ignoreIfExists)
+} else {
+  val tableProperties = tableMetadataToProperties(tableDefinition)
+
+  def newSparkSQLSpecificMetastoreTable(): CatalogTable = {
+tableDefinition.copy(
+  schema = new StructType,
+  partitionColumnNames = Nil,
+  bucketSpec = None,
+  properties = tableDefinition.properties ++ tableProperties)
+  }
+
+  def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): 
CatalogTable = {
+tableDefinition.copy(
+  storage = tableDefinition.storage.copy(
+locationUri = Some(new Path(path).toUri.toString),
+inputFormat = serde.inputFormat,
+outputFormat = serde.outputFormat,
+serde = serde.serde
+  ),
+  properties = tableDefinition.properties ++ tableProperties)
+  }
+
+  val qualifiedTableName = tableDefinition.identifier.quotedString
+  val maybeSerde = 
HiveSerDe.sourceToSerDe(tableDefinition.provider.get)
+  val maybePath = new 
CaseInsensitiveMap(tableDefinition.storage.properties).get("path")
+  val skipHiveMetadata = tableDefinition.storage.properties
+.getOrElse("skipHiveMetadata", "false").toBoolean
+
+  val (hiveCompatibleTable, logMessage) = (maybeSerde, maybePath) 
match {
+case _ if skipHiveMetadata =>
+  val message =
+s"Persisting data source table $qualifiedTableName into Hive 
metastore in" +
+  "Spark SQL specific format, which is NOT compatible with 
Hive."
+  (None, message)
+
+// our bucketing is un-compatible with hive(different hash 
function)
+case _ if tableDefinition.bucketSpec.nonEmpty =>
+  val message =
+s"Persisting bucketed data source table $qualifiedTableName 
into " +
+  "Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive. "
+  (None, message)
+
+case (Some(serde), Some(path)) =>
+  val message =
+s"Persisting data source table $qualifiedTableName with a 
single input path " +
+  s"into Hive metastore in Hive compatible format."
+  (Some(newHiveCompatibleMetastoreTable(serde, path)), message)
+
+case (Some(_), None) =>
+  val message =
+s"Data source table $qualifiedTableName is not file based. 
Persisting it into " +
+  s"Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive."
+  (None, message)
+
+case _ =>
+  val provider = tableDefinition.provider.get
+  val message =
+s"Couldn't find corresponding Hive SerDe for data source 
provider $provider. " +
+  s"Persisting data source table $qualifiedTableName into Hive 
metastore in " +
+  s"Spark SQL specific format, which is NOT compatible with 
Hive."
+  (None, message)
+  }
+
+  (hiveCompatibleTable, logMessage) match {
+case (Some(table), message) =>
+  // We first try to save the metadata of the table in a Hive 
compatible way.
+  // If Hive throws an error, we fall back to save its metadata in 
the Spark SQL
+  // specific way.
+  try {
+logInfo(message)
+saveTableIntoHive(table, ignoreIfExists)
+  } catch {
+case NonFatal(e) =>
+  val warningMessage =
+s"Could not persist 
${tableDefinition.identifier.quotedString} in a Hive " +
+  "compatible way. Persisting it into Hive metastore in 
Spark SQL specific format."
+  logWarning(warningMessage, e)
+  saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), 
ignoreIfExists)
+  }
+
+case (None, message) =>
+  logWarning(message)
+  saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), 
ignoreIfExists)
+  }
+}
+  }
+
+  private def 

[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14678
  
**[Test build #63898 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63898/consoleFull)**
 for PR 14678 at commit 
[`c959f3b`](https://github.com/apache/spark/commit/c959f3b4e9ed23a9cee67db64b35fdc4d0e2301d).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14678
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63898/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75063736
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireDbExists(db)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.createTable(tableDefinition, ignoreIfExists)
+} else {
+  val tableProperties = tableMetadataToProperties(tableDefinition)
+
+  def newSparkSQLSpecificMetastoreTable(): CatalogTable = {
+tableDefinition.copy(
+  schema = new StructType,
+  partitionColumnNames = Nil,
+  bucketSpec = None,
+  properties = tableDefinition.properties ++ tableProperties)
+  }
+
+  def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): 
CatalogTable = {
+tableDefinition.copy(
+  storage = tableDefinition.storage.copy(
+locationUri = Some(new Path(path).toUri.toString),
+inputFormat = serde.inputFormat,
+outputFormat = serde.outputFormat,
+serde = serde.serde
+  ),
+  properties = tableDefinition.properties ++ tableProperties)
+  }
+
+  val qualifiedTableName = tableDefinition.identifier.quotedString
+  val maybeSerde = 
HiveSerDe.sourceToSerDe(tableDefinition.provider.get)
+  val maybePath = new 
CaseInsensitiveMap(tableDefinition.storage.properties).get("path")
+  val skipHiveMetadata = tableDefinition.storage.properties
+.getOrElse("skipHiveMetadata", "false").toBoolean
+
+  val (hiveCompatibleTable, logMessage) = (maybeSerde, maybePath) 
match {
+case _ if skipHiveMetadata =>
+  val message =
+s"Persisting data source table $qualifiedTableName into Hive 
metastore in" +
+  "Spark SQL specific format, which is NOT compatible with 
Hive."
+  (None, message)
+
+// our bucketing is un-compatible with hive(different hash 
function)
+case _ if tableDefinition.bucketSpec.nonEmpty =>
+  val message =
+s"Persisting bucketed data source table $qualifiedTableName 
into " +
+  "Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive. "
+  (None, message)
+
+case (Some(serde), Some(path)) =>
+  val message =
+s"Persisting data source table $qualifiedTableName with a 
single input path " +
+  s"into Hive metastore in Hive compatible format."
+  (Some(newHiveCompatibleMetastoreTable(serde, path)), message)
+
+case (Some(_), None) =>
+  val message =
+s"Data source table $qualifiedTableName is not file based. 
Persisting it into " +
+  s"Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive."
+  (None, message)
+
+case _ =>
+  val provider = tableDefinition.provider.get
+  val message =
+s"Couldn't find corresponding Hive SerDe for data source 
provider $provider. " +
+  s"Persisting data source table $qualifiedTableName into Hive 
metastore in " +
+  s"Spark SQL specific format, which is NOT compatible with 
Hive."
+  (None, message)
+  }
+
+  (hiveCompatibleTable, logMessage) match {
+case (Some(table), message) =>
+  // We first try to save the metadata of the table in a Hive 
compatible way.
+  // If Hive throws an error, we fall back to save its metadata in 
the Spark SQL
+  // specific way.
+  try {
+logInfo(message)
+saveTableIntoHive(table, ignoreIfExists)
+  } catch {
+case NonFatal(e) =>
+  val warningMessage =
+s"Could not persist 
${tableDefinition.identifier.quotedString} in a Hive " +
+  "compatible way. Persisting it into Hive metastore in 
Spark SQL specific format."
+  logWarning(warningMessage, e)
+  saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), 
ignoreIfExists)
+  }
+
+case (None, message) =>
+  logWarning(message)
+  saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), 
ignoreIfExists)
+  }
+}
+  }
+
+  private def 

[GitHub] spark issue #14678: [MINOR][SQL] Add missing functions for some options in S...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14678
  
**[Test build #63898 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63898/consoleFull)**
 for PR 14678 at commit 
[`c959f3b`](https://github.com/apache/spark/commit/c959f3b4e9ed23a9cee67db64b35fdc4d0e2301d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75063676
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireDbExists(db)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.createTable(tableDefinition, ignoreIfExists)
+} else {
+  val tableProperties = tableMetadataToProperties(tableDefinition)
+
+  def newSparkSQLSpecificMetastoreTable(): CatalogTable = {
+tableDefinition.copy(
+  schema = new StructType,
+  partitionColumnNames = Nil,
+  bucketSpec = None,
+  properties = tableDefinition.properties ++ tableProperties)
+  }
+
+  def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): 
CatalogTable = {
+tableDefinition.copy(
+  storage = tableDefinition.storage.copy(
+locationUri = Some(new Path(path).toUri.toString),
+inputFormat = serde.inputFormat,
+outputFormat = serde.outputFormat,
+serde = serde.serde
+  ),
+  properties = tableDefinition.properties ++ tableProperties)
+  }
+
+  val qualifiedTableName = tableDefinition.identifier.quotedString
+  val maybeSerde = 
HiveSerDe.sourceToSerDe(tableDefinition.provider.get)
+  val maybePath = new 
CaseInsensitiveMap(tableDefinition.storage.properties).get("path")
+  val skipHiveMetadata = tableDefinition.storage.properties
+.getOrElse("skipHiveMetadata", "false").toBoolean
+
+  val (hiveCompatibleTable, logMessage) = (maybeSerde, maybePath) 
match {
+case _ if skipHiveMetadata =>
+  val message =
+s"Persisting data source table $qualifiedTableName into Hive 
metastore in" +
+  "Spark SQL specific format, which is NOT compatible with 
Hive."
+  (None, message)
+
+// our bucketing is un-compatible with hive(different hash 
function)
+case _ if tableDefinition.bucketSpec.nonEmpty =>
+  val message =
+s"Persisting bucketed data source table $qualifiedTableName 
into " +
+  "Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive. "
+  (None, message)
+
+case (Some(serde), Some(path)) =>
+  val message =
+s"Persisting data source table $qualifiedTableName with a 
single input path " +
+  s"into Hive metastore in Hive compatible format."
+  (Some(newHiveCompatibleMetastoreTable(serde, path)), message)
+
+case (Some(_), None) =>
+  val message =
+s"Data source table $qualifiedTableName is not file based. 
Persisting it into " +
+  s"Hive metastore in Spark SQL specific format, which is NOT 
compatible with Hive."
+  (None, message)
+
+case _ =>
+  val provider = tableDefinition.provider.get
+  val message =
+s"Couldn't find corresponding Hive SerDe for data source 
provider $provider. " +
+  s"Persisting data source table $qualifiedTableName into Hive 
metastore in " +
+  s"Spark SQL specific format, which is NOT compatible with 
Hive."
+  (None, message)
+  }
+
+  (hiveCompatibleTable, logMessage) match {
+case (Some(table), message) =>
+  // We first try to save the metadata of the table in a Hive 
compatible way.
+  // If Hive throws an error, we fall back to save its metadata in 
the Spark SQL
+  // specific way.
+  try {
+logInfo(message)
+saveTableIntoHive(table, ignoreIfExists)
+  } catch {
+case NonFatal(e) =>
+  val warningMessage =
+s"Could not persist 
${tableDefinition.identifier.quotedString} in a Hive " +
+  "compatible way. Persisting it into Hive metastore in 
Spark SQL specific format."
+  logWarning(warningMessage, e)
+  saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), 
ignoreIfExists)
+  }
+
+case (None, message) =>
+  logWarning(message)
+  saveTableIntoHive(newSparkSQLSpecificMetastoreTable(), 
ignoreIfExists)
+  }
+}
+  }
+
+  private def 

[GitHub] spark pull request #14678: [MINOR][SQL] Add missing functions for some optio...

2016-08-16 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/14678

[MINOR][SQL] Add missing functions for some options in SQLConf and use them 
where applicable

## What changes were proposed in this pull request?

I first thought they are missing because they are kind of hidden options 
but it seems they are just missing.

For example, `spark.sql.parquet.mergeSchema` is documented in 
[sql-programming-guide.md](https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md)
 but this functions is missing whereas many options such as 
`spark.sql.join.preferSortMergeJoin` are not documented but has its own 
function.

So, this PR suggests make them consistent by adding the missing functions 
for some options in `SQLConf` and use them where applicable. 

## How was this patch tested?

Existing tests should cover this.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark sqlconf-cleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14678


commit 27fbbd902dbfca34dd5edd5a219dc64abf9691cf
Author: hyukjinkwon 
Date:   2016-08-17T04:50:15Z

Add missing functions for some options and use them where applicable

commit c959f3b4e9ed23a9cee67db64b35fdc4d0e2301d
Author: hyukjinkwon 
Date:   2016-08-17T04:59:22Z

Fix typos




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75063449
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireDbExists(db)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.createTable(tableDefinition, ignoreIfExists)
+} else {
+  val tableProperties = tableMetadataToProperties(tableDefinition)
+
+  def newSparkSQLSpecificMetastoreTable(): CatalogTable = {
+tableDefinition.copy(
+  schema = new StructType,
+  partitionColumnNames = Nil,
+  bucketSpec = None,
+  properties = tableDefinition.properties ++ tableProperties)
+  }
+
+  def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): 
CatalogTable = {
+tableDefinition.copy(
+  storage = tableDefinition.storage.copy(
+locationUri = Some(new Path(path).toUri.toString),
+inputFormat = serde.inputFormat,
+outputFormat = serde.outputFormat,
+serde = serde.serde
+  ),
+  properties = tableDefinition.properties ++ tableProperties)
+  }
+
+  val qualifiedTableName = tableDefinition.identifier.quotedString
+  val maybeSerde = 
HiveSerDe.sourceToSerDe(tableDefinition.provider.get)
+  val maybePath = new 
CaseInsensitiveMap(tableDefinition.storage.properties).get("path")
--- End diff --

I think this path will be set by the ddl command (e.g. 
`CreateDataSourceTableAsSelectCommand`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75063156
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireDbExists(db)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.createTable(tableDefinition, ignoreIfExists)
+} else {
+  val tableProperties = tableMetadataToProperties(tableDefinition)
+
+  def newSparkSQLSpecificMetastoreTable(): CatalogTable = {
+tableDefinition.copy(
+  schema = new StructType,
+  partitionColumnNames = Nil,
+  bucketSpec = None,
+  properties = tableDefinition.properties ++ tableProperties)
+  }
+
+  def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): 
CatalogTable = {
+tableDefinition.copy(
+  storage = tableDefinition.storage.copy(
+locationUri = Some(new Path(path).toUri.toString),
+inputFormat = serde.inputFormat,
+outputFormat = serde.outputFormat,
+serde = serde.serde
+  ),
+  properties = tableDefinition.properties ++ tableProperties)
+  }
+
+  val qualifiedTableName = tableDefinition.identifier.quotedString
+  val maybeSerde = 
HiveSerDe.sourceToSerDe(tableDefinition.provider.get)
+  val maybePath = new 
CaseInsensitiveMap(tableDefinition.storage.properties).get("path")
--- End diff --

If the create table command does not specify the location, does this 
`maybePath` contains the default location?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75063094
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireDbExists(db)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.createTable(tableDefinition, ignoreIfExists)
+} else {
+  val tableProperties = tableMetadataToProperties(tableDefinition)
--- End diff --

Let's explain what will be put into this `tableProperties`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75062850
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireDbExists(db)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.createTable(tableDefinition, ignoreIfExists)
+} else {
+  val tableProperties = tableMetadataToProperties(tableDefinition)
+
+  def newSparkSQLSpecificMetastoreTable(): CatalogTable = {
+tableDefinition.copy(
+  schema = new StructType,
+  partitionColumnNames = Nil,
+  bucketSpec = None,
+  properties = tableDefinition.properties ++ tableProperties)
+  }
+
+  def newHiveCompatibleMetastoreTable(serde: HiveSerDe, path: String): 
CatalogTable = {
--- End diff --

comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75062846
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireDbExists(db)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.createTable(tableDefinition, ignoreIfExists)
+} else {
+  val tableProperties = tableMetadataToProperties(tableDefinition)
+
+  def newSparkSQLSpecificMetastoreTable(): CatalogTable = {
--- End diff --

comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14539: [SPARK-16947][SQL] Improve type coercion for inline tabl...

2016-08-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14539
  
How to specify `null` when we creating inline table? 
```
sql(
  """
|create temporary view src as select * from values
|(201, null),
|(86, "val_86"),
  """.stripMargin)
```

Is that supported?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-16 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75062743
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -144,16 +161,147 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireDbExists(db)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.createTable(tableDefinition, ignoreIfExists)
+} else {
--- End diff --

Let's add comment to explain what we are doing at here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14676: [SPARK-16947][SQL] Support type coercion and fold...

2016-08-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14676#discussion_r75062469
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/inline-table.sql ---
@@ -0,0 +1,39 @@
+
+-- single row, without table and column alias
+select * from values ("one", 1);
+
+-- single row, without column alias
+select * from values ("one", 1) as data;
+
+-- single row
+select * from values ("one", 1) as data(a, b);
--- End diff --

Could you add a case for `NULL`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14676
  
**[Test build #63897 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63897/consoleFull)**
 for PR 14676 at commit 
[`092605b`](https://github.com/apache/spark/commit/092605be786adae0aa241da43e25d1f1be5de492).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14676
  
**[Test build #63896 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63896/consoleFull)**
 for PR 14676 at commit 
[`fcc3caf`](https://github.com/apache/spark/commit/fcc3cafe08c9549be51b33b5ac993fbb3fa46d37).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14665: [SPARK-17084][SQL] Rename ParserUtils.assert to v...

2016-08-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14665


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14551: [SPARK-16961][CORE] Fixed off-by-one error that biased r...

2016-08-16 Thread wangmiao1981
Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/14551
  
`model.gaussiansDF.show()` displays the `mean` and `variance` of the 
gaussians, which are Dataframes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14665: [SPARK-17084][SQL] Rename ParserUtils.assert to validate

2016-08-16 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14665
  
Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14648: [SPARK-16995][SQL] TreeNodeException when flat mapping R...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14648
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63892/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14648: [SPARK-16995][SQL] TreeNodeException when flat mapping R...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14648
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14648: [SPARK-16995][SQL] TreeNodeException when flat mapping R...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14648
  
**[Test build #63892 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63892/consoleFull)**
 for PR 14648 at commit 
[`0008c3e`](https://github.com/apache/spark/commit/0008c3e11dfb85523f9f4606d5dec714339d5f43).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14660
  
**[Test build #63895 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63895/consoleFull)**
 for PR 14660 at commit 
[`0130c39`](https://github.com/apache/spark/commit/0130c39450c7bcecfb3a14db1e581c3f3a9f6a20).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #8880: [SPARK-5682][Core] Add encrypted shuffle in spark

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/8880
  
**[Test build #63894 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63894/consoleFull)**
 for PR 8880 at commit 
[`77122bb`](https://github.com/apache/spark/commit/77122bb3662c65ffa5596d740efab41f5dfc3a0f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14677: [MINOR][DOC] Fix the descriptions for `properties` argum...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14677
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14677: [MINOR][DOC] Fix the descriptions for `properties` argum...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63893/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14677: [MINOR][DOC] Fix the descriptions for `properties` argum...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14677
  
**[Test build #63893 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63893/consoleFull)**
 for PR 14677 at commit 
[`f501861`](https://github.com/apache/spark/commit/f5018616eee50544c22432c2256c75325b537e82).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14677: [MINOR][DOC] Fix the descriptions for `properties` argum...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14677
  
**[Test build #63893 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63893/consoleFull)**
 for PR 14677 at commit 
[`f501861`](https://github.com/apache/spark/commit/f5018616eee50544c22432c2256c75325b537e82).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-16 Thread mpjlu
Github user mpjlu closed the pull request at:

https://github.com/apache/spark/pull/14597


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14620: [SPARK-17032][SQL] Add test cases for methods in ParserU...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14620
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14620: [SPARK-17032][SQL] Add test cases for methods in ParserU...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14620
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63888/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14620: [SPARK-17032][SQL] Add test cases for methods in ParserU...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14620
  
**[Test build #63888 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63888/consoleFull)**
 for PR 14620 at commit 
[`36e049a`](https://github.com/apache/spark/commit/36e049a8fdafb3d311029b292e7d2de5efce4c6a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14677: [MINOR][DOC] Fix the descriptions for `properties` argum...

2016-08-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14677
  
cc @srowen and @mvervuurt 

I just opened this just for verifying duplicately again as I already did. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14677: [MINOR][DOC] Fix the descriptions for `properties...

2016-08-16 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/14677

[MINOR][DOC] Fix the descriptions for `properties` argument in the 
documenation for jdbc APIs

## What changes were proposed in this pull request?

This should be credited to @mvervuurt. The main purpose of this PR is
 - simply to include the change for the same instance in `DataFrameReader` 
just to match up.
 - just avoid duplicately verifying the PR (as I already did).

The documentation for both should be the same because both assume the 
`properties` should be  the same `dict`.

## How was this patch tested?

Manually building Python documentation.

This will produce the output as below:

- `DataFrameReader`

![2016-08-17 11 12 
00](https://cloud.githubusercontent.com/assets/6477701/17722764/b3f6568e-646f-11e6-8b75-4fb672f3f366.png)

- `DataFrameWriter`

![2016-08-17 11 12 
10](https://cloud.githubusercontent.com/assets/6477701/17722765/b58cb308-646f-11e6-841a-32f19800d139.png)

Closes #14624


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark typo-python

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14677


commit b0864ecb51a28452f6e33f4bdcd30795c7c2ec99
Author: mvervuurt 
Date:   2016-08-12T18:42:40Z

Fix docstring of method jdbc of PySpark DataFrameWriter because a 
dictionary of JDBC connection arguments is used instead of a list.

commit f5018616eee50544c22432c2256c75325b537e82
Author: hyukjinkwon 
Date:   2016-08-17T02:17:48Z

Fix the descriptions for `properties` argument in the documenation for jdbc 
APIs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14648: [SPARK-16995][SQL] TreeNodeException when flat mapping R...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14648
  
**[Test build #63892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63892/consoleFull)**
 for PR 14648 at commit 
[`0008c3e`](https://github.com/apache/spark/commit/0008c3e11dfb85523f9f4606d5dec714339d5f43).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14676
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14676
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63889/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14676
  
**[Test build #63889 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63889/consoleFull)**
 for PR 14676 at commit 
[`2327b79`](https://github.com/apache/spark/commit/2327b7971a845ce01b0b18fd5ccd2b1f0bb99be0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14547: [SPARK-16718][MLlib] gbm-style treeboost [WIP]

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14547
  
**[Test build #3224 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3224/consoleFull)**
 for PR 14547 at commit 
[`a040da5`](https://github.com/apache/spark/commit/a040da5ea64778d766720ecd6a8859893d7204f0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14676
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63887/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14676
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14676
  
**[Test build #63887 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63887/consoleFull)**
 for PR 14676 at commit 
[`d7acae5`](https://github.com/apache/spark/commit/d7acae55034d4ff5da3e7579cf44acb7b704b4a1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedInlineTable(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-16 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/14639
  
Thanks @sun-rui  Another commit resolved the downloading issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-16 Thread sun-rui
Github user sun-rui commented on the issue:

https://github.com/apache/spark/pull/14639
  
This is not only about the correct cache dir under MAC OS, but also in 
yarn-cluster mode, there should not be downloading of Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14675: [SPARK-17096][SQL][STREAMING] Improve exception string r...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14675
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14670: [SPARK-15285][SQL] Generated SpecificSafeProjecti...

2016-08-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14670#discussion_r75052840
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala ---
@@ -58,4 +59,43 @@ class DataFrameComplexTypeSuite extends QueryTest with 
SharedSQLContext {
 val nullIntRow = df.selectExpr("i[1]").collect()(0)
 assert(nullIntRow == org.apache.spark.sql.Row(null))
   }
+
+  test("SPARK-15285 Generated SpecificSafeProjection.apply method grows 
beyond 64KB") {
+val ds100_5 = Seq(S100_5()).toDS()
+ds100_5.rdd.count
+  }
 }
+
+class S100(
+  val s1: String = "1", val s2: String = "2", val s3: String = "3", val 
s4: String = "4",
+  val s5: String = "5", val s6: String = "6", val s7: String = "7", val 
s8: String = "8",
+  val s9: String = "9", val s10: String = "10", val s11: String = "11", 
val s12: String = "12",
+  val s13: String = "13", val s14: String = "14", val s15: String = "15", 
val s16: String = "16",
+  val s17: String = "17", val s18: String = "18", val s19: String = "19", 
val s20: String = "20",
+  val s21: String = "21", val s22: String = "22", val s23: String = "23", 
val s24: String = "24",
+  val s25: String = "25", val s26: String = "26", val s27: String = "27", 
val s28: String = "28",
+  val s29: String = "29", val s30: String = "30", val s31: String = "31", 
val s32: String = "32",
+  val s33: String = "33", val s34: String = "34", val s35: String = "35", 
val s36: String = "36",
+  val s37: String = "37", val s38: String = "38", val s39: String = "39", 
val s40: String = "40",
+  val s41: String = "41", val s42: String = "42", val s43: String = "43", 
val s44: String = "44",
+  val s45: String = "45", val s46: String = "46", val s47: String = "47", 
val s48: String = "48",
+  val s49: String = "49", val s50: String = "50", val s51: String = "51", 
val s52: String = "52",
+  val s53: String = "53", val s54: String = "54", val s55: String = "55", 
val s56: String = "56",
+  val s57: String = "57", val s58: String = "58", val s59: String = "59", 
val s60: String = "60",
+  val s61: String = "61", val s62: String = "62", val s63: String = "63", 
val s64: String = "64",
+  val s65: String = "65", val s66: String = "66", val s67: String = "67", 
val s68: String = "68",
+  val s69: String = "69", val s70: String = "70", val s71: String = "71", 
val s72: String = "72",
+  val s73: String = "73", val s74: String = "74", val s75: String = "75", 
val s76: String = "76",
+  val s77: String = "77", val s78: String = "78", val s79: String = "79", 
val s80: String = "80",
+  val s81: String = "81", val s82: String = "82", val s83: String = "83", 
val s84: String = "84",
+  val s85: String = "85", val s86: String = "86", val s87: String = "87", 
val s88: String = "88",
+  val s89: String = "89", val s90: String = "90", val s91: String = "91", 
val s92: String = "92",
+  val s93: String = "93", val s94: String = "94", val s95: String = "95", 
val s96: String = "96",
+  val s97: String = "97", val s98: String = "98", val s99: String = "99", 
val s100: String = "100")
+extends DefinedByConstructorParams {}
--- End diff --

remove the useless `{}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14675: [SPARK-17096][SQL][STREAMING] Improve exception string r...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63885/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14670: [SPARK-15285][SQL] Generated SpecificSafeProjection.appl...

2016-08-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14670
  
can you add a link to the original PR? thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14675: [SPARK-17096][SQL][STREAMING] Improve exception string r...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14675
  
**[Test build #63885 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63885/consoleFull)**
 for PR 14675 at commit 
[`60eabcc`](https://github.com/apache/spark/commit/60eabcc4716f26e0c6d688d14b518f7321313f88).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14639
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63891/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14639
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14639
  
**[Test build #63891 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63891/consoleFull)**
 for PR 14639 at commit 
[`c50102c`](https://github.com/apache/spark/commit/c50102cccd34c50f727b9e8873fed1e7f87983d3).
 * This patch **fails R style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14639
  
**[Test build #63891 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63891/consoleFull)**
 for PR 14639 at commit 
[`c50102c`](https://github.com/apache/spark/commit/c50102cccd34c50f727b9e8873fed1e7f87983d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14229
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14229
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63890/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14229
  
**[Test build #63890 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63890/consoleFull)**
 for PR 14229 at commit 
[`84cc5e7`](https://github.com/apache/spark/commit/84cc5e73523dc9a306c45ca8bc994dc05984424d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14671: [SPARK-17091][SQL] ParquetFilters rewrite IN to OR of Eq

2016-08-16 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14671
  
Yea unfortunately the row-by-row filtering doesn't make much sense in 
Parquet.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-16 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75050631
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -159,55 +159,81 @@ sparkR.sparkContext <- function(
   warning(paste("sparkPackages has no effect when using spark-submit 
or sparkR shell",
 " please use the --packages commandline instead", sep 
= ","))
 }
+host <- "localhost"
 backendPort <- existingPort
   } else {
-path <- tempfile(pattern = "backend_port")
-submitOps <- getClientModeSparkSubmitOpts(
+
+if (!nzchar(master) || is_master_local(master)) {
+  path <- tempfile(pattern = "backend_port")
+  submitOps <- getClientModeSparkSubmitOpts(
 Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"),
 sparkEnvirMap)
-launchBackend(
+  launchBackend(
 args = path,
 sparkHome = sparkHome,
 jars = jars,
 sparkSubmitOpts = submitOps,
 packages = packages)
-# wait atmost 100 seconds for JVM to launch
-wait <- 0.1
-for (i in 1:25) {
-  Sys.sleep(wait)
-  if (file.exists(path)) {
-break
+  # wait atmost 100 seconds for JVM to launch
+  wait <- 0.1
+  for (i in 1:25) {
+Sys.sleep(wait)
+if (file.exists(path)) {
+  break
+}
+wait <- wait * 1.25
   }
-  wait <- wait * 1.25
-}
-if (!file.exists(path)) {
-  stop("JVM is not ready after 10 seconds")
-}
-f <- file(path, open = "rb")
-backendPort <- readInt(f)
-monitorPort <- readInt(f)
-rLibPath <- readString(f)
-close(f)
-file.remove(path)
-if (length(backendPort) == 0 || backendPort == 0 ||
-length(monitorPort) == 0 || monitorPort == 0 ||
-length(rLibPath) != 1) {
-  stop("JVM failed to launch")
+  if (!file.exists(path)) {
+stop("JVM is not ready after 10 seconds")
+  }
+  f <- file(path, open = "rb")
+  backendPort <- readInt(f)
+  monitorPort <- readInt(f)
+  rLibPath <- readString(f)
+  close(f)
+  file.remove(path)
+  if (length(backendPort) == 0 || backendPort == 0 ||
+  length(monitorPort) == 0 || monitorPort == 0 ||
+  length(rLibPath) != 1) {
+stop("JVM failed to launch")
+  }
+  if (rLibPath != "") {
+assign(".libPath", rLibPath, envir = .sparkREnv)
+.libPaths(c(rLibPath, .libPaths()))
+  }
+  host <- "localhost"
+} else {
+  backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) {
+sparkEnvirMap[["backend.port"]]
--- End diff --

How is backend.port passed to R process ? I don't see how this environment 
variable is set. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14359
  
**[Test build #63886 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63886/consoleFull)**
 for PR 14359 at commit 
[`41f4297`](https://github.com/apache/spark/commit/41f4297f7602c062c78c76b2215397830ed7b6af).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14359
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63886/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14359
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for Spa...

2016-08-16 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/14555
  
@srowen Do you mean putting the validation back to method Vectors.sparse() 
?  The constructor of SparseVector is public, so I think we should keep the 
validation in SparseVector rather than its factory Vectors. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14671: [SPARK-17091][SQL] ParquetFilters rewrite IN to O...

2016-08-16 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14671#discussion_r75049585
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -369,7 +369,7 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 
   test("SPARK-11103: Filter applied on merged Parquet schema with new 
column fails") {
 import testImplicits._
-Seq("true", "false").map { vectorized =>
+Seq("true", "false").foreach { vectorized =>
--- End diff --

Yes, I remember I was told that this case is even essential in some cases, 
https://github.com/apache/spark/pull/14416#discussion_r72886131


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14674: [SPARK-17002][CORE]: Document that spark.ssl.protocol. i...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14674
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63882/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14674: [SPARK-17002][CORE]: Document that spark.ssl.protocol. i...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14674
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14674: [SPARK-17002][CORE]: Document that spark.ssl.protocol. i...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14674
  
**[Test build #63882 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63882/consoleFull)**
 for PR 14674 at commit 
[`6cc46b9`](https://github.com/apache/spark/commit/6cc46b927bda28a707fcb5a3471cf3ed16ca).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14229
  
**[Test build #63890 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63890/consoleFull)**
 for PR 14229 at commit 
[`84cc5e7`](https://github.com/apache/spark/commit/84cc5e73523dc9a306c45ca8bc994dc05984424d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14671: [SPARK-17091][SQL] ParquetFilters rewrite IN to OR of Eq

2016-08-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14671
  
Thanks for cc me! As you might already know, I think it makes sense 
allowing to filter rowgroups but this will be also applied to row-by-row for 
normal parquet reader and this was removed by 
[SPARK-16400](https://issues.apache.org/jira/browse/SPARK-16400). So, let me 
please cc @rxin and @liancheng here.

IMHO, I remember there is a concern (sorry I can't find the reference) that 
Spark-side codegen row-by-row filtering might be faster than Parquet's one in 
general due to type-boxing and virtual function calls which Spark's one tries 
to avoid.

So, actually, I was thinking of bringing back this after (maybe) Parquet 
row-by-row filtering is disabled in Spark to allow to filter rowgroups properly.

I am pretty sure filtering rowgroups will make sense but I am a bit 
hesitated for row-by-row one because it seems it was removed for better 
performance and bringing it back might be performance regression although the 
implementation is different. Do we maybe need a benchmark?

Otherwise, maybe we should experiment to check if Spark codegen one is 
actually faster than Parquet's so that we can decide t disable row-by-row 
filtering first (although I am not sure if this was done somewhere or offline).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14670: [SPARK-15285][SQL] Generated SpecificSafeProjection.appl...

2016-08-16 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/14670
  
@cloud-fan @rxin Is this test case what you suggested? I would appreciate 
your comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14229
  
**[Test build #63884 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63884/consoleFull)**
 for PR 14229 at commit 
[`8280b41`](https://github.com/apache/spark/commit/8280b414443eece95130abb053be16c0c57aa9f0).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14229
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63884/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14676
  
**[Test build #63889 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63889/consoleFull)**
 for PR 14676 at commit 
[`2327b79`](https://github.com/apache/spark/commit/2327b7971a845ce01b0b18fd5ccd2b1f0bb99be0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14229: [SPARK-16447][ML][SparkR] LDA wrapper in SparkR

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14229
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14656: [SPARK-17069] Expose spark.range() as table-valued funct...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14656
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63881/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14656: [SPARK-17069] Expose spark.range() as table-valued funct...

2016-08-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14656
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14620: [SPARK-17032][SQL] Add test cases for methods in ParserU...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14620
  
**[Test build #63888 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63888/consoleFull)**
 for PR 14620 at commit 
[`36e049a`](https://github.com/apache/spark/commit/36e049a8fdafb3d311029b292e7d2de5efce4c6a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14656: [SPARK-17069] Expose spark.range() as table-valued funct...

2016-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14656
  
**[Test build #63881 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63881/consoleFull)**
 for PR 14656 at commit 
[`1fa57c4`](https://github.com/apache/spark/commit/1fa57c4d63c8ece438fde7f8d23d0b0698d22cd9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14676: [SPARK-16947][SQL] Support type coercion and foldable ex...

2016-08-16 Thread petermaxlee
Github user petermaxlee commented on the issue:

https://github.com/apache/spark/pull/14676
  
cc @hvanhovell and @cloud-fan. I followed @cloud-fan's suggestion to use 
the analyzer to replace inline tables with LocalRelation.

One thing that is broken about inline tables is SQL generation. We might 
need to implement SQL generation for LocalRelation.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >