date:20160803

[GitHub] spark issue #14476: [SPARK-16867][SQL] createTable and alterTable in Externa...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14476
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14476: [SPARK-16867][SQL] createTable and alterTable in Externa...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14476
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63212/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14476: [SPARK-16867][SQL] createTable and alterTable in Externa...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14476
  
**[Test build #63212 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63212/consoleFull)**
 for PR 14476 at commit 
[`2093906`](https://github.com/apache/spark/commit/20939066b99fd5892a123177deafe24bfb7607d0).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14470: [SPARK-16863][ML] ProbabilisticClassifier.fit check thre...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14470
  
**[Test build #63213 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63213/consoleFull)**
 for PR 14470 at commit 
[`df5af72`](https://github.com/apache/spark/commit/df5af7247960e44281ec64bb141c7a499eaa80cd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14474: [SPARK-16853][SQL] fixes encoder error in DataSet...

2016-08-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14474#discussion_r73465751
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -184,6 +184,17 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   2, 3, 4)
   }
 
+  test("SPARK-16853: select, case class and tuple") {
--- End diff --

how about `typed select that returns case class or tuple`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14467
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14467
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63205/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14467
  
**[Test build #63205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63205/consoleFull)**
 for PR 14467 at commit 
[`cc5f435`](https://github.com/apache/spark/commit/cc5f4352950f338afeecf1e4f5eaceae853b1520).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14476: [SPARK-16867][SQL] createTable and alterTable in Externa...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14476
  
**[Test build #63212 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63212/consoleFull)**
 for PR 14476 at commit 
[`2093906`](https://github.com/apache/spark/commit/20939066b99fd5892a123177deafe24bfb7607d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14470: [SPARK-16863][ML] ProbabilisticClassifier.fit che...

2016-08-03 Thread zhengruifeng

Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14470#discussion_r73465384
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
 ---
@@ -84,6 +84,13 @@ class DecisionTreeClassifier @Since("1.4.0") (
 val categoricalFeatures: Map[Int, Int] =
   MetadataUtils.getCategoricalFeatures(dataset.schema($(featuresCol)))
 val numClasses: Int = getNumClasses(dataset)
+
+if (isDefined(thresholds)) {
+  require($(thresholds).length == numClasses, 
this.getClass.getSimpleName +
--- End diff --

Because `ProbabilisticClassificationModel.transform` first check like this, 
so I just follow this style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14470: [SPARK-16863][ML] ProbabilisticClassifier.fit che...

2016-08-03 Thread zhengruifeng

Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/14470#discussion_r73465400
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala ---
@@ -101,6 +101,14 @@ class NaiveBayes @Since("1.5.0") (
   setDefault(modelType -> OldNaiveBayes.Multinomial)
 
   override protected def train(dataset: Dataset[_]): NaiveBayesModel = {
+val numClasses: Int = getNumClasses(dataset)
--- End diff --

Thanks, I will remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14486: [SQL][SPARK-16888] Implements eval method for expression...

2016-08-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14486
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14486: [SQL][SPARK-16888] Implements eval method for exp...

2016-08-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14486


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...

2016-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14482#discussion_r73465089
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala ---
@@ -19,50 +19,25 @@ package org.apache.spark.sql.execution.datasources
 
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
 import org.apache.spark.sql.catalyst.expressions.Attribute
-import org.apache.spark.sql.catalyst.plans.logical
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.execution.command.RunnableCommand
 import org.apache.spark.sql.types._
 
+case class CreateTable(tableDesc: CatalogTable, mode: SaveMode, query: 
Option[LogicalPlan])
+  extends LogicalPlan {
+  assert(tableDesc.provider.isDefined, "The table to be created must have 
a provider.")
 
-/**
- * Used to represent the operation of create table using a data source.
- *
- * @param allowExisting If it is true, we will do nothing when the table 
already exists.
- *  If it is false, an exception will be thrown
- */
-case class CreateTableUsing(
-tableIdent: TableIdentifier,
-userSpecifiedSchema: Option[StructType],
-provider: String,
-temporary: Boolean,
-options: Map[String, String],
-partitionColumns: Array[String],
-bucketSpec: Option[BucketSpec],
-allowExisting: Boolean,
-managedIfNoPath: Boolean) extends LogicalPlan with logical.Command {
-
-  override def output: Seq[Attribute] = Seq.empty
-  override def children: Seq[LogicalPlan] = Seq.empty
-}
+  if (query.isEmpty) {
+assert(
+  mode == SaveMode.ErrorIfExists || mode == SaveMode.Ignore,
+  "create table without data insertion can only use ErrorIfExists or 
Ignore as SaveMode.")
+  }
 
-/**
- * A node used to support CTAS statements and saveAsTable for the data 
source API.
- * This node is a [[logical.UnaryNode]] instead of a [[logical.Command]] 
because we want the
- * analyzer can analyze the logical plan that will be used to populate the 
table.
- * So, [[PreWriteCheck]] can detect cases that are not allowed.
- */
-case class CreateTableUsingAsSelect(
-tableIdent: TableIdentifier,
-provider: String,
-partitionColumns: Array[String],
-bucketSpec: Option[BucketSpec],
-mode: SaveMode,
-options: Map[String, String],
-child: LogicalPlan) extends logical.UnaryNode {
   override def output: Seq[Attribute] = Seq.empty[Attribute]
+
+  override def children: Seq[LogicalPlan] = query.toSeq
--- End diff --

This is great! Sometimes, the plan of `query` could be not analyzed at the 
end. This resolves an existing bug. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14478: [SPARK-16875][SQL] Add args checking for DataSet randomS...

2016-08-03 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/14478
  
@srowen Right. I just add those checking for RDD.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...

2016-08-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14482#discussion_r73464594
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -154,6 +274,21 @@ private[sql] case class PreWriteCheck(conf: SQLConf, 
catalog: SessionCatalog)
 
   def apply(plan: LogicalPlan): Unit = {
 plan.foreach {
+  case c @ CreateTable(tableDesc, mode, query) if c.resolved =>
+// Since we are saving table metadata to metastore, we should make 
sure the table name
+// and database name don't break some common restrictions, e.g. 
special chars except
+// underscore are not allowed.
+val pattern = Pattern.compile("[\\w_]+")
--- End diff --

cc @hvanhovell , I think this is the only place that we need this check, as 
`CreateTable` is the only plan that can save a table metadata into metastore. 
what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14482: [SPARK-16879][SQL] unify logical plans for CREATE TABLE ...

2016-08-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14482
  
cc @gatorsmile , IIRC, you have some PRs about error handling. After this 
PR, we can have a centre place for basic error handling, is it good enough for 
all the error cases you found?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14480: [MINOR][SQL] Fix minor formatting issue of SortAg...

2016-08-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14480


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14482: [SPARK-16879][SQL] unify logical plans for CREATE TABLE ...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14482
  
**[Test build #63211 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63211/consoleFull)**
 for PR 14482 at commit 
[`ec47911`](https://github.com/apache/spark/commit/ec479111f18257286e09723049badf402b1fad1a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...

2016-08-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14482#discussion_r73464308
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -206,22 +206,22 @@ private[sql] case class PreWriteCheck(conf: SQLConf, 
catalog: SessionCatalog)
 // The relation in l is not an InsertableRelation.
 failAnalysis(s"$l does not allow insertion.")
 
-  case c: CreateTableUsingAsSelect =>
+  case CreateTable(tableDesc, mode, Some(query)) =>
--- End diff --

Now this rule only checks `if the table is an input table of the query`, it 
won't do anything for hive serde tables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...

2016-08-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14482#discussion_r73464228
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -62,6 +66,122 @@ private[sql] class ResolveDataSource(sparkSession: 
SparkSession) extends Rule[Lo
 }
 
 /**
+ * Preprocess some DDL plans, e.g. [[CreateTable]], to do some 
normalization and checking.
+ */
+case class PreprocessDDL(conf: SQLConf) extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+// When we CREATE TABLE without specifying the table schema, we should 
fail the query if
+// bucketing information is specified, as we can't infer bucketing 
from data files currently,
+// and we should ignore the partition columns if it's specified, as we 
will infer it later, at
+// runtime.
+case c @ CreateTable(tableDesc, _, None) if tableDesc.schema.isEmpty =>
+  if (tableDesc.bucketSpec.isDefined) {
+failAnalysis("Cannot specify bucketing information if the table 
schema is not specified " +
+  "when creating and will be inferred at runtime")
+  }
+
+  val partitionColumnNames = tableDesc.partitionColumnNames
+  if (partitionColumnNames.nonEmpty) {
+// The table does not have a specified schema, which means that 
the schema will be inferred
+// at runtime. So, we are not expecting partition columns and we 
will discover partitions
+// at runtime. However, if there are specified partition columns, 
we simply ignore them and
+// provide a warning message.
+logWarning(
+  s"Specified partition columns 
(${partitionColumnNames.mkString(",")}) will be " +
+s"ignored. The schema and partition columns of table 
${tableDesc.identifier} will " +
+"be inferred.")
+c.copy(tableDesc = tableDesc.copy(partitionColumnNames = Nil))
+  } else {
+c
+  }
+
+// Here we normalize partition, bucket and sort column names, w.r.t. 
the case sensitivity
+// config, and do various checks:
+//   * column names in table definition can't be duplicated.
+//   * partition, bucket and sort column names must exist in table 
definition.
+//   * partition, bucket and sort column names can't be duplicated.
+//   * can't use all table columns as partition columns.
+//   * partition columns' type must be AtomicType.
+//   * sort columns' type must be orderable.
--- End diff --

cc @gatorsmile , I think all this checks are general and can be applied to 
hive serde tables too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14480: [MINOR][SQL] Fix minor formatting issue of SortAggregate...

2016-08-03 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14480
  
Thanks, merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...

2016-08-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14482#discussion_r73464090
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -420,45 +420,40 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 
   object DDLStrategy extends Strategy {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-  case c: CreateTableUsing if c.temporary && !c.allowExisting =>
-logWarning(
-  s"CREATE TEMPORARY TABLE ${c.tableIdent.identifier} USING... is 
deprecated, " +
-s"please use CREATE TEMPORARY VIEW viewName USING... instead")
-ExecutedCommandExec(
-  CreateTempViewUsing(
-c.tableIdent, c.userSpecifiedSchema, replace = true, 
c.provider, c.options)) :: Nil
-
-  case c: CreateTableUsing if !c.temporary =>
+  case CreateTable(tableDesc, mode, None) if tableDesc.provider.get == 
"hive" =>
--- End diff --

no, the `provider` is always defined, see 
https://github.com/apache/spark/pull/14482/files#diff-ea32a127bbe0c2bab24b0bbc8c333982R30


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14482: [SPARK-16879][SQL] unify logical plans for CREATE TABLE ...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14482
  
**[Test build #63209 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63209/consoleFull)**
 for PR 14482 at commit 
[`108e385`](https://github.com/apache/spark/commit/108e3859d81391def31a381518a072a21f6c4567).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14478: [SPARK-16875][SQL] Add args checking for DataSet randomS...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14478
  
**[Test build #63210 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63210/consoleFull)**
 for PR 14478 at commit 
[`4b0efea`](https://github.com/apache/spark/commit/4b0efea99f2d3e48124040b5f1cace28d603e386).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14472
  
**[Test build #3201 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3201/consoleFull)**
 for PR 14472 at commit 
[`a1e1b57`](https://github.com/apache/spark/commit/a1e1b578cd8f7aa45fd0db107b194c500284ae79).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14492
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14492
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63203/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14492
  
**[Test build #63203 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63203/consoleFull)**
 for PR 14492 at commit 
[`ffc0e4a`](https://github.com/apache/spark/commit/ffc0e4a363968fa62a592f96e37669ca1bcbf099).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...

2016-08-03 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/14492
  
More specifically, the Spark distribution has the jars needed by the 
launcher in `$SPARK_HOME/jars`, so basically this is extra code in Spark to 
support non-standard distributions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12135
  
**[Test build #63208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63208/consoleFull)**
 for PR 12135 at commit 
[`785a667`](https://github.com/apache/spark/commit/785a66703cfe4b2de29047994ab6b0bb38065c43).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...

2016-08-03 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/14492
  
But what's the goal of that? If there's nothing in `$SPARK_HOME/jars`, why 
not create a symlink instead to the location where the jars are?

The change itself doesn't really cause any problems, I just don't 
understand the need.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12135: [SPARK-14352][SQL] approxQuantile should support ...

2016-08-03 Thread zhengruifeng

Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12135#discussion_r73462027
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1181,18 +1181,33 @@ def approxQuantile(self, col, probabilities, 
relativeError):
 Space-efficient Online Computation of Quantile Summaries]]
 by Greenwald and Khanna.
 
-:param col: the name of the numerical column
+:param col: the name of the numerical column, or a list/tuple of
+  numerical columns.
 :param probabilities: a list of quantile probabilities
   Each number must belong to [0, 1].
   For example 0 is the minimum, 0.5 is the median, 1 is the 
maximum.
 :param relativeError:  The relative target precision to achieve
   (>= 0). If set to zero, the exact quantiles are computed, which
   could be very expensive. Note that values greater than 1 are
   accepted but give the same result as 1.
-:return:  the approximate quantiles at the given probabilities
+:return:  the approximate quantiles at the given probabilities. If
+  the input `col` is a string, the output is a list of float. If 
the
+  input `col` is a list or tuple of strings, the output is also a
+  list, but each element in it is a list of float, i.e., the output
+  is a list of list of float.
 """
-if not isinstance(col, str):
-raise ValueError("col should be a string.")
+if not isinstance(col, (str, list, tuple)):
+raise ValueError("col should be a string, list or tuple.")
+
+isStr = isinstance(col, str)
--- End diff --

Thanks for helping to review this PR, it is quite a while.
The type of `col` detemine the type of return. 
If I make `col = [col]` here, I will not know whether to return a `list` or 
a `list of list`. 
Like this:
```
>>> dataset = 
spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt")
>>> dataset.stat.approxQuantile(['label'], [0.1,0.2], 0.1)
[[0.0, 1.0]]
>>> dataset.stat.approxQuantile('label', [0.1,0.2], 0.1)
[0.0, 1.0]

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...

2016-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14482#discussion_r73461786
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -233,12 +233,11 @@ private[sql] case class PreWriteCheck(conf: SQLConf, 
catalog: SessionCatalog)
 }
 
 PartitioningUtils.validatePartitionColumn(
-  c.child.schema, c.partitionColumns, conf.caseSensitiveAnalysis)
+  query.schema, tableDesc.partitionColumnNames, 
conf.caseSensitiveAnalysis)
 
 for {
-  spec <- c.bucketSpec
-  sortColumnName <- spec.sortColumnNames
-  sortColumn <- c.child.schema.find(_.name == sortColumnName)
+  spec <- tableDesc.bucketSpec
+  sortColumn <- 
tableDesc.schema.filter(spec.sortColumnNames.contains)
--- End diff --

Below is the logics for bucketing tables. If we do not plan to support Hive 
bucketing tables, maybe we just issue an exception? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14490
  
**[Test build #63207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63207/consoleFull)**
 for PR 14490 at commit 
[`79eaef7`](https://github.com/apache/spark/commit/79eaef7f55779e949d7e8dc0b4e4749d76f99c9f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...

2016-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14482#discussion_r73461643
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -233,12 +233,11 @@ private[sql] case class PreWriteCheck(conf: SQLConf, 
catalog: SessionCatalog)
 }
 
 PartitioningUtils.validatePartitionColumn(
-  c.child.schema, c.partitionColumns, conf.caseSensitiveAnalysis)
+  query.schema, tableDesc.partitionColumnNames, 
conf.caseSensitiveAnalysis)
--- End diff --

`validatePartitionColumn` is for data source tables only. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14490
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63201/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14490
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...

2016-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14482#discussion_r73461508
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala 
---
@@ -206,22 +206,22 @@ private[sql] case class PreWriteCheck(conf: SQLConf, 
catalog: SessionCatalog)
 // The relation in l is not an InsertableRelation.
 failAnalysis(s"$l does not allow insertion.")
 
-  case c: CreateTableUsingAsSelect =>
+  case CreateTable(tableDesc, mode, Some(query)) =>
--- End diff --

Previously, this is only applicable to Data Source tables. After this 
change, this is also applicable to Create Hive Table As Select. Thus, some 
validation might not be right to Hive tables. We have to be careful to check 
them one by one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...

2016-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14490
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14490
  
**[Test build #63201 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63201/consoleFull)**
 for PR 14490 at commit 
[`79eaef7`](https://github.com/apache/spark/commit/79eaef7f55779e949d7e8dc0b4e4749d76f99c9f).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...

2016-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14490#discussion_r73460838
  
--- Diff: scalastyle-config.xml ---
@@ -250,6 +250,14 @@ This file is divided into 3 sections:
 Omit braces in case clauses.
   
 
+  
+  
+^Override$
--- End diff --

Yes, I just ran a test after cloning 
[scalariform](https://github.com/scala-ide/scalariform) which 
[scalastyle](https://github.com/scalastyle/scalastyle) uses as below:

```
ScalaLexer.rawTokenise("@Override")
```

It seems this becomes

![2016-08-04 1 29 
11](https://cloud.githubusercontent.com/assets/6477701/17390365/78afcb96-5a47-11e6-9e9f-8d9a2d6c4ddf.png)

different tokens.

(BTW, maybe we should avoid to write `@Override` as it is.. I started to 
feel guilty for cc him/her here and there)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...

2016-08-03 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14492
  
Sure. This change is for putting Spark jars in a different dir than the 
default dir in `spark/assembly` or `spark/jars`. So, in this case, the main 
class is not in `SPARK_JARS_DIR`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14065: [SPARK-14743][YARN] Add a configurable credential manage...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14065
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63200/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14065: [SPARK-14743][YARN] Add a configurable credential manage...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14065
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14065: [SPARK-14743][YARN] Add a configurable credential manage...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14065
  
**[Test build #63200 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63200/consoleFull)**
 for PR 14065 at commit 
[`127d85e`](https://github.com/apache/spark/commit/127d85ed54f057581a35c88fc7f85e1b8e13de38).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14474
  
**[Test build #63206 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63206/consoleFull)**
 for PR 14474 at commit 
[`3d90a68`](https://github.com/apache/spark/commit/3d90a68d84ff55249fb50c463a2bc0674d6fc79b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14467
  
**[Test build #63205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63205/consoleFull)**
 for PR 14467 at commit 
[`cc5f435`](https://github.com/apache/spark/commit/cc5f4352950f338afeecf1e4f5eaceae853b1520).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14474
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63204/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14474
  
**[Test build #63204 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63204/consoleFull)**
 for PR 14474 at commit 
[`d9b5a40`](https://github.com/apache/spark/commit/d9b5a40d2d28d9e2fcc0f0605550f57b37634a0c).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14474
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14473: [SPARK-16495] [MLlib]Add ADMM optimizer in mllib package

2016-08-03 Thread ZunwenYou

Github user ZunwenYou commented on the issue:

https://github.com/apache/spark/pull/14473
  
@MLnick please have a look at this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14452
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63202/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14452
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...

2016-08-03 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/14490#discussion_r73459086
  
--- Diff: scalastyle-config.xml ---
@@ -250,6 +250,14 @@ This file is divided into 3 sections:
 Omit braces in case clauses.
   
 
+  
+  
+^Override$
--- End diff --

I just reprored this locally on my machine as well (doesn't trigger with 
`@Override` or `\@Override`). One guess is the token checker has split them 
into separate tokens perhaps?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #63202 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63202/consoleFull)**
 for PR 14452 at commit 
[`7fe57a0`](https://github.com/apache/spark/commit/7fe57a0666f5d5f489d5b09a6cc20f784611dcf8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14428: [SPARK-16810] Refactor registerSinks with multiple const...

2016-08-03 Thread lovexi

Github user lovexi commented on the issue:

https://github.com/apache/spark/pull/14428
  
I think this one is ready for reviews. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...

2016-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14490#discussion_r73458908
  
--- Diff: scalastyle-config.xml ---
@@ -250,6 +250,14 @@ This file is divided into 3 sections:
 Omit braces in case clauses.
   
 
+  
+  
+^Override$
--- End diff --

BTW, actually I tried `RegexChecker` as well to grep this case but then I 
found actually I should come up with a complex regular expression for some 
exceptional cases such as 

- `@Override` in comments

```scala
  /**
...
   *@Override
   *public void close(Throwable errorOrNull) {
   *  // close the connection
   *}
...
```

- `@Override` in codegen

```scala
...
"  @Override public String toString() { return \"" + toStringValue + "\"; 
}}"
...
```

So, I had to use `TokenChecker`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...

2016-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14156
  
That's the question indeed. I'm not sure because the function that's 
supplied could be anything. I don't see how it could automatically be converted 
to a vectorized operation automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...

2016-08-03 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/14156
  
yeah, currently it seems to make a little overhead (do a copy), but I think 
it will take advantage of breeze optimization, in the future, e.g, SIMD 
instructions or something ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14488: [SPARK-16826][SQL] Switch to java.net.URI for par...

2016-08-03 Thread sylvinus

Github user sylvinus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14488#discussion_r73458498
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -749,25 +749,44 @@ case class ParseUrl(children: Seq[Expression])
 Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
   }
 
-  private def getUrl(url: UTF8String): URL = {
+  private def getUrl(url: UTF8String): URI = {
 try {
-  new URL(url.toString)
+  new URI(url.toString)
 } catch {
-  case e: MalformedURLException => null
+  case e: URISyntaxException => null
 }
   }
 
-  private def getExtractPartFunc(partToExtract: UTF8String): URL => String 
= {
+  private def getExtractPartFunc(partToExtract: UTF8String): URI => String 
= {
+
+// partToExtract match {
+//   case HOST => _.toURL().getHost
+//   case PATH => _.toURL().getPath
+//   case QUERY => _.toURL().getQuery
+//   case REF => _.toURL().getRef
+//   case PROTOCOL => _.toURL().getProtocol
+//   case FILE => _.toURL().getFile
+//   case AUTHORITY => _.toURL().getAuthority
+//   case USERINFO => _.toURL().getUserInfo
+//   case _ => (url: URI) => null
+// }
+
 partToExtract match {
   case HOST => _.getHost
-  case PATH => _.getPath
-  case QUERY => _.getQuery
-  case REF => _.getRef
-  case PROTOCOL => _.getProtocol
-  case FILE => _.getFile
-  case AUTHORITY => _.getAuthority
-  case USERINFO => _.getUserInfo
-  case _ => (url: URL) => null
+  case PATH => _.getRawPath
+  case QUERY => _.getRawQuery
+  case REF => _.getRawFragment
+  case PROTOCOL => _.getScheme
+  case FILE =>
+(url: URI) =>
+  if (url.getRawQuery ne null) {
--- End diff --

It does seem so:
```
scala> new URL("http://example.com/path%20?query=x%20#hash%20;).getQuery()
res1: String = query=x%20

scala> new URL("http://example.com/path%20?query=x%20#hash%20;).getRef()
res2: String = hash%20
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...

2016-08-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14490#discussion_r73458483
  
--- Diff: scalastyle-config.xml ---
@@ -250,6 +250,14 @@ This file is divided into 3 sections:
 Omit braces in case clauses.
   
 
+  
+  
+^Override$
--- End diff --

Hm so this matches "@Override" but not "Override" now, but would reverse if 
the regex included "@"? That sounds flipped. @ isn't a special char. You're 
doubly sure that's right? It also seems like this is matching "Override" on a 
line alone but should be looking for "@Override" anywhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14474: [SPARK-16853][SQL] fixes encoder error in DataSet typed ...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14474
  
**[Test build #63204 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63204/consoleFull)**
 for PR 14474 at commit 
[`d9b5a40`](https://github.com/apache/spark/commit/d9b5a40d2d28d9e2fcc0f0605550f57b37634a0c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...

2016-08-03 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/14492
  
@yhuai can you explain the use case you're trying to cover here with an 
example?

`LAUNCH_CLASSPATH` is the classpath of the launcher process (the process 
that creates the command line to then run the `SparkSubmit` class). The 
launcher itself already adds `SPARK_DIST_CLASSPATH` to Spark's classpath:


https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java#L204

```
addToClassPath(cp, getenv("HADOOP_CONF_DIR"));
addToClassPath(cp, getenv("YARN_CONF_DIR"));
addToClassPath(cp, getenv("SPARK_DIST_CLASSPATH"));
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14445: [SPARK-16320] [SQL] Fix performance regression for parqu...

2016-08-03 Thread clockfly

Github user clockfly commented on the issue:

https://github.com/apache/spark/pull/14445
  
@rxin  maybe we can still use this Jira Id by @maver1ck's comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...

2016-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14156
  
I see, this copies x to y then modifies y in place. OK. Is that more 
efficient? it seems like extra work, but does the transform method make up for 
it? just seeing if this has actually been observed to speed it up or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14476: [SPARK-16867][SQL] createTable and alterTable in ...

2016-08-03 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14476#discussion_r73458295
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -82,7 +82,7 @@ abstract class ExternalCatalog {
* Note: If the underlying implementation does not support altering a 
certain field,
* this becomes a no-op.
*/
-  def alterTable(db: String, tableDefinition: CatalogTable): Unit
+  def alterTable(tableDefinition: CatalogTable): Unit
--- End diff --

Let's add doc to explain that it does not support moving a table to another 
db since a developer want to use it in this way by just looking at this API 
(tableDefinition has a db field).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...

2016-08-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14482#discussion_r73458290
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -367,15 +368,16 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
 throw new AnalysisException(s"Table $tableIdent already exists.")
 
   case _ =>
-val cmd =
-  CreateTableUsingAsSelect(
-tableIdent,
-source,
-
partitioningColumns.map(_.toArray).getOrElse(Array.empty[String]),
-getBucketSpec,
-mode,
-extraOptions.toMap,
-df.logicalPlan)
+val tableDesc = CatalogTable(
+  identifier = tableIdent,
+  tableType = CatalogTableType.EXTERNAL,
+  storage = CatalogStorageFormat.empty.copy(properties = 
extraOptions.toMap),
+  schema = new StructType,
+  provider = Some(source),
+  partitionColumnNames = partitioningColumns.getOrElse(Nil),
+  bucketSpec = getBucketSpec
+)
+val cmd = CreateTable(tableDesc, mode, Some(df.logicalPlan))
--- End diff --

hmmm, do we have to use `Option` even though the parameter is guaranteed to 
be not null?

For this case, we can't use `Option`, or the behaviour will be changed. 
Previously if `df.logicalPlan` is null, it's a bug and we will throw NPE 
somewhere. If we use `Option` here, then we are silently converting a CTAS to 
CREATE TABLE, which is not expected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...

2016-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14490#discussion_r73458250
  
--- Diff: scalastyle-config.xml ---
@@ -250,6 +250,14 @@ This file is divided into 3 sections:
 Omit braces in case clauses.
   
 
+  
+  
+^Override$
--- End diff --

I actually tried `^@Override$` first but I found this Scala checking 
recognise this token as just `Override`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...

2016-08-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14490#discussion_r73458156
  
--- Diff: scalastyle-config.xml ---
@@ -250,6 +250,14 @@ This file is divided into 3 sections:
 Omit braces in case clauses.
   
 
+  
+  
+^Override$
--- End diff --

Does this need to look for a leading "@" as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...

2016-08-03 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/14156
  
@srowen
The := operator in BDM is simply copy one BDM to another, and it is widely 
used in breeze source, e.g, we can check DenseMatrix.copy function in Breeze:
it first use `DenseMatrix.create` to create a new Matrix with the same 
dimension
`val result = DenseMatrix.create(...)`
, and them use
`result := this` to copy self into the matrix just created.

The mechanism of := operator for DenseMatrix is that the DenseMatrix 
implements the `OpSet` trait.
check `DenseMatrix` source file in breeze, in line 985, there is:
implicit val setMV_D:OpSet.InPlaceImpl2[...] = new SetDMDVOp[Double]()
so, the implementation code is in `SetDMDVOp` class
and we can see that in `SetDMDVOp` it do Type Specialization for Double 
type so that the compiling code will have high efficiency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14477: [SPARK-16870][docs]Summary:add "spark.sql.broadcastTimeo...

2016-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14477
  
Any objection to documenting it @liancheng ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14488: [SPARK-16826][SQL] Switch to java.net.URI for par...

2016-08-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14488#discussion_r73457928
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -749,25 +749,44 @@ case class ParseUrl(children: Seq[Expression])
 Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
   }
 
-  private def getUrl(url: UTF8String): URL = {
+  private def getUrl(url: UTF8String): URI = {
 try {
-  new URL(url.toString)
+  new URI(url.toString)
 } catch {
-  case e: MalformedURLException => null
+  case e: URISyntaxException => null
 }
   }
 
-  private def getExtractPartFunc(partToExtract: UTF8String): URL => String 
= {
+  private def getExtractPartFunc(partToExtract: UTF8String): URI => String 
= {
+
+// partToExtract match {
+//   case HOST => _.toURL().getHost
+//   case PATH => _.toURL().getPath
+//   case QUERY => _.toURL().getQuery
+//   case REF => _.toURL().getRef
+//   case PROTOCOL => _.toURL().getProtocol
+//   case FILE => _.toURL().getFile
+//   case AUTHORITY => _.toURL().getAuthority
+//   case USERINFO => _.toURL().getUserInfo
+//   case _ => (url: URI) => null
+// }
+
 partToExtract match {
   case HOST => _.getHost
-  case PATH => _.getPath
-  case QUERY => _.getQuery
-  case REF => _.getRef
-  case PROTOCOL => _.getProtocol
-  case FILE => _.getFile
-  case AUTHORITY => _.getAuthority
-  case USERINFO => _.getUserInfo
-  case _ => (url: URL) => null
+  case PATH => _.getRawPath
+  case QUERY => _.getRawQuery
+  case REF => _.getRawFragment
+  case PROTOCOL => _.getScheme
+  case FILE =>
+(url: URI) =>
+  if (url.getRawQuery ne null) {
--- End diff --

We really need the 'raw' elements in each of these? I'd think they need to 
be parsed. Your tests show this code not parsing escapes. Is that how it 
behaved before? then OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14488: [SPARK-16826][SQL] Switch to java.net.URI for par...

2016-08-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14488#discussion_r73457856
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -749,25 +749,44 @@ case class ParseUrl(children: Seq[Expression])
 Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX)
   }
 
-  private def getUrl(url: UTF8String): URL = {
+  private def getUrl(url: UTF8String): URI = {
 try {
-  new URL(url.toString)
+  new URI(url.toString)
 } catch {
-  case e: MalformedURLException => null
+  case e: URISyntaxException => null
--- End diff --

Don't change this unless you need to make another change anyway, but this 
can be `case _: ...` I know it wasn't like that before


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...

2016-08-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14482#discussion_r73457639
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -61,6 +64,38 @@ trait CheckAnalysis extends PredicateHelper {
 }
   }
 
+  private def checkColumnNames(tableDesc: CatalogTable): Unit = {
+val colNames = tableDesc.schema.map(_.name)
+val colNamesSet = colNames.toSet
+checkDuplicatedColumnNames(colNames, colNamesSet, "table definition of 
" + tableDesc.identifier)
+
+def requireSubsetOfSchema(subColNames: Seq[String], colType: String): 
Unit = {
+  val subColNamesSet = subColNames.toSet
+  checkDuplicatedColumnNames(subColNames, subColNamesSet, colType)
+  if (!subColNamesSet.subsetOf(colNamesSet)) {
+failAnalysis(s"$colType columns (${subColNames.mkString(", ")}) 
must be a subset of " +
+  s"schema (${colNames.mkString(", ")}) in table 
'${tableDesc.identifier}'")
+  }
+}
+
+// Verify that the provided columns are part of the schema
+requireSubsetOfSchema(tableDesc.partitionColumnNames, "partition")
+
requireSubsetOfSchema(tableDesc.bucketSpec.map(_.bucketColumnNames).getOrElse(Nil),
 "bucket")
+
requireSubsetOfSchema(tableDesc.bucketSpec.map(_.sortColumnNames).getOrElse(Nil),
 "sort")
+  }
+
+  private def checkDuplicatedColumnNames(
+  colNames: Seq[String],
+  colNamesSet: Set[String],
+  colType: String): Unit = {
+if (colNamesSet.size != colNames.length) {
+  val duplicateColumns = colNames.groupBy(identity).collect {
+case (x, ys) if ys.length > 1 => quoteIdentifier(x)
+  }
--- End diff --

we should, but the previous code doesn't consider case sensitivity either, 
we can do it in follow-ups.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14482: [SPARK-16879][SQL] unify logical plans for CREATE...

2016-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14482#discussion_r73457371
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -349,6 +384,27 @@ trait CheckAnalysis extends PredicateHelper {
  |${s.catalogTable.identifier}
""".stripMargin)
 
+  case c @ CreateTable(tableDesc, mode, query) if c.resolved =>
+// Since we are saving table metadata to metastore, we should 
make sure the table name
+// and database name don't break some common restrictions, 
e.g. special chars except
+// underscore are not allowed.
+val pattern = Pattern.compile("[\\w_]+")
+if (!pattern.matcher(tableDesc.identifier.table).matches()) {
+  failAnalysis(s"Table name ${tableDesc.identifier.table} is 
not a valid name for " +
+s"metastore, it only accepts table name containing 
characters, numbers and _.")
+}
+if (tableDesc.identifier.database.isDefined &&
+  
!pattern.matcher(tableDesc.identifier.database.get).matches()) {
+  failAnalysis(s"Database name ${tableDesc.identifier.table} 
is not a valid name for " +
--- End diff --

`${tableDesc.identifier.table}` -> `${tableDesc.identifier.database.get}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14488: [SPARK-16826][SQL] Switch to java.net.URI for parse_url(...

2016-08-03 Thread sylvinus

Github user sylvinus commented on the issue:

https://github.com/apache/spark/pull/14488
  
rebase done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14491: [SPARK-16886] [EXAMPLES][SQL] structured streaming netwo...

2016-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14491
  
OK, doesn't need a JIRA. Looks like there are similar occurrences in the 
Python code, and streaming docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13738: [SPARK-11227][CORE] UnknownHostException can be thrown w...

2016-08-03 Thread soldiershen

Github user soldiershen commented on the issue:

https://github.com/apache/spark/pull/13738
  
@sarutak got it. I add hdfs conf file to specific the host .Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...

2016-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14156
  
Is there reasonable evidence this speeds things up? just want to make sure 
this does not make it slower. Help me understand the := operator? I don't 
recognize how it's helping compute y as a function of x here. I assume the 
method below can't use the same mechanism?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14488: [SPARK-16826][SQL] Switch to java.net.URI for parse_url(...

2016-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14488
  
Description looks good. You can use `git rebase -i HEAD~4` or similar to 
`drop` the extra commit here. Pending that and tests passing, looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPA...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14492
  
**[Test build #63203 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63203/consoleFull)**
 for PR 14492 at commit 
[`ffc0e4a`](https://github.com/apache/spark/commit/ffc0e4a363968fa62a592f96e37669ca1bcbf099).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14492: [SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_...

2016-08-03 Thread yhuai

GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/14492

[SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPATH

## What changes were proposed in this pull request?
To deploy Spark, it can be pretty convenient to put all jars (spark jars, 
hadoop jars, and other libs' jars) that we want to include in the classpath of 
Spark in the same dir, which may not be spark's assembly dir. So, I am 
proposing to also add SPARK_DIST_CLASSPATH to the LAUNCH_CLASSPATH.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark SPARK-16887

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14492.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14492


commit ffc0e4a363968fa62a592f96e37669ca1bcbf099
Author: Yin Huai 
Date:   2016-08-04T02:57:47Z

[SPARK-16887] Add SPARK_DIST_CLASSPATH to LAUNCH_CLASSPATH




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...

2016-08-03 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14490
  
LGTM pending Jenkins.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14472
  
**[Test build #3201 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3201/consoleFull)**
 for PR 14472 at commit 
[`a1e1b57`](https://github.com/apache/spark/commit/a1e1b578cd8f7aa45fd0db107b194c500284ae79).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14491: [SPARK-16886] [EXAMPLES][SQL] structured streaming netwo...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14491
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14472
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63197/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...

2016-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14472
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14472: [SPARK-16866][SQL] Infrastructure for file-based SQL end...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14472
  
**[Test build #63197 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63197/consoleFull)**
 for PR 14472 at commit 
[`a1e1b57`](https://github.com/apache/spark/commit/a1e1b578cd8f7aa45fd0db107b194c500284ae79).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14491: [SPARK-16886] [EXAMPLES][SQL] structured streamin...

2016-08-03 Thread ganeshchand

GitHub user ganeshchand opened a pull request:

https://github.com/apache/spark/pull/14491

[SPARK-16886] [EXAMPLES][SQL] structured streaming network word count 
examples â¦

## What changes were proposed in this pull request?

Fixed a minor code comment typo by replacing DataFrame with Dataset


## How was this patch tested?

Run Locally

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ganeshchand/spark SPARK-16886

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14491.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14491


commit 8751e08b18b8f8a4467cf15f88076c0f93294fc2
Author: Ganesh Chand 
Date:   2016-08-04T02:29:40Z

[SPARK-16886] [SQL] structured streaming network word count examples code 
comments
Replaced Dataframe with Dataset in code comments




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14490
  
**[Test build #63201 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63201/consoleFull)**
 for PR 14490 at commit 
[`79eaef7`](https://github.com/apache/spark/commit/79eaef7f55779e949d7e8dc0b4e4749d76f99c9f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #63202 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63202/consoleFull)**
 for PR 14452 at commit 
[`7fe57a0`](https://github.com/apache/spark/commit/7fe57a0666f5d5f489d5b09a6cc20f784611dcf8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12913: [SPARK-928][CORE] Add support for Unsafe-based se...

2016-08-03 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/12913#discussion_r73454617
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala ---
@@ -399,6 +399,14 @@ class KryoSerializerSuite extends SparkFunSuite with 
SharedSparkContext {
 assert(!ser2.getAutoReset)
   }
 
+  private def testBothUnsafeAndSafe(f: SparkConf => Unit): Unit = {
--- End diff --

Yes will update the pr today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14490: [SPARK-16877][BUILD] Add rules for preventing to use Jav...

2016-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14490
  
cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14490: [SPARK-16877][BUILD] Add rules for preventing to ...

2016-08-03 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/14490

[SPARK-16877][BUILD] Add rules for preventing to use Java annotations 
(Deprecated and Override)

## What changes were proposed in this pull request?

This PR adds both rules for preventing to use `@Deprecated` and `@Override`.

- Java's `@Override`
  It seems Scala compiler just ignores this. This can be problematic when 
traits or abstract classes as inherited. Scala compiles seems `override` 
modifier is only mandatory for " that override some other **concrete member 
definition** in a parent class" but not for for **incomplete member 
definition** (such as ones from trait or abstract), see 
(http://www.scala-lang.org/files/archive/spec/2.11/05-classes-and-objects.html#override)

  For a simple example,

  - Normal class - needs `override` modifier

  ```bash
  scala> class A { def say = {}}
  defined class A

  scala> class B extends A { def say = {}}
  :8: error: overriding method say in class A of type => Unit;
   method say needs `override' modifier
 class B extends A { def say = {}}
 ^
  ```

  - Trait - does not need `override` modifier

  ```bash
  scala> trait A { def say }
  defined trait A

  scala> class B extends A { def say = {}}
  defined class B
  ```

  To cut this short, in the latter case, we can write `@Override` 
annotation (meaning nothing) which might confuse engineers that Java's 
annotation is working fine. It might be great if we prevent those potential 
confusion.

- Java's `@Deprecated`
  When `@Deprecated` is used,  it seems Scala compiler recognises this 
correctly but it seems we use Scala one `@deprecated` across codebase.

## How was this patch tested?

Manually tested, by inserting both `@Override` and `@Deprecated`. This will 
shows the error messages as below:

```bash
Scalastyle checks failed at following occurrences:
[error] ... : @deprecated should be used instead of @java.lang.
```

```basg
Scalastyle checks failed at following occurrences:
[error] ... : override modifier should be used instead of 
@java.lang.Override.
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-16877

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14490.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14490


commit 79eaef7f55779e949d7e8dc0b4e4749d76f99c9f
Author: hyukjinkwon 
Date:   2016-08-04T02:02:25Z

Add rules for preventing to use Java annotations (Deprecated and Override)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14488: [SPARK-16826][SQL] Switch to java.net.URI for parse_url(...

2016-08-03 Thread sylvinus

Github user sylvinus commented on the issue:

https://github.com/apache/spark/pull/14488
  
@rxin is that better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14279: [SPARK-16216][SQL] Write Timestamp and Date in ISO 8601 ...

2016-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14279
  
Yes, as shown in 
https://github.com/apache/spark/pull/14279#issuecomment-236469454

(but we should still manually give the schema as inferring `DateType` and 
`TimestampType` is not supported in JSON and `DateType` is not inferred in 
`CSV` )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13891: [SPARK-6685][MLLIB]Use DSYRK to compute AtA in ALS

2016-08-03 Thread hqzizania

Github user hqzizania commented on the issue:

https://github.com/apache/spark/pull/13891
  
cc @mengxr @yanboliang Was this patch Okay?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14479: [SPARK-16873] [Core] Fix SpillReader NPE when spi...

2016-08-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14479


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14479: [SPARK-16873] [Core] Fix SpillReader NPE when spillFile ...

2016-08-03 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14479
  
LGTM - merging in master/2.0/1.6.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 525 matches

Mail list logo