[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...

2016-12-07 Thread djvulee
Github user djvulee commented on the issue:

https://github.com/apache/spark/pull/16210
  
@rxin our jdk is jdk1.8.0_91, and we do not install the scala, the OS is 
Debian 4.6.4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16209
  
**[Test build #69855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69855/consoleFull)**
 for PR 16209 at commit 
[`faa8172`](https://github.com/apache/spark/commit/faa8172751082cd532dd7f8292a318a81a3a53e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...

2016-12-07 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16210
  
Paging @jodersky 
I'm not sure about this ... isn't `-usejavacp` a legacy option? I don't 
know of any other reports of this so it is possibly specific to your env.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers example cod...

2016-12-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16148
  
LGTM. Looks like we are locked down for 2.1. Good to have with all the new 
examples but seems like a lot of code (example) changes?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...

2016-12-07 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16210
  
What are the environments?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2016-12-07 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/16158
  
@MLnick Does this match your thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16014: [SPARK-18590][SPARKR] build R source package when...

2016-12-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16014#discussion_r91459472
  
--- Diff: dev/create-release/release-build.sh ---
@@ -221,14 +235,13 @@ if [[ "$1" == "package" ]]; then
 
   # We increment the Zinc port each time to avoid OOM's and other 
craziness if multiple builds
   # share the same Zinc server.
-  # Make R source package only once. (--r)
   FLAGS="-Psparkr -Phive -Phive-thriftserver -Pyarn -Pmesos"
   make_binary_release "hadoop2.3" "-Phadoop-2.3 $FLAGS" "3033" &
   make_binary_release "hadoop2.4" "-Phadoop-2.4 $FLAGS" "3034" &
   make_binary_release "hadoop2.6" "-Phadoop-2.6 $FLAGS" "3035" &
   make_binary_release "hadoop2.7" "-Phadoop-2.7 $FLAGS" "3036" "withpip" &
   make_binary_release "hadoop2.4-without-hive" "-Psparkr -Phadoop-2.4 
-Pyarn -Pmesos" "3037" &
-  make_binary_release "without-hadoop" "--r -Psparkr -Phadoop-provided 
-Pyarn -Pmesos" "3038" &
+  make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn 
-Pmesos" "3038" "withr" &
--- End diff --

@shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16150: [SPARK-18349][SparkR]:Update R API documentation on ml m...

2016-12-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16150
  
LGTM.
@rxin I know rc2 has been cut but can this still go to branch-2.1? There're 
only 2 lines of code change and API doc improvements could really help 
usability in 2.1.
+ @shivaram


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16210
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16204
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69843/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16204
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16204
  
**[Test build #69843 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69843/consoleFull)**
 for PR 16204 at commit 
[`3199f8f`](https://github.com/apache/spark/commit/3199f8f9265e5d324c50998523a4c85a3590a39c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16210: [Core][SPARK-18778]Fix the scala classpath under ...

2016-12-07 Thread djvulee
GitHub user djvulee opened a pull request:

https://github.com/apache/spark/pull/16210

[Core][SPARK-18778]Fix the scala classpath under some environment

## What changes were proposed in this pull request?
under some environment, the Dscala.usejavacp=true option seems not work, 
pass the -usejavacp directly to the repl fix this.

## How was this patch tested?
we test in our cluster environment.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/djvulee/spark sparkShell

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16210.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16210


commit ab81a7af165c7287c0356758097dfa5ded6adea3
Author: DjvuLee 
Date:   2016-12-08T07:15:59Z

[Core]Fix the scala classpath under some envrionment

under some envrionment, the Dscala.usejavacp=true option seems not work,
pass the -usejavacp directly to the repl fix this.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2016-12-07 Thread eyalfa
Github user eyalfa commented on the issue:

https://github.com/apache/spark/pull/16043
  
@HyukjinKwon , thanks for the quick response :-)
I'll tackle these later today (gott'a work sometimes)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r91453918
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, 
CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, 
GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field extraction
+  case GetStructField(createNamedStructLike : CreateNamedStructLike, 
ordinal, _) =>
+createNamedStructLike.valExprs(ordinal)
+}
+  }
+}
+
+/**
+* push down operations into [[CreateArray]].
+*/
+object SimplifyCreateArrayOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field selection (array of structs)
+  case GetArrayStructFields(CreateArray(elems), field, ordinal, 
numFields, containsNull) =>
+def getStructField(elem : Expression) = {
+  GetStructField(elem, ordinal, Some(field.name))
+}
+CreateArray(elems.map(getStructField))
--- End diff --

Could we do this like something as below?:

```scala
CreateArray(elems.map(elem => GetStructField(elem, ordinal, 
Some(field.name
```

It seems `getStructField(...)` is only used in this scope and I think it is 
good to remove this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r91454208
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, 
CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, 
GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field extraction
+  case GetStructField(createNamedStructLike : CreateNamedStructLike, 
ordinal, _) =>
+createNamedStructLike.valExprs(ordinal)
+}
+  }
+}
+
+/**
+* push down operations into [[CreateArray]].
+*/
+object SimplifyCreateArrayOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field selection (array of structs)
+  case GetArrayStructFields(CreateArray(elems), field, ordinal, 
numFields, containsNull) =>
+def getStructField(elem : Expression) = {
+  GetStructField(elem, ordinal, Some(field.name))
+}
+CreateArray(elems.map(getStructField))
+  // push down item selection.
+  case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) =>
+if (idx >= 0 && idx < elems.size) {
+  elems(idx)
+} else {
+  Cast(Literal(null), ga.dataType)
+}
+}
+  }
+}
+
+/**
+* push down operations into [[CreateMap]].
+*/
+object SimplifyCreateMapOps extends Rule[LogicalPlan]{
+  object ComparisonResult extends Enumeration {
+val PositiveMatch = Value
+val NegativeMatch = Value
+val UnDetermined = Value
+  }
+
+  def compareKeys(k1 : Expression, k2 : Expression) : 
ComparisonResult.Value = {
+(k1, k2) match {
+  case (x, y) if x.semanticEquals(y) => ComparisonResult.PositiveMatch
+  // make surethis is null safe, especially when datatypes differ
+  // is this even possible?
+  case (_ : Literal, _ : Literal) => ComparisonResult.NegativeMatch
+  case _ => ComparisonResult.UnDetermined
+}
+  }
+
+  case class ClassifiedEntries(undetermined : Seq[Expression],
+   nullable : Boolean,
+   firstPositive : Option[Expression]) {
--- End diff --

Oh @eyalfa, I believe we should make the indentation as below if it does 
not fit in 100 character length:

```scala
case class ClassifiedEntries(
undetermined : Seq[Expression],
nullable : Boolean,
firstPositive : Option[Expression]) {
...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r91455499
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, 
CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, 
GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field extraction
+  case GetStructField(createNamedStructLike : CreateNamedStructLike, 
ordinal, _) =>
+createNamedStructLike.valExprs(ordinal)
+}
+  }
+}
+
+/**
+* push down operations into [[CreateArray]].
+*/
+object SimplifyCreateArrayOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field selection (array of structs)
+  case GetArrayStructFields(CreateArray(elems), field, ordinal, 
numFields, containsNull) =>
+def getStructField(elem : Expression) = {
+  GetStructField(elem, ordinal, Some(field.name))
+}
+CreateArray(elems.map(getStructField))
+  // push down item selection.
+  case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) =>
+if (idx >= 0 && idx < elems.size) {
+  elems(idx)
+} else {
+  Cast(Literal(null), ga.dataType)
+}
+}
+  }
+}
+
+/**
+* push down operations into [[CreateMap]].
+*/
+object SimplifyCreateMapOps extends Rule[LogicalPlan]{
+  object ComparisonResult extends Enumeration {
+val PositiveMatch = Value
+val NegativeMatch = Value
+val UnDetermined = Value
+  }
+
+  def compareKeys(k1 : Expression, k2 : Expression) : 
ComparisonResult.Value = {
+(k1, k2) match {
+  case (x, y) if x.semanticEquals(y) => ComparisonResult.PositiveMatch
+  // make surethis is null safe, especially when datatypes differ
+  // is this even possible?
+  case (_ : Literal, _ : Literal) => ComparisonResult.NegativeMatch
+  case _ => ComparisonResult.UnDetermined
+}
+  }
+
+  case class ClassifiedEntries(undetermined : Seq[Expression],
+   nullable : Boolean,
+   firstPositive : Option[Expression]) {
+def normalize( k : Expression ) : ClassifiedEntries = this match {
+  /**
+  * when we have undetermined matches that might bproduce a null value,
+  * we can't separate a positive match and use [[Coalesce]] to choose 
the final result.
+  * so we 'hide' the positive match as an undetermined match.
+  */
+  case ClassifiedEntries( u, true, Some(p)) if u.nonEmpty =>
+ClassifiedEntries(u ++ Seq(k, p), true, None)
+  case _ => this
+}
+  }
+
+  def classifyEntries(mapEntries : Seq[(Expression, Expression)],
+  requestedKey : Expression) : ClassifiedEntries = {
+val res1 = mapEntries.foldLeft(ClassifiedEntries(Seq.empty, nullable = 
false, None)) {
+  case (prev @ ClassifiedEntries(_, _, Some(_)), _) => prev
+  case (ClassifiedEntries(prev, nullable, None), (k, v)) =>
+compareKeys(k, requestedKey) match {
+  case ComparisonResult.UnDetermined =>
+val vIsNullable = v.nullable
+val nextNullbale = nullable || vIsNullable
+   

[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r91455356
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, 
CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, 
GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field extraction
+  case GetStructField(createNamedStructLike : CreateNamedStructLike, 
ordinal, _) =>
+createNamedStructLike.valExprs(ordinal)
+}
+  }
+}
+
+/**
+* push down operations into [[CreateArray]].
+*/
+object SimplifyCreateArrayOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field selection (array of structs)
+  case GetArrayStructFields(CreateArray(elems), field, ordinal, 
numFields, containsNull) =>
+def getStructField(elem : Expression) = {
+  GetStructField(elem, ordinal, Some(field.name))
+}
+CreateArray(elems.map(getStructField))
+  // push down item selection.
+  case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) =>
+if (idx >= 0 && idx < elems.size) {
+  elems(idx)
+} else {
+  Cast(Literal(null), ga.dataType)
+}
+}
+  }
+}
+
+/**
+* push down operations into [[CreateMap]].
+*/
+object SimplifyCreateMapOps extends Rule[LogicalPlan]{
+  object ComparisonResult extends Enumeration {
+val PositiveMatch = Value
+val NegativeMatch = Value
+val UnDetermined = Value
+  }
+
+  def compareKeys(k1 : Expression, k2 : Expression) : 
ComparisonResult.Value = {
+(k1, k2) match {
+  case (x, y) if x.semanticEquals(y) => ComparisonResult.PositiveMatch
+  // make surethis is null safe, especially when datatypes differ
+  // is this even possible?
+  case (_ : Literal, _ : Literal) => ComparisonResult.NegativeMatch
+  case _ => ComparisonResult.UnDetermined
+}
+  }
+
+  case class ClassifiedEntries(undetermined : Seq[Expression],
+   nullable : Boolean,
+   firstPositive : Option[Expression]) {
+def normalize( k : Expression ) : ClassifiedEntries = this match {
+  /**
+  * when we have undetermined matches that might bproduce a null value,
+  * we can't separate a positive match and use [[Coalesce]] to choose 
the final result.
+  * so we 'hide' the positive match as an undetermined match.
+  */
+  case ClassifiedEntries( u, true, Some(p)) if u.nonEmpty =>
+ClassifiedEntries(u ++ Seq(k, p), true, None)
+  case _ => this
+}
+  }
+
+  def classifyEntries(mapEntries : Seq[(Expression, Expression)],
+  requestedKey : Expression) : ClassifiedEntries = {
+val res1 = mapEntries.foldLeft(ClassifiedEntries(Seq.empty, nullable = 
false, None)) {
+  case (prev @ ClassifiedEntries(_, _, Some(_)), _) => prev
+  case (ClassifiedEntries(prev, nullable, None), (k, v)) =>
+compareKeys(k, requestedKey) match {
+  case ComparisonResult.UnDetermined =>
+val vIsNullable = v.nullable
+val nextNullbale = nullable || vIsNullable
+   

[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r91457070
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
 ---
@@ -0,0 +1,482 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.{Coalesce, CreateArray, 
CreateMap, CreateNamedStruct, Expression, GetArrayItem, GetArrayStructFields, 
GetMapValue, GetStructField, Literal}
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.plans.logical.Range
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types._
+
+/**
+* Created by eyalf on 11/4/2016.
+* SPARK-18601 discusses simplification direct access to complex types 
creators.
+* i.e. {{{create_named_struct(square, `x` * `x`).square}}} can be 
simplified to {{{`x` * `x`}}}.
+* sam applies to create_array and create_map
+*/
+class ComplexTypesSuite extends PlanTest with Matchers{
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches =
+  Batch("collapse projections", FixedPoint(10),
+  CollapseProject) ::
+  Batch("Constant Folding", FixedPoint(10),
+  NullPropagation,
+  ConstantFolding,
+  BooleanSimplification,
+  SimplifyConditionals,
+  SimplifyCreateStructOps,
+  SimplifyCreateArrayOps,
+  SimplifyCreateMapOps) :: Nil
+  }
+
+  val idAtt = ('id).long.notNull
+
+  lazy val baseOptimizedPlan = Range(1L, 1000L, 1, Some(2), idAtt :: Nil)
+
+  val idRef = baseOptimizedPlan.output.head
+
+
+//  val idRefColumn = Column("id")
+//  val struct1RefColumn = Column("struct1")
+
+  implicit class ComplexTypeDslSupport(e : Expression) {
+def getStructField(f : String): GetStructField = {
+  e should be ('resolved)
+  e.dataType should be (a[StructType])
--- End diff --

I guess infix annotation is discouraged according to 
http://spark.apache.org/contributing.html. I see `assert` is being used much 
commonly. This might be acceptable but honestly I have seen `should be` not 
often although I understand there are some usages of this across codebase..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r91453489
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, CreateArray, 
CreateMap, CreateNamedStructLike, Expression, GetArrayItem, 
GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan]{
--- End diff --

> `]{ -> ] {`

I believe most of them have this indentation and think this is a good to 
do. It seems there are several same instances for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r91453426
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, 
CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, 
GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal}
--- End diff --

I believe it is nicer if it has a multiple-line import or a wild card one 
as it imports more than 6 ones.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r91453593
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, 
CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, 
GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
--- End diff --

Here too, `plan.transformExpressionsUp {`. I think it is good to follow 
other code styles. It seems there are several same instances.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r91457208
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
 ---
@@ -0,0 +1,482 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.{Coalesce, CreateArray, 
CreateMap, CreateNamedStruct, Expression, GetArrayItem, GetArrayStructFields, 
GetMapValue, GetStructField, Literal}
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.plans.logical.Range
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types._
+
+/**
+* Created by eyalf on 11/4/2016.
+* SPARK-18601 discusses simplification direct access to complex types 
creators.
+* i.e. {{{create_named_struct(square, `x` * `x`).square}}} can be 
simplified to {{{`x` * `x`}}}.
+* sam applies to create_array and create_map
+*/
+class ComplexTypesSuite extends PlanTest with Matchers{
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches =
+  Batch("collapse projections", FixedPoint(10),
+  CollapseProject) ::
+  Batch("Constant Folding", FixedPoint(10),
+  NullPropagation,
+  ConstantFolding,
+  BooleanSimplification,
+  SimplifyConditionals,
+  SimplifyCreateStructOps,
+  SimplifyCreateArrayOps,
+  SimplifyCreateMapOps) :: Nil
+  }
+
+  val idAtt = ('id).long.notNull
+
+  lazy val baseOptimizedPlan = Range(1L, 1000L, 1, Some(2), idAtt :: Nil)
+
+  val idRef = baseOptimizedPlan.output.head
+
+
+//  val idRefColumn = Column("id")
+//  val struct1RefColumn = Column("struct1")
+
+  implicit class ComplexTypeDslSupport(e : Expression) {
+def getStructField(f : String): GetStructField = {
+  e should be ('resolved)
+  e.dataType should be (a[StructType])
+  val structType = e.dataType.asInstanceOf[StructType]
+  val ord = structType.fieldNames.indexOf(f)
+  ord shouldNot be (-1)
+  GetStructField(e, ord, Some(f))
+}
+def getArrayStructField(f : String) : Expression = {
--- End diff --

I believe we need a single newline between consecutive methods according to 
https://github.com/databricks/scala-style-guide#blank-lines-vertical-whitespace 
and for the same instances here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r9148
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
 ---
@@ -0,0 +1,482 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.{Coalesce, CreateArray, 
CreateMap, CreateNamedStruct, Expression, GetArrayItem, GetArrayStructFields, 
GetMapValue, GetStructField, Literal}
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.plans.logical.Range
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types._
+
+/**
+* Created by eyalf on 11/4/2016.
+* SPARK-18601 discusses simplification direct access to complex types 
creators.
+* i.e. {{{create_named_struct(square, `x` * `x`).square}}} can be 
simplified to {{{`x` * `x`}}}.
+* sam applies to create_array and create_map
+*/
+class ComplexTypesSuite extends PlanTest with Matchers{
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches =
+  Batch("collapse projections", FixedPoint(10),
+  CollapseProject) ::
+  Batch("Constant Folding", FixedPoint(10),
+  NullPropagation,
+  ConstantFolding,
+  BooleanSimplification,
+  SimplifyConditionals,
+  SimplifyCreateStructOps,
+  SimplifyCreateArrayOps,
+  SimplifyCreateMapOps) :: Nil
+  }
+
+  val idAtt = ('id).long.notNull
+
+  lazy val baseOptimizedPlan = Range(1L, 1000L, 1, Some(2), idAtt :: Nil)
+
+  val idRef = baseOptimizedPlan.output.head
+
+
+//  val idRefColumn = Column("id")
+//  val struct1RefColumn = Column("struct1")
--- End diff --

It seems removing those was missed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r91454252
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, 
CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, 
GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field extraction
+  case GetStructField(createNamedStructLike : CreateNamedStructLike, 
ordinal, _) =>
+createNamedStructLike.valExprs(ordinal)
+}
+  }
+}
+
+/**
+* push down operations into [[CreateArray]].
+*/
+object SimplifyCreateArrayOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field selection (array of structs)
+  case GetArrayStructFields(CreateArray(elems), field, ordinal, 
numFields, containsNull) =>
+def getStructField(elem : Expression) = {
+  GetStructField(elem, ordinal, Some(field.name))
+}
+CreateArray(elems.map(getStructField))
+  // push down item selection.
+  case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) =>
+if (idx >= 0 && idx < elems.size) {
+  elems(idx)
+} else {
+  Cast(Literal(null), ga.dataType)
+}
+}
+  }
+}
+
+/**
+* push down operations into [[CreateMap]].
+*/
+object SimplifyCreateMapOps extends Rule[LogicalPlan]{
+  object ComparisonResult extends Enumeration {
+val PositiveMatch = Value
+val NegativeMatch = Value
+val UnDetermined = Value
+  }
+
+  def compareKeys(k1 : Expression, k2 : Expression) : 
ComparisonResult.Value = {
+(k1, k2) match {
+  case (x, y) if x.semanticEquals(y) => ComparisonResult.PositiveMatch
+  // make surethis is null safe, especially when datatypes differ
+  // is this even possible?
+  case (_ : Literal, _ : Literal) => ComparisonResult.NegativeMatch
+  case _ => ComparisonResult.UnDetermined
+}
+  }
+
+  case class ClassifiedEntries(undetermined : Seq[Expression],
+   nullable : Boolean,
+   firstPositive : Option[Expression]) {
+def normalize( k : Expression ) : ClassifiedEntries = this match {
--- End diff --

We could remove the extra spaces between braces as `def normalize(k : 
Expression)` and for the same instances.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r91457315
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
 ---
@@ -0,0 +1,482 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.{Coalesce, CreateArray, 
CreateMap, CreateNamedStruct, Expression, GetArrayItem, GetArrayStructFields, 
GetMapValue, GetStructField, Literal}
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.plans.logical.Range
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types._
+
+/**
+* Created by eyalf on 11/4/2016.
+* SPARK-18601 discusses simplification direct access to complex types 
creators.
+* i.e. {{{create_named_struct(square, `x` * `x`).square}}} can be 
simplified to {{{`x` * `x`}}}.
+* sam applies to create_array and create_map
+*/
+class ComplexTypesSuite extends PlanTest with Matchers{
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches =
+  Batch("collapse projections", FixedPoint(10),
+  CollapseProject) ::
+  Batch("Constant Folding", FixedPoint(10),
+  NullPropagation,
+  ConstantFolding,
+  BooleanSimplification,
+  SimplifyConditionals,
+  SimplifyCreateStructOps,
+  SimplifyCreateArrayOps,
+  SimplifyCreateMapOps) :: Nil
+  }
+
+  val idAtt = ('id).long.notNull
+
+  lazy val baseOptimizedPlan = Range(1L, 1000L, 1, Some(2), idAtt :: Nil)
+
+  val idRef = baseOptimizedPlan.output.head
+
+
+//  val idRefColumn = Column("id")
+//  val struct1RefColumn = Column("struct1")
+
+  implicit class ComplexTypeDslSupport(e : Expression) {
+def getStructField(f : String): GetStructField = {
+  e should be ('resolved)
+  e.dataType should be (a[StructType])
+  val structType = e.dataType.asInstanceOf[StructType]
+  val ord = structType.fieldNames.indexOf(f)
+  ord shouldNot be (-1)
+  GetStructField(e, ord, Some(f))
+}
+def getArrayStructField(f : String) : Expression = {
+  e should be ('resolved)
+  e.dataType should be (a[ArrayType])
+  val arrType = e.dataType.asInstanceOf[ArrayType]
+  arrType.elementType should be (a[StructType])
+  val structType = arrType.elementType.asInstanceOf[StructType]
+  val ord = structType.fieldNames.indexOf(f)
+  ord shouldNot be (-1)
+  GetArrayStructFields(e, structType(ord), ord, 1, 
arrType.containsNull)
+}
+def getArrayItem(i : Int) : GetArrayItem = {
+  e should be ('resolved)
+  e.dataType should be (a[ArrayType])
+  GetArrayItem(e, Literal(i))
+}
+def getMapValue(k : Expression) : Expression = {
+  e should be ('resolved)
+  e.dataType should be (a[MapType])
+  val mapType = e.dataType.asInstanceOf[MapType]
+  k.dataType shouldEqual mapType.keyType
+  GetMapValue(e, k)
+}
+  }
+
+  test("explicit") {
+val rel = baseOptimizedPlan.select(
+  CreateNamedStruct("att" :: idRef :: Nil).getStructField("att") as 
"outerAtt"
+   )
+
+rel.schema shouldEqual
+  StructType(StructField("outerAtt", LongType, nullable = false) :: 
Nil)
+
+val optimized = Optimize execute rel
+
+val expected = baseOptimizedPlan.select(idRef as "outerAtt")
+
+comparePlans(optimized, expected)
+  }
+
+  ignore("explicit - deduced att name") {
+val rel = 

[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16204
  
**[Test build #69854 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69854/consoleFull)**
 for PR 16204 at commit 
[`d2172d1`](https://github.com/apache/spark/commit/d2172d11c968cf30b989de3257faaaf6b17366ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16204
  
**[Test build #69853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69853/consoleFull)**
 for PR 16204 at commit 
[`f77730f`](https://github.com/apache/spark/commit/f77730f6b5deba40e28d0b147ae11cb3ed4af37a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14282: [SPARK-16628][SQL] Don't convert Orc Metastore tables to...

2016-12-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14282
  
This can be closed now because we don't infer schema from Orc files when 
converting Hive Orc tables to data source tables anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14282: [SPARK-16628][SQL] Don't convert Orc Metastore ta...

2016-12-07 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/14282


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16135: [SPARK-18700][SQL] Add ReadWriteLock for each tab...

2016-12-07 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/16135#discussion_r91455760
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala ---
@@ -95,7 +95,7 @@ private[sql] class HiveSessionCatalog(
   }
 
   def invalidateCache(): Unit = {
-metastoreCatalog.cachedDataSourceTables.invalidateAll()
+metastoreCatalog.invalidateAllCache()
--- End diff --

Why this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16209
  
**[Test build #69851 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69851/consoleFull)**
 for PR 16209 at commit 
[`6eec6ca`](https://github.com/apache/spark/commit/6eec6ca63c5641d1c9958bdd300ac079d5cf).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16209
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69851/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16135: [SPARK-18700][SQL] Add ReadWriteLock for each tab...

2016-12-07 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/16135#discussion_r91455755
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala ---
@@ -105,7 +105,8 @@ private[sql] class HiveSessionCatalog(
   // For testing only
   private[hive] def getCachedDataSourceTable(table: TableIdentifier): 
LogicalPlan = {
 val key = metastoreCatalog.getQualifiedTableName(table)
-metastoreCatalog.cachedDataSourceTables.getIfPresent(key)
+metastoreCatalog.readLock(key,
+  metastoreCatalog.cachedDataSourceTables.getIfPresent(key))
--- End diff --

Why a read lock here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14365: [SPARK-16628][SQL] Translate file-based relation schema ...

2016-12-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14365
  
@cloud-fan @yhuai @dongjoon-hyun I've updated this as:

* Assume metastore schema matches with physical Orc schema by column, 
disregarding column names. 

* Mapping required schema to columns in physical Orc schema.

* If the length or data types of metastore schema and physical schema is 
not matched, throw an exception suggesting users to disable 
`spark.sql.hive.convertMetastoreOrc`.

Please let me know what you think about this approach. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16209
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16209
  
**[Test build #69851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69851/consoleFull)**
 for PR 16209 at commit 
[`6eec6ca`](https://github.com/apache/spark/commit/6eec6ca63c5641d1c9958bdd300ac079d5cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14365: [SPARK-16628][SQL] Translate file-based relation schema ...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14365
  
**[Test build #69852 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69852/consoleFull)**
 for PR 14365 at commit 
[`2bb8368`](https://github.com/apache/spark/commit/2bb836868dec18c5f214a2bef45664a22124885e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC d...

2016-12-07 Thread sureshthalamati
GitHub user sureshthalamati opened a pull request:

https://github.com/apache/spark/pull/16209

[WIP][SPARK-10849][SQL] Adds option to the JDBC data source  for user to 
specify database column type for the create table

## What changes were proposed in this pull request?
Currently JDBC data source creates tables in the target database using the 
default type mapping, and the JDBC dialect mechanism.  If users want to 
specify different database data type for only some of columns, there is no 
option available. In scenarios where default mapping does not work, users are 
forced to create tables on the target database before writing. This workaround 
is probably not acceptable from a usability point of view. This PR is to 
provide a user-defined type mapping for specific columns.

The solution is to allow users to specify database column data type for the 
create table  as JDBC datasource option(createTableColumnTypes) on write. Data 
type information can be specified as key(column name)-value(data type) pairs in 
JSON (e.g: {"name":"varchar(128)", "comments":"clob(20k)"}). Users can use 
org.apache.spark.sql.types.MetadataBuilder to build the metadata and generate 
the JSON string required for this option. 

Example:
```Scala
val mdb = new MetadataBuilder()
mdb.putString("name", "VARCHAR(128)”)
mdb.putString("comments”, “CLOB(20K)”)
val createTableColTypes = mdb.build().json
df.write.option("createTableColumnTypes", createTableColTypes).jdbc(url, 
"TEST.DBCOLTYPETEST", properties)
```
Alternative approach  is to add a new column metadata property to the jdbc 
data source for users to specify database column type using the metadata.

TODO : Case-insensitive column name lookup based on the 
spark.sql.caseSensitive property value.

## How was this patch tested?
Added new test case to the JDBCWriteSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sureshthalamati/spark 
jdbc_custom_dbtype_option_json-spark-10849

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16209


commit 6eec6ca63c5641d1c9958bdd300ac079d5cf
Author: sureshthalamati 
Date:   2016-12-02T23:22:17Z

Adding new option to the jdbc to allow users to specify create table column 
types when table is created on write




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16193
  
**[Test build #69850 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69850/consoleFull)**
 for PR 16193 at commit 
[`2c3b917`](https://github.com/apache/spark/commit/2c3b91738fae8286525cabb24c386503a570448b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16135: [SPARK-18700][SQL] Add ReadWriteLock for each table's re...

2016-12-07 Thread ericl
Github user ericl commented on the issue:

https://github.com/apache/spark/pull/16135
  
I guess the large number of lock sites is confusing me. We only want to 
prevent concurrent instantiation of a single table, so shouldn't you only need 
1 lock for that site?

Also, we should have a unit test that tries to concurrently read from a 
table from many threads, and verifies via the catalog metrics that it is only 
loaded once (see `TablePerfStatsSuite` for how to access the metrics).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16208: [WIP][SPARK-10849][SQL] Adds a new column metadata prope...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16208
  
**[Test build #69849 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69849/consoleFull)**
 for PR 16208 at commit 
[`3834903`](https://github.com/apache/spark/commit/38349033a306a733e83975ca09b6cf8a8d69d397).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...

2016-12-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16193
  
Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16208: [WIP][SPARK-10849][SQL] Adds a new column metadat...

2016-12-07 Thread sureshthalamati
GitHub user sureshthalamati opened a pull request:

https://github.com/apache/spark/pull/16208

[WIP][SPARK-10849][SQL] Adds a new column metadata property to the jdbc 
data source for users to specify database column type using the metadata

## What changes were proposed in this pull request?
Currently JDBC data source creates tables in the target database using the 
default type mapping, and the JDBC dialect mechanism.  If users want to 
specify different database data type for only some of columns, there is no 
option available. In scenarios where default mapping does not work, users are 
forced to create tables on the target database before writing. This workaround 
is probably not acceptable from a usability point of view. This PR is to 
provide a user-defined type mapping for specific columns.
 
The solution is based on the existing Redshift connector 
(https://github.com/databricks/spark-redshift#setting-a-custom-column-type). We 
add a new column metadata property to the jdbc data source for users to specify 
database column type using the metadata.
 
Example :
```Scala
val nvarcharMd = new MetadataBuilder().putString(“createTableColumnType", 
"NVARCHAR(123)").build()
val newDf = df.withColumn("name", col("name"), nvarcharMd)
newDf.write.mode(SaveMode.Overwrite).jdbc(url, "TEST.USERDBTYPETEST", 
properties)
```
One restriction with this approach metadata modification is unsupported in 
the Python, SQL, and R language APIs. Users have to create a new data frame to 
specify the metadata with the _createTableColumnType_ property.
 
Alternative approach is to add JDBC data source option for users to specify 
database column types information as JSON String.

TODO: Documentation for specifying the database column type

## How was this patch tested?
Added new test case to the JDBCWriteSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sureshthalamati/spark 
jdbc_custom_dbtype-spark-10849

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16208


commit 38349033a306a733e83975ca09b6cf8a8d69d397
Author: sureshthalamati 
Date:   2016-12-02T23:22:17Z

[SPARK-10849][SQL} Add new jdbc datasource metadata property to allow users 
to specify database column type when creating table on write.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...

2016-12-07 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16149#discussion_r91453574
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -215,6 +215,7 @@ class GeneralizedLinearRegression @Since("2.0.0") 
(@Since("2.0.0") override val
* Sets the value of param [[weightCol]].
* If this is not set or empty, we treat all instance weights as 1.0.
* Default is not set, so all instances have weight one.
+   * In the Binomial model, weights correspond to number of trials.
--- End diff --

We should note that the weights should therefore be integers and that 
they'll be rounded if they are not. Also say "Binomial family" instead of 
"Binomial model."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...

2016-12-07 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16149#discussion_r91453386
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
@@ -715,7 +715,7 @@ class GeneralizedLinearRegressionSuite
 val datasetWithWeight = Seq(
   Instance(1.0, 1.0, Vectors.dense(0.0, 5.0).toSparse),
   Instance(0.5, 2.0, Vectors.dense(1.0, 2.0)),
-  Instance(1.0, 3.0, Vectors.dense(2.0, 1.0)),
+  Instance(1.0, 0.3, Vectors.dense(2.0, 1.0)),
--- End diff --

Hm, so I know this is a pain, but we have special handling implemented for 
the weight = 0 case, but we never test it. I think we should add a test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...

2016-12-07 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16149#discussion_r91453428
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -468,11 +469,7 @@ object GeneralizedLinearRegression extends 
DefaultParamsReadable[GeneralizedLine
 override def variance(mu: Double): Double = mu * (1.0 - mu)
 
 private def ylogy(y: Double, mu: Double): Double = {
-  if (y == 0) {
-0.0
-  } else {
-y * math.log(y / mu)
-  }
+  if (y == 0) 0.0 else y * math.log(y / mu)
--- End diff --

Well the entire thing can go on one line, but only change it if you make 
another commit since it's trivial.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16203: [SPARK-18774][Core][SQL]Ignore non-existing files...

2016-12-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16203


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs

2016-12-07 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16207
  
Yea i'd say in general that `--allow-empty` be there.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16203: [SPARK-18774][Core][SQL]Ignore non-existing files when i...

2016-12-07 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16203
  
Actually this doesn't merge cleanly in branch-2.1. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16203: [SPARK-18774][Core][SQL]Ignore non-existing files when i...

2016-12-07 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16203
  
Merging in master/branch-2.1.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs

2016-12-07 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16207
  
Yes I have to edit the merge script to `git commit --allow-empty ...` but I 
don't know that we should always set it. It could prompt or something but I was 
too lazy to implement that.

Anyway that sounds fine and I have my own list of PRs to close that I'll 
'flush' with a commit soon too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16202: [SPARK-18662][hotfix] Add new resource-managers director...

2016-12-07 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16202
  
Was mesos never included?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12064
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69845/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12064
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12064
  
**[Test build #69845 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69845/consoleFull)**
 for PR 12064 at commit 
[`4458a5f`](https://github.com/apache/spark/commit/4458a5f0d16b14095360ec3e2afec6d1db912c7d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14640: [SPARK-17055] [MLLIB] add groupKFold to CrossVali...

2016-12-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14640


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16187
  
**[Test build #69848 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69848/consoleFull)**
 for PR 16187 at commit 
[`767ff2f`](https://github.com/apache/spark/commit/767ff2f6c3d960a68417757f1f7110b5376ac01d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15917: SPARK-18252: Using RoaringBitmap for bloom filter...

2016-12-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15917


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15689: [SPARK-9487] Use the same num. worker threads in ...

2016-12-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15689


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16188: Branch 1.6 decision tree

2016-12-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16188


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16206: Branch 2.0

2016-12-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16206


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16207
  
Sure, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16207
  
BTW, I believe we should add `--allow-empty` when it is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16207: [BUILD] Closing some stale/inappropriate PRs

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/16207


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs

2016-12-07 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16207
  
For some reason I couldn't merge this one. I pushed a commit directly to 
master. Can you close this one now?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...

2016-12-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16187
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs

2016-12-07 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16207
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2016-12-07 Thread eyalfa
Github user eyalfa commented on the issue:

https://github.com/apache/spark/pull/16043
  
@hvanhovell, can you please comment on the latest changes?
@gatorsmile, @HyukjinKwon, I think I've sorted out most of the formatting 
issues you guys mentioned, please let me know I missed anything or introduces 
new ones in the latest push.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16207
  
**[Test build #69847 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69847/consoleFull)**
 for PR 16207 at commit 
[`c51011c`](https://github.com/apache/spark/commit/c51011c6a3f1d060c0d767b1d9115c64dcfaa447).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16207
  
cc @srowen Could you take a look and see if they are reasonable please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16135: [SPARK-18700][SQL] Add ReadWriteLock for each table's re...

2016-12-07 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/16135
  
@ericl Thanks for your review.

> Is it sufficient to lock around the catalog.filterPartitions(Nil)?
Yes, this patch port from 1.6.2 and I missed the diff here. Fixed in next 
patch.

>  Why do we need reader locks?
Write or Invalid the table cache operation fewer than read it. Reader 
waiting when there is same table writing cache.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16207: [BUILD] Closing some stale/inappropriate PRs

2016-12-07 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16207

[BUILD] Closing some stale/inappropriate PRs 

## What changes were proposed in this pull request?

This PR proposes to close some stale PRs and ones suggested to be closed by 
committer(s) or obviously inappropriate PRs (e.g. branch to branch).

Closes #15689
Closes #14640
Closes #15917
Closes #16188
Closes #16206

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark closing-some-prs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16207.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16207


commit c51011c6a3f1d060c0d767b1d9115c64dcfaa447
Author: hyukjinkwon 
Date:   2016-12-08T06:16:16Z

Closing some PRs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16205: [SPARK-18776][SS] Make Offset for FileStreamSource corre...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16205
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16205: [SPARK-18776][SS] Make Offset for FileStreamSource corre...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16205
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69841/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16205: [SPARK-18776][SS] Make Offset for FileStreamSource corre...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16205
  
**[Test build #69841 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69841/consoleFull)**
 for PR 16205 at commit 
[`5dda0f3`](https://github.com/apache/spark/commit/5dda0f3b18ed52b2cd89b52d3427bae63cdc866b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class FileStreamSourceOffset(logOffset: Long) extends Offset `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16199: [SPARK-18772][SQL] NaN/Infinite float parsing in ...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16199#discussion_r91450072
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1764,4 +1764,37 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 val df2 = spark.read.option("PREfersdecimaL", "true").json(records)
 assert(df2.schema == schema)
   }
+
+  test("SPARK-18772: Special floats") {
+val records = sparkContext
--- End diff --

I think it would be nicer if it has some roundtrip tests in reading and 
writing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16199: [SPARK-18772][SQL] NaN/Infinite float parsing in JSON is...

2016-12-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16199
  
@NathanHowell, while tracking down the history, I found similar PR 
including this in 
https://github.com/apache/spark/pull/9759/files#diff-8affe5ec7d691943a88e43eb30af656e
 (this seems reverted due to conflicts of `dev/deps/spark-deps-hadoop*` which 
is not related with this PR).

Would this make sense if we take out the valid changes from there? It seems 
safe to follow it as the changes there were checked by several reviewers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16195
  
**[Test build #69846 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69846/consoleFull)**
 for PR 16195 at commit 
[`10e0c75`](https://github.com/apache/spark/commit/10e0c7522bc6e8ca1c2e45240374db61bf7e5138).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16206: Branch 2.0

2016-12-07 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16206
  
@ming616 please close this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16206: Branch 2.0

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16206
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility - crea...

2016-12-07 Thread vijoshi
Github user vijoshi commented on the issue:

https://github.com/apache/spark/pull/16119
  
@holdenk test case added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12064
  
**[Test build #69845 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69845/consoleFull)**
 for PR 12064 at commit 
[`4458a5f`](https://github.com/apache/spark/commit/4458a5f0d16b14095360ec3e2afec6d1db912c7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16206: Branch 2.0

2016-12-07 Thread ming616
GitHub user ming616 opened a pull request:

https://github.com/apache/spark/pull/16206

Branch 2.0

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16206.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16206


commit fffcec90b65047c3031c2b96679401f8fbef6337
Author: Shixiong Zhu 
Date:   2016-09-14T20:33:51Z

[SPARK-17463][CORE] Make CollectionAccumulator and SetAccumulator's value 
can be read thread-safely

## What changes were proposed in this pull request?

Make CollectionAccumulator and SetAccumulator's value can be read 
thread-safely to fix the ConcurrentModificationException reported in 
[JIRA](https://issues.apache.org/jira/browse/SPARK-17463).

## How was this patch tested?

Existing tests.

Author: Shixiong Zhu 

Closes #15063 from zsxwing/SPARK-17463.

(cherry picked from commit e33bfaed3b160fbc617c878067af17477a0044f5)
Signed-off-by: Josh Rosen 

commit bb2bdb44032d2e71832b3e0e771590fb2225e4f3
Author: Xing SHI 
Date:   2016-09-14T20:46:46Z

[SPARK-17465][SPARK CORE] Inappropriate memory management in 
`org.apache.spark.storage.MemoryStore` may lead to memory leak

The expression like `if (memoryMap(taskAttemptId) == 0) 
memoryMap.remove(taskAttemptId)` in method `releaseUnrollMemoryForThisTask` and 
`releasePendingUnrollMemoryForThisTask` should be called after release memory 
operation, whatever `memoryToRelease` is > 0 or not.

If the memory of a task has been set to 0 when calling a 
`releaseUnrollMemoryForThisTask` or a `releasePendingUnrollMemoryForThisTask` 
method, the key in the memory map corresponding to that task will never be 
removed from the hash map.

See the details in 
[SPARK-17465](https://issues.apache.org/jira/browse/SPARK-17465).

Author: Xing SHI 

Closes #15022 from saturday-shi/SPARK-17465.

commit 5c2bc8360019fb08e2e62e50bb261f7ce19b231e
Author: codlife <1004910...@qq.com>
Date:   2016-09-15T08:38:13Z

[SPARK-17521] Error when I use sparkContext.makeRDD(Seq())

## What changes were proposed in this pull request?

 when i use sc.makeRDD below
```
val data3 = sc.makeRDD(Seq())
println(data3.partitions.length)
```
I got an error:
Exception in thread "main" java.lang.IllegalArgumentException: Positive 
number of slices required

We can fix this bug just modify the last line ,do a check of seq.size
```
  def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope {
assertNotStopped()
val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap
new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, 
defaultParallelism), indexToPrefs)
  }
```

## How was this patch tested?

 manual tests

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Author: codlife <1004910...@qq.com>
Author: codlife 

Closes #15077 from codlife/master.

(cherry picked from commit 647ee05e5815bde361662a9286ac602c44b4d4e6)
Signed-off-by: Sean Owen 

commit a09c258c9a97e701fa7650cc0651e3c6a7a1cab9
Author: junyangq 
Date:   2016-09-15T17:00:36Z

[SPARK-17317][SPARKR] Add SparkR vignette to branch 2.0

## What changes were proposed in this pull request?

This PR adds SparkR vignette to branch 2.0, which works as a friendly 
guidance going through the functionality provided by SparkR.

## How was this patch tested?

R unit test.

Author: junyangq 
Author: Shivaram Venkataraman 
Author: Junyang Qian 

Closes #15100 from junyangq/SPARKR-vignette-2.0.

commit e77a437d292ecda66163a895427d62e4f72e2a25
Author: Josh Rosen 
Date:   2016-09-15T18:22:58Z

[SPARK-17547] Ensure temp shuffle data file is cleaned up after error

SPARK-8029 (#9610) modified shuffle writers to first stage their data 

[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16195
  
**[Test build #69844 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69844/consoleFull)**
 for PR 16195 at commit 
[`6c21436`](https://github.com/apache/spark/commit/6c21436b3c60245a5c7a679a9bf4844c7092ea5d).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16195
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16195
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69844/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16195
  
**[Test build #69844 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69844/consoleFull)**
 for PR 16195 at commit 
[`6c21436`](https://github.com/apache/spark/commit/6c21436b3c60245a5c7a679a9bf4844c7092ea5d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16204
  
**[Test build #69843 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69843/consoleFull)**
 for PR 16204 at commit 
[`3199f8f`](https://github.com/apache/spark/commit/3199f8f9265e5d324c50998523a4c85a3590a39c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...

2016-12-07 Thread daisukebe
Github user daisukebe commented on the issue:

https://github.com/apache/spark/pull/16195
  
Per @vanzin's suggestion,

- revised the code style,
- dded a new default variable, 
- and also fixed the warning: "


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-07 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16173
  
ok,someone else can tell if it is resonable to readd the conf now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16157: [SPARK-18723][DOC] Expanded programming guide inf...

2016-12-07 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16157#discussion_r91445291
  
--- Diff: docs/programming-guide.md ---
@@ -347,7 +347,7 @@ Some notes on reading files with Spark:
 
 Apart from text files, Spark's Scala API also supports several other data 
formats:
 
-* `SparkContext.wholeTextFiles` lets you read a directory containing 
multiple small text files, and returns each of them as (filename, content) 
pairs. This is in contrast with `textFile`, which would return one record per 
line in each file.
+* `SparkContext.wholeTextFiles` lets you read a directory containing 
multiple small text files, and returns each of them as (filename, content) 
pairs. This is in contrast with `textFile`, which would return one record per 
line in each file. It takes an optional second argument for controlling the 
minimal number of partitions (by default this is 2). It uses 
[CombineFileInputFormat](https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.html)
 internally in order to process large numbers of small files effectively by 
grouping files on the same executor into a single partition. This can lead to 
sub-optimal partitioning when the file sets would benefit from residing in 
multiple partitions (e.g., larger partitions would not fit in memory, files are 
replicated but a large subset is locally reachable from a single executor, 
subsequent transformations would benefit from multi-core processing). In those 
cases, set the `minPartitions` argume
 nt to enforce splitting.
--- End diff --

Every element of the result is a file; it's fundamentally different from 
`textFile`, and can't be that each file therefore ends up in a partition. I 
don't think this merges files, right? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...

2016-12-07 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16173
  
Yeah those were Spark classes. This property doesn't seem to be used now.
It's possible to restore this but I don't know if it's intended now. Yes I 
suppose you could update the comment instead, but it doesn't seem like a big 
deal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14365: [SPARK-16628][SQL] Translate file-based relation schema ...

2016-12-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14365
  
We have two options. First one is to map metastore schema to physical Orc 
schema like this. But we don't infer physical schema of Orc file now. I will 
update this to have this mapping in OrcFileFormat.

Another one is like #14282. But as we don't infer schema from Orc file now, 
we can't disable the conversion when the mismatch is detected. One possible is 
to throw exception in OrcFileFormat when detecting the mismatch before reading 
and show message to ask user to disable `spark.sql.hive.convertMetastoreOrc`.

@cloud-fan @yhuai @dongjoon-hyun What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14365: [SPARK-16628][SQL] Translate file-based relation schema ...

2016-12-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14365
  
@dongjoon-hyun yeah, I see. Because we directly use metastore schema of 
converted Orc table, when the physical schema in Orc file and metastore schema 
mismatch, this issue happens.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16201: [SPARK-3359][DOCS] Fix greater-than symbols in Javadoc t...

2016-12-07 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16201
  
I see, so this is just another case where changes will keep breaking this. 
We do need a build that can run this at some point soon here to catch it. But 
yes, just keep fixing.

I would just write "Must be at least 1" instead of "Must be >= 1".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers example cod...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69842/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers example cod...

2016-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16148
  
**[Test build #69842 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69842/consoleFull)**
 for PR 16148 at commit 
[`ac89b1c`](https://github.com/apache/spark/commit/ac89b1c317eb0e1c090e61bb5c144b0481dd533b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers example cod...

2016-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM

2016-12-07 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16149
  
@sethah @srowen I have added a comment to the weigthCol doc for the 
Binomial case. 
I also updated to test the case `weight < 0.5`, i.e., `round(weight) = 0`. 
All tests passed.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >