[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/16210 @rxin our jdk is jdk1.8.0_91, and we do not install the scala, the OS is Debian 4.6.4. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16209 **[Test build #69855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69855/consoleFull)** for PR 16209 at commit [`faa8172`](https://github.com/apache/spark/commit/faa8172751082cd532dd7f8292a318a81a3a53e9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16210 Paging @jodersky I'm not sure about this ... isn't `-usejavacp` a legacy option? I don't know of any other reports of this so it is possibly specific to your env. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers example cod...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16148 LGTM. Looks like we are locked down for 2.1. Good to have with all the new examples but seems like a lot of code (example) changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16210 What are the environments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/16158 @MLnick Does this match your thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16014: [SPARK-18590][SPARKR] build R source package when...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16014#discussion_r91459472 --- Diff: dev/create-release/release-build.sh --- @@ -221,14 +235,13 @@ if [[ "$1" == "package" ]]; then # We increment the Zinc port each time to avoid OOM's and other craziness if multiple builds # share the same Zinc server. - # Make R source package only once. (--r) FLAGS="-Psparkr -Phive -Phive-thriftserver -Pyarn -Pmesos" make_binary_release "hadoop2.3" "-Phadoop-2.3 $FLAGS" "3033" & make_binary_release "hadoop2.4" "-Phadoop-2.4 $FLAGS" "3034" & make_binary_release "hadoop2.6" "-Phadoop-2.6 $FLAGS" "3035" & make_binary_release "hadoop2.7" "-Phadoop-2.7 $FLAGS" "3036" "withpip" & make_binary_release "hadoop2.4-without-hive" "-Psparkr -Phadoop-2.4 -Pyarn -Pmesos" "3037" & - make_binary_release "without-hadoop" "--r -Psparkr -Phadoop-provided -Pyarn -Pmesos" "3038" & + make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn -Pmesos" "3038" "withr" & --- End diff -- @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16150: [SPARK-18349][SparkR]:Update R API documentation on ml m...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16150 LGTM. @rxin I know rc2 has been cut but can this still go to branch-2.1? There're only 2 lines of code change and API doc improvements could really help usability in 2.1. + @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16210: [Core][SPARK-18778]Fix the scala classpath under some en...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16210 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69843/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16204 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16204 **[Test build #69843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69843/consoleFull)** for PR 16204 at commit [`3199f8f`](https://github.com/apache/spark/commit/3199f8f9265e5d324c50998523a4c85a3590a39c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16210: [Core][SPARK-18778]Fix the scala classpath under ...
GitHub user djvulee opened a pull request: https://github.com/apache/spark/pull/16210 [Core][SPARK-18778]Fix the scala classpath under some environment ## What changes were proposed in this pull request? under some environment, the Dscala.usejavacp=true option seems not work, pass the -usejavacp directly to the repl fix this. ## How was this patch tested? we test in our cluster environment. You can merge this pull request into a Git repository by running: $ git pull https://github.com/djvulee/spark sparkShell Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16210.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16210 commit ab81a7af165c7287c0356758097dfa5ded6adea3 Author: DjvuLeeDate: 2016-12-08T07:15:59Z [Core]Fix the scala classpath under some envrionment under some envrionment, the Dscala.usejavacp=true option seems not work, pass the -usejavacp directly to the repl fix this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user eyalfa commented on the issue: https://github.com/apache/spark/pull/16043 @HyukjinKwon , thanks for the quick response :-) I'll tackle these later today (gott'a work sometimes) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r91453918 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** +* push down operations into [[CreateNamedStructLike]]. +*/ +object SimplifyCreateStructOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field extraction + case GetStructField(createNamedStructLike : CreateNamedStructLike, ordinal, _) => +createNamedStructLike.valExprs(ordinal) +} + } +} + +/** +* push down operations into [[CreateArray]]. +*/ +object SimplifyCreateArrayOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field selection (array of structs) + case GetArrayStructFields(CreateArray(elems), field, ordinal, numFields, containsNull) => +def getStructField(elem : Expression) = { + GetStructField(elem, ordinal, Some(field.name)) +} +CreateArray(elems.map(getStructField)) --- End diff -- Could we do this like something as below?: ```scala CreateArray(elems.map(elem => GetStructField(elem, ordinal, Some(field.name ``` It seems `getStructField(...)` is only used in this scope and I think it is good to remove this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r91454208 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** +* push down operations into [[CreateNamedStructLike]]. +*/ +object SimplifyCreateStructOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field extraction + case GetStructField(createNamedStructLike : CreateNamedStructLike, ordinal, _) => +createNamedStructLike.valExprs(ordinal) +} + } +} + +/** +* push down operations into [[CreateArray]]. +*/ +object SimplifyCreateArrayOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field selection (array of structs) + case GetArrayStructFields(CreateArray(elems), field, ordinal, numFields, containsNull) => +def getStructField(elem : Expression) = { + GetStructField(elem, ordinal, Some(field.name)) +} +CreateArray(elems.map(getStructField)) + // push down item selection. + case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) => +if (idx >= 0 && idx < elems.size) { + elems(idx) +} else { + Cast(Literal(null), ga.dataType) +} +} + } +} + +/** +* push down operations into [[CreateMap]]. +*/ +object SimplifyCreateMapOps extends Rule[LogicalPlan]{ + object ComparisonResult extends Enumeration { +val PositiveMatch = Value +val NegativeMatch = Value +val UnDetermined = Value + } + + def compareKeys(k1 : Expression, k2 : Expression) : ComparisonResult.Value = { +(k1, k2) match { + case (x, y) if x.semanticEquals(y) => ComparisonResult.PositiveMatch + // make surethis is null safe, especially when datatypes differ + // is this even possible? + case (_ : Literal, _ : Literal) => ComparisonResult.NegativeMatch + case _ => ComparisonResult.UnDetermined +} + } + + case class ClassifiedEntries(undetermined : Seq[Expression], + nullable : Boolean, + firstPositive : Option[Expression]) { --- End diff -- Oh @eyalfa, I believe we should make the indentation as below if it does not fit in 100 character length: ```scala case class ClassifiedEntries( undetermined : Seq[Expression], nullable : Boolean, firstPositive : Option[Expression]) { ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r91455499 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** +* push down operations into [[CreateNamedStructLike]]. +*/ +object SimplifyCreateStructOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field extraction + case GetStructField(createNamedStructLike : CreateNamedStructLike, ordinal, _) => +createNamedStructLike.valExprs(ordinal) +} + } +} + +/** +* push down operations into [[CreateArray]]. +*/ +object SimplifyCreateArrayOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field selection (array of structs) + case GetArrayStructFields(CreateArray(elems), field, ordinal, numFields, containsNull) => +def getStructField(elem : Expression) = { + GetStructField(elem, ordinal, Some(field.name)) +} +CreateArray(elems.map(getStructField)) + // push down item selection. + case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) => +if (idx >= 0 && idx < elems.size) { + elems(idx) +} else { + Cast(Literal(null), ga.dataType) +} +} + } +} + +/** +* push down operations into [[CreateMap]]. +*/ +object SimplifyCreateMapOps extends Rule[LogicalPlan]{ + object ComparisonResult extends Enumeration { +val PositiveMatch = Value +val NegativeMatch = Value +val UnDetermined = Value + } + + def compareKeys(k1 : Expression, k2 : Expression) : ComparisonResult.Value = { +(k1, k2) match { + case (x, y) if x.semanticEquals(y) => ComparisonResult.PositiveMatch + // make surethis is null safe, especially when datatypes differ + // is this even possible? + case (_ : Literal, _ : Literal) => ComparisonResult.NegativeMatch + case _ => ComparisonResult.UnDetermined +} + } + + case class ClassifiedEntries(undetermined : Seq[Expression], + nullable : Boolean, + firstPositive : Option[Expression]) { +def normalize( k : Expression ) : ClassifiedEntries = this match { + /** + * when we have undetermined matches that might bproduce a null value, + * we can't separate a positive match and use [[Coalesce]] to choose the final result. + * so we 'hide' the positive match as an undetermined match. + */ + case ClassifiedEntries( u, true, Some(p)) if u.nonEmpty => +ClassifiedEntries(u ++ Seq(k, p), true, None) + case _ => this +} + } + + def classifyEntries(mapEntries : Seq[(Expression, Expression)], + requestedKey : Expression) : ClassifiedEntries = { +val res1 = mapEntries.foldLeft(ClassifiedEntries(Seq.empty, nullable = false, None)) { + case (prev @ ClassifiedEntries(_, _, Some(_)), _) => prev + case (ClassifiedEntries(prev, nullable, None), (k, v)) => +compareKeys(k, requestedKey) match { + case ComparisonResult.UnDetermined => +val vIsNullable = v.nullable +val nextNullbale = nullable || vIsNullable +
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r91455356 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** +* push down operations into [[CreateNamedStructLike]]. +*/ +object SimplifyCreateStructOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field extraction + case GetStructField(createNamedStructLike : CreateNamedStructLike, ordinal, _) => +createNamedStructLike.valExprs(ordinal) +} + } +} + +/** +* push down operations into [[CreateArray]]. +*/ +object SimplifyCreateArrayOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field selection (array of structs) + case GetArrayStructFields(CreateArray(elems), field, ordinal, numFields, containsNull) => +def getStructField(elem : Expression) = { + GetStructField(elem, ordinal, Some(field.name)) +} +CreateArray(elems.map(getStructField)) + // push down item selection. + case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) => +if (idx >= 0 && idx < elems.size) { + elems(idx) +} else { + Cast(Literal(null), ga.dataType) +} +} + } +} + +/** +* push down operations into [[CreateMap]]. +*/ +object SimplifyCreateMapOps extends Rule[LogicalPlan]{ + object ComparisonResult extends Enumeration { +val PositiveMatch = Value +val NegativeMatch = Value +val UnDetermined = Value + } + + def compareKeys(k1 : Expression, k2 : Expression) : ComparisonResult.Value = { +(k1, k2) match { + case (x, y) if x.semanticEquals(y) => ComparisonResult.PositiveMatch + // make surethis is null safe, especially when datatypes differ + // is this even possible? + case (_ : Literal, _ : Literal) => ComparisonResult.NegativeMatch + case _ => ComparisonResult.UnDetermined +} + } + + case class ClassifiedEntries(undetermined : Seq[Expression], + nullable : Boolean, + firstPositive : Option[Expression]) { +def normalize( k : Expression ) : ClassifiedEntries = this match { + /** + * when we have undetermined matches that might bproduce a null value, + * we can't separate a positive match and use [[Coalesce]] to choose the final result. + * so we 'hide' the positive match as an undetermined match. + */ + case ClassifiedEntries( u, true, Some(p)) if u.nonEmpty => +ClassifiedEntries(u ++ Seq(k, p), true, None) + case _ => this +} + } + + def classifyEntries(mapEntries : Seq[(Expression, Expression)], + requestedKey : Expression) : ClassifiedEntries = { +val res1 = mapEntries.foldLeft(ClassifiedEntries(Seq.empty, nullable = false, None)) { + case (prev @ ClassifiedEntries(_, _, Some(_)), _) => prev + case (ClassifiedEntries(prev, nullable, None), (k, v)) => +compareKeys(k, requestedKey) match { + case ComparisonResult.UnDetermined => +val vIsNullable = v.nullable +val nextNullbale = nullable || vIsNullable +
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r91457070 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala --- @@ -0,0 +1,482 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.scalatest.Matchers + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.{Coalesce, CreateArray, CreateMap, CreateNamedStruct, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, Literal} +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.plans.logical.Range +import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.types._ + +/** +* Created by eyalf on 11/4/2016. +* SPARK-18601 discusses simplification direct access to complex types creators. +* i.e. {{{create_named_struct(square, `x` * `x`).square}}} can be simplified to {{{`x` * `x`}}}. +* sam applies to create_array and create_map +*/ +class ComplexTypesSuite extends PlanTest with Matchers{ + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = + Batch("collapse projections", FixedPoint(10), + CollapseProject) :: + Batch("Constant Folding", FixedPoint(10), + NullPropagation, + ConstantFolding, + BooleanSimplification, + SimplifyConditionals, + SimplifyCreateStructOps, + SimplifyCreateArrayOps, + SimplifyCreateMapOps) :: Nil + } + + val idAtt = ('id).long.notNull + + lazy val baseOptimizedPlan = Range(1L, 1000L, 1, Some(2), idAtt :: Nil) + + val idRef = baseOptimizedPlan.output.head + + +// val idRefColumn = Column("id") +// val struct1RefColumn = Column("struct1") + + implicit class ComplexTypeDslSupport(e : Expression) { +def getStructField(f : String): GetStructField = { + e should be ('resolved) + e.dataType should be (a[StructType]) --- End diff -- I guess infix annotation is discouraged according to http://spark.apache.org/contributing.html. I see `assert` is being used much commonly. This might be acceptable but honestly I have seen `should be` not often although I understand there are some usages of this across codebase.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r91453489 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Cast, CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** +* push down operations into [[CreateNamedStructLike]]. +*/ +object SimplifyCreateStructOps extends Rule[LogicalPlan]{ --- End diff -- > `]{ -> ] {` I believe most of them have this indentation and think this is a good to do. It seems there are several same instances for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r91453426 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal} --- End diff -- I believe it is nicer if it has a multiple-line import or a wild card one as it imports more than 6 ones. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r91453593 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** +* push down operations into [[CreateNamedStructLike]]. +*/ +object SimplifyCreateStructOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ --- End diff -- Here too, `plan.transformExpressionsUp {`. I think it is good to follow other code styles. It seems there are several same instances. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r91457208 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala --- @@ -0,0 +1,482 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.scalatest.Matchers + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.{Coalesce, CreateArray, CreateMap, CreateNamedStruct, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, Literal} +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.plans.logical.Range +import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.types._ + +/** +* Created by eyalf on 11/4/2016. +* SPARK-18601 discusses simplification direct access to complex types creators. +* i.e. {{{create_named_struct(square, `x` * `x`).square}}} can be simplified to {{{`x` * `x`}}}. +* sam applies to create_array and create_map +*/ +class ComplexTypesSuite extends PlanTest with Matchers{ + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = + Batch("collapse projections", FixedPoint(10), + CollapseProject) :: + Batch("Constant Folding", FixedPoint(10), + NullPropagation, + ConstantFolding, + BooleanSimplification, + SimplifyConditionals, + SimplifyCreateStructOps, + SimplifyCreateArrayOps, + SimplifyCreateMapOps) :: Nil + } + + val idAtt = ('id).long.notNull + + lazy val baseOptimizedPlan = Range(1L, 1000L, 1, Some(2), idAtt :: Nil) + + val idRef = baseOptimizedPlan.output.head + + +// val idRefColumn = Column("id") +// val struct1RefColumn = Column("struct1") + + implicit class ComplexTypeDslSupport(e : Expression) { +def getStructField(f : String): GetStructField = { + e should be ('resolved) + e.dataType should be (a[StructType]) + val structType = e.dataType.asInstanceOf[StructType] + val ord = structType.fieldNames.indexOf(f) + ord shouldNot be (-1) + GetStructField(e, ord, Some(f)) +} +def getArrayStructField(f : String) : Expression = { --- End diff -- I believe we need a single newline between consecutive methods according to https://github.com/databricks/scala-style-guide#blank-lines-vertical-whitespace and for the same instances here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r9148 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala --- @@ -0,0 +1,482 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.scalatest.Matchers + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.{Coalesce, CreateArray, CreateMap, CreateNamedStruct, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, Literal} +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.plans.logical.Range +import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.types._ + +/** +* Created by eyalf on 11/4/2016. +* SPARK-18601 discusses simplification direct access to complex types creators. +* i.e. {{{create_named_struct(square, `x` * `x`).square}}} can be simplified to {{{`x` * `x`}}}. +* sam applies to create_array and create_map +*/ +class ComplexTypesSuite extends PlanTest with Matchers{ + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = + Batch("collapse projections", FixedPoint(10), + CollapseProject) :: + Batch("Constant Folding", FixedPoint(10), + NullPropagation, + ConstantFolding, + BooleanSimplification, + SimplifyConditionals, + SimplifyCreateStructOps, + SimplifyCreateArrayOps, + SimplifyCreateMapOps) :: Nil + } + + val idAtt = ('id).long.notNull + + lazy val baseOptimizedPlan = Range(1L, 1000L, 1, Some(2), idAtt :: Nil) + + val idRef = baseOptimizedPlan.output.head + + +// val idRefColumn = Column("id") +// val struct1RefColumn = Column("struct1") --- End diff -- It seems removing those was missed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r91454252 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Cast, Coalesce, CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** +* push down operations into [[CreateNamedStructLike]]. +*/ +object SimplifyCreateStructOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field extraction + case GetStructField(createNamedStructLike : CreateNamedStructLike, ordinal, _) => +createNamedStructLike.valExprs(ordinal) +} + } +} + +/** +* push down operations into [[CreateArray]]. +*/ +object SimplifyCreateArrayOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field selection (array of structs) + case GetArrayStructFields(CreateArray(elems), field, ordinal, numFields, containsNull) => +def getStructField(elem : Expression) = { + GetStructField(elem, ordinal, Some(field.name)) +} +CreateArray(elems.map(getStructField)) + // push down item selection. + case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) => +if (idx >= 0 && idx < elems.size) { + elems(idx) +} else { + Cast(Literal(null), ga.dataType) +} +} + } +} + +/** +* push down operations into [[CreateMap]]. +*/ +object SimplifyCreateMapOps extends Rule[LogicalPlan]{ + object ComparisonResult extends Enumeration { +val PositiveMatch = Value +val NegativeMatch = Value +val UnDetermined = Value + } + + def compareKeys(k1 : Expression, k2 : Expression) : ComparisonResult.Value = { +(k1, k2) match { + case (x, y) if x.semanticEquals(y) => ComparisonResult.PositiveMatch + // make surethis is null safe, especially when datatypes differ + // is this even possible? + case (_ : Literal, _ : Literal) => ComparisonResult.NegativeMatch + case _ => ComparisonResult.UnDetermined +} + } + + case class ClassifiedEntries(undetermined : Seq[Expression], + nullable : Boolean, + firstPositive : Option[Expression]) { +def normalize( k : Expression ) : ClassifiedEntries = this match { --- End diff -- We could remove the extra spaces between braces as `def normalize(k : Expression)` and for the same instances. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r91457315 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala --- @@ -0,0 +1,482 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.scalatest.Matchers + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.{Coalesce, CreateArray, CreateMap, CreateNamedStruct, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, Literal} +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.plans.logical.Range +import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.types._ + +/** +* Created by eyalf on 11/4/2016. +* SPARK-18601 discusses simplification direct access to complex types creators. +* i.e. {{{create_named_struct(square, `x` * `x`).square}}} can be simplified to {{{`x` * `x`}}}. +* sam applies to create_array and create_map +*/ +class ComplexTypesSuite extends PlanTest with Matchers{ + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = + Batch("collapse projections", FixedPoint(10), + CollapseProject) :: + Batch("Constant Folding", FixedPoint(10), + NullPropagation, + ConstantFolding, + BooleanSimplification, + SimplifyConditionals, + SimplifyCreateStructOps, + SimplifyCreateArrayOps, + SimplifyCreateMapOps) :: Nil + } + + val idAtt = ('id).long.notNull + + lazy val baseOptimizedPlan = Range(1L, 1000L, 1, Some(2), idAtt :: Nil) + + val idRef = baseOptimizedPlan.output.head + + +// val idRefColumn = Column("id") +// val struct1RefColumn = Column("struct1") + + implicit class ComplexTypeDslSupport(e : Expression) { +def getStructField(f : String): GetStructField = { + e should be ('resolved) + e.dataType should be (a[StructType]) + val structType = e.dataType.asInstanceOf[StructType] + val ord = structType.fieldNames.indexOf(f) + ord shouldNot be (-1) + GetStructField(e, ord, Some(f)) +} +def getArrayStructField(f : String) : Expression = { + e should be ('resolved) + e.dataType should be (a[ArrayType]) + val arrType = e.dataType.asInstanceOf[ArrayType] + arrType.elementType should be (a[StructType]) + val structType = arrType.elementType.asInstanceOf[StructType] + val ord = structType.fieldNames.indexOf(f) + ord shouldNot be (-1) + GetArrayStructFields(e, structType(ord), ord, 1, arrType.containsNull) +} +def getArrayItem(i : Int) : GetArrayItem = { + e should be ('resolved) + e.dataType should be (a[ArrayType]) + GetArrayItem(e, Literal(i)) +} +def getMapValue(k : Expression) : Expression = { + e should be ('resolved) + e.dataType should be (a[MapType]) + val mapType = e.dataType.asInstanceOf[MapType] + k.dataType shouldEqual mapType.keyType + GetMapValue(e, k) +} + } + + test("explicit") { +val rel = baseOptimizedPlan.select( + CreateNamedStruct("att" :: idRef :: Nil).getStructField("att") as "outerAtt" + ) + +rel.schema shouldEqual + StructType(StructField("outerAtt", LongType, nullable = false) :: Nil) + +val optimized = Optimize execute rel + +val expected = baseOptimizedPlan.select(idRef as "outerAtt") + +comparePlans(optimized, expected) + } + + ignore("explicit - deduced att name") { +val rel =
[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16204 **[Test build #69854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69854/consoleFull)** for PR 16204 at commit [`d2172d1`](https://github.com/apache/spark/commit/d2172d11c968cf30b989de3257faaaf6b17366ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16204 **[Test build #69853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69853/consoleFull)** for PR 16204 at commit [`f77730f`](https://github.com/apache/spark/commit/f77730f6b5deba40e28d0b147ae11cb3ed4af37a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14282: [SPARK-16628][SQL] Don't convert Orc Metastore tables to...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14282 This can be closed now because we don't infer schema from Orc files when converting Hive Orc tables to data source tables anymore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14282: [SPARK-16628][SQL] Don't convert Orc Metastore ta...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/14282 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16135: [SPARK-18700][SQL] Add ReadWriteLock for each tab...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/16135#discussion_r91455760 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -95,7 +95,7 @@ private[sql] class HiveSessionCatalog( } def invalidateCache(): Unit = { -metastoreCatalog.cachedDataSourceTables.invalidateAll() +metastoreCatalog.invalidateAllCache() --- End diff -- Why this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16209 **[Test build #69851 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69851/consoleFull)** for PR 16209 at commit [`6eec6ca`](https://github.com/apache/spark/commit/6eec6ca63c5641d1c9958bdd300ac079d5cf). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16209 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69851/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16135: [SPARK-18700][SQL] Add ReadWriteLock for each tab...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/16135#discussion_r91455755 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -105,7 +105,8 @@ private[sql] class HiveSessionCatalog( // For testing only private[hive] def getCachedDataSourceTable(table: TableIdentifier): LogicalPlan = { val key = metastoreCatalog.getQualifiedTableName(table) -metastoreCatalog.cachedDataSourceTables.getIfPresent(key) +metastoreCatalog.readLock(key, + metastoreCatalog.cachedDataSourceTables.getIfPresent(key)) --- End diff -- Why a read lock here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14365: [SPARK-16628][SQL] Translate file-based relation schema ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14365 @cloud-fan @yhuai @dongjoon-hyun I've updated this as: * Assume metastore schema matches with physical Orc schema by column, disregarding column names. * Mapping required schema to columns in physical Orc schema. * If the length or data types of metastore schema and physical schema is not matched, throw an exception suggesting users to disable `spark.sql.hive.convertMetastoreOrc`. Please let me know what you think about this approach. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16209 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16209 **[Test build #69851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69851/consoleFull)** for PR 16209 at commit [`6eec6ca`](https://github.com/apache/spark/commit/6eec6ca63c5641d1c9958bdd300ac079d5cf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14365: [SPARK-16628][SQL] Translate file-based relation schema ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14365 **[Test build #69852 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69852/consoleFull)** for PR 14365 at commit [`2bb8368`](https://github.com/apache/spark/commit/2bb836868dec18c5f214a2bef45664a22124885e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC d...
GitHub user sureshthalamati opened a pull request: https://github.com/apache/spark/pull/16209 [WIP][SPARK-10849][SQL] Adds option to the JDBC data source for user to specify database column type for the create table ## What changes were proposed in this pull request? Currently JDBC data source creates tables in the target database using the default type mapping, and the JDBC dialect mechanism. Â If users want to specify different database data type for only some of columns, there is no option available. In scenarios where default mapping does not work, users are forced to create tables on the target database before writing. This workaround is probably not acceptable from a usability point of view. This PR is to provide a user-defined type mapping for specific columns. The solution is to allow users to specify database column data type for the create table as JDBC datasource option(createTableColumnTypes) on write. Data type information can be specified as key(column name)-value(data type) pairs in JSON (e.g: {"name":"varchar(128)", "comments":"clob(20k)"}). Users can use org.apache.spark.sql.types.MetadataBuilder to build the metadata and generate the JSON string required for this option. Example: ```Scala val mdb = new MetadataBuilder() mdb.putString("name", "VARCHAR(128)â) mdb.putString("commentsâ, âCLOB(20K)â) val createTableColTypes = mdb.build().json df.write.option("createTableColumnTypes", createTableColTypes).jdbc(url, "TEST.DBCOLTYPETEST", properties) ``` Alternative approach is to add a new column metadata property to the jdbc data source for users to specify database column type using the metadata. TODO : Case-insensitive column name lookup based on the spark.sql.caseSensitive property value. ## How was this patch tested? Added new test case to the JDBCWriteSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/sureshthalamati/spark jdbc_custom_dbtype_option_json-spark-10849 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16209.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16209 commit 6eec6ca63c5641d1c9958bdd300ac079d5cf Author: sureshthalamatiDate: 2016-12-02T23:22:17Z Adding new option to the jdbc to allow users to specify create table column types when table is created on write --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16193 **[Test build #69850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69850/consoleFull)** for PR 16193 at commit [`2c3b917`](https://github.com/apache/spark/commit/2c3b91738fae8286525cabb24c386503a570448b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add ReadWriteLock for each table's re...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/16135 I guess the large number of lock sites is confusing me. We only want to prevent concurrent instantiation of a single table, so shouldn't you only need 1 lock for that site? Also, we should have a unit test that tries to concurrently read from a table from many threads, and verifies via the catalog metrics that it is only loaded once (see `TablePerfStatsSuite` for how to access the metrics). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16208: [WIP][SPARK-10849][SQL] Adds a new column metadata prope...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16208 **[Test build #69849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69849/consoleFull)** for PR 16208 at commit [`3834903`](https://github.com/apache/spark/commit/38349033a306a733e83975ca09b6cf8a8d69d397). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16193: [SPARK-18766] [SQL] Push Down Filter Through BatchEvalPy...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16193 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16208: [WIP][SPARK-10849][SQL] Adds a new column metadat...
GitHub user sureshthalamati opened a pull request: https://github.com/apache/spark/pull/16208 [WIP][SPARK-10849][SQL] Adds a new column metadata property to the jdbc data source for users to specify database column type using the metadata ## What changes were proposed in this pull request? Currently JDBC data source creates tables in the target database using the default type mapping, and the JDBC dialect mechanism. Â If users want to specify different database data type for only some of columns, there is no option available. In scenarios where default mapping does not work, users are forced to create tables on the target database before writing. This workaround is probably not acceptable from a usability point of view. This PR is to provide a user-defined type mapping for specific columns. Â The solution is based on the existing Redshift connector (https://github.com/databricks/spark-redshift#setting-a-custom-column-type). We add a new column metadata property to the jdbc data source for users to specify database column type using the metadata. Â Example : ```Scala val nvarcharMd = new MetadataBuilder().putString(âcreateTableColumnType", "NVARCHAR(123)").build() val newDf = df.withColumn("name", col("name"), nvarcharMd) newDf.write.mode(SaveMode.Overwrite).jdbc(url, "TEST.USERDBTYPETEST", properties) ``` One restriction with this approach metadata modification is unsupported in the Python, SQL, and R language APIs. Users have to create a new data frame to specify the metadata with the _createTableColumnType_ property. Â Alternative approach is to add JDBC data source option for users to specify database column types information as JSON String. TODO: Documentation for specifying the database column type ## How was this patch tested? Added new test case to the JDBCWriteSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/sureshthalamati/spark jdbc_custom_dbtype-spark-10849 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16208.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16208 commit 38349033a306a733e83975ca09b6cf8a8d69d397 Author: sureshthalamatiDate: 2016-12-02T23:22:17Z [SPARK-10849][SQL} Add new jdbc datasource metadata property to allow users to specify database column type when creating table on write. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16149#discussion_r91453574 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -215,6 +215,7 @@ class GeneralizedLinearRegression @Since("2.0.0") (@Since("2.0.0") override val * Sets the value of param [[weightCol]]. * If this is not set or empty, we treat all instance weights as 1.0. * Default is not set, so all instances have weight one. + * In the Binomial model, weights correspond to number of trials. --- End diff -- We should note that the weights should therefore be integers and that they'll be rounded if they are not. Also say "Binomial family" instead of "Binomial model." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16149#discussion_r91453386 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -715,7 +715,7 @@ class GeneralizedLinearRegressionSuite val datasetWithWeight = Seq( Instance(1.0, 1.0, Vectors.dense(0.0, 5.0).toSparse), Instance(0.5, 2.0, Vectors.dense(1.0, 2.0)), - Instance(1.0, 3.0, Vectors.dense(2.0, 1.0)), + Instance(1.0, 0.3, Vectors.dense(2.0, 1.0)), --- End diff -- Hm, so I know this is a pain, but we have special handling implemented for the weight = 0 case, but we never test it. I think we should add a test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16149#discussion_r91453428 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -468,11 +469,7 @@ object GeneralizedLinearRegression extends DefaultParamsReadable[GeneralizedLine override def variance(mu: Double): Double = mu * (1.0 - mu) private def ylogy(y: Double, mu: Double): Double = { - if (y == 0) { -0.0 - } else { -y * math.log(y / mu) - } + if (y == 0) 0.0 else y * math.log(y / mu) --- End diff -- Well the entire thing can go on one line, but only change it if you make another commit since it's trivial. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16203: [SPARK-18774][Core][SQL]Ignore non-existing files...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16203 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16207 Yea i'd say in general that `--allow-empty` be there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16203: [SPARK-18774][Core][SQL]Ignore non-existing files when i...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16203 Actually this doesn't merge cleanly in branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16203: [SPARK-18774][Core][SQL]Ignore non-existing files when i...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16203 Merging in master/branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16207 Yes I have to edit the merge script to `git commit --allow-empty ...` but I don't know that we should always set it. It could prompt or something but I was too lazy to implement that. Anyway that sounds fine and I have my own list of PRs to close that I'll 'flush' with a commit soon too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16202: [SPARK-18662][hotfix] Add new resource-managers director...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16202 Was mesos never included? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12064 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69845/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12064 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12064 **[Test build #69845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69845/consoleFull)** for PR 12064 at commit [`4458a5f`](https://github.com/apache/spark/commit/4458a5f0d16b14095360ec3e2afec6d1db912c7d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14640: [SPARK-17055] [MLLIB] add groupKFold to CrossVali...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14640 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16187 **[Test build #69848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69848/consoleFull)** for PR 16187 at commit [`767ff2f`](https://github.com/apache/spark/commit/767ff2f6c3d960a68417757f1f7110b5376ac01d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15917: SPARK-18252: Using RoaringBitmap for bloom filter...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15917 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15689: [SPARK-9487] Use the same num. worker threads in ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15689 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16188: Branch 1.6 decision tree
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16188 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16206: Branch 2.0
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16206 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16207 Sure, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16207 BTW, I believe we should add `--allow-empty` when it is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16207: [BUILD] Closing some stale/inappropriate PRs
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/16207 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16207 For some reason I couldn't merge this one. I pushed a commit directly to master. Can you close this one now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16187: [SPARK-18760][SQL] Consistent format specification for F...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16187 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16207 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user eyalfa commented on the issue: https://github.com/apache/spark/pull/16043 @hvanhovell, can you please comment on the latest changes? @gatorsmile, @HyukjinKwon, I think I've sorted out most of the formatting issues you guys mentioned, please let me know I missed anything or introduces new ones in the latest push. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16207 **[Test build #69847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69847/consoleFull)** for PR 16207 at commit [`c51011c`](https://github.com/apache/spark/commit/c51011c6a3f1d060c0d767b1d9115c64dcfaa447). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16207: [BUILD] Closing some stale/inappropriate PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16207 cc @srowen Could you take a look and see if they are reasonable please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add ReadWriteLock for each table's re...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/16135 @ericl Thanks for your review. > Is it sufficient to lock around the catalog.filterPartitions(Nil)? Yes, this patch port from 1.6.2 and I missed the diff here. Fixed in next patch. > Why do we need reader locks? Write or Invalid the table cache operation fewer than read it. Reader waiting when there is same table writing cache. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16207: [BUILD] Closing some stale/inappropriate PRs
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16207 [BUILD] Closing some stale/inappropriate PRs ## What changes were proposed in this pull request? This PR proposes to close some stale PRs and ones suggested to be closed by committer(s) or obviously inappropriate PRs (e.g. branch to branch). Closes #15689 Closes #14640 Closes #15917 Closes #16188 Closes #16206 ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark closing-some-prs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16207.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16207 commit c51011c6a3f1d060c0d767b1d9115c64dcfaa447 Author: hyukjinkwonDate: 2016-12-08T06:16:16Z Closing some PRs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16205: [SPARK-18776][SS] Make Offset for FileStreamSource corre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16205 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16205: [SPARK-18776][SS] Make Offset for FileStreamSource corre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16205 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69841/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16205: [SPARK-18776][SS] Make Offset for FileStreamSource corre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16205 **[Test build #69841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69841/consoleFull)** for PR 16205 at commit [`5dda0f3`](https://github.com/apache/spark/commit/5dda0f3b18ed52b2cd89b52d3427bae63cdc866b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class FileStreamSourceOffset(logOffset: Long) extends Offset ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16199: [SPARK-18772][SQL] NaN/Infinite float parsing in ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16199#discussion_r91450072 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1764,37 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { val df2 = spark.read.option("PREfersdecimaL", "true").json(records) assert(df2.schema == schema) } + + test("SPARK-18772: Special floats") { +val records = sparkContext --- End diff -- I think it would be nicer if it has some roundtrip tests in reading and writing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16199: [SPARK-18772][SQL] NaN/Infinite float parsing in JSON is...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16199 @NathanHowell, while tracking down the history, I found similar PR including this in https://github.com/apache/spark/pull/9759/files#diff-8affe5ec7d691943a88e43eb30af656e (this seems reverted due to conflicts of `dev/deps/spark-deps-hadoop*` which is not related with this PR). Would this make sense if we take out the valid changes from there? It seems safe to follow it as the changes there were checked by several reviewers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16195 **[Test build #69846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69846/consoleFull)** for PR 16195 at commit [`10e0c75`](https://github.com/apache/spark/commit/10e0c7522bc6e8ca1c2e45240374db61bf7e5138). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16206: Branch 2.0
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16206 @ming616 please close this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16206: Branch 2.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16206 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility - crea...
Github user vijoshi commented on the issue: https://github.com/apache/spark/pull/16119 @holdenk test case added --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12064 **[Test build #69845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69845/consoleFull)** for PR 12064 at commit [`4458a5f`](https://github.com/apache/spark/commit/4458a5f0d16b14095360ec3e2afec6d1db912c7d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16206: Branch 2.0
GitHub user ming616 opened a pull request: https://github.com/apache/spark/pull/16206 Branch 2.0 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16206.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16206 commit fffcec90b65047c3031c2b96679401f8fbef6337 Author: Shixiong ZhuDate: 2016-09-14T20:33:51Z [SPARK-17463][CORE] Make CollectionAccumulator and SetAccumulator's value can be read thread-safely ## What changes were proposed in this pull request? Make CollectionAccumulator and SetAccumulator's value can be read thread-safely to fix the ConcurrentModificationException reported in [JIRA](https://issues.apache.org/jira/browse/SPARK-17463). ## How was this patch tested? Existing tests. Author: Shixiong Zhu Closes #15063 from zsxwing/SPARK-17463. (cherry picked from commit e33bfaed3b160fbc617c878067af17477a0044f5) Signed-off-by: Josh Rosen commit bb2bdb44032d2e71832b3e0e771590fb2225e4f3 Author: Xing SHI Date: 2016-09-14T20:46:46Z [SPARK-17465][SPARK CORE] Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may lead to memory leak The expression like `if (memoryMap(taskAttemptId) == 0) memoryMap.remove(taskAttemptId)` in method `releaseUnrollMemoryForThisTask` and `releasePendingUnrollMemoryForThisTask` should be called after release memory operation, whatever `memoryToRelease` is > 0 or not. If the memory of a task has been set to 0 when calling a `releaseUnrollMemoryForThisTask` or a `releasePendingUnrollMemoryForThisTask` method, the key in the memory map corresponding to that task will never be removed from the hash map. See the details in [SPARK-17465](https://issues.apache.org/jira/browse/SPARK-17465). Author: Xing SHI Closes #15022 from saturday-shi/SPARK-17465. commit 5c2bc8360019fb08e2e62e50bb261f7ce19b231e Author: codlife <1004910...@qq.com> Date: 2016-09-15T08:38:13Z [SPARK-17521] Error when I use sparkContext.makeRDD(Seq()) ## What changes were proposed in this pull request? when i use sc.makeRDD below ``` val data3 = sc.makeRDD(Seq()) println(data3.partitions.length) ``` I got an error: Exception in thread "main" java.lang.IllegalArgumentException: Positive number of slices required We can fix this bug just modify the last line ,do a check of seq.size ``` def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope { assertNotStopped() val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, defaultParallelism), indexToPrefs) } ``` ## How was this patch tested? manual tests (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: codlife <1004910...@qq.com> Author: codlife Closes #15077 from codlife/master. (cherry picked from commit 647ee05e5815bde361662a9286ac602c44b4d4e6) Signed-off-by: Sean Owen commit a09c258c9a97e701fa7650cc0651e3c6a7a1cab9 Author: junyangq Date: 2016-09-15T17:00:36Z [SPARK-17317][SPARKR] Add SparkR vignette to branch 2.0 ## What changes were proposed in this pull request? This PR adds SparkR vignette to branch 2.0, which works as a friendly guidance going through the functionality provided by SparkR. ## How was this patch tested? R unit test. Author: junyangq Author: Shivaram Venkataraman Author: Junyang Qian Closes #15100 from junyangq/SPARKR-vignette-2.0. commit e77a437d292ecda66163a895427d62e4f72e2a25 Author: Josh Rosen Date: 2016-09-15T18:22:58Z [SPARK-17547] Ensure temp shuffle data file is cleaned up after error SPARK-8029 (#9610) modified shuffle writers to first stage their data
[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16195 **[Test build #69844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69844/consoleFull)** for PR 16195 at commit [`6c21436`](https://github.com/apache/spark/commit/6c21436b3c60245a5c7a679a9bf4844c7092ea5d). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16195 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16195 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69844/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16195 **[Test build #69844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69844/consoleFull)** for PR 16195 at commit [`6c21436`](https://github.com/apache/spark/commit/6c21436b3c60245a5c7a679a9bf4844c7092ea5d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16204 **[Test build #69843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69843/consoleFull)** for PR 16204 at commit [`3199f8f`](https://github.com/apache/spark/commit/3199f8f9265e5d324c50998523a4c85a3590a39c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...
Github user daisukebe commented on the issue: https://github.com/apache/spark/pull/16195 Per @vanzin's suggestion, - revised the code style, - dded a new default variable, - and also fixed the warning: " --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16173 ok,someone else can tell if it is resonable to readd the conf now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16157: [SPARK-18723][DOC] Expanded programming guide inf...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16157#discussion_r91445291 --- Diff: docs/programming-guide.md --- @@ -347,7 +347,7 @@ Some notes on reading files with Spark: Apart from text files, Spark's Scala API also supports several other data formats: -* `SparkContext.wholeTextFiles` lets you read a directory containing multiple small text files, and returns each of them as (filename, content) pairs. This is in contrast with `textFile`, which would return one record per line in each file. +* `SparkContext.wholeTextFiles` lets you read a directory containing multiple small text files, and returns each of them as (filename, content) pairs. This is in contrast with `textFile`, which would return one record per line in each file. It takes an optional second argument for controlling the minimal number of partitions (by default this is 2). It uses [CombineFileInputFormat](https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.html) internally in order to process large numbers of small files effectively by grouping files on the same executor into a single partition. This can lead to sub-optimal partitioning when the file sets would benefit from residing in multiple partitions (e.g., larger partitions would not fit in memory, files are replicated but a large subset is locally reachable from a single executor, subsequent transformations would benefit from multi-core processing). In those cases, set the `minPartitions` argume nt to enforce splitting. --- End diff -- Every element of the result is a file; it's fundamentally different from `textFile`, and can't be that each file therefore ends up in a partition. I don't think this merges files, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16173: [SPARK-18742][CORE]readd spark.broadcast.factory conf to...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16173 Yeah those were Spark classes. This property doesn't seem to be used now. It's possible to restore this but I don't know if it's intended now. Yes I suppose you could update the comment instead, but it doesn't seem like a big deal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14365: [SPARK-16628][SQL] Translate file-based relation schema ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14365 We have two options. First one is to map metastore schema to physical Orc schema like this. But we don't infer physical schema of Orc file now. I will update this to have this mapping in OrcFileFormat. Another one is like #14282. But as we don't infer schema from Orc file now, we can't disable the conversion when the mismatch is detected. One possible is to throw exception in OrcFileFormat when detecting the mismatch before reading and show message to ask user to disable `spark.sql.hive.convertMetastoreOrc`. @cloud-fan @yhuai @dongjoon-hyun What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14365: [SPARK-16628][SQL] Translate file-based relation schema ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14365 @dongjoon-hyun yeah, I see. Because we directly use metastore schema of converted Orc table, when the physical schema in Orc file and metastore schema mismatch, this issue happens. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16201: [SPARK-3359][DOCS] Fix greater-than symbols in Javadoc t...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16201 I see, so this is just another case where changes will keep breaking this. We do need a build that can run this at some point soon here to catch it. But yes, just keep fixing. I would just write "Must be at least 1" instead of "Must be >= 1". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers example cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69842/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers example cod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16148 **[Test build #69842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69842/consoleFull)** for PR 16148 at commit [`ac89b1c`](https://github.com/apache/spark/commit/ac89b1c317eb0e1c090e61bb5c144b0481dd533b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16148: [SPARK-18325][SparkR][ML] SparkR ML wrappers example cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @sethah @srowen I have added a comment to the weigthCol doc for the Binomial case. I also updated to test the case `weight < 0.5`, i.e., `round(weight) = 0`. All tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org