[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22939#discussion_r232166370 --- Diff: R/pkg/R/functions.R --- @@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", schema = "characterOrstructType") column(jc) }) +#' @details +#' \code{schema_of_json}: Parses a JSON string and infers its schema in DDL format. +#' +#' @rdname column_collection_functions +#' @aliases schema_of_json schema_of_json,characterOrColumn-method +#' @examples +#' +#' \dontrun{ +#' json <- '{"name":"Bob"}' +#' df <- sql("SELECT * FROM range(1)") +#' head(select(df, schema_of_json(json)))} +#' @note schema_of_json since 3.0.0 +setMethod("schema_of_json", signature(x = "characterOrColumn"), + function(x, ...) { +if (class(x) == "character") { + col <- callJStatic("org.apache.spark.sql.functions", "lit", x) +} else { + col <- x@jc --- End diff -- hm.. why not just support string then? it's kinda very odd usage in R `schema_of_csv(lit("Amsterdam,2018")))` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22987 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98636/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22987 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22987 **[Test build #98636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98636/testReport)** for PR 22987 at commit [`471092d`](https://github.com/apache/spark/commit/471092d417666f5cf8908318aed098d6f06c4900). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22955 **[Test build #98644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98644/testReport)** for PR 22955 at commit [`38b1555`](https://github.com/apache/spark/commit/38b15552995355d5e00186fb2b332928a83d248a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22955 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4882/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22955 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22955#discussion_r232163956 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.scalatest.Matchers._ + +import org.apache.spark.api.python.PythonEvalType +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.PythonUDF +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.internal.SQLConf._ +import org.apache.spark.sql.types.BooleanType + +class PullOutPythonUDFInJoinConditionSuite extends PlanTest { + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = + Batch("Extract PythonUDF From JoinCondition", Once, +PullOutPythonUDFInJoinCondition) :: + Batch("Check Cartesian Products", Once, +CheckCartesianProducts) :: Nil + } + + val testRelationLeft = LocalRelation('a.int, 'b.int) + val testRelationRight = LocalRelation('c.int, 'd.int) + + // Dummy python UDF for testing. Unable to execute. + val pythonUDF = PythonUDF("pythonUDF", null, +BooleanType, +Seq.empty, +PythonEvalType.SQL_BATCHED_UDF, +udfDeterministic = true) + + val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti) + + test("inner join condition with python udf only") { +val query = testRelationLeft.join( + testRelationRight, + joinType = Inner, + condition = Some(pythonUDF)) +val expected = testRelationLeft.join( + testRelationRight, + joinType = Inner, + condition = None).where(pythonUDF).analyze + +// AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false +val exception = the [AnalysisException] thrownBy { + Optimize.execute(query.analyze) +} +assert(exception.message.startsWith("Detected implicit cartesian product")) + +// pull out the python udf while set spark.sql.crossJoin.enabled=true +withSQLConf(CROSS_JOINS_ENABLED.key -> "true") { + val optimized = Optimize.execute(query.analyze) + comparePlans(optimized, expected) +} + } + + test("left semi join condition with python udf only") { +val query = testRelationLeft.join( + testRelationRight, + joinType = LeftSemi, + condition = Some(pythonUDF)) +val expected = testRelationLeft.join( + testRelationRight, + joinType = Inner, + condition = None).where(pythonUDF).select('a, 'b).analyze + +// AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false +val exception = the [AnalysisException] thrownBy { + Optimize.execute(query.analyze) +} +assert(exception.message.startsWith("Detected implicit cartesian product")) + +// pull out the python udf while set spark.sql.crossJoin.enabled=true +withSQLConf(CROSS_JOINS_ENABLED.key -> "true") { + val optimized = Optimize.execute(query.analyze) + comparePlans(optimized, expected) +} + } + + test("python udf with other common condition") { --- End diff -- Thanks, add more cases in 38b1555. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22955#discussion_r232163715 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.scalatest.Matchers._ + +import org.apache.spark.api.python.PythonEvalType +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.PythonUDF +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.internal.SQLConf._ +import org.apache.spark.sql.types.BooleanType + +class PullOutPythonUDFInJoinConditionSuite extends PlanTest { + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = + Batch("Extract PythonUDF From JoinCondition", Once, +PullOutPythonUDFInJoinCondition) :: + Batch("Check Cartesian Products", Once, +CheckCartesianProducts) :: Nil + } + + val testRelationLeft = LocalRelation('a.int, 'b.int) + val testRelationRight = LocalRelation('c.int, 'd.int) + + // Dummy python UDF for testing. Unable to execute. + val pythonUDF = PythonUDF("pythonUDF", null, +BooleanType, +Seq.empty, +PythonEvalType.SQL_BATCHED_UDF, +udfDeterministic = true) + + val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti) --- End diff -- Thanks, done in 38b1555. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22955#discussion_r232163787 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.scalatest.Matchers._ + +import org.apache.spark.api.python.PythonEvalType +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.PythonUDF +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.internal.SQLConf._ +import org.apache.spark.sql.types.BooleanType + +class PullOutPythonUDFInJoinConditionSuite extends PlanTest { + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = + Batch("Extract PythonUDF From JoinCondition", Once, +PullOutPythonUDFInJoinCondition) :: + Batch("Check Cartesian Products", Once, +CheckCartesianProducts) :: Nil + } + + val testRelationLeft = LocalRelation('a.int, 'b.int) + val testRelationRight = LocalRelation('c.int, 'd.int) + + // Dummy python UDF for testing. Unable to execute. + val pythonUDF = PythonUDF("pythonUDF", null, +BooleanType, +Seq.empty, +PythonEvalType.SQL_BATCHED_UDF, +udfDeterministic = true) + + val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti) + + test("inner join condition with python udf only") { --- End diff -- Sorry for this, done in 38b1555. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22955#discussion_r232163738 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.scalatest.Matchers._ + +import org.apache.spark.api.python.PythonEvalType +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.PythonUDF +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.internal.SQLConf._ +import org.apache.spark.sql.types.BooleanType + +class PullOutPythonUDFInJoinConditionSuite extends PlanTest { + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = + Batch("Extract PythonUDF From JoinCondition", Once, +PullOutPythonUDFInJoinCondition) :: + Batch("Check Cartesian Products", Once, +CheckCartesianProducts) :: Nil + } + + val testRelationLeft = LocalRelation('a.int, 'b.int) + val testRelationRight = LocalRelation('c.int, 'd.int) + + // Dummy python UDF for testing. Unable to execute. + val pythonUDF = PythonUDF("pythonUDF", null, +BooleanType, +Seq.empty, +PythonEvalType.SQL_BATCHED_UDF, +udfDeterministic = true) + + val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti) + + test("inner join condition with python udf only") { +val query = testRelationLeft.join( + testRelationRight, + joinType = Inner, + condition = Some(pythonUDF)) +val expected = testRelationLeft.join( + testRelationRight, + joinType = Inner, + condition = None).where(pythonUDF).analyze + +// AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false +val exception = the [AnalysisException] thrownBy { --- End diff -- Thanks, done in 38b1555. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22990 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98638/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22990 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22990 **[Test build #98638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98638/testReport)** for PR 22990 at commit [`17b725c`](https://github.com/apache/spark/commit/17b725c79ad602df20c44cacb92e7c6abd84cdda). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22974 **[Test build #98643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98643/testReport)** for PR 22974 at commit [`2fc7247`](https://github.com/apache/spark/commit/2fc72471b1ce0c701bae20555c6b34126ec620bc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4881/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22990 **[Test build #98642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98642/testReport)** for PR 22990 at commit [`52f2b1e`](https://github.com/apache/spark/commit/52f2b1e84596c8b877c3557c9821e6d0c9948397). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22990 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4880/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22990 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22966#discussion_r232155608 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.benchmark + +import java.io.File + +import scala.util.Random + +import org.apache.spark.SparkConf +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.{DataFrame, SparkSession} +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.types._ + +/** + * Benchmark to measure Avro read performance. + * {{{ + * To run this benchmark: + * 1. without sbt: bin/spark-submit --class + *--jars , + * 2. build/sbt "avro/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain " + * Results will be written to "benchmarks/AvroReadBenchmark-results.txt". + * }}} + */ +object AvroReadBenchmark extends BenchmarkBase with SQLHelper { + val conf = new SparkConf() + conf.set("spark.sql.avro.compression.codec", "snappy") --- End diff -- Since this is the default value, I think we can remove line 41 ~ 49. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22966#discussion_r232155430 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.benchmark + +import java.io.File + +import scala.util.Random + +import org.apache.spark.SparkConf +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.{DataFrame, SparkSession} +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.types._ + +/** + * Benchmark to measure Avro read performance. + * {{{ + * To run this benchmark: + * 1. without sbt: bin/spark-submit --class + *--jars , + * 2. build/sbt "avro/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain " + * Results will be written to "benchmarks/AvroReadBenchmark-results.txt". + * }}} + */ +object AvroReadBenchmark extends BenchmarkBase with SQLHelper { --- End diff -- @gengliangwang . Can we use `SqlBasedBenchmark` for consistency? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22973 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98639/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22973 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22973 **[Test build #98639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98639/testReport)** for PR 22973 at commit [`4ca71fc`](https://github.com/apache/spark/commit/4ca71fc75d0a25ced9803372b0594ae8342b5eb9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22975: [SPARK-20156][SQL][ML][FOLLOW-UP] Java String toLowerCas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22975 **[Test build #98641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98641/testReport)** for PR 22975 at commit [`aa5aa8e`](https://github.com/apache/spark/commit/aa5aa8e2094ded81cf13e15bd3c59beac2886f7b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22975: [SPARK-20156][SQL][ML][FOLLOW-UP] Java String toLowerCas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22975 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22975: [SPARK-20156][SQL][ML][FOLLOW-UP] Java String toLowerCas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22975 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4879/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22975: [SPARK-20156][SQL][ML][FOLLOW-UP] Java String toLowerCas...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/22975 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22979 **[Test build #98640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98640/testReport)** for PR 22979 at commit [`64a97a2`](https://github.com/apache/spark/commit/64a97a27e4b22e605f3b2ddfebb7eaebdebc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22973 **[Test build #98639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98639/testReport)** for PR 22973 at commit [`4ca71fc`](https://github.com/apache/spark/commit/4ca71fc75d0a25ced9803372b0594ae8342b5eb9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98635/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22974 **[Test build #98635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98635/testReport)** for PR 22974 at commit [`90a4d54`](https://github.com/apache/spark/commit/90a4d54387fcb110b01e34a5603a3fdbe2d35731). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22990 good catch! LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22990: [SPARK-25988] [SQL] Keep names unchanged when ded...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22990#discussion_r232148751 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2856,6 +2856,59 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { checkAnswer(sql("select 26393499451 / (1e6 * 1000)"), Row(BigDecimal("26.393499451"))) } } + + test("self join with aliases on partitioned tables #1") { --- End diff -- let's put the JIRA ticket number in the test name --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22990: [SPARK-25988] [SQL] Keep names unchanged when ded...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22990#discussion_r232148583 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2856,6 +2856,59 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { checkAnswer(sql("select 26393499451 / (1e6 * 1000)"), Row(BigDecimal("26.393499451"))) } } + + test("self join with aliases on partitioned tables #1") { +withTempView("tmpView1", "tmpView2") { + withTable("tab1", "tab2") { +sql( + """ +|CREATE TABLE `tab1` (`col1` INT, `TDATE` DATE) +|USING CSV +|PARTITIONED BY (TDATE) + """.stripMargin) +spark.table("tab1").where("TDATE >= '2017-08-15'").createOrReplaceTempView("tmpView1") +sql("CREATE TABLE `tab2` (`TDATE` DATE) USING parquet") +sql( + """ +|CREATE OR REPLACE TEMPORARY VIEW tmpView2 AS +|SELECT N.tdate, col1 AS aliasCol1 +|FROM tmpView1 N +|JOIN tab2 Z +|ON N.tdate = Z.tdate + """.stripMargin) +withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") { + sql("SELECT * FROM tmpView2 x JOIN tmpView2 y ON x.tdate = y.tdate").collect() +} + } +} + } + + test("self join with aliases on partitioned tables #2") { +withTempView("tmp") { + withTable("tab1", "tab2") { +sql( + """ +|CREATE TABLE `tab1` (`EX` STRING, `TDATE` DATE) +|USING parquet +|PARTITIONED BY (tdate) + """.stripMargin) +sql("CREATE TABLE `tab2` (`TDATE` DATE) USING parquet") +sql( + """ +|CREATE OR REPLACE TEMPORARY VIEW TMP as +|SELECT N.tdate, EX AS new_ex +|FROM tab1 N +|JOIN tab2 Z +|ON N.tdate = Z.tdate --- End diff -- nit: `ON N.tdate = Z.tdate` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22975: [SPARK-20156][SQL][ML][FOLLOW-UP] Java String toLowerCas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22975 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22975: [SPARK-20156][SQL][ML][FOLLOW-UP] Java String toLowerCas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22975 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98634/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22975: [SPARK-20156][SQL][ML][FOLLOW-UP] Java String toLowerCas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22975 **[Test build #98634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98634/testReport)** for PR 22975 at commit [`aa5aa8e`](https://github.com/apache/spark/commit/aa5aa8e2094ded81cf13e15bd3c59beac2886f7b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22990 **[Test build #98638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98638/testReport)** for PR 22990 at commit [`17b725c`](https://github.com/apache/spark/commit/17b725c79ad602df20c44cacb92e7c6abd84cdda). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22990 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4878/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22990 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22990 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22990: [SPARK-25988] [SQL] Keep names unchanged when ded...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/22990 [SPARK-25988] [SQL] Keep names unchanged when deduplicating the column names in Analyzer ## What changes were proposed in this pull request? When the queries do not use the column names with the same case, users might hit various errors. Below is a typical test failure they can hit. ``` Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); org.apache.spark.sql.AnalysisException: Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:146) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.listPartitionsByFilter(InMemoryCatalog.scala:560) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925) ``` ## How was this patch tested? Added two test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark fix1283 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22990.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22990 commit 5e9f6f345b93d3370906c7b2d73ede15f4089c29 Author: gatorsmile Date: 2018-11-09T05:27:37Z fix commit 17b725c79ad602df20c44cacb92e7c6abd84cdda Author: gatorsmile Date: 2018-11-09T05:33:58Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22987 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98633/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22987 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22987 **[Test build #98633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98633/testReport)** for PR 22987 at commit [`2da6f99`](https://github.com/apache/spark/commit/2da6f998e4ee95d6cfbf2e8258c3a160220a366c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22275 ping @HyukjinKwon and @viirya to maybe take another look at the recent changes to make this cleaner, if you are able to. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/22275#discussion_r232145973 --- Diff: python/pyspark/sql/tests.py --- @@ -4923,6 +4923,28 @@ def test_timestamp_dst(self): self.assertPandasEqual(pdf, df_from_python.toPandas()) self.assertPandasEqual(pdf, df_from_pandas.toPandas()) +def test_toPandas_batch_order(self): + +# Collects Arrow RecordBatches out of order in driver JVM then re-orders in Python +def run_test(num_records, num_parts, max_records): +df = self.spark.range(num_records, numPartitions=num_parts).toDF("a") +with self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": max_records}): +pdf, pdf_arrow = self._toPandas_arrow_toggle(df) +self.assertPandasEqual(pdf, pdf_arrow) + +cases = [ +(1024, 512, 2), # Try large num partitions for good chance of not collecting in order +(512, 64, 2),# Try medium num partitions to test out of order collection +(64, 8, 2), # Try small number of partitions to test out of order collection +(64, 64, 1), # Test single batch per partition +(64, 1, 64), # Test single partition, single batch +(64, 1, 8), # Test single partition, multiple batches +(30, 7, 2), # Test different sized partitions +] --- End diff -- @holdenk , I updated the tests, please take another look when you get a chance. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22880 Let me take a look on this weekends. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22989: [SPARK-25986][Build] Banning throw new OutOfMemoryErrors
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22989 **[Test build #98637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98637/testReport)** for PR 22989 at commit [`d678751`](https://github.com/apache/spark/commit/d67875115f622082519b1dbcb1c1e34c2184b34f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22989: [SPARK-25986][Build] Banning throw new OutOfMemoryErrors
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22989 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22989: [SPARK-25986][Build] Banning throw new OutOfMemoryErrors
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4877/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22989: [SPARK-25986][Build] Banning throw new OutOfMemor...
GitHub user xuanyuanking opened a pull request: https://github.com/apache/spark/pull/22989 [SPARK-25986][Build] Banning throw new OutOfMemoryErrors ## What changes were proposed in this pull request? Add scala and java lint check rules to ban the usage of `throw new OutOfMemoryErrors` cause it will cause hole executor killed. See more details in https://github.com/apache/spark/pull/22969. ## How was this patch tested? Local test with lint-scala and lint-java. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuanyuanking/spark SPARK-25986 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22989.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22989 commit d67875115f622082519b1dbcb1c1e34c2184b34f Author: Yuanjian Li Date: 2018-11-09T05:23:01Z banning throw new OutOfMemoryErrors --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22966: [SPARK-25965][SQL][TEST] Add avro read benchmark
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22966 @dongjoon-hyun I think we can merge this one first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22976 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98632/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22976 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22976 **[Test build #98632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98632/testReport)** for PR 22976 at commit [`5acf2a4`](https://github.com/apache/spark/commit/5acf2a44ef12b1af4457f07ff1bee6476c9b27d1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22976 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22976 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98631/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22976: [SPARK-25974][SQL]Optimizes Generates bytecode for order...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22976 **[Test build #98631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98631/testReport)** for PR 22976 at commit [`b07acdb`](https://github.com/apache/spark/commit/b07acdbb95b43f3cbfdf6c5c5e42dcab828937bc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22988: [SPARK-25984][CORE][SQL][STREAMING] Remove deprecated .n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22988 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22988: [SPARK-25984][CORE][SQL][STREAMING] Remove deprecated .n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22988 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98627/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22988: [SPARK-25984][CORE][SQL][STREAMING] Remove deprecated .n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22988 **[Test build #98627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98627/testReport)** for PR 22988 at commit [`55ac7c0`](https://github.com/apache/spark/commit/55ac7c09d251ecb0ca21eac3c2fcffafe53c2960). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22987 **[Test build #98636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98636/testReport)** for PR 22987 at commit [`471092d`](https://github.com/apache/spark/commit/471092d417666f5cf8908318aed098d6f06c4900). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22987 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22987 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4876/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/22974 not all public serializable classes are needed to registered. Only those one which needed ser-deser should be registered, one important groups should be transformers and prediction models. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22987 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22987: [SPARK-25979][SQL] Window function: allow parenth...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22987#discussion_r232136609 --- Diff: sql/core/src/test/resources/sql-tests/inputs/window.sql --- @@ -109,3 +109,9 @@ last_value(false, false) OVER w AS last_value_contain_null FROM testData WINDOW w AS () ORDER BY cate, val; + +-- parentheses around window reference +SELECT cate, sum(val) OVER (w) +FROM testData +WHERE val is not null +WINDOW w AS (PARTITION BY cate ORDER BY val); --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22985: [SPARK-25510][SQL][TEST][FOLLOW-UP] Remove Benchm...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22985 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/22974 I am not sure, but maybe all serializable classes need to be registered. Since `MultivariateGaussian` is a public class, so I think we need to add it. I also wonder whether a test is needed. If no longer needed, I can list all other public ones in ML in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98629/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98629/testReport)** for PR 22275 at commit [`7dc92c8`](https://github.com/apache/spark/commit/7dc92c8d0dca69e254088fd6e1f3e15da1f90fbe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22985: [SPARK-25510][SQL][TEST][FOLLOW-UP] Remove BenchmarkWith...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22985 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22939: [SPARK-25446][R] Add schema_of_json() and schema_of_csv(...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22939 Hey @felixcheung, it should be ready for another look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22987: [SPARK-25979][SQL] Window function: allow parenth...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22987#discussion_r232135028 --- Diff: sql/core/src/test/resources/sql-tests/inputs/window.sql --- @@ -109,3 +109,9 @@ last_value(false, false) OVER w AS last_value_contain_null FROM testData WINDOW w AS () ORDER BY cate, val; + +-- parentheses around window reference +SELECT cate, sum(val) OVER (w) +FROM testData +WHERE val is not null +WINDOW w AS (PARTITION BY cate ORDER BY val); --- End diff -- need a new line at the end. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98630/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22954 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98628/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22954 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22954 **[Test build #98628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98628/testReport)** for PR 22954 at commit [`2ba6add`](https://github.com/apache/spark/commit/2ba6addbcd52940ef989880bff69fe126a4dd2e1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98630/testReport)** for PR 22275 at commit [`8045fac`](https://github.com/apache/spark/commit/8045facbe523c89b91b930203bb6874d82d08a4d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22974 OK, that's the issue, yeah. Registration is an optimization. I wonder, what other classes should we add if we're going to add this one? I don't know if it needs a test. But if there are 10 other somewhat commonly-used classes that are serialized during Spark ML operations, they should be registered. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/22974 Do you mean fail in this pr? It was caused by a non-registered filed `BDM[Double]`. `MultivariateGaussian` is used in GMM, kryo-registration should help performance. As to mllib-local's dependency, it is another thing: current kryo-regiestered classes, like 'ml.linalg.Vector', 'ml.linalg.Matrix', do not have kryo test in their testsuites. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22974 You're requiring registration, which is what makes this fail, right? why do that? I think I'm missing something. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/22974 @srowen Existing kryo-register testsuite need to import spark-core: ``` import org.apache.spark.SparkConf import org.apache.spark.serializer.KryoSerializer val conf = new SparkConf(false) conf.set("spark.kryo.registrationRequired", "true") val ser = new KryoSerializer(conf).newInstance() ``` Since mllib-local is not dependent on spark-core, current classes in mllib-local do not test kryo-serialization at all. E.g. `mllib.linalg.VectorsSuite` contains test `test("kryo class register")`, while `ml.linalg.VectorsSuite` do not have it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4875/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22974 **[Test build #98635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98635/testReport)** for PR 22974 at commit [`90a4d54`](https://github.com/apache/spark/commit/90a4d54387fcb110b01e34a5603a3fdbe2d35731). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4874/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][Core][MLLib][FollowUp] Safely register Mul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22975: [SPARK-20156][SQL][ML][FOLLOW-UP] Java String toLowerCas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22975 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4873/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22975: [SPARK-20156][SQL][ML][FOLLOW-UP] Java String toLowerCas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22975 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22975: [SPARK-20156][SQL][ML][FOLLOW-UP] Java String toLowerCas...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/22975 @srowen Yes, we should keep user input data and column names. Thanks for your explain! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22975: [SPARK-20156][SQL][ML][FOLLOW-UP] Java String toLowerCas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22975 **[Test build #98634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98634/testReport)** for PR 22975 at commit [`aa5aa8e`](https://github.com/apache/spark/commit/aa5aa8e2094ded81cf13e15bd3c59beac2886f7b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22987 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4872/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22987 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org