[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17673 @shubhamchopra are you still working on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/13599 I'm interested in us fixing this, especially after yesterday when I spent several hours working with workaround hacks. But I want us to do something not YARN specific and not involve a large slow down on worker creation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21049 @henryr Could you update the PR based on the review? We can safely drop them in scalar subqueries and nested subqueries --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20908 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92976/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20908 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92977/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18139: [SPARK-20787][PYTHON] PySpark can't handle datetimes bef...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18139 @rberenguel is this still on your radar? Also jenkins ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20908 **[Test build #92976 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92976/testReport)** for PR 20908 at commit [`f5aeafc`](https://github.com/apache/spark/commit/f5aeafc5ee474ea41cd00acbf8660957d15d5c64). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20629: [SPARK-23451][ML] Deprecate KMeans.computeCost
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/20629#discussion_r202423675 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala --- @@ -37,7 +37,7 @@ import org.apache.spark.sql.{Row, SparkSession} */ @Since("0.8.0") class KMeansModel @Since("2.4.0") (@Since("1.0.0") val clusterCenters: Array[Vector], - @Since("2.4.0") val distanceMeasure: String) + @Since("2.4.0") val distanceMeasure: String, @Since("2.4.0") val trainingCost: Double) --- End diff -- Since we changed the constructor here, and since it is not private, we should provide a similar (and deprecated) constructor without training cost which calls this with the default value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92977/testReport)** for PR 21748 at commit [`88a9d7f`](https://github.com/apache/spark/commit/88a9d7fa94e17e55f8e28d8922cff759625b1e42). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 **[Test build #92980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92980/testReport)** for PR 21583 at commit [`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92979/testReport)** for PR 21748 at commit [`88a9d7f`](https://github.com/apache/spark/commit/88a9d7fa94e17e55f8e28d8922cff759625b1e42). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r202423258 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -378,6 +378,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_FILTER_PUSHDOWN_TIMESTAMP_ENABLED = +buildConf("spark.sql.parquet.filterPushdown.timestamp") + .doc("If true, enables Parquet filter push-down optimization for Timestamp. " + +"This configuration only has an effect when 'spark.sql.parquet.filterPushdown' is " + +"enabled and Timestamp stored as TIMESTAMP_MICROS or TIMESTAMP_MILLIS type.") --- End diff -- You need to explain how to use `spark.sql.parquet.outputTimestampType` to control the Parquet timestamp type Spark uses to writes parquet files. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21583 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21748 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/931/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/931/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/931/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20611 **[Test build #92978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92978/testReport)** for PR 20611 at commit [`9ceeb30`](https://github.com/apache/spark/commit/9ceeb30ae0f0b04ac46980c499c9c286ba68e20a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92977/testReport)** for PR 21748 at commit [`88a9d7f`](https://github.com/apache/spark/commit/88a9d7fa94e17e55f8e28d8922cff759625b1e42). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202418834 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -222,6 +225,14 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: // See SPARK-20364. def canMakeFilterOn(name: String): Boolean = nameToType.contains(name) && !name.contains(".") +// All DataTypes that support `makeEq` can provide better performance. +def shouldConvertInPredicate(name: String): Boolean = nameToType(name) match { --- End diff -- Also need to update the benchmark suite. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202418683 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -222,6 +225,14 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: // See SPARK-20364. def canMakeFilterOn(name: String): Boolean = nameToType.contains(name) && !name.contains(".") +// All DataTypes that support `makeEq` can provide better performance. +def shouldConvertInPredicate(name: String): Boolean = nameToType(name) match { --- End diff -- It depends on which PR will be merged first. The corresponding PRs should update this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202418582 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -222,6 +225,14 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: // See SPARK-20364. def canMakeFilterOn(name: String): Boolean = nameToType.contains(name) && !name.contains(".") +// All DataTypes that support `makeEq` can provide better performance. +def shouldConvertInPredicate(name: String): Boolean = nameToType(name) match { --- End diff -- Let us keep it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202418387 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -376,7 +374,8 @@ class ParquetFileFormat // Collects all converted Parquet filter predicates. Notice that not all predicates can be // converted (`ParquetFilters.createFilter` returns an `Option`). That's why a `flatMap` // is used here. - .flatMap(new ParquetFilters(pushDownDate, pushDownStringStartWith) + .flatMap(new ParquetFilters(pushDownDate, pushDownStringStartWith, --- End diff -- let us create `val parquetFilters = new ParquetFilters(pushDownDate, pushDownStringStartWith, pushDownInFilterThreshold )` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21748 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/21757#discussion_r202417166 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -254,13 +254,15 @@ class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case i @ InsertIntoTable(UnresolvedCatalogRelation(tableMeta), _, _, _, _) -if DDLUtils.isDatasourceTable(tableMeta) => +if DDLUtils.isDatasourceTable(tableMeta) && + DDLUtils.convertSchema(tableMeta, sparkSession) => --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...
Github user CodingCat closed the pull request at: https://github.com/apache/spark/pull/21757 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20908 **[Test build #92976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92976/testReport)** for PR 20908 at commit [`f5aeafc`](https://github.com/apache/spark/commit/f5aeafc5ee474ea41cd00acbf8660957d15d5c64). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20908 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/930/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20908 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21757#discussion_r202416374 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -254,13 +254,15 @@ class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case i @ InsertIntoTable(UnresolvedCatalogRelation(tableMeta), _, _, _, _) -if DDLUtils.isDatasourceTable(tableMeta) => +if DDLUtils.isDatasourceTable(tableMeta) && + DDLUtils.convertSchema(tableMeta, sparkSession) => --- End diff -- If you are using `format("parquet")` to create a new table, it will be a data source table. We always use the native reader/writer to read/write such a table. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21762 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92975/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21762 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21762 **[Test build #92975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92975/testReport)** for PR 21762 at commit [`bb7a43c`](https://github.com/apache/spark/commit/bb7a43c8f3e34c90ebe8f0e22019c096776b6da3). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) ` * ` sealed trait CatalystDataUpdater ` * ` final class RowUpdater(row: InternalRow) extends CatalystDataUpdater ` * ` final class ArrayDataUpdater(array: ArrayData) extends CatalystDataUpdater ` * `class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable: Boolean) ` * `class IncompatibleSchemaException(msg: String, ex: Throwable = null) extends Exception(msg, ex)` * `class SerializableSchema(@transient var value: Schema)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20908: [WIP][SPARK-23672][PYTHON] Document support for nested r...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/20908 Jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21763: Branch 2.1
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21763 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/21757#discussion_r202414440 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -254,13 +254,15 @@ class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case i @ InsertIntoTable(UnresolvedCatalogRelation(tableMeta), _, _, _, _) -if DDLUtils.isDatasourceTable(tableMeta) => +if DDLUtils.isDatasourceTable(tableMeta) && + DDLUtils.convertSchema(tableMeta, sparkSession) => --- End diff -- do you mean any table built through df.write.format("..") should be taken as a data source table no matter we register it with HMS or not --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21763: Branch 2.1
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21763 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21763: Branch 2.1
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21763 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and r...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21760 This breaks the build. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.6/7842/ I need to revert it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21763: Branch 2.1
GitHub user rajesh7738 opened a pull request: https://github.com/apache/spark/pull/21763 Branch 2.1 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21763.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21763 commit 664c9795c94d3536ff9fe54af06e0fb6c0012862 Author: Shixiong Zhu Date: 2017-03-04T03:00:35Z [SPARK-19816][SQL][TESTS] Fix an issue that DataFrameCallbackSuite doesn't recover the log level ## What changes were proposed in this pull request? "DataFrameCallbackSuite.execute callback functions when a DataFrame action failed" sets the log level to "fatal" but doesn't recover it. Hence, tests running after it won't output any logs except fatal logs. This PR uses `testQuietly` instead to avoid changing the log level. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #17156 from zsxwing/SPARK-19816. (cherry picked from commit fbc4058037cf5b0be9f14a7dd28105f7f8151bed) Signed-off-by: Yin Huai commit ca7a7e8a893a30d85e4315a4fa1ca1b1c56a703c Author: uncleGen Date: 2017-03-06T02:17:30Z [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should not filter checkpointFilesOfLatestTime with the PATH string. ## What changes were proposed in this pull request? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73800/testReport/ ``` sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 617 times over 10.003740484 seconds. Last failure message: 8 did not equal 2. at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438) at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:336) at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478) at org.apache.spark.streaming.DStreamCheckpointTester$class.generateOutput(CheckpointSuite .scala:172) at org.apache.spark.streaming.CheckpointSuite.generateOutput(CheckpointSuite.scala:211) ``` the check condition is: ``` val checkpointFilesOfLatestTime = Checkpoint.getCheckpointFiles(checkpointDir).filter { _.toString.contains(clock.getTimeMillis.toString) } // Checkpoint files are written twice for every batch interval. So assert that both // are written to make sure that both of them have been written. assert(checkpointFilesOfLatestTime.size === 2) ``` the path string may contain the `clock.getTimeMillis.toString`, like `3500` : ``` file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-500 file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-1000 file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-1500 file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-2000 file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-2500 file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3000 file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3500.bk file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3500 â²â²â²â² ``` so we should only check the filename, but not the whole path. ## How was this patch tested? Jenkins. Author: uncleGen Closes #17167 from uncleGen/flaky-CheckpointSuite. (cherry picked from commit 207067ead6db6dc87b0d144a658e2564e3280a89) Signed-off-by: Shixiong Zhu commit fd6c6d5c363008a229759bf628edc0f6c5e00ade Author: Tyson Condie Date: 2017-03-07T00:39:05Z [SPARK-19719][SS] Kafka writer
[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21762 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21762 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/929/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21762: [SPARK-24800][SQL] Refactor Avro Serializer and Deserial...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21762 **[Test build #92975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92975/testReport)** for PR 21762 at commit [`bb7a43c`](https://github.com/apache/spark/commit/bb7a43c8f3e34c90ebe8f0e22019c096776b6da3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21762: [SPARK-24800][SQL] Refactor Avro Serializer and D...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21762 [SPARK-24800][SQL] Refactor Avro Serializer and Deserializer ## What changes were proposed in this pull request? Currently the Avro Deserializer converts input Avro format data to `Row`, and then convert the `Row` to `InternalRow`. While the Avro Serializer converts `InternalRow` to `Row`, and then output Avro format data. This PR allows direct conversion between `InternalRow` and Avro format data. Credits to @cloud-fan . ## How was this patch tested? Unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark avro_io Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21762.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21762 commit bb7a43c8f3e34c90ebe8f0e22019c096776b6da3 Author: Gengliang Wang Date: 2018-07-13T08:18:12Z refactor avro Serializer and Deserializer --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21720 ping @maryannxue Resolve the conflicts? Will review it again after that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add read schema suite for file-...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Thank you so much, @gatorsmile . Sure. I'll make a PR to improve error handling for that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add read schema suite for file-...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Also, thank you, @HyukjinKwon . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtil...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21760 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21745: [SPARK-24781][SQL] Using a reference from Dataset...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21745 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21761 **[Test build #92974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92974/testReport)** for PR 21761 at commit [`531be9a`](https://github.com/apache/spark/commit/531be9a84ff5f2c99d3c8b7b223d8dd2cbf596cf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21761 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/928/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21761 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21745 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21761 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92973/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21761 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21761 **[Test build #92973 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92973/testReport)** for PR 21761 at commit [`cd9d0e6`](https://github.com/apache/spark/commit/cd9d0e6b76241f4eaf609ed1b5721c96f4d149b0). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21761 **[Test build #92973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92973/testReport)** for PR 21761 at commit [`cd9d0e6`](https://github.com/apache/spark/commit/cd9d0e6b76241f4eaf609ed1b5721c96f4d149b0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21761 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/927/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21761 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21761 [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2 ## What changes were proposed in this pull request? Upgrade Apache Avro from 1.7.7 to 1.8.2. The major new features: 1. More logical types. From the spec of 1.8.2 https://avro.apache.org/docs/1.8.2/spec.html#Logical+Types we can see comparing to [1.7.7](https://avro.apache.org/docs/1.7.7/spec.html#Logical+Types), the new version support: - Date - Time (millisecond precision) - Time (microsecond precision) - Timestamp (millisecond precision) - Timestamp (microsecond precision) - Duration 2. Single-object encoding: https://avro.apache.org/docs/1.8.2/spec.html#single_object_encoding This PR aims to update Apache Spark to support these new features. ## How was this patch tested? Unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark upgrade_avro_1.8 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21761.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21761 commit cd9d0e6b76241f4eaf609ed1b5721c96f4d149b0 Author: Gengliang Wang Date: 2018-07-13T09:03:56Z upgrade Apache AVRO to 1.8.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20611 Look worth going ahead to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user mrow4a commented on the issue: https://github.com/apache/spark/pull/21748 Can we remove this as a part of this PR? https://github.com/apache/spark/blob/master/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L87 - it seems to set a client mode by default.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r202363783 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand( s"partitioned, but a partition spec was provided.") } } - -val loadPath = +val loadPath = { if (isLocal) { -val uri = Utils.resolveURI(path) -val file = new File(uri.getPath) -val exists = if (file.getAbsolutePath.contains("*")) { - val fileSystem = FileSystems.getDefault - val dir = file.getParentFile.getAbsolutePath - if (dir.contains("*")) { -throw new AnalysisException( - s"LOAD DATA input path allows only filename wildcard: $path") - } - - // Note that special characters such as "*" on Windows are not allowed as a path. - // Calling `WindowsFileSystem.getPath` throws an exception if there are in the path. - val dirPath = fileSystem.getPath(dir) - val pathPattern = new File(dirPath.toAbsolutePath.toString, file.getName).toURI.getPath - val safePathPattern = if (Utils.isWindows) { -// On Windows, the pattern should not start with slashes for absolute file paths. -pathPattern.stripPrefix("/") - } else { -pathPattern - } - val files = new File(dir).listFiles() - if (files == null) { -false - } else { -val matcher = fileSystem.getPathMatcher("glob:" + safePathPattern) -files.exists(f => matcher.matches(fileSystem.getPath(f.getAbsolutePath))) - } -} else { - new File(file.getAbsolutePath).exists() -} -if (!exists) { - throw new AnalysisException(s"LOAD DATA input path does not exist: $path") -} -uri +val localFS = FileContext.getLocalFSFileContext() +localFS.makeQualified(new Path(path)) } else { -val uri = new URI(path) -val hdfsUri = if (uri.getScheme() != null && uri.getAuthority() != null) { - uri -} else { - // Follow Hive's behavior: - // If no schema or authority is provided with non-local inpath, - // we will use hadoop configuration "fs.defaultFS". - val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") - val defaultFS = if (defaultFSConf == null) { -new URI("") - } else { -new URI(defaultFSConf) - } - - val scheme = if (uri.getScheme() != null) { --- End diff -- Hm. I was trying to understand where this logic went. I see that's sort of in the call to `makeQualified`. I couldn't find the docs for that method overload though because it's actually "LimitedPrivate" in Hadoop. I think we shouldn't call this method? can we instead just restore this logic? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r202360482 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand( s"partitioned, but a partition spec was provided.") } } - -val loadPath = +val loadPath = { if (isLocal) { -val uri = Utils.resolveURI(path) -val file = new File(uri.getPath) -val exists = if (file.getAbsolutePath.contains("*")) { - val fileSystem = FileSystems.getDefault - val dir = file.getParentFile.getAbsolutePath - if (dir.contains("*")) { -throw new AnalysisException( - s"LOAD DATA input path allows only filename wildcard: $path") - } - - // Note that special characters such as "*" on Windows are not allowed as a path. - // Calling `WindowsFileSystem.getPath` throws an exception if there are in the path. - val dirPath = fileSystem.getPath(dir) - val pathPattern = new File(dirPath.toAbsolutePath.toString, file.getName).toURI.getPath - val safePathPattern = if (Utils.isWindows) { -// On Windows, the pattern should not start with slashes for absolute file paths. -pathPattern.stripPrefix("/") - } else { -pathPattern - } - val files = new File(dir).listFiles() - if (files == null) { -false - } else { -val matcher = fileSystem.getPathMatcher("glob:" + safePathPattern) -files.exists(f => matcher.matches(fileSystem.getPath(f.getAbsolutePath))) - } -} else { - new File(file.getAbsolutePath).exists() -} -if (!exists) { - throw new AnalysisException(s"LOAD DATA input path does not exist: $path") -} -uri +val localFS = FileContext.getLocalFSFileContext() +localFS.makeQualified(new Path(path)) } else { -val uri = new URI(path) -val hdfsUri = if (uri.getScheme() != null && uri.getAuthority() != null) { - uri -} else { - // Follow Hive's behavior: - // If no schema or authority is provided with non-local inpath, - // we will use hadoop configuration "fs.defaultFS". - val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") - val defaultFS = if (defaultFSConf == null) { -new URI("") - } else { -new URI(defaultFSConf) - } - - val scheme = if (uri.getScheme() != null) { -uri.getScheme() - } else { -defaultFS.getScheme() - } - val authority = if (uri.getAuthority() != null) { -uri.getAuthority() - } else { -defaultFS.getAuthority() - } - - if (scheme == null) { -throw new AnalysisException( - s"LOAD DATA: URI scheme is required for non-local input paths: '$path'") - } - - // Follow Hive's behavior: - // If LOCAL is not specified, and the path is relative, - // then the path is interpreted relative to "/user/" - val uriPath = uri.getPath() - val absolutePath = if (uriPath != null && uriPath.startsWith("/")) { -uriPath - } else { -s"/user/${System.getProperty("user.name")}/$uriPath" - } - new URI(scheme, authority, absolutePath, uri.getQuery(), uri.getFragment()) -} -val hadoopConf = sparkSession.sessionState.newHadoopConf() -val srcPath = new Path(hdfsUri) -val fs = srcPath.getFileSystem(hadoopConf) -if (!fs.exists(srcPath)) { - throw new AnalysisException(s"LOAD DATA input path does not exist: $path") -} -hdfsUri +val loadPath = new Path(path) +// Follow Hive's behavior: +// If no schema or authority is provided with non-local inpath, +// we will use hadoop configuration "fs.defaultFS". +val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") +val defaultFS = if (defaultFSConf == null) new URI("") else new URI(defaultFSConf) +// Follow Hive's behavior: +// If LOCAL is not specified, and the path is relative, +// then the path is interpreted relative to "/user/" +val uriPath = new Path(s"/user/${System.getProperty
[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r202360294 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -303,94 +303,44 @@ case class LoadDataCommand( s"partitioned, but a partition spec was provided.") } } - -val loadPath = +val loadPath = { if (isLocal) { -val uri = Utils.resolveURI(path) -val file = new File(uri.getPath) -val exists = if (file.getAbsolutePath.contains("*")) { - val fileSystem = FileSystems.getDefault - val dir = file.getParentFile.getAbsolutePath - if (dir.contains("*")) { -throw new AnalysisException( - s"LOAD DATA input path allows only filename wildcard: $path") - } - - // Note that special characters such as "*" on Windows are not allowed as a path. - // Calling `WindowsFileSystem.getPath` throws an exception if there are in the path. - val dirPath = fileSystem.getPath(dir) - val pathPattern = new File(dirPath.toAbsolutePath.toString, file.getName).toURI.getPath - val safePathPattern = if (Utils.isWindows) { -// On Windows, the pattern should not start with slashes for absolute file paths. -pathPattern.stripPrefix("/") - } else { -pathPattern - } - val files = new File(dir).listFiles() - if (files == null) { -false - } else { -val matcher = fileSystem.getPathMatcher("glob:" + safePathPattern) -files.exists(f => matcher.matches(fileSystem.getPath(f.getAbsolutePath))) - } -} else { - new File(file.getAbsolutePath).exists() -} -if (!exists) { - throw new AnalysisException(s"LOAD DATA input path does not exist: $path") -} -uri +val localFS = FileContext.getLocalFSFileContext() +localFS.makeQualified(new Path(path)) } else { -val uri = new URI(path) -val hdfsUri = if (uri.getScheme() != null && uri.getAuthority() != null) { - uri -} else { - // Follow Hive's behavior: - // If no schema or authority is provided with non-local inpath, - // we will use hadoop configuration "fs.defaultFS". - val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") - val defaultFS = if (defaultFSConf == null) { -new URI("") - } else { -new URI(defaultFSConf) - } - - val scheme = if (uri.getScheme() != null) { -uri.getScheme() - } else { -defaultFS.getScheme() - } - val authority = if (uri.getAuthority() != null) { -uri.getAuthority() - } else { -defaultFS.getAuthority() - } - - if (scheme == null) { -throw new AnalysisException( - s"LOAD DATA: URI scheme is required for non-local input paths: '$path'") - } - - // Follow Hive's behavior: - // If LOCAL is not specified, and the path is relative, - // then the path is interpreted relative to "/user/" - val uriPath = uri.getPath() - val absolutePath = if (uriPath != null && uriPath.startsWith("/")) { -uriPath - } else { -s"/user/${System.getProperty("user.name")}/$uriPath" - } - new URI(scheme, authority, absolutePath, uri.getQuery(), uri.getFragment()) -} -val hadoopConf = sparkSession.sessionState.newHadoopConf() -val srcPath = new Path(hdfsUri) -val fs = srcPath.getFileSystem(hadoopConf) -if (!fs.exists(srcPath)) { - throw new AnalysisException(s"LOAD DATA input path does not exist: $path") -} -hdfsUri +val loadPath = new Path(path) +// Follow Hive's behavior: +// If no schema or authority is provided with non-local inpath, +// we will use hadoop configuration "fs.defaultFS". +val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") +val defaultFS = if (defaultFSConf == null) new URI("") else new URI(defaultFSConf) +// Follow Hive's behavior: +// If LOCAL is not specified, and the path is relative, +// then the path is interpreted relative to "/user/" +val uriPath = new Path(s"/user/${System.getProperty
[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20611#discussion_r202359876 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1912,11 +1912,59 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { sql("LOAD DATA LOCAL INPATH '/non-exist-folder/*part*' INTO TABLE load_t") }.getMessage assert(m.contains("LOAD DATA input path does not exist")) + } +} + } -val m2 = intercept[AnalysisException] { - sql(s"LOAD DATA LOCAL INPATH '$path*/*part*' INTO TABLE load_t") + test("Support wildcard character in folderlevel for LOAD DATA LOCAL INPATH") { +withTempDir { dir => + val path = dir.toURI.toString.stripSuffix("/") + val dirPath = dir.getAbsoluteFile + for (i <- 1 to 3) { +Files.write(s"$i", new File(dirPath, s"part-r-$i"), StandardCharsets.UTF_8) + } + withTable("load_t_folder_wildcard") { +sql("CREATE TABLE load_t (a STRING)") +sql(s"LOAD DATA LOCAL INPATH '${ + path.substring(0, path.length - 1) +.concat("*") +}/' INTO TABLE load_t") +checkAnswer(sql("SELECT * FROM load_t"), Seq(Row("1"), Row("2"), Row("3"))) +val m = intercept[AnalysisException] { + sql(s"LOAD DATA LOCAL INPATH '${ +path.substring(0, path.length - 1).concat("_invalid_dir") concat ("*") + }/' INTO TABLE load_t") }.getMessage -assert(m2.contains("LOAD DATA input path allows only filename wildcard")) +assert(m.contains("LOAD DATA input path does not exist")) + } +} + } --- End diff -- Still need a space here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92970/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21690: [SPARK-24713]AppMatser of spark streaming kafka O...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21690 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21102 **[Test build #92970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92970/testReport)** for PR 21102 at commit [`fce9eb0`](https://github.com/apache/spark/commit/fce9eb09bf0666711dbb5584c56b2534e495dffc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21690 LGTM, merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/926/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/926/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/926/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92972/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 **[Test build #92972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92972/testReport)** for PR 21652 at commit [`1bc3d07`](https://github.com/apache/spark/commit/1bc3d070f4a92d16c4a2a5bf2876a50d1a311ba3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/925/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21652 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/925/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/925/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21652 **[Test build #92972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92972/testReport)** for PR 21652 at commit [`1bc3d07`](https://github.com/apache/spark/commit/1bc3d070f4a92d16c4a2a5bf2876a50d1a311ba3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21652 jenkins test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix type coercions and nullabilities ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21704 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix type coercions and nullabilities ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21704 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92967/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix type coercions and nullabilities ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21704 **[Test build #92967 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92967/testReport)** for PR 21704 at commit [`5115961`](https://github.com/apache/spark/commit/5115961fb0503cabbdbdead7c29c1521ab4f76cb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21184: [WIP][SPARK-24051][SQL] Replace Aliases with the ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21184#discussion_r202334300 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -284,6 +288,80 @@ class Analyzer( } } + /** + * Replaces [[Alias]] with the same exprId but different references with [[Alias]] having + * different exprIds. This is a rare situation which can cause incorrect results. + */ + object DeduplicateAliases extends Rule[LogicalPlan] { --- End diff -- Yes, that is also true. But in many places in the codebase we just compare attributes using `semanticEquals` or in some other cases, even `equals`. Well, if we admit that different attributes can have the same `exprId`, all these places should be checked in order to be sure that the same problem cannot happen there too. Moreover (this is more a nit), the `semanticEquals` or `sameRef` method itself would be wrong according to its semantic, as it may return `true` even when two attributes don't have the same reference. This is the reason why I opted for this solution, which seems to me cleaner as it solves the root cause of the problem. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20611 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92968/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20611 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20611 **[Test build #92968 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92968/testReport)** for PR 20611 at commit [`bee161f`](https://github.com/apache/spark/commit/bee161f07ae4f76a0f090f64ac84c39f752652ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21652 @liyinan926 I had the same issue locally but I dont think it is becaus eof th ePR. "[ERROR] Failed to execute goal on project spark-kubernetes-integration-tests_2.11: Could not resolve dependencies for project spark-kubernetes-integration-tests:spark-kubernetes-integration-tests_2.11:jar:2.4.0-SNAPSHOT: Failed to collect dependencies at org.apache.spark:spark-core_2.11:jar:2.4.0-SNAPSHOT: Failed to read artifact descriptor for org.apache.spark:spark-core_2.11:jar:2.4.0-SNAPSHOT: Failure to find org.apache.spark:spark-parent_2.11:pom:2.4.0-20180712.095204-165 in https://repository.apache.org/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of apache.snapshots has elapsed or updates are forced" I fixed that locally (I have my own script to run them) by doing ./build/mvn install... since the integrationt ests suite is run in stanadlone mode it expects parent artifact to be available. @srowen thoughts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21751: [SPARK-24208][SQL][FOLLOWUP] Move test cases to proper l...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21751 sure, I'll keep them in mind. Sorry for the mistakes, I'll be more careful. Thanks @gatorsmile. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21760 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and r...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21760 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92971/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and r...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21760 **[Test build #92971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92971/testReport)** for PR 21760 at commit [`26b88ca`](https://github.com/apache/spark/commit/26b88ca201a70283528f289cdd2e1e216fce6e7a). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class AvroSuite extends QueryTest with SharedSQLContext with SQLTestUtils ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202327362 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -37,41 +39,64 @@ import org.apache.spark.unsafe.types.UTF8String /** * Some utility function to convert Spark data source filters to Parquet filters. */ -private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: Boolean) { +private[parquet] class ParquetFilters( +pushDownDate: Boolean, +pushDownDecimal: Boolean, +pushDownStartWith: Boolean) { private case class ParquetSchemaType( originalType: OriginalType, primitiveTypeName: PrimitiveTypeName, - decimalMetadata: DecimalMetadata) - - private val ParquetBooleanType = ParquetSchemaType(null, BOOLEAN, null) - private val ParquetByteType = ParquetSchemaType(INT_8, INT32, null) - private val ParquetShortType = ParquetSchemaType(INT_16, INT32, null) - private val ParquetIntegerType = ParquetSchemaType(null, INT32, null) - private val ParquetLongType = ParquetSchemaType(null, INT64, null) - private val ParquetFloatType = ParquetSchemaType(null, FLOAT, null) - private val ParquetDoubleType = ParquetSchemaType(null, DOUBLE, null) - private val ParquetStringType = ParquetSchemaType(UTF8, BINARY, null) - private val ParquetBinaryType = ParquetSchemaType(null, BINARY, null) - private val ParquetDateType = ParquetSchemaType(DATE, INT32, null) + length: Int, + decimalMeta: DecimalMetadata) + + private val ParquetBooleanType = ParquetSchemaType(null, BOOLEAN, 0, null) + private val ParquetByteType = ParquetSchemaType(INT_8, INT32, 0, null) + private val ParquetShortType = ParquetSchemaType(INT_16, INT32, 0, null) + private val ParquetIntegerType = ParquetSchemaType(null, INT32, 0, null) + private val ParquetLongType = ParquetSchemaType(null, INT64, 0, null) + private val ParquetFloatType = ParquetSchemaType(null, FLOAT, 0, null) + private val ParquetDoubleType = ParquetSchemaType(null, DOUBLE, 0, null) + private val ParquetStringType = ParquetSchemaType(UTF8, BINARY, 0, null) + private val ParquetBinaryType = ParquetSchemaType(null, BINARY, 0, null) + private val ParquetDateType = ParquetSchemaType(DATE, INT32, 0, null) private def dateToDays(date: Date): SQLDate = { DateTimeUtils.fromJavaDate(date) } + private def decimalToInt32(decimal: JBigDecimal): Integer = decimal.unscaledValue().intValue() + + private def decimalToInt64(decimal: JBigDecimal): JLong = decimal.unscaledValue().longValue() + + private def decimalToByteArray(decimal: JBigDecimal, numBytes: Int): Binary = { +val decimalBuffer = new Array[Byte](numBytes) +val bytes = decimal.unscaledValue().toByteArray + +val fixedLengthBytes = if (bytes.length == numBytes) { + bytes +} else { + val signByte = if (bytes.head < 0) -1: Byte else 0: Byte + java.util.Arrays.fill(decimalBuffer, 0, numBytes - bytes.length, signByte) + System.arraycopy(bytes, 0, decimalBuffer, numBytes - bytes.length, bytes.length) + decimalBuffer +} +Binary.fromReusedByteArray(fixedLengthBytes, 0, numBytes) + } + private val makeEq: PartialFunction[ParquetSchemaType, (String, Any) => FilterPredicate] = { --- End diff -- `ParquetBooleanType`, `ParquetLongType`, `ParquetFloatType` and `ParquetDoubleType` do not need `Option`. Here is a example: ```scala scala> import org.apache.parquet.io.api.Binary import org.apache.parquet.io.api.Binary scala> Option(null).map(s => Binary.fromString(s.asInstanceOf[String])).orNull res7: org.apache.parquet.io.api.Binary = null scala> Binary.fromString(null.asInstanceOf[String]) java.lang.NullPointerException at org.apache.parquet.io.api.Binary$FromStringBinary.encodeUTF8(Binary.java:224) at org.apache.parquet.io.api.Binary$FromStringBinary.(Binary.java:214) at org.apache.parquet.io.api.Binary.fromString(Binary.java:554) ... 52 elided scala> null.asInstanceOf[java.lang.Long] res9: Long = null scala> null.asInstanceOf[java.lang.Boolean] res10: Boolean = null scala> Option(null).map(_.asInstanceOf[Number].intValue.asInstanceOf[Integer]).orNull res11: Integer = null scala> null.asInstanceOf[Number].intValue.asInstanceOf[Integer] java.lang.NullPointerException ... 52 elided ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org