spark git commit: [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class
Repository: spark Updated Branches: refs/heads/master 736a7911c -> 48b459ddd [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class There's an unused `classTag` val in the AtomicType base class which is causing unnecessary slowness in deserialization because it needs to grab ScalaReflectionLock and create a new runtime reflection mirror. Removing this unused code gives a small but measurable performance boost in SQL task deserialization. Author: Josh RosenCloses #14869 from JoshRosen/remove-unused-classtag. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/48b459dd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/48b459dd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/48b459dd Branch: refs/heads/master Commit: 48b459ddd58affd5519856cb6e204398b7739a2a Parents: 736a791 Author: Josh Rosen Authored: Tue Aug 30 09:58:00 2016 +0800 Committer: Reynold Xin Committed: Tue Aug 30 09:58:00 2016 +0800 -- .../org/apache/spark/sql/types/AbstractDataType.scala | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/48b459dd/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala index 65eae86..1981fd8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala @@ -17,13 +17,10 @@ package org.apache.spark.sql.types -import scala.reflect.ClassTag -import scala.reflect.runtime.universe.{runtimeMirror, TypeTag} +import scala.reflect.runtime.universe.TypeTag import org.apache.spark.annotation.DeveloperApi -import org.apache.spark.sql.catalyst.ScalaReflectionLock import org.apache.spark.sql.catalyst.expressions.Expression -import org.apache.spark.util.Utils /** * A non-concrete data type, reserved for internal uses. @@ -130,11 +127,6 @@ protected[sql] abstract class AtomicType extends DataType { private[sql] type InternalType private[sql] val tag: TypeTag[InternalType] private[sql] val ordering: Ordering[InternalType] - - @transient private[sql] val classTag = ScalaReflectionLock.synchronized { -val mirror = runtimeMirror(Utils.getSparkClassLoader) -ClassTag[InternalType](mirror.runtimeClass(tag.tpe)) - } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class
Repository: spark Updated Branches: refs/heads/branch-2.0 976a43dbf -> 59032570f [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class There's an unused `classTag` val in the AtomicType base class which is causing unnecessary slowness in deserialization because it needs to grab ScalaReflectionLock and create a new runtime reflection mirror. Removing this unused code gives a small but measurable performance boost in SQL task deserialization. Author: Josh RosenCloses #14869 from JoshRosen/remove-unused-classtag. (cherry picked from commit 48b459ddd58affd5519856cb6e204398b7739a2a) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/59032570 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/59032570 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/59032570 Branch: refs/heads/branch-2.0 Commit: 59032570fbd0985f758c27bdec5482221cc64af9 Parents: 976a43d Author: Josh Rosen Authored: Tue Aug 30 09:58:00 2016 +0800 Committer: Reynold Xin Committed: Tue Aug 30 09:58:11 2016 +0800 -- .../org/apache/spark/sql/types/AbstractDataType.scala | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/59032570/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala index 65eae86..1981fd8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala @@ -17,13 +17,10 @@ package org.apache.spark.sql.types -import scala.reflect.ClassTag -import scala.reflect.runtime.universe.{runtimeMirror, TypeTag} +import scala.reflect.runtime.universe.TypeTag import org.apache.spark.annotation.DeveloperApi -import org.apache.spark.sql.catalyst.ScalaReflectionLock import org.apache.spark.sql.catalyst.expressions.Expression -import org.apache.spark.util.Utils /** * A non-concrete data type, reserved for internal uses. @@ -130,11 +127,6 @@ protected[sql] abstract class AtomicType extends DataType { private[sql] type InternalType private[sql] val tag: TypeTag[InternalType] private[sql] val ordering: Ordering[InternalType] - - @transient private[sql] val classTag = ScalaReflectionLock.synchronized { -val mirror = runtimeMirror(Utils.getSparkClassLoader) -ClassTag[InternalType](mirror.runtimeClass(tag.tpe)) - } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-16581][SPARKR] Make JVM backend calling functions public
Repository: spark Updated Branches: refs/heads/branch-2.0 3d283f6c9 -> 976a43dbf [SPARK-16581][SPARKR] Make JVM backend calling functions public ## What changes were proposed in this pull request? This change exposes a public API in SparkR to create objects, call methods on the Spark driver JVM ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Unit tests, CRAN checks Author: Shivaram VenkataramanCloses #14775 from shivaram/sparkr-java-api. (cherry picked from commit 736a7911cb0335cdb2b2f6c87f9e3c32047b5bbb) Signed-off-by: Shivaram Venkataraman Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/976a43db Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/976a43db Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/976a43db Branch: refs/heads/branch-2.0 Commit: 976a43dbf9d97b30d81576799470532b81b882f0 Parents: 3d283f6 Author: Shivaram Venkataraman Authored: Mon Aug 29 12:55:32 2016 -0700 Committer: Shivaram Venkataraman Committed: Mon Aug 29 12:55:42 2016 -0700 -- R/pkg/DESCRIPTION| 5 +- R/pkg/NAMESPACE | 4 + R/pkg/R/jvm.R| 117 ++ R/pkg/inst/tests/testthat/test_jvm_api.R | 43 ++ 4 files changed, 167 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/976a43db/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index e5afed2..5a83883 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -2,7 +2,7 @@ Package: SparkR Type: Package Title: R Frontend for Apache Spark Version: 2.0.0 -Date: 2016-07-07 +Date: 2016-08-27 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), email = "shiva...@cs.berkeley.edu"), person("Xiangrui", "Meng", role = "aut", @@ -11,7 +11,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), email = "felixche...@apache.org"), person(family = "The Apache Software Foundation", role = c("aut", "cph"))) URL: http://www.apache.org/ http://spark.apache.org/ -BugReports: https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12315420=12325400=4 +BugReports: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports Depends: R (>= 3.0), methods @@ -39,6 +39,7 @@ Collate: 'deserialize.R' 'functions.R' 'install.R' +'jvm.R' 'mllib.R' 'serialize.R' 'sparkR.R' http://git-wip-us.apache.org/repos/asf/spark/blob/976a43db/R/pkg/NAMESPACE -- diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE index cdb8834..666e76a 100644 --- a/R/pkg/NAMESPACE +++ b/R/pkg/NAMESPACE @@ -357,4 +357,8 @@ S3method(structField, jobj) S3method(structType, jobj) S3method(structType, structField) +export("sparkR.newJObject") +export("sparkR.callJMethod") +export("sparkR.callJStatic") + export("install.spark") http://git-wip-us.apache.org/repos/asf/spark/blob/976a43db/R/pkg/R/jvm.R -- diff --git a/R/pkg/R/jvm.R b/R/pkg/R/jvm.R new file mode 100644 index 000..bb5c775 --- /dev/null +++ b/R/pkg/R/jvm.R @@ -0,0 +1,117 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Methods to directly access the JVM running the SparkR backend. + +#' Call Java Methods +#' +#' Call a Java method in the JVM running the Spark driver. The return +#' values are automatically converted to R objects for simple objects. Other +#' values are returned as "jobj" which are references to objects on JVM. +#' +#' @details +#' This is a low level function to access the JVM directly and should only be used +#'
spark git commit: [SPARK-16581][SPARKR] Make JVM backend calling functions public
Repository: spark Updated Branches: refs/heads/master 48caec251 -> 736a7911c [SPARK-16581][SPARKR] Make JVM backend calling functions public ## What changes were proposed in this pull request? This change exposes a public API in SparkR to create objects, call methods on the Spark driver JVM ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Unit tests, CRAN checks Author: Shivaram VenkataramanCloses #14775 from shivaram/sparkr-java-api. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/736a7911 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/736a7911 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/736a7911 Branch: refs/heads/master Commit: 736a7911cb0335cdb2b2f6c87f9e3c32047b5bbb Parents: 48caec2 Author: Shivaram Venkataraman Authored: Mon Aug 29 12:55:32 2016 -0700 Committer: Shivaram Venkataraman Committed: Mon Aug 29 12:55:32 2016 -0700 -- R/pkg/DESCRIPTION| 5 +- R/pkg/NAMESPACE | 4 + R/pkg/R/jvm.R| 117 ++ R/pkg/inst/tests/testthat/test_jvm_api.R | 43 ++ 4 files changed, 167 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/736a7911/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index e5afed2..5a83883 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -2,7 +2,7 @@ Package: SparkR Type: Package Title: R Frontend for Apache Spark Version: 2.0.0 -Date: 2016-07-07 +Date: 2016-08-27 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), email = "shiva...@cs.berkeley.edu"), person("Xiangrui", "Meng", role = "aut", @@ -11,7 +11,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), email = "felixche...@apache.org"), person(family = "The Apache Software Foundation", role = c("aut", "cph"))) URL: http://www.apache.org/ http://spark.apache.org/ -BugReports: https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12315420=12325400=4 +BugReports: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports Depends: R (>= 3.0), methods @@ -39,6 +39,7 @@ Collate: 'deserialize.R' 'functions.R' 'install.R' +'jvm.R' 'mllib.R' 'serialize.R' 'sparkR.R' http://git-wip-us.apache.org/repos/asf/spark/blob/736a7911/R/pkg/NAMESPACE -- diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE index ad587a6..5e625b2 100644 --- a/R/pkg/NAMESPACE +++ b/R/pkg/NAMESPACE @@ -364,4 +364,8 @@ S3method(structField, jobj) S3method(structType, jobj) S3method(structType, structField) +export("sparkR.newJObject") +export("sparkR.callJMethod") +export("sparkR.callJStatic") + export("install.spark") http://git-wip-us.apache.org/repos/asf/spark/blob/736a7911/R/pkg/R/jvm.R -- diff --git a/R/pkg/R/jvm.R b/R/pkg/R/jvm.R new file mode 100644 index 000..bb5c775 --- /dev/null +++ b/R/pkg/R/jvm.R @@ -0,0 +1,117 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Methods to directly access the JVM running the SparkR backend. + +#' Call Java Methods +#' +#' Call a Java method in the JVM running the Spark driver. The return +#' values are automatically converted to R objects for simple objects. Other +#' values are returned as "jobj" which are references to objects on JVM. +#' +#' @details +#' This is a low level function to access the JVM directly and should only be used +#' for advanced use cases. The arguments and return values that are primitive R +#' types (like integer, numeric, character, lists) are
spark git commit: [SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore
Repository: spark Updated Branches: refs/heads/branch-2.0 eec03718d -> 3d283f6c9 [SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore This PR split the the single `createPartitions()` call into smaller batches, which could prevent Hive metastore from OOM (caused by millions of partitions). It will also try to gather all the fast stats (number of files and total size of all files) in parallel to avoid the bottle neck of listing the files in metastore sequential, which is controlled by spark.sql.gatherFastStats (enabled by default). Tested locally with 1 partitions and 100 files with embedded metastore, without gathering fast stats in parallel, adding partitions took 153 seconds, after enable that, gathering the fast stats took about 34 seconds, adding these partitions took 25 seconds (most of the time spent in object store), 59 seconds in total, 2.5X faster (with larger cluster, gathering will much faster). Author: Davies LiuCloses #14607 from davies/repair_batch. (cherry picked from commit 48caec2516ef35bfa1a3de2dc0a80d0dc819e6bd) Signed-off-by: Davies Liu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3d283f6c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3d283f6c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3d283f6c Branch: refs/heads/branch-2.0 Commit: 3d283f6c9d9daef53fa4e90b0ead2a94710a37a7 Parents: eec0371 Author: Davies Liu Authored: Mon Aug 29 11:23:53 2016 -0700 Committer: Davies Liu Committed: Mon Aug 29 11:30:04 2016 -0700 -- .../spark/sql/catalyst/catalog/interface.scala | 4 +- .../spark/sql/execution/command/ddl.scala | 156 +++ .../org/apache/spark/sql/internal/SQLConf.scala | 10 ++ .../spark/sql/execution/command/DDLSuite.scala | 13 +- .../spark/sql/hive/client/HiveClientImpl.scala | 4 +- .../apache/spark/sql/hive/client/HiveShim.scala | 8 +- .../spark/sql/hive/execution/HiveDDLSuite.scala | 38 + 7 files changed, 200 insertions(+), 33 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3d283f6c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala index c083cf6..e7430b0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala @@ -103,10 +103,12 @@ case class CatalogColumn( * * @param spec partition spec values indexed by column name * @param storage storage format of the partition + * @param parameters some parameters for the partition, for example, stats. */ case class CatalogTablePartition( spec: CatalogTypes.TablePartitionSpec, -storage: CatalogStorageFormat) +storage: CatalogStorageFormat, +parameters: Map[String, String] = Map.empty) /** http://git-wip-us.apache.org/repos/asf/spark/blob/3d283f6c/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala index aac70e9..50ffcd4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala @@ -17,12 +17,13 @@ package org.apache.spark.sql.execution.command -import scala.collection.GenSeq +import scala.collection.{GenMap, GenSeq} import scala.collection.parallel.ForkJoinTaskSupport import scala.concurrent.forkjoin.ForkJoinPool import scala.util.control.NonFatal -import org.apache.hadoop.fs.{FileStatus, FileSystem, Path, PathFilter} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs._ import org.apache.hadoop.mapred.{FileInputFormat, JobConf} import org.apache.spark.sql.{AnalysisException, Row, SparkSession} @@ -34,6 +35,7 @@ import org.apache.spark.sql.execution.command.CreateDataSourceTableUtils._ import org.apache.spark.sql.execution.datasources.BucketSpec import org.apache.spark.sql.execution.datasources.PartitioningUtils import org.apache.spark.sql.types._ +import org.apache.spark.util.SerializableConfiguration // Note: The definition of these commands are based on the ones described in // https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL @@ -429,6 +431,9 @@ case
spark git commit: [SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore
Repository: spark Updated Branches: refs/heads/master 6a0fda2c0 -> 48caec251 [SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore ## What changes were proposed in this pull request? This PR split the the single `createPartitions()` call into smaller batches, which could prevent Hive metastore from OOM (caused by millions of partitions). It will also try to gather all the fast stats (number of files and total size of all files) in parallel to avoid the bottle neck of listing the files in metastore sequential, which is controlled by spark.sql.gatherFastStats (enabled by default). ## How was this patch tested? Tested locally with 1 partitions and 100 files with embedded metastore, without gathering fast stats in parallel, adding partitions took 153 seconds, after enable that, gathering the fast stats took about 34 seconds, adding these partitions took 25 seconds (most of the time spent in object store), 59 seconds in total, 2.5X faster (with larger cluster, gathering will much faster). Author: Davies LiuCloses #14607 from davies/repair_batch. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/48caec25 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/48caec25 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/48caec25 Branch: refs/heads/master Commit: 48caec2516ef35bfa1a3de2dc0a80d0dc819e6bd Parents: 6a0fda2 Author: Davies Liu Authored: Mon Aug 29 11:23:53 2016 -0700 Committer: Davies Liu Committed: Mon Aug 29 11:23:53 2016 -0700 -- .../spark/sql/catalyst/catalog/interface.scala | 4 +- .../spark/sql/execution/command/ddl.scala | 156 +++ .../org/apache/spark/sql/internal/SQLConf.scala | 10 ++ .../spark/sql/execution/command/DDLSuite.scala | 13 +- .../spark/sql/hive/client/HiveClientImpl.scala | 4 +- .../apache/spark/sql/hive/client/HiveShim.scala | 8 +- .../spark/sql/hive/execution/HiveDDLSuite.scala | 38 + 7 files changed, 200 insertions(+), 33 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/48caec25/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala index 83e01f9..8408d76 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala @@ -81,10 +81,12 @@ object CatalogStorageFormat { * * @param spec partition spec values indexed by column name * @param storage storage format of the partition + * @param parameters some parameters for the partition, for example, stats. */ case class CatalogTablePartition( spec: CatalogTypes.TablePartitionSpec, -storage: CatalogStorageFormat) +storage: CatalogStorageFormat, +parameters: Map[String, String] = Map.empty) /** http://git-wip-us.apache.org/repos/asf/spark/blob/48caec25/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala index 3817f91..53fb684 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala @@ -17,12 +17,13 @@ package org.apache.spark.sql.execution.command -import scala.collection.GenSeq +import scala.collection.{GenMap, GenSeq} import scala.collection.parallel.ForkJoinTaskSupport import scala.concurrent.forkjoin.ForkJoinPool import scala.util.control.NonFatal -import org.apache.hadoop.fs.{FileStatus, FileSystem, Path, PathFilter} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs._ import org.apache.hadoop.mapred.{FileInputFormat, JobConf} import org.apache.spark.sql.{AnalysisException, Row, SparkSession} @@ -32,6 +33,7 @@ import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference} import org.apache.spark.sql.execution.datasources.PartitioningUtils import org.apache.spark.sql.types._ +import org.apache.spark.util.SerializableConfiguration // Note: The definition of these commands are based on the ones described in // https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL @@ -422,6 +424,9 @@ case class
spark git commit: [SPARKR][MINOR] Fix LDA doc
Repository: spark Updated Branches: refs/heads/master 08913ce00 -> 6a0fda2c0 [SPARKR][MINOR] Fix LDA doc ## What changes were proposed in this pull request? This PR tries to fix the name of the `SparkDataFrame` used in the example. Also, it gives a reference url of an example data file so that users can play with. ## How was this patch tested? Manual test. Author: Junyang QianCloses #14853 from junyangq/SPARKR-FixLDADoc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a0fda2c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a0fda2c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a0fda2c Branch: refs/heads/master Commit: 6a0fda2c0590b455e8713da79cd5f2413e5d0f28 Parents: 08913ce Author: Junyang Qian Authored: Mon Aug 29 10:23:10 2016 -0700 Committer: Xiangrui Meng Committed: Mon Aug 29 10:23:10 2016 -0700 -- R/pkg/R/mllib.R | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6a0fda2c/R/pkg/R/mllib.R -- diff --git a/R/pkg/R/mllib.R b/R/pkg/R/mllib.R index 6808aae..64d19fa 100644 --- a/R/pkg/R/mllib.R +++ b/R/pkg/R/mllib.R @@ -994,18 +994,22 @@ setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula #' @export #' @examples #' \dontrun{ -#' text <- read.df("path/to/data", source = "libsvm") +#' # nolint start +#' # An example "path/to/file" can be +#' # paste0(Sys.getenv("SPARK_HOME"), "/data/mllib/sample_lda_libsvm_data.txt") +#' # nolint end +#' text <- read.df("path/to/file", source = "libsvm") #' model <- spark.lda(data = text, optimizer = "em") #' #' # get a summary of the model #' summary(model) #' #' # compute posterior probabilities -#' posterior <- spark.posterior(model, df) +#' posterior <- spark.posterior(model, text) #' showDF(posterior) #' #' # compute perplexity -#' perplexity <- spark.perplexity(model, df) +#' perplexity <- spark.perplexity(model, text) #' #' # save and load the model #' path <- "path/to/model" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark-website git commit: Add Abraham Zhan to 2.0.0 contribs; wrap and dedupe the list.
Repository: spark-website Updated Branches: refs/heads/asf-site 9700f2f4a -> d37a3afce Add Abraham Zhan to 2.0.0 contribs; wrap and dedupe the list. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/d37a3afc Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/d37a3afc Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/d37a3afc Branch: refs/heads/asf-site Commit: d37a3afce5f85ea4591c07921a75edc969abf954 Parents: 9700f2f Author: Sean OwenAuthored: Mon Aug 29 10:43:14 2016 +0100 Committer: Sean Owen Committed: Mon Aug 29 10:43:14 2016 +0100 -- .../_posts/2016-07-26-spark-release-2-0-0.md| 47 +++- 1 file changed, 46 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark-website/blob/d37a3afc/releases/_posts/2016-07-26-spark-release-2-0-0.md -- diff --git a/releases/_posts/2016-07-26-spark-release-2-0-0.md b/releases/_posts/2016-07-26-spark-release-2-0-0.md index f50fd68..3be1e3a 100644 --- a/releases/_posts/2016-07-26-spark-release-2-0-0.md +++ b/releases/_posts/2016-07-26-spark-release-2-0-0.md @@ -168,4 +168,49 @@ The following features have been deprecated in Spark 2.0, and might be removed i ### Credits -Last but not least, this release would not have been possible without the following contributors: Aaron Tokhy, Abhinav Gupta, Abou Haydar Elias, Adam Budde, Adam Roberts, Ahmed Kamal, Ahmed Mahran, Alex Bozarth, Alexander Ulanov, Allen, Anatoliy Plastinin, Andrew, Andrew Ash, Andrew Or, Andrew Ray, Anthony Truchet, Anton Okolnychyi, Antonio Murgia, Antonio Murgia, Arun Allamsetty, Azeem Jiva, Ben McCann, BenFradet, Bertrand Bossy, Bill Chambers, Bjorn Jonsson, Bo Meng, Bo Meng, Brandon Bradley, Brian O'Neill, BrianLondon, Bryan Cutler, Burak Köse, Burak Yavuz, Carson Wang, Cazen, Cedar Pan, Charles Allen, Cheng Hao, Cheng Lian, Claes Redestad, CodingCat, Cody Koeninger, DB Tsai, DLucky, Daniel Jalova, Daoyuan Wang, Darek Blasiak, David Tolpin, Davies Liu, Devaraj K, Dhruve Ashar, Dilip Biswal, Dmitry Erastov, Dominik JastrzÄbski, Dongjoon Hyun, Earthson Lu, Egor Pakhomov, Ehsan M.Kermani, Ergin Seyfe, Eric Liang, Ernest, Felix Cheung, Felix Cheung, Feynman Liang, Fokko Driesprong, Fonso Li, Franklyn D'souza, François Garillot, Fred Reiss, Gabriele Nizzoli, Gary King, GayathriMurali, Gio Borje, Grace, Greg Michalopoulos, Grzegorz Chilkiewicz, Guillaume Poulin, Gábor Lipták, Hemant Bhanawat, Herman van Hovell, Herman van Hövell tot Westerflier, Hiroshi Inoue, Holden Karau, Hossein, Huaxin Gao, Hyukjin Kwon, Imran Rashid, Imran Younus, Ioana Delaney, Iulian Dragos, Jacek Laskowski, Jacek Lewandowski, Jakob Odersky, James Lohse, James Thomas, Jason Lee, Jason Moore, Jason White, Jean Lyn, Jean-Baptiste Onofré, Jeff L, Jeff Zhang, Jeremy Derr, JeremyNixon, Jia Li, Jo Voordeckers, Joan, Jon Maurer, Joseph K. Bradley, Josh Howes, Josh Rosen, Joshi, Juarez Bochi, Julien Baley, Junyang, Junyang Qian, Jurriaan Pruis, Kai Jiang, KaiXinXiaoLei, Kay Ousterhout, Kazuaki Ishizaki, Kevin Yu, Koert Kuipers, Kousuke Saruta, Koyo Yoshida, Krishna Kalyan, Krishna Kalyan, Lewuathe, Liang-Chi Hsieh, Lianhui Wang, Lin Zhao, Lining Sun, Liu Xiang, Liwei Lin, Liwei Lin, Liye, L uc Bourlier, Luciano Resende, Lukasz, Maciej Brynski, Malte, Maciej Szymkiewicz, Marcelo Vanzin, Marcin Tustin, Mark Grover, Mark Yang, Martin Menestret, Masayoshi TSUZUKI, Matei Zaharia, Mathieu Longtin, Matthew Wise, Miao Wang, Michael Allman, Michael Armbrust, Michael Gummelt, Michel Lemay, Mike Dusenberry, Mortada Mehyar, Nakul Jindal, Nam Pham, Narine Kokhlikyan, NarineK, Neelesh Srinivas Salian, Nezih Yigitbasi, Nicholas Chammas, Nicholas Tietz, Nick Pentreath, Nilanjan Raychaudhuri, Nirman Narang, Nishkam Ravi, Nong, Nong Li, Oleg Danilov, Oliver Pierson, Oscar D. Lara Yejas, Parth Brahmbhatt, Patrick Wendell, Pete Robbins, Peter Ableda, Pierre Borckmans, Prajwal Tuladhar, Prashant Sharma, Pravin Gadakh, QiangCai, Qifan Pu, Raafat Akkad, Rahul Tanwani, Rajesh Balamohan, Rekha Joshi, Reynold Xin, Richard W. Eggert II, Robert Dodier, Robert Kruszewski, Robin East, Ruifeng Zheng, Ryan Blue, Sachin Aggarwal, Saisai Shao, Sameer Agarwal, Sandeep Singh, Sanket, Sasaki Toru, Sean Ow en, Sean Zhong, Sebastien Rainville, Sebastián RamÃrez, Sela, Sergiusz Urbaniak, Seth Hendrickson, Shally Sangal, Sheamus K. Parkes, Shi Jinkui, Shivaram Venkataraman, Shixiong Zhu, Shuai Lin, Shubhanshu Mishra, Sin Wu, Sital Kedia, Stavros Kontopoulos, Stephan Kessler, Steve Loughran, Subhobrata Dey, Subroto Sanyal, Sumedh Mungee, Sun Rui, Sunitha Kambhampati, Suresh Thalamati, Takahashi Hiroshi, Takeshi
spark git commit: fixed a typo
Repository: spark Updated Branches: refs/heads/master 1a48c0047 -> 08913ce00 fixed a typo idempotant -> idempotent Author: Seigneurin, Alexis (CONT)Closes #14833 from aseigneurin/fix-typo. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/08913ce0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/08913ce0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/08913ce0 Branch: refs/heads/master Commit: 08913ce0002a80a989489a31b7353f5ec4a5849f Parents: 1a48c00 Author: Seigneurin, Alexis (CONT) Authored: Mon Aug 29 13:12:10 2016 +0100 Committer: Sean Owen Committed: Mon Aug 29 13:12:10 2016 +0100 -- docs/structured-streaming-programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/08913ce0/docs/structured-streaming-programming-guide.md -- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 090b14f..8a88e06 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -406,7 +406,7 @@ Furthermore, this model naturally handles data that has arrived later than expec ## Fault Tolerance Semantics Delivering end-to-end exactly-once semantics was one of key goals behind the design of Structured Streaming. To achieve that, we have designed the Structured Streaming sources, the sinks and the execution engine to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing. Every streaming source is assumed to have offsets (similar to Kafka offsets, or Kinesis sequence numbers) -to track the read position in the stream. The engine uses checkpointing and write ahead logs to record the offset range of the data being processed in each trigger. The streaming sinks are designed to be idempotent for handling reprocessing. Together, using replayable sources and idempotant sinks, Structured Streaming can ensure **end-to-end exactly-once semantics** under any failure. +to track the read position in the stream. The engine uses checkpointing and write ahead logs to record the offset range of the data being processed in each trigger. The streaming sinks are designed to be idempotent for handling reprocessing. Together, using replayable sources and idempotent sinks, Structured Streaming can ensure **end-to-end exactly-once semantics** under any failure. # API using Datasets and DataFrames Since Spark 2.0, DataFrames and Datasets can represent static, bounded data, as well as streaming, unbounded data. Similar to static Datasets/DataFrames, you can use the common entry point `SparkSession` ([Scala](api/scala/index.html#org.apache.spark.sql.SparkSession)/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [BUILD] Closes some stale PRs.
Repository: spark Updated Branches: refs/heads/master 095862a3c -> 1a48c0047 [BUILD] Closes some stale PRs. ## What changes were proposed in this pull request? Closes #10995 Closes #13658 Closes #14505 Closes #14536 Closes #12753 Closes #14449 Closes #12694 Closes #12695 Closes #14810 Closes #10572 ## How was this patch tested? N/A Author: Sean OwenCloses #14849 from srowen/CloseStalePRs. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1a48c004 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1a48c004 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1a48c004 Branch: refs/heads/master Commit: 1a48c0047bbdb6328c3ac5ec617a5e35e244d66d Parents: 095862a Author: Sean Owen Authored: Mon Aug 29 10:46:26 2016 +0100 Committer: Sean Owen Committed: Mon Aug 29 10:46:26 2016 +0100 -- -- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org