spark git commit: [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class

2016-08-29 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 736a7911c -> 48b459ddd


[SPARK-17301][SQL] Remove unused classTag field from AtomicType base class

There's an unused `classTag` val in the AtomicType base class which is causing 
unnecessary slowness in deserialization because it needs to grab 
ScalaReflectionLock and create a new runtime reflection mirror. Removing this 
unused code gives a small but measurable performance boost in SQL task 
deserialization.

Author: Josh Rosen 

Closes #14869 from JoshRosen/remove-unused-classtag.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/48b459dd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/48b459dd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/48b459dd

Branch: refs/heads/master
Commit: 48b459ddd58affd5519856cb6e204398b7739a2a
Parents: 736a791
Author: Josh Rosen 
Authored: Tue Aug 30 09:58:00 2016 +0800
Committer: Reynold Xin 
Committed: Tue Aug 30 09:58:00 2016 +0800

--
 .../org/apache/spark/sql/types/AbstractDataType.scala | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/48b459dd/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
index 65eae86..1981fd8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
@@ -17,13 +17,10 @@
 
 package org.apache.spark.sql.types
 
-import scala.reflect.ClassTag
-import scala.reflect.runtime.universe.{runtimeMirror, TypeTag}
+import scala.reflect.runtime.universe.TypeTag
 
 import org.apache.spark.annotation.DeveloperApi
-import org.apache.spark.sql.catalyst.ScalaReflectionLock
 import org.apache.spark.sql.catalyst.expressions.Expression
-import org.apache.spark.util.Utils
 
 /**
  * A non-concrete data type, reserved for internal uses.
@@ -130,11 +127,6 @@ protected[sql] abstract class AtomicType extends DataType {
   private[sql] type InternalType
   private[sql] val tag: TypeTag[InternalType]
   private[sql] val ordering: Ordering[InternalType]
-
-  @transient private[sql] val classTag = ScalaReflectionLock.synchronized {
-val mirror = runtimeMirror(Utils.getSparkClassLoader)
-ClassTag[InternalType](mirror.runtimeClass(tag.tpe))
-  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-17301][SQL] Remove unused classTag field from AtomicType base class

2016-08-29 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 976a43dbf -> 59032570f


[SPARK-17301][SQL] Remove unused classTag field from AtomicType base class

There's an unused `classTag` val in the AtomicType base class which is causing 
unnecessary slowness in deserialization because it needs to grab 
ScalaReflectionLock and create a new runtime reflection mirror. Removing this 
unused code gives a small but measurable performance boost in SQL task 
deserialization.

Author: Josh Rosen 

Closes #14869 from JoshRosen/remove-unused-classtag.

(cherry picked from commit 48b459ddd58affd5519856cb6e204398b7739a2a)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/59032570
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/59032570
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/59032570

Branch: refs/heads/branch-2.0
Commit: 59032570fbd0985f758c27bdec5482221cc64af9
Parents: 976a43d
Author: Josh Rosen 
Authored: Tue Aug 30 09:58:00 2016 +0800
Committer: Reynold Xin 
Committed: Tue Aug 30 09:58:11 2016 +0800

--
 .../org/apache/spark/sql/types/AbstractDataType.scala | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/59032570/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
index 65eae86..1981fd8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
@@ -17,13 +17,10 @@
 
 package org.apache.spark.sql.types
 
-import scala.reflect.ClassTag
-import scala.reflect.runtime.universe.{runtimeMirror, TypeTag}
+import scala.reflect.runtime.universe.TypeTag
 
 import org.apache.spark.annotation.DeveloperApi
-import org.apache.spark.sql.catalyst.ScalaReflectionLock
 import org.apache.spark.sql.catalyst.expressions.Expression
-import org.apache.spark.util.Utils
 
 /**
  * A non-concrete data type, reserved for internal uses.
@@ -130,11 +127,6 @@ protected[sql] abstract class AtomicType extends DataType {
   private[sql] type InternalType
   private[sql] val tag: TypeTag[InternalType]
   private[sql] val ordering: Ordering[InternalType]
-
-  @transient private[sql] val classTag = ScalaReflectionLock.synchronized {
-val mirror = runtimeMirror(Utils.getSparkClassLoader)
-ClassTag[InternalType](mirror.runtimeClass(tag.tpe))
-  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-16581][SPARKR] Make JVM backend calling functions public

2016-08-29 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 3d283f6c9 -> 976a43dbf


[SPARK-16581][SPARKR] Make JVM backend calling functions public

## What changes were proposed in this pull request?

This change exposes a public API in SparkR to create objects, call methods on 
the Spark driver JVM

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)

Unit tests, CRAN checks

Author: Shivaram Venkataraman 

Closes #14775 from shivaram/sparkr-java-api.

(cherry picked from commit 736a7911cb0335cdb2b2f6c87f9e3c32047b5bbb)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/976a43db
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/976a43db
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/976a43db

Branch: refs/heads/branch-2.0
Commit: 976a43dbf9d97b30d81576799470532b81b882f0
Parents: 3d283f6
Author: Shivaram Venkataraman 
Authored: Mon Aug 29 12:55:32 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Mon Aug 29 12:55:42 2016 -0700

--
 R/pkg/DESCRIPTION|   5 +-
 R/pkg/NAMESPACE  |   4 +
 R/pkg/R/jvm.R| 117 ++
 R/pkg/inst/tests/testthat/test_jvm_api.R |  43 ++
 4 files changed, 167 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/976a43db/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index e5afed2..5a83883 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -2,7 +2,7 @@ Package: SparkR
 Type: Package
 Title: R Frontend for Apache Spark
 Version: 2.0.0
-Date: 2016-07-07
+Date: 2016-08-27
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
 email = "shiva...@cs.berkeley.edu"),
  person("Xiangrui", "Meng", role = "aut",
@@ -11,7 +11,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = 
c("aut", "cre"),
 email = "felixche...@apache.org"),
  person(family = "The Apache Software Foundation", role = c("aut", 
"cph")))
 URL: http://www.apache.org/ http://spark.apache.org/
-BugReports: 
https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12315420=12325400=4
+BugReports: 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports
 Depends:
 R (>= 3.0),
 methods
@@ -39,6 +39,7 @@ Collate:
 'deserialize.R'
 'functions.R'
 'install.R'
+'jvm.R'
 'mllib.R'
 'serialize.R'
 'sparkR.R'

http://git-wip-us.apache.org/repos/asf/spark/blob/976a43db/R/pkg/NAMESPACE
--
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index cdb8834..666e76a 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -357,4 +357,8 @@ S3method(structField, jobj)
 S3method(structType, jobj)
 S3method(structType, structField)
 
+export("sparkR.newJObject")
+export("sparkR.callJMethod")
+export("sparkR.callJStatic")
+
 export("install.spark")

http://git-wip-us.apache.org/repos/asf/spark/blob/976a43db/R/pkg/R/jvm.R
--
diff --git a/R/pkg/R/jvm.R b/R/pkg/R/jvm.R
new file mode 100644
index 000..bb5c775
--- /dev/null
+++ b/R/pkg/R/jvm.R
@@ -0,0 +1,117 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Methods to directly access the JVM running the SparkR backend.
+
+#' Call Java Methods
+#'
+#' Call a Java method in the JVM running the Spark driver. The return
+#' values are automatically converted to R objects for simple objects. Other
+#' values are returned as "jobj" which are references to objects on JVM.
+#'
+#' @details
+#' This is a low level function to access the JVM directly and should only be 
used
+#' 

spark git commit: [SPARK-16581][SPARKR] Make JVM backend calling functions public

2016-08-29 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/master 48caec251 -> 736a7911c


[SPARK-16581][SPARKR] Make JVM backend calling functions public

## What changes were proposed in this pull request?

This change exposes a public API in SparkR to create objects, call methods on 
the Spark driver JVM

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)

Unit tests, CRAN checks

Author: Shivaram Venkataraman 

Closes #14775 from shivaram/sparkr-java-api.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/736a7911
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/736a7911
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/736a7911

Branch: refs/heads/master
Commit: 736a7911cb0335cdb2b2f6c87f9e3c32047b5bbb
Parents: 48caec2
Author: Shivaram Venkataraman 
Authored: Mon Aug 29 12:55:32 2016 -0700
Committer: Shivaram Venkataraman 
Committed: Mon Aug 29 12:55:32 2016 -0700

--
 R/pkg/DESCRIPTION|   5 +-
 R/pkg/NAMESPACE  |   4 +
 R/pkg/R/jvm.R| 117 ++
 R/pkg/inst/tests/testthat/test_jvm_api.R |  43 ++
 4 files changed, 167 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/736a7911/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index e5afed2..5a83883 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -2,7 +2,7 @@ Package: SparkR
 Type: Package
 Title: R Frontend for Apache Spark
 Version: 2.0.0
-Date: 2016-07-07
+Date: 2016-08-27
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
 email = "shiva...@cs.berkeley.edu"),
  person("Xiangrui", "Meng", role = "aut",
@@ -11,7 +11,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = 
c("aut", "cre"),
 email = "felixche...@apache.org"),
  person(family = "The Apache Software Foundation", role = c("aut", 
"cph")))
 URL: http://www.apache.org/ http://spark.apache.org/
-BugReports: 
https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12315420=12325400=4
+BugReports: 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports
 Depends:
 R (>= 3.0),
 methods
@@ -39,6 +39,7 @@ Collate:
 'deserialize.R'
 'functions.R'
 'install.R'
+'jvm.R'
 'mllib.R'
 'serialize.R'
 'sparkR.R'

http://git-wip-us.apache.org/repos/asf/spark/blob/736a7911/R/pkg/NAMESPACE
--
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index ad587a6..5e625b2 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -364,4 +364,8 @@ S3method(structField, jobj)
 S3method(structType, jobj)
 S3method(structType, structField)
 
+export("sparkR.newJObject")
+export("sparkR.callJMethod")
+export("sparkR.callJStatic")
+
 export("install.spark")

http://git-wip-us.apache.org/repos/asf/spark/blob/736a7911/R/pkg/R/jvm.R
--
diff --git a/R/pkg/R/jvm.R b/R/pkg/R/jvm.R
new file mode 100644
index 000..bb5c775
--- /dev/null
+++ b/R/pkg/R/jvm.R
@@ -0,0 +1,117 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Methods to directly access the JVM running the SparkR backend.
+
+#' Call Java Methods
+#'
+#' Call a Java method in the JVM running the Spark driver. The return
+#' values are automatically converted to R objects for simple objects. Other
+#' values are returned as "jobj" which are references to objects on JVM.
+#'
+#' @details
+#' This is a low level function to access the JVM directly and should only be 
used
+#' for advanced use cases. The arguments and return values that are primitive R
+#' types (like integer, numeric, character, lists) are 

spark git commit: [SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore

2016-08-29 Thread davies
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 eec03718d -> 3d283f6c9


[SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore

This PR split the the single `createPartitions()` call into smaller batches, 
which could prevent Hive metastore from OOM (caused by millions of partitions).

It will also try to gather all the fast stats (number of files and total size 
of all files) in parallel to avoid the bottle neck of listing the files in 
metastore sequential, which is controlled by spark.sql.gatherFastStats (enabled 
by default).

Tested locally with 1 partitions and 100 files with embedded metastore, 
without gathering fast stats in parallel, adding partitions took 153 seconds, 
after enable that, gathering the fast stats took about 34 seconds, adding these 
partitions took 25 seconds (most of the time spent in object store), 59 seconds 
in total, 2.5X faster (with larger cluster, gathering will much faster).

Author: Davies Liu 

Closes #14607 from davies/repair_batch.

(cherry picked from commit 48caec2516ef35bfa1a3de2dc0a80d0dc819e6bd)
Signed-off-by: Davies Liu 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3d283f6c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3d283f6c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3d283f6c

Branch: refs/heads/branch-2.0
Commit: 3d283f6c9d9daef53fa4e90b0ead2a94710a37a7
Parents: eec0371
Author: Davies Liu 
Authored: Mon Aug 29 11:23:53 2016 -0700
Committer: Davies Liu 
Committed: Mon Aug 29 11:30:04 2016 -0700

--
 .../spark/sql/catalyst/catalog/interface.scala  |   4 +-
 .../spark/sql/execution/command/ddl.scala   | 156 +++
 .../org/apache/spark/sql/internal/SQLConf.scala |  10 ++
 .../spark/sql/execution/command/DDLSuite.scala  |  13 +-
 .../spark/sql/hive/client/HiveClientImpl.scala  |   4 +-
 .../apache/spark/sql/hive/client/HiveShim.scala |   8 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala |  38 +
 7 files changed, 200 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3d283f6c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
index c083cf6..e7430b0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
@@ -103,10 +103,12 @@ case class CatalogColumn(
  *
  * @param spec partition spec values indexed by column name
  * @param storage storage format of the partition
+ * @param parameters some parameters for the partition, for example, stats.
  */
 case class CatalogTablePartition(
 spec: CatalogTypes.TablePartitionSpec,
-storage: CatalogStorageFormat)
+storage: CatalogStorageFormat,
+parameters: Map[String, String] = Map.empty)
 
 
 /**

http://git-wip-us.apache.org/repos/asf/spark/blob/3d283f6c/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
index aac70e9..50ffcd4 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
@@ -17,12 +17,13 @@
 
 package org.apache.spark.sql.execution.command
 
-import scala.collection.GenSeq
+import scala.collection.{GenMap, GenSeq}
 import scala.collection.parallel.ForkJoinTaskSupport
 import scala.concurrent.forkjoin.ForkJoinPool
 import scala.util.control.NonFatal
 
-import org.apache.hadoop.fs.{FileStatus, FileSystem, Path, PathFilter}
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs._
 import org.apache.hadoop.mapred.{FileInputFormat, JobConf}
 
 import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
@@ -34,6 +35,7 @@ import 
org.apache.spark.sql.execution.command.CreateDataSourceTableUtils._
 import org.apache.spark.sql.execution.datasources.BucketSpec
 import org.apache.spark.sql.execution.datasources.PartitioningUtils
 import org.apache.spark.sql.types._
+import org.apache.spark.util.SerializableConfiguration
 
 // Note: The definition of these commands are based on the ones described in
 // https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
@@ -429,6 +431,9 @@ case 

spark git commit: [SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore

2016-08-29 Thread davies
Repository: spark
Updated Branches:
  refs/heads/master 6a0fda2c0 -> 48caec251


[SPARK-17063] [SQL] Improve performance of MSCK REPAIR TABLE with Hive metastore

## What changes were proposed in this pull request?

This PR split the the single `createPartitions()` call into smaller batches, 
which could prevent Hive metastore from OOM (caused by millions of partitions).

It will also try to gather all the fast stats (number of files and total size 
of all files) in parallel to avoid the bottle neck of listing the files in 
metastore sequential, which is controlled by spark.sql.gatherFastStats (enabled 
by default).

## How was this patch tested?

Tested locally with 1 partitions and 100 files with embedded metastore, 
without gathering fast stats in parallel, adding partitions took 153 seconds, 
after enable that, gathering the fast stats took about 34 seconds, adding these 
partitions took 25 seconds (most of the time spent in object store), 59 seconds 
in total, 2.5X faster (with larger cluster, gathering will much faster).

Author: Davies Liu 

Closes #14607 from davies/repair_batch.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/48caec25
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/48caec25
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/48caec25

Branch: refs/heads/master
Commit: 48caec2516ef35bfa1a3de2dc0a80d0dc819e6bd
Parents: 6a0fda2
Author: Davies Liu 
Authored: Mon Aug 29 11:23:53 2016 -0700
Committer: Davies Liu 
Committed: Mon Aug 29 11:23:53 2016 -0700

--
 .../spark/sql/catalyst/catalog/interface.scala  |   4 +-
 .../spark/sql/execution/command/ddl.scala   | 156 +++
 .../org/apache/spark/sql/internal/SQLConf.scala |  10 ++
 .../spark/sql/execution/command/DDLSuite.scala  |  13 +-
 .../spark/sql/hive/client/HiveClientImpl.scala  |   4 +-
 .../apache/spark/sql/hive/client/HiveShim.scala |   8 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala |  38 +
 7 files changed, 200 insertions(+), 33 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/48caec25/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
index 83e01f9..8408d76 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
@@ -81,10 +81,12 @@ object CatalogStorageFormat {
  *
  * @param spec partition spec values indexed by column name
  * @param storage storage format of the partition
+ * @param parameters some parameters for the partition, for example, stats.
  */
 case class CatalogTablePartition(
 spec: CatalogTypes.TablePartitionSpec,
-storage: CatalogStorageFormat)
+storage: CatalogStorageFormat,
+parameters: Map[String, String] = Map.empty)
 
 
 /**

http://git-wip-us.apache.org/repos/asf/spark/blob/48caec25/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
index 3817f91..53fb684 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
@@ -17,12 +17,13 @@
 
 package org.apache.spark.sql.execution.command
 
-import scala.collection.GenSeq
+import scala.collection.{GenMap, GenSeq}
 import scala.collection.parallel.ForkJoinTaskSupport
 import scala.concurrent.forkjoin.ForkJoinPool
 import scala.util.control.NonFatal
 
-import org.apache.hadoop.fs.{FileStatus, FileSystem, Path, PathFilter}
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs._
 import org.apache.hadoop.mapred.{FileInputFormat, JobConf}
 
 import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
@@ -32,6 +33,7 @@ import 
org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
 import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
 import org.apache.spark.sql.execution.datasources.PartitioningUtils
 import org.apache.spark.sql.types._
+import org.apache.spark.util.SerializableConfiguration
 
 // Note: The definition of these commands are based on the ones described in
 // https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
@@ -422,6 +424,9 @@ case class 

spark git commit: [SPARKR][MINOR] Fix LDA doc

2016-08-29 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 08913ce00 -> 6a0fda2c0


[SPARKR][MINOR] Fix LDA doc

## What changes were proposed in this pull request?

This PR tries to fix the name of the `SparkDataFrame` used in the example. 
Also, it gives a reference url of an example data file so that users can play 
with.

## How was this patch tested?

Manual test.

Author: Junyang Qian 

Closes #14853 from junyangq/SPARKR-FixLDADoc.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a0fda2c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a0fda2c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a0fda2c

Branch: refs/heads/master
Commit: 6a0fda2c0590b455e8713da79cd5f2413e5d0f28
Parents: 08913ce
Author: Junyang Qian 
Authored: Mon Aug 29 10:23:10 2016 -0700
Committer: Xiangrui Meng 
Committed: Mon Aug 29 10:23:10 2016 -0700

--
 R/pkg/R/mllib.R | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6a0fda2c/R/pkg/R/mllib.R
--
diff --git a/R/pkg/R/mllib.R b/R/pkg/R/mllib.R
index 6808aae..64d19fa 100644
--- a/R/pkg/R/mllib.R
+++ b/R/pkg/R/mllib.R
@@ -994,18 +994,22 @@ setMethod("spark.survreg", signature(data = 
"SparkDataFrame", formula = "formula
 #' @export
 #' @examples
 #' \dontrun{
-#' text <- read.df("path/to/data", source = "libsvm")
+#' # nolint start
+#' # An example "path/to/file" can be
+#' # paste0(Sys.getenv("SPARK_HOME"), "/data/mllib/sample_lda_libsvm_data.txt")
+#' # nolint end
+#' text <- read.df("path/to/file", source = "libsvm")
 #' model <- spark.lda(data = text, optimizer = "em")
 #'
 #' # get a summary of the model
 #' summary(model)
 #'
 #' # compute posterior probabilities
-#' posterior <- spark.posterior(model, df)
+#' posterior <- spark.posterior(model, text)
 #' showDF(posterior)
 #'
 #' # compute perplexity
-#' perplexity <- spark.perplexity(model, df)
+#' perplexity <- spark.perplexity(model, text)
 #'
 #' # save and load the model
 #' path <- "path/to/model"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark-website git commit: Add Abraham Zhan to 2.0.0 contribs; wrap and dedupe the list.

2016-08-29 Thread srowen
Repository: spark-website
Updated Branches:
  refs/heads/asf-site 9700f2f4a -> d37a3afce


Add Abraham Zhan to 2.0.0 contribs; wrap and dedupe the list.


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/d37a3afc
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/d37a3afc
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/d37a3afc

Branch: refs/heads/asf-site
Commit: d37a3afce5f85ea4591c07921a75edc969abf954
Parents: 9700f2f
Author: Sean Owen 
Authored: Mon Aug 29 10:43:14 2016 +0100
Committer: Sean Owen 
Committed: Mon Aug 29 10:43:14 2016 +0100

--
 .../_posts/2016-07-26-spark-release-2-0-0.md| 47 +++-
 1 file changed, 46 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark-website/blob/d37a3afc/releases/_posts/2016-07-26-spark-release-2-0-0.md
--
diff --git a/releases/_posts/2016-07-26-spark-release-2-0-0.md 
b/releases/_posts/2016-07-26-spark-release-2-0-0.md
index f50fd68..3be1e3a 100644
--- a/releases/_posts/2016-07-26-spark-release-2-0-0.md
+++ b/releases/_posts/2016-07-26-spark-release-2-0-0.md
@@ -168,4 +168,49 @@ The following features have been deprecated in Spark 2.0, 
and might be removed i
 
 
 ### Credits
-Last but not least, this release would not have been possible without the 
following contributors: Aaron Tokhy, Abhinav Gupta, Abou Haydar Elias, Adam 
Budde, Adam Roberts, Ahmed Kamal, Ahmed Mahran, Alex Bozarth, Alexander Ulanov, 
Allen, Anatoliy Plastinin, Andrew, Andrew Ash, Andrew Or, Andrew Ray, Anthony 
Truchet, Anton Okolnychyi, Antonio Murgia, Antonio Murgia, Arun Allamsetty, 
Azeem Jiva, Ben McCann, BenFradet, Bertrand Bossy, Bill Chambers, Bjorn 
Jonsson, Bo Meng, Bo Meng, Brandon Bradley, Brian O'Neill, BrianLondon, Bryan 
Cutler, Burak Köse, Burak Yavuz, Carson Wang, Cazen, Cedar Pan, Charles Allen, 
Cheng Hao, Cheng Lian, Claes Redestad, CodingCat, Cody Koeninger, DB Tsai, 
DLucky, Daniel Jalova, Daoyuan Wang, Darek Blasiak, David Tolpin, Davies Liu, 
Devaraj K, Dhruve Ashar, Dilip Biswal, Dmitry Erastov, Dominik Jastrzębski, 
Dongjoon Hyun, Earthson Lu, Egor Pakhomov, Ehsan M.Kermani, Ergin Seyfe, Eric 
Liang, Ernest, Felix Cheung, Felix Cheung, Feynman Liang, Fokko Driesprong,
  Fonso Li, Franklyn D'souza, François Garillot, Fred Reiss, Gabriele Nizzoli, 
Gary King, GayathriMurali, Gio Borje, Grace, Greg Michalopoulos, Grzegorz 
Chilkiewicz, Guillaume Poulin, Gábor Lipták, Hemant Bhanawat, Herman van 
Hovell, Herman van Hövell tot Westerflier, Hiroshi Inoue, Holden Karau, 
Hossein, Huaxin Gao, Hyukjin Kwon, Imran Rashid, Imran Younus, Ioana Delaney, 
Iulian Dragos, Jacek Laskowski, Jacek Lewandowski, Jakob Odersky, James Lohse, 
James Thomas, Jason Lee, Jason Moore, Jason White, Jean Lyn, Jean-Baptiste 
Onofré, Jeff L, Jeff Zhang, Jeremy Derr, JeremyNixon, Jia Li, Jo Voordeckers, 
Joan, Jon Maurer, Joseph K. Bradley, Josh Howes, Josh Rosen, Joshi, Juarez 
Bochi, Julien Baley, Junyang, Junyang Qian, Jurriaan Pruis, Kai Jiang, 
KaiXinXiaoLei, Kay Ousterhout, Kazuaki Ishizaki, Kevin Yu, Koert Kuipers, 
Kousuke Saruta, Koyo Yoshida, Krishna Kalyan, Krishna Kalyan, Lewuathe, 
Liang-Chi Hsieh, Lianhui Wang, Lin Zhao, Lining Sun, Liu Xiang, Liwei Lin, 
Liwei Lin, Liye, L
 uc Bourlier, Luciano Resende, Lukasz, Maciej Brynski, Malte, Maciej 
Szymkiewicz, Marcelo Vanzin, Marcin Tustin, Mark Grover, Mark Yang, Martin 
Menestret, Masayoshi TSUZUKI, Matei Zaharia, Mathieu Longtin, Matthew Wise, 
Miao Wang, Michael Allman, Michael Armbrust, Michael Gummelt, Michel Lemay, 
Mike Dusenberry, Mortada Mehyar, Nakul Jindal, Nam Pham, Narine Kokhlikyan, 
NarineK, Neelesh Srinivas Salian, Nezih Yigitbasi, Nicholas Chammas, Nicholas 
Tietz, Nick Pentreath, Nilanjan Raychaudhuri, Nirman Narang, Nishkam Ravi, 
Nong, Nong Li, Oleg Danilov, Oliver Pierson, Oscar D. Lara Yejas, Parth 
Brahmbhatt, Patrick Wendell, Pete Robbins, Peter Ableda, Pierre Borckmans, 
Prajwal Tuladhar, Prashant Sharma, Pravin Gadakh, QiangCai, Qifan Pu, Raafat 
Akkad, Rahul Tanwani, Rajesh Balamohan, Rekha Joshi, Reynold Xin, Richard W. 
Eggert II, Robert Dodier, Robert Kruszewski, Robin East, Ruifeng Zheng, Ryan 
Blue, Sachin Aggarwal, Saisai Shao, Sameer Agarwal, Sandeep Singh, Sanket, 
Sasaki Toru, Sean Ow
 en, Sean Zhong, Sebastien Rainville, Sebastián Ramírez, Sela, Sergiusz 
Urbaniak, Seth Hendrickson, Shally Sangal, Sheamus K. Parkes, Shi Jinkui, 
Shivaram Venkataraman, Shixiong Zhu, Shuai Lin, Shubhanshu Mishra, Sin Wu, 
Sital Kedia, Stavros Kontopoulos, Stephan Kessler, Steve Loughran, Subhobrata 
Dey, Subroto Sanyal, Sumedh Mungee, Sun Rui, Sunitha Kambhampati, Suresh 
Thalamati, Takahashi Hiroshi, Takeshi 

spark git commit: fixed a typo

2016-08-29 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 1a48c0047 -> 08913ce00


fixed a typo

idempotant -> idempotent

Author: Seigneurin, Alexis (CONT) 

Closes #14833 from aseigneurin/fix-typo.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/08913ce0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/08913ce0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/08913ce0

Branch: refs/heads/master
Commit: 08913ce0002a80a989489a31b7353f5ec4a5849f
Parents: 1a48c00
Author: Seigneurin, Alexis (CONT) 
Authored: Mon Aug 29 13:12:10 2016 +0100
Committer: Sean Owen 
Committed: Mon Aug 29 13:12:10 2016 +0100

--
 docs/structured-streaming-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/08913ce0/docs/structured-streaming-programming-guide.md
--
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 090b14f..8a88e06 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -406,7 +406,7 @@ Furthermore, this model naturally handles data that has 
arrived later than expec
 
 ## Fault Tolerance Semantics
 Delivering end-to-end exactly-once semantics was one of key goals behind the 
design of Structured Streaming. To achieve that, we have designed the 
Structured Streaming sources, the sinks and the execution engine to reliably 
track the exact progress of the processing so that it can handle any kind of 
failure by restarting and/or reprocessing. Every streaming source is assumed to 
have offsets (similar to Kafka offsets, or Kinesis sequence numbers)
-to track the read position in the stream. The engine uses checkpointing and 
write ahead logs to record the offset range of the data being processed in each 
trigger. The streaming sinks are designed to be idempotent for handling 
reprocessing. Together, using replayable sources and idempotant sinks, 
Structured Streaming can ensure **end-to-end exactly-once semantics** under any 
failure.
+to track the read position in the stream. The engine uses checkpointing and 
write ahead logs to record the offset range of the data being processed in each 
trigger. The streaming sinks are designed to be idempotent for handling 
reprocessing. Together, using replayable sources and idempotent sinks, 
Structured Streaming can ensure **end-to-end exactly-once semantics** under any 
failure.
 
 # API using Datasets and DataFrames
 Since Spark 2.0, DataFrames and Datasets can represent static, bounded data, 
as well as streaming, unbounded data. Similar to static Datasets/DataFrames, 
you can use the common entry point `SparkSession` 
([Scala](api/scala/index.html#org.apache.spark.sql.SparkSession)/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [BUILD] Closes some stale PRs.

2016-08-29 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 095862a3c -> 1a48c0047


[BUILD] Closes some stale PRs.

## What changes were proposed in this pull request?

Closes #10995
Closes #13658
Closes #14505
Closes #14536
Closes #12753
Closes #14449
Closes #12694
Closes #12695
Closes #14810
Closes #10572

## How was this patch tested?

N/A

Author: Sean Owen 

Closes #14849 from srowen/CloseStalePRs.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1a48c004
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1a48c004
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1a48c004

Branch: refs/heads/master
Commit: 1a48c0047bbdb6328c3ac5ec617a5e35e244d66d
Parents: 095862a
Author: Sean Owen 
Authored: Mon Aug 29 10:46:26 2016 +0100
Committer: Sean Owen 
Committed: Mon Aug 29 10:46:26 2016 +0100

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org