date:20160806

[GitHub] spark issue #14523: [SPARK-16936] [SQL] Case Sensitivity Support for Refresh...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14523
  
**[Test build #63319 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63319/consoleFull)**
 for PR 14523 at commit 
[`fb0dd0b`](https://github.com/apache/spark/commit/fb0dd0b03640c9456313d8b7a63203607940e683).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14523: [SPARK-16936] [SQL] Case Sensitivity Support for ...

2016-08-06 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/14523

[SPARK-16936] [SQL] Case Sensitivity Support for Refresh Temp Table

### What changes were proposed in this pull request?
Currently, the `refreshTable` API is always case sensitive.

When users use the view name without the exact case match, the API silently 
ignores the call. Users might expect the command has been successfully 
completed. However, when users run the subsequent SQL commands, they might 
still get the exception, like 
```
Job aborted due to stage failure: 
Task 1 in stage 4.0 failed 1 times, most recent failure: Lost task 1.0 in 
stage 4.0 (TID 7, localhost): 
java.io.FileNotFoundException: 
File 
file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-bd4b9ea6-9aec-49c5-8f05-01cff426211e/part-r-0-0c84b915-c032-4f2e-abf5-1d48fdbddf38.snappy.parquet
 does not exist
``` 

This PR is to fix the issue.

### How was this patch tested?
Added a test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark refreshTempTable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14523


commit ade173c2397613b2649d6f61e8fe27c2d659d088
Author: gatorsmile 
Date:   2016-08-07T04:27:41Z

fix

commit f62fb19791f590a8110d6a7be65987b348dc167a
Author: gatorsmile 
Date:   2016-08-07T05:16:21Z

fix2

commit fb0dd0b03640c9456313d8b7a63203607940e683
Author: gatorsmile 
Date:   2016-08-07T05:35:55Z

update the comment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14491: [SPARK-16886] [EXAMPLES][SQL] structured streaming netwo...

2016-08-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14491
  
@ganeshchand Ah, I see, right. `Dataset` is `DataFrame` so you didn't 
change? I skimmed through the list I provided and it seems these are all and it 
seems `structured_network_wordcount.py` is loading `DataFrame` (Java one and 
Scala one are converting it to `Dataset` explicitly).

But I believe we still need to correct 
`structured-streaming-programming-guide.md`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-08-06 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13680#discussion_r73795810
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -101,6 +101,8 @@ object ScalaReflection extends ScalaReflection {
   case t if t <:< definitions.ShortTpe => classOf[Array[Short]]
   case t if t <:< definitions.ByteTpe => classOf[Array[Byte]]
   case t if t <:< definitions.BooleanTpe => classOf[Array[Boolean]]
+  case t if t <:< localTypeOf[CalendarInterval] => 
classOf[Array[CalendarInterval]]
+  case t if t <:< localTypeOf[Decimal] => classOf[Array[Decimal]]
--- End diff --

When I added test cases for `CalendarInterval` and `Decimal`, I got the 
following cast exception without these changes. What is an appropriate way to 
fix this?

```java
org.apache.spark.sql.types.CalendarIntervalType$ cannot be cast to 
org.apache.spark.sql.types.ObjectType
java.lang.ClassCastException: 
org.apache.spark.sql.types.CalendarIntervalType$ cannot be cast to 
org.apache.spark.sql.types.ObjectType
at 
org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$arrayClassFor(ScalaReflection.scala:108)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor(ScalaReflection.scala:82)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:63)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:53)
at 
org.apache.spark.sql.catalyst.util.UnsafeArraySuite$$anonfun$1.apply$mcV$sp(UnsafeArraySuite.scala:129)
at 
org.apache.spark.sql.catalyst.util.UnsafeArraySuite$$anonfun$1.apply(UnsafeArraySuite.scala:48)
at 
org.apache.spark.sql.catalyst.util.UnsafeArraySuite$$anonfun$1.apply(UnsafeArraySuite.scala:48)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
at 
org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
at 
org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
at 
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
at 
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)

[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...

2016-08-06 Thread NarineK

Github user NarineK commented on the issue:

https://github.com/apache/spark/pull/14431
  
It seems that, currently, in SparkR the `GroupedData` which represents 
scala's GroupedData object doesn't have any information about the grouping 
keys. `RelationalGroupedDataset` has a private attribute `groupingExpr` which 
contains information about grouping columns, however it is not accessible from 
R side. I was thinking that maybe we could pass grouping columns to groups.R 
like: groupedData(sgd, cols).
Any thoughts @shivaram  ?
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13680
  
**[Test build #63318 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63318/consoleFull)**
 for PR 13680 at commit 
[`87aca80`](https://github.com/apache/spark/commit/87aca805f0c0270b6b25ade05bc7904fc0b96a06).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14509: [SPARK-16924][SQL] - Support option("inferSchema", true)...

2016-08-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14509
  
Actually, `inferSchema` in CSV would be CSV-datasource specific option in 
order to allow read the headers as column names but to avoid infer the schema.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14509: [SPARK-16924][SQL] - Support option("inferSchema", true)...

2016-08-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14509
  
If my understanding is correct, JSON one does have have `inferSchema` 
option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14522: [Spark-16508][SparkR] Split docs for arrange and orderBy...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14522
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14522: [Spark-16508][SparkR] Split docs for arrange and orderBy...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14522
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63317/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14522: [Spark-16508][SparkR] Split docs for arrange and orderBy...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14522
  
**[Test build #63317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63317/consoleFull)**
 for PR 14522 at commit 
[`0876b75`](https://github.com/apache/spark/commit/0876b7588cee1b2d39ffe8869e6b8320d8e27d1e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install.spark function

2016-08-06 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r73794933
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Apache Spark to a Local Directory
+#' 
+#' \code{install.spark} downloads and installs Spark to a local directory 
if
+#' it is not found. The Spark version we use is the same as the SparkR 
version.
+#' Users can specify a desired Hadoop version, the remote mirror site, and
+#' the directory where the package is installed locally.
+#'
+#' The full url of remote file is inferred from \code{mirrorUrl} and 
\code{hadoopVersion}.
+#' \code{mirrorUrl} specifies the remote path to a Spark folder. It is 
followed by a subfolder
+#' named after the Spark version (that corresponds to SparkR), and then 
the tar filename.
+#' The filename is composed of four parts, i.e. [Spark 
version]-bin-[Hadoop version].tgz.
+#' For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from
+#' \code{http://apache.osuosl.org} has path:
+#' 
\code{http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz}.
+#' For \code{hadoopVersion = "without"}, [Hadoop version] in the filename 
is then
+#' \code{without-hadoop}.
+#'
+#' @param hadoopVersion Version of Hadoop to install. Default is 
\code{"2.7"}. It can take other
+#'  version number in the format of "x.y" where x and 
y are integer.
+#'  If \code{hadoopVersion = "without"}, "Hadoop free" 
build is installed.
+#'  See
+#'  
\href{http://spark.apache.org/docs/latest/hadoop-provided.html}{
+#'  "Hadoop Free" Build} for more information.
+#'  Other patched version names can also be used, e.g. 
\code{"cdh4"}
+#' @param mirrorUrl base URL of the repositories to use. The directory 
layout should follow
+#'  
\href{http://www.apache.org/dyn/closer.lua/spark/}{Apache mirrors}.
+#' @param localDir a local directory where Spark is installed. The 
directory contains
+#' version-specific folders of Spark packages. Default is 
path to
+#' the cache directory:
+#' \itemize{
+#'   \item Mac OS X: \file{~/Library/Caches/spark}
+#'   \item Unix: \env{$XDG_CACHE_HOME} if defined, 
otherwise \file{~/.cache/spark}
+#'   \item Windows: 
\file{\%LOCALAPPDATA\%\\spark\\spark\\Cache}. See
+#' 
\href{https://www.microsoft.com/security/portal/mmpc/shared/variables.aspx}{
+#' Windows Common Folder Variables} about 
\%LOCALAPPDATA\%
+#' }
+#' @param overwrite If \code{TRUE}, download and overwrite the existing 
tar file in localDir
+#'  and force re-install Spark (in case the local 
directory or file is corrupted)
+#' @return \code{install.spark} returns the local directory where Spark is 
found or installed
+#' @rdname install.spark
+#' @name install.spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install.spark()
+#'}
+#' @note install.spark since 2.1.0
+#' @seealso See available Hadoop versions:
+#'  \href{http://spark.apache.org/downloads.html}{Apache Spark}
+install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL,
+  localDir = NULL, overwrite = FALSE) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoopVersion <- tolower(hadoopVersion)
+  hadoopVersionName <- hadoop_version_name(hadoopVersion)
+  packageName <- paste(version, "bin", hadoopVersionName, sep = "-")
+  localDir <- ifelse(is.null(localDir), spark_cache_path(),
+ normalizePath(localDir, mustWork = FALSE))
+
+  if

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install.spark function

2016-08-06 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r73794892
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Apache Spark to a Local Directory
+#' 
+#' \code{install.spark} downloads and installs Spark to a local directory 
if
+#' it is not found. The Spark version we use is the same as the SparkR 
version.
+#' Users can specify a desired Hadoop version, the remote mirror site, and
+#' the directory where the package is installed locally.
+#'
+#' The full url of remote file is inferred from \code{mirrorUrl} and 
\code{hadoopVersion}.
+#' \code{mirrorUrl} specifies the remote path to a Spark folder. It is 
followed by a subfolder
+#' named after the Spark version (that corresponds to SparkR), and then 
the tar filename.
+#' The filename is composed of four parts, i.e. [Spark 
version]-bin-[Hadoop version].tgz.
+#' For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from
+#' \code{http://apache.osuosl.org} has path:
+#' 
\code{http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz}.
+#' For \code{hadoopVersion = "without"}, [Hadoop version] in the filename 
is then
+#' \code{without-hadoop}.
+#'
+#' @param hadoopVersion Version of Hadoop to install. Default is 
\code{"2.7"}. It can take other
+#'  version number in the format of "x.y" where x and 
y are integer.
+#'  If \code{hadoopVersion = "without"}, "Hadoop free" 
build is installed.
+#'  See
+#'  
\href{http://spark.apache.org/docs/latest/hadoop-provided.html}{
+#'  "Hadoop Free" Build} for more information.
+#'  Other patched version names can also be used, e.g. 
\code{"cdh4"}
+#' @param mirrorUrl base URL of the repositories to use. The directory 
layout should follow
+#'  
\href{http://www.apache.org/dyn/closer.lua/spark/}{Apache mirrors}.
+#' @param localDir a local directory where Spark is installed. The 
directory contains
+#' version-specific folders of Spark packages. Default is 
path to
+#' the cache directory:
+#' \itemize{
+#'   \item Mac OS X: \file{~/Library/Caches/spark}
+#'   \item Unix: \env{$XDG_CACHE_HOME} if defined, 
otherwise \file{~/.cache/spark}
+#'   \item Windows: 
\file{\%LOCALAPPDATA\%\\spark\\spark\\Cache}. See
+#' 
\href{https://www.microsoft.com/security/portal/mmpc/shared/variables.aspx}{
+#' Windows Common Folder Variables} about 
\%LOCALAPPDATA\%
+#' }
+#' @param overwrite If \code{TRUE}, download and overwrite the existing 
tar file in localDir
+#'  and force re-install Spark (in case the local 
directory or file is corrupted)
+#' @return \code{install.spark} returns the local directory where Spark is 
found or installed
+#' @rdname install.spark
+#' @name install.spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install.spark()
+#'}
+#' @note install.spark since 2.1.0
+#' @seealso See available Hadoop versions:
+#'  \href{http://spark.apache.org/downloads.html}{Apache Spark}
+install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL,
+  localDir = NULL, overwrite = FALSE) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoopVersion <- tolower(hadoopVersion)
+  hadoopVersionName <- hadoop_version_name(hadoopVersion)
+  packageName <- paste(version, "bin", hadoopVersionName, sep = "-")
+  localDir <- ifelse(is.null(localDir), spark_cache_path(),
+ normalizePath(localDir, mustWork = FALSE))
+
+  if

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install.spark function

2016-08-06 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r73794861
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -365,6 +365,23 @@ sparkR.session <- function(
 }
 overrideEnvs(sparkConfigMap, paramMap)
   }
+  # do not download if it is run in the sparkR shell
+  if (!grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE)) {
+if (!nzchar(master) || is_master_local(master)) {
--- End diff --

to clarify, i mean this check isn't restricted to local only, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install.spark function

2016-08-06 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r73794831
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Apache Spark to a Local Directory
+#' 
+#' \code{install.spark} downloads and installs Spark to a local directory 
if
+#' it is not found. The Spark version we use is the same as the SparkR 
version.
+#' Users can specify a desired Hadoop version, the remote mirror site, and
+#' the directory where the package is installed locally.
+#'
+#' The full url of remote file is inferred from \code{mirrorUrl} and 
\code{hadoopVersion}.
+#' \code{mirrorUrl} specifies the remote path to a Spark folder. It is 
followed by a subfolder
+#' named after the Spark version (that corresponds to SparkR), and then 
the tar filename.
+#' The filename is composed of four parts, i.e. [Spark 
version]-bin-[Hadoop version].tgz.
+#' For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from
+#' \code{http://apache.osuosl.org} has path:
+#' 
\code{http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz}.
+#' For \code{hadoopVersion = "without"}, [Hadoop version] in the filename 
is then
+#' \code{without-hadoop}.
+#'
+#' @param hadoopVersion Version of Hadoop to install. Default is 
\code{"2.7"}. It can take other
+#'  version number in the format of "x.y" where x and 
y are integer.
+#'  If \code{hadoopVersion = "without"}, "Hadoop free" 
build is installed.
+#'  See
+#'  
\href{http://spark.apache.org/docs/latest/hadoop-provided.html}{
+#'  "Hadoop Free" Build} for more information.
+#'  Other patched version names can also be used, e.g. 
\code{"cdh4"}
+#' @param mirrorUrl base URL of the repositories to use. The directory 
layout should follow
+#'  
\href{http://www.apache.org/dyn/closer.lua/spark/}{Apache mirrors}.
+#' @param localDir a local directory where Spark is installed. The 
directory contains
+#' version-specific folders of Spark packages. Default is 
path to
+#' the cache directory:
+#' \itemize{
+#'   \item Mac OS X: \file{~/Library/Caches/spark}
+#'   \item Unix: \env{$XDG_CACHE_HOME} if defined, 
otherwise \file{~/.cache/spark}
+#'   \item Windows: 
\file{\%LOCALAPPDATA\%\\spark\\spark\\Cache}. See
+#' 
\href{https://www.microsoft.com/security/portal/mmpc/shared/variables.aspx}{
+#' Windows Common Folder Variables} about 
\%LOCALAPPDATA\%
+#' }
+#' @param overwrite If \code{TRUE}, download and overwrite the existing 
tar file in localDir
+#'  and force re-install Spark (in case the local 
directory or file is corrupted)
+#' @return \code{install.spark} returns the local directory where Spark is 
found or installed
+#' @rdname install.spark
+#' @name install.spark
--- End diff --

add @aliases 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install.spark function

2016-08-06 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r73794815
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -365,6 +365,23 @@ sparkR.session <- function(
 }
 overrideEnvs(sparkConfigMap, paramMap)
   }
+  # do not download if it is run in the sparkR shell
+  if (!grepl(".*shell\\.R$", Sys.getenv("R_PROFILE_USER"), perl = TRUE)) {
+if (!nzchar(master) || is_master_local(master)) {
--- End diff --

shouldn't we also fail if master != local but SPARK_HOME is not defined or 
spark jar is not in SPARK_HOME?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14510: [SPARK-16925] Master should call schedule() after...

2016-08-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14510


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14258: [Spark-16579][SparkR] add install.spark function

2016-08-06 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/14258
  
I think we should go ahead with this and get some usage from the community 
if we could as early as possible.
LGTM - we could see if we could improve on how to detect if running from 
shell later.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14510: [SPARK-16925] Master should call schedule() after all ex...

2016-08-06 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/14510
  
I'm going to merge this to master, branch-2.0, and branch-1.6. I have a 
followup patch to add configuration options for controlling the "remove 
application that has experienced too many back-to-back executor failures" code 
path, which I'll submit tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14522: [Spark-16508][SparkR] Split docs for arrange and orderBy...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14522
  
**[Test build #63317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63317/consoleFull)**
 for PR 14522 at commit 
[`0876b75`](https://github.com/apache/spark/commit/0876b7588cee1b2d39ffe8869e6b8320d8e27d1e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14522: [Spark-16508][SparkR] Split docs for arrange and ...

2016-08-06 Thread junyangq

GitHub user junyangq opened a pull request:

https://github.com/apache/spark/pull/14522

[Spark-16508][SparkR] Split docs for arrange and orderBy methods

## What changes were proposed in this pull request?

This PR splits arrange and orderBy methods according to their functionality 
(the former for sorting sparkDataFrame and the latter for windowSpec). 

## How was this patch tested?

![screen shot 2016-08-06 at 6 39 19 
pm](https://cloud.githubusercontent.com/assets/15318264/17459969/51eade28-5c05-11e6-8ca1-8d8a8e344bab.png)
![screen shot 2016-08-06 at 6 39 29 
pm](https://cloud.githubusercontent.com/assets/15318264/17459966/51e3c246-5c05-11e6-8d35-3e905ca48676.png)
![screen shot 2016-08-06 at 6 40 02 
pm](https://cloud.githubusercontent.com/assets/15318264/17459967/51e650ec-5c05-11e6-8698-0f037f5199ff.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junyangq/spark SPARK-16508-0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14522.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14522


commit 0876b7588cee1b2d39ffe8869e6b8320d8e27d1e
Author: Junyang Qian 
Date:   2016-08-05T22:41:39Z

Separate docs for arrange and orderBy methods according to their 
functionality




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14521: [SPARK-16935] [SQL] Verification of Function-related Ext...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14521
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14521: [SPARK-16935] [SQL] Verification of Function-related Ext...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14521
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63316/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14521: [SPARK-16935] [SQL] Verification of Function-related Ext...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14521
  
**[Test build #63316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63316/consoleFull)**
 for PR 14521 at commit 
[`58cba4b`](https://github.com/apache/spark/commit/58cba4ba3658d2b0c5bb7bf7b0bfe929bec1aafd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class ShuffleIndexInformation `
  * `public class ShuffleIndexRecord `
  * `case class CreateTable(tableDesc: CatalogTable, mode: SaveMode, query: 
Option[LogicalPlan])`
  * `case class PreprocessDDL(conf: SQLConf) extends Rule[LogicalPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...

2016-08-06 Thread debasish83

Github user debasish83 commented on the issue:

https://github.com/apache/spark/pull/12574
  
I will take a pass at the PR as well..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...

2016-08-06 Thread debasish83

Github user debasish83 commented on the issue:

https://github.com/apache/spark/pull/12574
  
@MLnick I recently visited IBM STC but unfortunately missed you on the 
meeting...we discussed about the ML/MLlib changes for matrix factorization...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...

2016-08-06 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14469#discussion_r73794005
  
--- Diff: python/pyspark/sql/session.py ---
@@ -384,17 +384,15 @@ def _createFromLocal(self, data, schema):
 
 if schema is None or isinstance(schema, (list, tuple)):
 struct = self._inferSchemaFromList(data)
+converter = _create_converter(struct)
--- End diff --

This [`_create_converter` 
method](https://github.com/davies/spark/blob/c0ad8668ba22e51b07ba08b8e19c312783cd1b87/python/pyspark/sql/types.py#L1054)
 is confusingly-named: what it's actually doing here is converting `data` from 
a dict to a tuple in case the schema is a StructType and data is a Python 
dictionary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame from dict...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14469
  
**[Test build #3205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3205/consoleFull)**
 for PR 14469 at commit 
[`c0ad866`](https://github.com/apache/spark/commit/c0ad8668ba22e51b07ba08b8e19c312783cd1b87).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame from dict...

2016-08-06 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/14469
  
This looks pretty good to me overall but I have a couple of clarification 
questions regarding some of the doc changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...

2016-08-06 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14469#discussion_r73793929
  
--- Diff: python/pyspark/sql/types.py ---
@@ -582,6 +582,8 @@ def toInternal(self, obj):
 else:
 if isinstance(obj, dict):
 return tuple(obj.get(n) for n in self.names)
+elif isinstance(obj, Row) and getattr(obj, "__from_dict__", 
False):
--- End diff --

Nice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...

2016-08-06 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14469#discussion_r73793910
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -411,6 +411,21 @@ def test_infer_schema_to_local(self):
 df3 = self.spark.createDataFrame(rdd, df.schema)
 self.assertEqual(10, df3.count())
 
+def test_apply_schema_to_dict_and_rows(self):
--- End diff --

Should we also add a test to exercise the `verifySchema=False` case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...

2016-08-06 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14469#discussion_r73793890
  
--- Diff: python/pyspark/sql/session.py ---
@@ -432,14 +430,9 @@ def createDataFrame(self, data, schema=None, 
samplingRatio=None):
 ``byte`` instead of ``tinyint`` for 
:class:`pyspark.sql.types.ByteType`. We can also use
 ``int`` as a short name for ``IntegerType``.
 :param samplingRatio: the sample ratio of rows used for inferring
+:param verifySchema: verify data types of every row against schema.
 :return: :class:`DataFrame`
 
-.. versionchanged:: 2.0
--- End diff --

@davies, I'm also slightly confused by this documentation change since it 
looks like the new 2.x behavior of wrapping single-field datatypes into 
structtypes and values into tuples is preserved by this patch. Could you 
clarify?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...

2016-08-06 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14469#discussion_r73793841
  
--- Diff: python/pyspark/sql/session.py ---
@@ -432,14 +430,9 @@ def createDataFrame(self, data, schema=None, 
samplingRatio=None):
 ``byte`` instead of ``tinyint`` for 
:class:`pyspark.sql.types.ByteType`. We can also use
 ``int`` as a short name for ``IntegerType``.
 :param samplingRatio: the sample ratio of rows used for inferring
+:param verifySchema: verify data types of every row against schema.
--- End diff --

+1 on also adding a `versionchanged` directive for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame fr...

2016-08-06 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14469#discussion_r73793829
  
--- Diff: python/pyspark/sql/context.py ---
@@ -253,6 +254,8 @@ def createDataFrame(self, data, schema=None, 
samplingRatio=None):
If it's not a :class:`pyspark.sql.types.StructType`, it will be 
wrapped into a
:class:`pyspark.sql.types.StructType` and each record will also 
be wrapped into a tuple.
 
+   Added verifySchema.
--- End diff --

+1. I wasn't aware of this, but it looks like it's possible to have 
multiple `versionchanged` directives in the same docstring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14469: [SPARK-16700] [PYSPARK] [SQL] create DataFrame from dict...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14469
  
**[Test build #3205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3205/consoleFull)**
 for PR 14469 at commit 
[`c0ad866`](https://github.com/apache/spark/commit/c0ad8668ba22e51b07ba08b8e19c312783cd1b87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14521: [SPARK-16935] [SQL] Verification of Function-related Ext...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14521
  
**[Test build #63316 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63316/consoleFull)**
 for PR 14521 at commit 
[`58cba4b`](https://github.com/apache/spark/commit/58cba4ba3658d2b0c5bb7bf7b0bfe929bec1aafd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14521: [SPARK-16935] [SQL] Verification of Function-rela...

2016-08-06 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/14521

[SPARK-16935] [SQL] Verification of Function-related ExternalCatalog APIs

### What changes were proposed in this pull request?
Function-related `HiveExternalCatalog` APIs do not have enough verification 
logics. After the PR, `HiveExternalCatalog` and `InMemoryCatalog` become 
consistent in the error handling. 

For example, below is the exception we got when calling `renameFunction`. 
```
15:13:40.369 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to 
get database db1, returning NoSuchObjectException
15:13:40.377 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to 
get database db2, returning NoSuchObjectException
15:13:40.739 ERROR DataNucleus.Datastore.Persist: Update of object 
"org.apache.hadoop.hive.metastore.model.MFunction@205629e9" using statement 
"UPDATE FUNCS SET FUNC_NAME=? WHERE FUNC_ID=?" failed : 
org.apache.derby.shared.common.error.DerbySQLIntegrityConstraintViolationException:
 The statement was aborted because it would have caused a duplicate key value 
in a unique or primary key constraint or unique index identified by 
'UNIQUEFUNCTION' defined on 'FUNCS'.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)
```

### How was this patch tested?
Improved the existing test cases to check whether the messages are right.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark functionChecking

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14521.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14521


commit e53809aeccb936ade39abbbaab408fccbe347b7f
Author: gatorsmile 
Date:   2016-08-04T22:45:09Z

fix

commit 58cba4ba3658d2b0c5bb7bf7b0bfe929bec1aafd
Author: gatorsmile 
Date:   2016-08-06T15:32:22Z

Merge remote-tracking branch 'upstream/master' into functionChecking




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14489: [MINOR][SparkR] R API documentation for "coltypes...

2016-08-06 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14489#discussion_r73789927
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -41,7 +41,7 @@ setOldClass("structType")
 #'\dontrun{
 #' sparkR.session()
 #' df <- createDataFrame(faithful)
-#'}
+#' }
--- End diff --

Yeah lets not do them in this PR. If we want to fix this lets do them in a 
separate PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-08-06 Thread zasdfgbnm

Github user zasdfgbnm closed the pull request at:

https://github.com/apache/spark/pull/14231


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14520
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63315/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14520
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14520
  
**[Test build #63315 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63315/consoleFull)**
 for PR 14520 at commit 
[`417aa1e`](https://github.com/apache/spark/commit/417aa1ea623b10d0d7b9f13b3d3f65fa8ac64ce8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...

2016-08-06 Thread eyalfa

Github user eyalfa commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r73788331
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1101,7 +1101,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
* Create a [[CreateStruct]] expression.
*/
   override def visitRowConstructor(ctx: RowConstructorContext): Expression 
= withOrigin(ctx) {
-CreateStruct(ctx.expression.asScala.map(expression))
+
CreateStruct(ctx.expression.asScala.map(expression)).toCreateNamedStruct
--- End diff --

@cloud-fan this process tow-constructor expression of the form `(1,"a") as 
col1(a,b)`.
I could basically leave this as a `CreateStruct` but than I'd had to do 
something like transformDown in `visitInlineTable` which is basically the 
reason for this mess.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-06 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/14520
  
Oh..its another algorithm and there are several different details so in 
order to make it clear I create a separated PR to discuss it , thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-06 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14520
  
Let's put this into https://github.com/apache/spark/pull/14109


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...

2016-08-06 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14519
  
Let's put this into https://github.com/apache/spark/pull/14109


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14520
  
**[Test build #63315 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63315/consoleFull)**
 for PR 14520 at commit 
[`417aa1e`](https://github.com/apache/spark/commit/417aa1ea623b10d0d7b9f13b3d3f65fa8ac64ce8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurviv...

2016-08-06 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14519#discussion_r73787877
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala 
---
@@ -583,19 +591,22 @@ private class AFTAggregator(
 private class AFTCostFun(
 data: RDD[AFTPoint],
 fitIntercept: Boolean,
-featuresStd: Array[Double]) extends DiffFunction[BDV[Double]] {
+bcFeaturesStd: Broadcast[Array[Double]]) extends 
DiffFunction[BDV[Double]] {
 
   override def calculate(parameters: BDV[Double]): (Double, BDV[Double]) = 
{
 
+val bcParameters = data.context.broadcast(parameters)
+
 val aftAggregator = data.treeAggregate(
-  new AFTAggregator(parameters, fitIntercept, featuresStd))(
+  new AFTAggregator(bcParameters, fitIntercept, bcFeaturesStd))(
   seqOp = (c, v) => (c, v) match {
 case (aggregator, instance) => aggregator.add(instance)
   },
   combOp = (c1, c2) => (c1, c2) match {
 case (aggregator1, aggregator2) => aggregator1.merge(aggregator2)
   })
 
--- End diff --

No need (c, v) match {...} pattern, directly use `(aggregator, instance) => 
aggregator.add(instance)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-06 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/14520
  
cc @sethah @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun ...

2016-08-06 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/14520

[SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoid redundant 
serielization

## What changes were proposed in this pull request?

Improve LogisticCostFun, replace closure var `localFeaturesStd` with 
broadcast var,
so it can avoid redundant serialization for each calling.

and make several other modifications to make it similar to the patterns in 
https://github.com/apache/spark/pull/14109 

## How was this patch tested?

Existing test.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark 
improve_logistic_regression_costfun

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14520.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14520


commit 417aa1ea623b10d0d7b9f13b3d3f65fa8ac64ce8
Author: WeichenXu 
Date:   2016-08-05T01:28:18Z

update




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...

2016-08-06 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/14519
  
cc @sethah @dbtsai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14519
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63314/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14519
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14519
  
**[Test build #63314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63314/consoleFull)**
 for PR 14519 at commit 
[`d152a3a`](https://github.com/apache/spark/commit/d152a3a32ec08743b026ab5bd632b22909c6aa3f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegre...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14519
  
**[Test build #63314 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63314/consoleFull)**
 for PR 14519 at commit 
[`d152a3a`](https://github.com/apache/spark/commit/d152a3a32ec08743b026ab5bd632b22909c6aa3f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14519: [SPARK-16933] [ML] Fix AFTAggregator in AFTSurviv...

2016-08-06 Thread yanboliang

GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/14519

[SPARK-16933] [ML] Fix AFTAggregator in AFTSurvivalRegression serializes 
unnecessary data.

## What changes were proposed in this pull request?
Similar to ```LeastSquaresAggregator``` in #14109, ```AFTAggregator``` used 
for ```AFTSurvivalRegression``` ends up serializing the ```parameters``` and 
```featuresStd```, which is not necessary and can cause performance issues for 
high dimensional data. This patch removes this serialization. This PR is highly 
inspired by #14109.

## How was this patch tested?
Existing tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-16933

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14519


commit d152a3a32ec08743b026ab5bd632b22909c6aa3f
Author: Yanbo Liang 
Date:   2016-08-06T13:37:37Z

Fix AFTAggregator in AFTSurvivalRegression serializes unnecessary data.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14109: [SPARK-16404][ML] LeastSquaresAggregators serializes unn...

2016-08-06 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/14109
  
The current fix for broadcast variable destroy is ok. LGTM. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14504
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63313/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14504
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14504
  
**[Test build #63313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63313/consoleFull)**
 for PR 14504 at commit 
[`b835bd3`](https://github.com/apache/spark/commit/b835bd3d5fe4c736b4c92b3486f2344a83d09438).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14504
  
**[Test build #63313 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63313/consoleFull)**
 for PR 14504 at commit 
[`b835bd3`](https://github.com/apache/spark/commit/b835bd3d5fe4c736b4c92b3486f2344a83d09438).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14175
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63312/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14175
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14175
  
**[Test build #63312 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63312/consoleFull)**
 for PR 14175 at commit 
[`1a8b8e6`](https://github.com/apache/spark/commit/1a8b8e606c051f7f9e3da78d51cd92b69e8f84d9).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-06 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/14392
  
@felixcheung @junyangq Any thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14477: [SPARK-16870][docs]Summary:add "spark.sql.broadca...

2016-08-06 Thread biglobster

Github user biglobster commented on a diff in the pull request:

https://github.com/apache/spark/pull/14477#discussion_r73782824
  
--- Diff: docs/sql-programming-guide.md ---
@@ -790,6 +790,15 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
 
   
 
+
--- End diff --

done. thx :) for you suggestion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14175
  
**[Test build #63312 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63312/consoleFull)**
 for PR 14175 at commit 
[`1a8b8e6`](https://github.com/apache/spark/commit/1a8b8e606c051f7f9e3da78d51cd92b69e8f84d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-08-06 Thread sun-rui

Github user sun-rui commented on the issue:

https://github.com/apache/spark/pull/14175
  
@mgummelt, regression test case added. Not sure it is the expected one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14502: [SPARK-16909][Spark Core] - Streaming for postgre...

2016-08-06 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14502#discussion_r73781961
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -79,14 +79,19 @@ class JdbcRDD[T: ClassTag](
 val conn = getConnection()
 val stmt = conn.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY, 
ResultSet.CONCUR_READ_ONLY)
 
-// setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to 
force streaming results,
-// rather than pulling entire resultset into memory.
-// see 
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
-if (conn.getMetaData.getURL.matches("jdbc:mysql:.*")) {
+val url = conn.getMetaData.getURL
+if (url.startsWith("jdbc:mysql:")) {
+  // setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to 
force streaming results,
--- End diff --

(This line is too long, fails style checks)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14502: [SPARK-16909][Spark Core] - Streaming for postgreSQL JDB...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14502
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14502: [SPARK-16909][Spark Core] - Streaming for postgreSQL JDB...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14502
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63311/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14502: [SPARK-16909][Spark Core] - Streaming for postgreSQL JDB...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14502
  
**[Test build #63311 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63311/consoleFull)**
 for PR 14502 at commit 
[`99528c2`](https://github.com/apache/spark/commit/99528c25888f2eae4ee7be738867234edcacaf64).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14502: [SPARK-16909][Spark Core] - Streaming for postgreSQL JDB...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14502
  
**[Test build #63311 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63311/consoleFull)**
 for PR 14502 at commit 
[`99528c2`](https://github.com/apache/spark/commit/99528c25888f2eae4ee7be738867234edcacaf64).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14502: [SPARK-16909][Spark Core] - Streaming for postgreSQL JDB...

2016-08-06 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14502
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14504
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...

2016-08-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14504
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63310/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14504: [SPARK-16409] [SQL] regexp_extract with optional groups ...

2016-08-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14504
  
**[Test build #63310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63310/consoleFull)**
 for PR 14504 at commit 
[`545c8de`](https://github.com/apache/spark/commit/545c8dec58a4273ab34d300c35302a3f3bd97c76).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

77 matches

Mail list logo