[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r226874142
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,52 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  
+  df <- createDataFrame(faithful)
+  expect_is(df, "SparkDataFrame")
+  expected <- "eruptions:double, waiting:double"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session with eager execution enabled
+  sparkConfig <- list(spark.sql.repl.eagerEval.enabled = "true",
+  spark.sql.repl.eagerEval.maxNumRows = as.integer(10))
--- End diff --

add a test with `spark.sql.repl.eagerEval.maxNumRows` not set.
change this test to add `spark.sql.repl.eagerEval.truncate`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r226874309
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,52 @@
+#
--- End diff --

one thing to note, since test runs alphebatically, this test will run 
before sparkSQL - I think we should rename this to test_sparkSQL_eager.R to 
ensure it runs after


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r226874251
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -3731,6 +3731,7 @@ test_that("catalog APIs, listTables, listColumns, 
listFunctions", {
   dropTempView("cars")
 })
 
+
--- End diff --

remove empty line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22668: [SPARK-25675] [Spark Job History] Job UI page does not s...

2018-10-21 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22668
  
@shivusondur 
what's your account name on https://issues.apache.org?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22668: [SPARK-25675] [Spark Job History] Job UI page does not s...

2018-10-21 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22668
  
merged to master



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-25675][SPARK JOB HISTORY] Job UI page does not show pagination with one page

2018-10-21 Thread felixcheung
Repository: spark
Updated Branches:
  refs/heads/master ffe256ce1 -> 4c6c6711d


[SPARK-25675][SPARK JOB HISTORY] Job UI page does not show pagination with one 
page

## What changes were proposed in this pull request?
Currently in PagedTable.scala pageNavigation() method, if it is having only one 
page, they were not using the pagination.
Now it is made to use the pagination, even if it is having one page.

## How was this patch tested?
This tested with Spark webUI and History page in spark local setup.
![pagination](https://user-images.githubusercontent.com/7912929/46592799-93bfaf00-cae3-11e8-881a-ca2e93f17818.png)

Author: shivusondur 

Closes #22668 from shivusondur/pagination.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4c6c6711
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4c6c6711
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4c6c6711

Branch: refs/heads/master
Commit: 4c6c6711d5e94378c7bb5f672314f4db751372ea
Parents: ffe256c
Author: shivusondur 
Authored: Sun Oct 21 11:44:48 2018 -0700
Committer: Felix Cheung 
Committed: Sun Oct 21 11:44:48 2018 -0700

--
 .../scala/org/apache/spark/ui/PagedTable.scala  | 210 +--
 1 file changed, 101 insertions(+), 109 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4c6c6711/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/PagedTable.scala 
b/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
index 2fc0259..0bbb10a 100644
--- a/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
+++ b/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
@@ -122,13 +122,9 @@ private[spark] trait PagedTable[T] {
 
   /**
* Return a page navigation.
-   * 
-   *   If the totalPages is 1, the page navigation will be empty
-   *   
-   * If the totalPages is more than 1, it will create a page navigation 
including a group of
-   * page numbers and a form to submit the page number.
-   *   
-   * 
+   *
+   * It will create a page navigation including a group of page numbers and a 
form
+   * to submit the page number.
*
* Here are some examples of the page navigation:
* {{{
@@ -154,120 +150,116 @@ private[spark] trait PagedTable[T] {
* }}}
*/
   private[ui] def pageNavigation(page: Int, pageSize: Int, totalPages: Int): 
Seq[Node] = {
-if (totalPages == 1) {
-  Nil
-} else {
-  // A group includes all page numbers will be shown in the page 
navigation.
-  // The size of group is 10 means there are 10 page numbers will be shown.
-  // The first group is 1 to 10, the second is 2 to 20, and so on
-  val groupSize = 10
-  val firstGroup = 0
-  val lastGroup = (totalPages - 1) / groupSize
-  val currentGroup = (page - 1) / groupSize
-  val startPage = currentGroup * groupSize + 1
-  val endPage = totalPages.min(startPage + groupSize - 1)
-  val pageTags = (startPage to endPage).map { p =>
-if (p == page) {
-  // The current page should be disabled so that it cannot be clicked.
-  {p}
-} else {
-  {p}
-}
+// A group includes all page numbers will be shown in the page navigation.
+// The size of group is 10 means there are 10 page numbers will be shown.
+// The first group is 1 to 10, the second is 2 to 20, and so on
+val groupSize = 10
+val firstGroup = 0
+val lastGroup = (totalPages - 1) / groupSize
+val currentGroup = (page - 1) / groupSize
+val startPage = currentGroup * groupSize + 1
+val endPage = totalPages.min(startPage + groupSize - 1)
+val pageTags = (startPage to endPage).map { p =>
+  if (p == page) {
+// The current page should be disabled so that it cannot be clicked.
+{p}
+  } else {
+{p}
   }
+}
 
-  val hiddenFormFields = {
-if (goButtonFormPath.contains('?')) {
-  val queryString = goButtonFormPath.split("\\?", 2)(1)
-  val search = queryString.split("#")(0)
-  Splitter
-.on('&')
-.trimResults()
-.omitEmptyStrings()
-.withKeyValueSeparator("=")
-.split(search)
-.asScala
-.filterKeys(_ != pageSizeFormField)
-.filterKeys(_ != prevPageSizeFormField)
-.filterKeys(_ != pageNumberFormField)
-.mapValues(URLDecoder.decode(_, "UTF-8"))
-.map { case (k, v) =>
-  
-}
-} else {
-  Seq.empty
-}
+val hiddenFormFields = {
+  if (goButtonFormPath.contains('?')) {
+val queryString = goButtonFormPath.split("\\?", 2)(1)
+val search = 

[GitHub] spark issue #22720: [SPARK-25730][K8S] Delete executor pods from kubernetes ...

2018-10-21 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22720
  
merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-25730][K8S] Delete executor pods from kubernetes after figuring out why they died

2018-10-21 Thread felixcheung
Repository: spark
Updated Branches:
  refs/heads/master c77aa42f5 -> ffe256ce1


[SPARK-25730][K8S] Delete executor pods from kubernetes after figuring out why 
they died

## What changes were proposed in this pull request?

`removeExecutorFromSpark` tries to fetch the reason the executor exited from 
Kubernetes, which may be useful if the pod was OOMKilled. However, the code 
previously deleted the pod from Kubernetes first which made retrieving this 
status impossible. This fixes the ordering.

On a separate but related note, it would be nice to wait some time before 
removing the pod - to let the operator examine logs and such.

## How was this patch tested?

Running on my local cluster.

Author: Mike Kaplinskiy 

Closes #22720 from mikekap/patch-1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ffe256ce
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ffe256ce
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ffe256ce

Branch: refs/heads/master
Commit: ffe256ce161884f0a1304b4925d51d39a9bfa5df
Parents: c77aa42
Author: Mike Kaplinskiy 
Authored: Sun Oct 21 11:32:33 2018 -0700
Committer: Felix Cheung 
Committed: Sun Oct 21 11:32:33 2018 -0700

--
 .../spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ffe256ce/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala
index cc254b8..1a75ae0 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala
@@ -112,8 +112,8 @@ private[spark] class ExecutorPodsLifecycleManager(
   execId: Long,
   schedulerBackend: KubernetesClusterSchedulerBackend,
   execIdsRemovedInRound: mutable.Set[Long]): Unit = {
-removeExecutorFromK8s(podState.pod)
 removeExecutorFromSpark(schedulerBackend, podState, execId)
+removeExecutorFromK8s(podState.pod)
 execIdsRemovedInRound += execId
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] spark pull request #22771: [SPARK-25773][Core]Cancel zombie tasks in a resul...

2018-10-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22771#discussion_r226818782
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1364,6 +1385,16 @@ private[spark] class DAGScheduler(
   if (job.numFinished == job.numPartitions) {
 markStageAsFinished(resultStage)
 cleanupStateForJobAndIndependentStages(job)
+try { // cancelTasks will fail if a SchedulerBackend 
does not implement killTask
+  logInfo(
+s"Job ${job.jobId} is finished. Killing 
speculative tasks for this job")
+  // ResultStage is only used by this job. It's safe 
to kill speculative or
+  // zombie tasks in this stage.
+  taskScheduler.cancelTasks(stageId, 
shouldInterruptTaskThread(job))
+} catch {
+  case e: UnsupportedOperationException =>
+logInfo(s"Could not cancel tasks for stage 
$stageId", e)
--- End diff --

logWarn? aren't we leaking tasks?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] zeppelin issue #3206: [WIP][ZEPPELIN-3810] Support Spark 2.4

2018-10-20 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3206
  
is this going to break, say spark 2.3 with scala 2.11.8?


---


[GitHub] spark pull request #22753: [SPARK-25754][DOC] Change CDN for MathJax

2018-10-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22753#discussion_r225792435
  
--- Diff: docs/_layouts/global.html ---
@@ -184,7 +184,8 @@ {{ page.title }}
 });
 };
 script.src = ('https:' == document.location.protocol ? 
'https://' : 'http://') +
-
'cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+
'cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js' +
--- End diff --

should we avoid hardcoding the version inline?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] zeppelin issue #3013: [ZEPPELIN-3511] remove old button "Download Data as CS...

2018-10-17 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3013
  
ok, let's go ahead then?


---


[GitHub] spark issue #22608: [SPARK-23257][K8S][TESTS] Kerberos Support Integration T...

2018-10-16 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22608
  
I think the suggestion is to go to master & 2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22608: [SPARK-23257][K8S][TESTS] Kerberos Support Integr...

2018-10-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22608#discussion_r225615864
  
--- Diff: bin/docker-image-tool.sh ---
@@ -71,18 +71,29 @@ function build {
 --build-arg
 base_img=$(image_ref spark)
   )
-  local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/Dockerfile"}
-  local 
PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/spark/bindings/python/Dockerfile"}
-  local RDOCKERFILE=${RDOCKERFILE:-"$IMG_PATH/spark/bindings/R/Dockerfile"}
+  local 
BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/Dockerfile"}
+  local 
PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/bindings/python/Dockerfile"}
+  local 
KDOCKERFILE=${KDOCKERFILE:-"$IMG_PATH/test/dockerfiles/spark/kerberos/Dockerfile"}
+  local 
RDOCKERFILE=${RDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/bindings/R/Dockerfile"}
 
+  # Spark Base
   docker build $NOCACHEARG "${BUILD_ARGS[@]}" \
 -t $(image_ref spark) \
 -f "$BASEDOCKERFILE" .
 
+  # PySpark
   docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
 -t $(image_ref spark-py) \
 -f "$PYDOCKERFILE" .
 
+  # The following are optional docker builds for Kerberos Testing
+  docker pull ifilonenko/hadoop-base:latest
--- End diff --

I comment on this earlier... 
https://github.com/apache/spark/pull/22608#issuecomment-427617701


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22730: [SPARK-16775][CORE] Remove deprecated accumulator v1 API...

2018-10-16 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22730
  
no accumulator in R


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22721: [SPARK-25403][SQL] Refreshes the table after inse...

2018-10-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22721#discussion_r225418234
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
@@ -189,6 +189,7 @@ case class InsertIntoHadoopFsRelationCommand(
   sparkSession.catalog.refreshByPath(outputPath.toString)
 
   if (catalogTable.nonEmpty) {
+
sparkSession.catalog.refreshTable(catalogTable.get.identifier.quotedString)
--- End diff --

btw, I thought the cost of refreshing the table can be quite high?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22720: [SPARK-25730][K8S] Delete executor pods from kubernetes ...

2018-10-16 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22720
  
cool, any more comment?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22727: [SPARK-25735][CORE][MINOR]Improve start-thriftser...

2018-10-16 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22727#discussion_r225417566
  
--- Diff: sbin/start-thriftserver.sh ---
@@ -39,6 +39,10 @@ function usage {
   pattern+="\|Spark Command: "
   pattern+="\|==="
   pattern+="\|--help"
+  pattern+="\|Using Spark's default log4j profile:"
+  pattern+="\|^log4j:"
+  pattern+="\|Started daemon with process name"
+  pattern+="\|Registered signal handler for"
 
   "${SPARK_HOME}"/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
--- End diff --

it;s kinda weird that `--help` will try to create the context?
`ERROR SparkContext:91 - Error initializing SparkContext.`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] zeppelin issue #3013: [ZEPPELIN-3511] remove old button "Download Data as CS...

2018-10-16 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3013
  
possibly? any objection from reviewers of this change/PR?


---


[GitHub] spark issue #22720: [K8S] Delete executor pods from kubernetes after figurin...

2018-10-15 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22720
  
can you open a JIRA


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22720: [K8S] Delete executor pods from kubernetes after figurin...

2018-10-15 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22720
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22714: [SPARK-25720][WEBUI] Support auto refresh page for the W...

2018-10-13 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22714
  
can this be passed in as a param in the url?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] zeppelin issue #3013: [ZEPPELIN-3511] remove old button "Download Data as CS...

2018-10-13 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3013
  
any update?


---


[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r224949068
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3854,6 +3854,38 @@ object functions {
   @scala.annotation.varargs
   def map_concat(cols: Column*): Column = withExpr { 
MapConcat(cols.map(_.expr)) }
 
+  /**
+   * Parses a column containing a CSV string into a `StructType` with the 
specified schema.
+   * Returns `null`, in the case of an unparseable string.
+   *
+   * @param e a string column containing CSV data.
+   * @param schema the schema to use when parsing the CSV string
+   * @param options options to control how the CSV is parsed. accepts the 
same options and the
+   *CSV data source.
+   *
+   * @group collection_funcs
+   * @since 3.0.0
+   */
+  def from_csv(e: Column, schema: StructType, options: Map[String, 
String]): Column = withExpr {
+CsvToStructs(schema, options, e.expr)
+  }
+
+  /**
+   * (Java-specific) Parses a column containing a CSV string into a 
`StructType`
+   * with the specified schema. Returns `null`, in the case of an 
unparseable string.
+   *
+   * @param e a string column containing CSV data.
+   * @param schema the schema to use when parsing the CSV string
+   * @param options options to control how the CSV is parsed. accepts the 
same options and the
+   *CSV data source.
+   *
+   * @group collection_funcs
+   * @since 3.0.0
+   */
+  def from_csv(e: Column, schema: String, options: java.util.Map[String, 
String]): Column = {
--- End diff --

you can create a new type characterOrstructOrColumn? (letter casing is 
weird)

but I wasn't following - what's the case for schema is a Column?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22668: [SPARK-25675] [Spark Job History] Job UI page doe...

2018-10-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22668#discussion_r224316034
  
--- Diff: core/src/main/scala/org/apache/spark/ui/PagedTable.scala ---
@@ -123,10 +123,9 @@ private[ui] trait PagedTable[T] {
   /**
* Return a page navigation.
* 
-   *   If the totalPages is 1, the page navigation will be empty
*   
-   * If the totalPages is more than 1, it will create a page 
navigation including a group of
-   * page numbers and a form to submit the page number.
+   * It will create a page navigation including a group of page 
numbers and a form
--- End diff --

true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22689: [SPARK-25697][CORE]When zstd compression enabled, InProg...

2018-10-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22689
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22681: [SPARK-25682][k8s] Package example jars in same t...

2018-10-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22681#discussion_r224314585
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile ---
@@ -18,6 +18,7 @@
 FROM openjdk:8-alpine
 
 ARG spark_jars=jars
+ARG example_jars=examples/jars
--- End diff --

could we make this optional? if someone wants to build a smaller image 
without example


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21710#discussion_r223574367
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -154,3 +160,74 @@ setMethod("write.ml", signature(object = 
"FPGrowthModel", path = "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PrefixSpan
+#'
+#' A parallel PrefixSpan algorithm to mine frequent sequential patterns.
+#' \code{spark.prefixSpan} returns an instance of PrefixSpan.
+#' \code{spark.findFrequentSequentialPatterns} returns a complete set of 
frequent sequential
+#' patterns.
+#' For more details, see
+#' 
\href{https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html#prefixspan}{
+#' PrefixSpan}.
+#'
+#' @param minSupport Minimal support level.
+#' @param maxPatternLength Maximal pattern length.
+#' @param maxLocalProjDBSize Maximum number of items (including delimiters 
used in the internal
+#'   storage format) allowed in a projected 
database before local
+#'   processing.
+#' @param sequenceCol name of the sequence column in dataset.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.prefixSpan} returns an instance of PrefixSpan
+#' @rdname spark.prefixSpan
+#' @name spark.prefixSpan
+#' @aliases spark.prefixSpan,ANY-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))),
+#'   list(list(list(1L), list(3L, 2L), list(1L, 2L))),
+#'   list(list(list(1L, 2L), list(5L))),
+#'   list(list(list(6L, schema = c("sequence"))
+#' prefix_Span <- spark.prefixSpan(minSupport = 0.5, maxPatternLength = 5L,
+#' maxLocalProjDBSize = 3200L)
+#' frequency <- spark.findFrequentSequentialPatterns(prefix_Span, df)
+#' showDF(frequency)
+#' }
+#' @note spark.prefixSpan since 3.0.0
+setMethod("spark.prefixSpan", signature(),
+  function(minSupport=0.1, maxPatternLength=10L,
+   maxLocalProjDBSize=3200L, sequenceCol="sequence") {
+if (!is.numeric(minSupport) || minSupport < 0) {
+  stop("minSupport should be a number with value >= 0.")
+}
+if (!is.integer(maxPatternLength) || maxPatternLength <= 0) {
+  stop("maxPatternLength should be a number with value > 0.")
+}
+if (!is.numeric(maxLocalProjDBSize) || maxLocalProjDBSize <= 
0) {
+  stop("maxLocalProjDBSize should be a number with value > 0.")
+}
+
+jobj <- callJStatic("org.apache.spark.ml.r.PrefixSpanWrapper", 
"getPrefixSpan",
+as.numeric(minSupport), 
as.integer(maxPatternLength),
+as.numeric(maxLocalProjDBSize), 
as.character(sequenceCol))
+new("PrefixSpan", jobj = jobj)
+  })
+
+# Find frequent sequential patterns.
+
+#' @param object a PrefixSpan object.
+#' @param data A SparkDataFrame.
+#' @return A complete set of frequent sequential patterns in the input 
sequences of itemsets.
+#' The returned \code{SparkDataFrame} contains columns of sequence 
and corresponding
+#' frequency. The schema of it will be:
+#' \code{sequence: ArrayType(ArrayType(T))} (T is the item type)
+#' \code{freq: Long}
+#' @rdname spark.prefixSpan
+#' @aliases findFrequentSequentialPatterns,PrefixSpan,SparkDataFrame-method
+#' @note spark.findFrequentSequentialPatterns(PrefixSpan, SparkDataFrame) 
since 3.0.0
+
+setMethod("spark.findFrequentSequentialPatterns",
+  signature(object = "PrefixSpan", data = "SparkDataFrame"),
--- End diff --

nm, see the other comment instead


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21710#discussion_r223574019
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -154,3 +160,74 @@ setMethod("write.ml", signature(object = 
"FPGrowthModel", path = "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PrefixSpan
+#'
+#' A parallel PrefixSpan algorithm to mine frequent sequential patterns.
+#' \code{spark.prefixSpan} returns an instance of PrefixSpan.
+#' \code{spark.findFrequentSequentialPatterns} returns a complete set of 
frequent sequential
+#' patterns.
+#' For more details, see
+#' 
\href{https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html#prefixspan}{
+#' PrefixSpan}.
+#'
+#' @param minSupport Minimal support level.
+#' @param maxPatternLength Maximal pattern length.
+#' @param maxLocalProjDBSize Maximum number of items (including delimiters 
used in the internal
+#'   storage format) allowed in a projected 
database before local
+#'   processing.
+#' @param sequenceCol name of the sequence column in dataset.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.prefixSpan} returns an instance of PrefixSpan
+#' @rdname spark.prefixSpan
+#' @name spark.prefixSpan
+#' @aliases spark.prefixSpan,ANY-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))),
+#'   list(list(list(1L), list(3L, 2L), list(1L, 2L))),
+#'   list(list(list(1L, 2L), list(5L))),
+#'   list(list(list(6L, schema = c("sequence"))
+#' prefix_Span <- spark.prefixSpan(minSupport = 0.5, maxPatternLength = 5L,
+#' maxLocalProjDBSize = 3200L)
+#' frequency <- spark.findFrequentSequentialPatterns(prefix_Span, df)
+#' showDF(frequency)
+#' }
+#' @note spark.prefixSpan since 3.0.0
+setMethod("spark.prefixSpan", signature(),
+  function(minSupport=0.1, maxPatternLength=10L,
+   maxLocalProjDBSize=3200L, sequenceCol="sequence") {
+if (!is.numeric(minSupport) || minSupport < 0) {
+  stop("minSupport should be a number with value >= 0.")
+}
+if (!is.integer(maxPatternLength) || maxPatternLength <= 0) {
+  stop("maxPatternLength should be a number with value > 0.")
+}
+if (!is.numeric(maxLocalProjDBSize) || maxLocalProjDBSize <= 
0) {
+  stop("maxLocalProjDBSize should be a number with value > 0.")
+}
+
+jobj <- callJStatic("org.apache.spark.ml.r.PrefixSpanWrapper", 
"getPrefixSpan",
+as.numeric(minSupport), 
as.integer(maxPatternLength),
+as.numeric(maxLocalProjDBSize), 
as.character(sequenceCol))
+new("PrefixSpan", jobj = jobj)
+  })
+
+# Find frequent sequential patterns.
+
+#' @param object a PrefixSpan object.
+#' @param data A SparkDataFrame.
+#' @return A complete set of frequent sequential patterns in the input 
sequences of itemsets.
+#' The returned \code{SparkDataFrame} contains columns of sequence 
and corresponding
+#' frequency. The schema of it will be:
+#' \code{sequence: ArrayType(ArrayType(T))} (T is the item type)
+#' \code{freq: Long}
+#' @rdname spark.prefixSpan
+#' @aliases findFrequentSequentialPatterns,PrefixSpan,SparkDataFrame-method
+#' @note spark.findFrequentSequentialPatterns(PrefixSpan, SparkDataFrame) 
since 3.0.0
+
+setMethod("spark.findFrequentSequentialPatterns",
+  signature(object = "PrefixSpan", data = "SparkDataFrame"),
--- End diff --

hmm, another one - why isn't `data` the first param? should also be called 
`x` (the convention in R API)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21710#discussion_r223573856
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -154,3 +160,74 @@ setMethod("write.ml", signature(object = 
"FPGrowthModel", path = "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PrefixSpan
+#'
+#' A parallel PrefixSpan algorithm to mine frequent sequential patterns.
+#' \code{spark.prefixSpan} returns an instance of PrefixSpan.
+#' \code{spark.findFrequentSequentialPatterns} returns a complete set of 
frequent sequential
+#' patterns.
+#' For more details, see
+#' 
\href{https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html#prefixspan}{
+#' PrefixSpan}.
+#'
+#' @param minSupport Minimal support level.
+#' @param maxPatternLength Maximal pattern length.
+#' @param maxLocalProjDBSize Maximum number of items (including delimiters 
used in the internal
+#'   storage format) allowed in a projected 
database before local
+#'   processing.
+#' @param sequenceCol name of the sequence column in dataset.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.prefixSpan} returns an instance of PrefixSpan
+#' @rdname spark.prefixSpan
+#' @name spark.prefixSpan
+#' @aliases spark.prefixSpan,ANY-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))),
+#'   list(list(list(1L), list(3L, 2L), list(1L, 2L))),
+#'   list(list(list(1L, 2L), list(5L))),
+#'   list(list(list(6L, schema = c("sequence"))
+#' prefix_Span <- spark.prefixSpan(minSupport = 0.5, maxPatternLength = 5L,
+#' maxLocalProjDBSize = 3200L)
+#' frequency <- spark.findFrequentSequentialPatterns(prefix_Span, df)
+#' showDF(frequency)
+#' }
+#' @note spark.prefixSpan since 3.0.0
+setMethod("spark.prefixSpan", signature(),
--- End diff --

sorry, I didn't see this before - an empty `signature()` is a bit usual
should we either
1) put one of these param as required (if they are required?) and its type 
into the signature, eg. `signature(minSupport = "numeric")`
2) instead of having `spark.prefixSpan`, take all of these param into 
`spark.findFrequentSequentialPatterns`
ie. 
```
setMethod("spark.findFrequentSequentialPatterns",
  signature(object = "PrefixSpan", data = "SparkDataFrame"),
  function(object, data, minSupport = 0.1, maxPatternLength = 10L,
   maxLocalProjDBSize = 3200L, sequenceCol = 
"sequence") {
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21710#discussion_r223572974
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -154,3 +160,74 @@ setMethod("write.ml", signature(object = 
"FPGrowthModel", path = "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PrefixSpan
+#'
+#' A parallel PrefixSpan algorithm to mine frequent sequential patterns.
+#' \code{spark.prefixSpan} returns an instance of PrefixSpan.
+#' \code{spark.findFrequentSequentialPatterns} returns a complete set of 
frequent sequential
+#' patterns.
+#' For more details, see
+#' 
\href{https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html#prefixspan}{
+#' PrefixSpan}.
+#'
+#' @param minSupport Minimal support level.
+#' @param maxPatternLength Maximal pattern length.
+#' @param maxLocalProjDBSize Maximum number of items (including delimiters 
used in the internal
+#'   storage format) allowed in a projected 
database before local
+#'   processing.
+#' @param sequenceCol name of the sequence column in dataset.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.prefixSpan} returns an instance of PrefixSpan
+#' @rdname spark.prefixSpan
+#' @name spark.prefixSpan
+#' @aliases spark.prefixSpan,ANY-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))),
+#'   list(list(list(1L), list(3L, 2L), list(1L, 2L))),
+#'   list(list(list(1L, 2L), list(5L))),
+#'   list(list(list(6L, schema = c("sequence"))
+#' prefix_Span <- spark.prefixSpan(minSupport = 0.5, maxPatternLength = 5L,
+#' maxLocalProjDBSize = 3200L)
+#' frequency <- spark.findFrequentSequentialPatterns(prefix_Span, df)
+#' showDF(frequency)
+#' }
+#' @note spark.prefixSpan since 3.0.0
+setMethod("spark.prefixSpan", signature(),
+  function(minSupport=0.1, maxPatternLength=10L,
+   maxLocalProjDBSize=3200L, sequenceCol="sequence") {
--- End diff --

nit: I think R style we put space around `=`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21710#discussion_r223572694
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/PrefixSpanWrapper.scala ---
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.spark.ml.fpm.PrefixSpan
+
+private[r] object PrefixSpanWrapper {
+  def getPrefixSpan(
+  minSupport: Double,
+  maxPatternLength: Int,
+  maxLocalProjDBSize: Double,
--- End diff --

no native 64 bit int support in R. double is ok 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22645: [SPARK-25566][SPARK-25567][WEBUI][SQL]Support pagination...

2018-10-09 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22645
  
@shahidki31 ^^


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22615: [SPARK-25016][BUILD][CORE] Remove support for Had...

2018-10-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22615#discussion_r223571584
  
--- Diff: docs/index.md ---
@@ -30,9 +30,6 @@ Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For 
the Scala API, Spark {{s
 uses Scala {{site.SCALA_BINARY_VERSION}}. You will need to use a 
compatible Scala version
 ({{site.SCALA_BINARY_VERSION}}.x).
 
-Note that support for Java 7, Python 2.6 and old Hadoop versions before 
2.6.5 were removed as of Spark 2.2.0.
-Support for Scala 2.10 was removed as of 2.3.0.
--- End diff --

so we are not going to mention supported hadoop version?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22615: [SPARK-25016][BUILD][CORE] Remove support for Had...

2018-10-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22615#discussion_r223571106
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -71,7 +71,7 @@ class HadoopTableReader(
 
   // Hadoop honors "mapreduce.job.maps" as hint,
   // but will ignore when mapreduce.jobtracker.address is "local".
-  // 
https://hadoop.apache.org/docs/r2.6.5/hadoop-mapreduce-client/hadoop-mapreduce-client-core/
+  // 
https://hadoop.apache.org/docs/r2.7.6/hadoop-mapreduce-client/hadoop-mapreduce-client-core/
--- End diff --

how come this is 2.7.6 and not 2.7.3 like others?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22615: [SPARK-25016][BUILD][CORE] Remove support for Had...

2018-10-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22615#discussion_r223570946
  
--- Diff: dev/create-release/release-build.sh ---
@@ -191,9 +191,17 @@ if [[ "$1" == "package" ]]; then
   make_binary_release() {
 NAME=$1
 FLAGS="$MVN_EXTRA_OPTS -B $BASE_RELEASE_PROFILES $2"
+# BUILD_PACKAGE can be "withpip", "withr", or both as "withpip,withr"
 BUILD_PACKAGE=$3
 SCALA_VERSION=$4
 
+if [[ $BUILD_PACKAGE == *"withpip"* ]]; then
--- End diff --

one caveat is I'm not sure we have tested building both python and R in 
"one build".

this could be a good thing but if I recall the R build changes some of the 
binary files under R that gets shipped in the "source release" (these are 
required R object file)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22615: [SPARK-25016][BUILD][CORE] Remove support for Hadoop 2.6

2018-10-09 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22615
  
I think we all know enough to not to make changes (merge changes) to these 
config, should be safe.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22645: [SPARK-25566][SPARK-25567][WEBUI][SQL]Support pagination...

2018-10-09 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22645
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22675: [SPARK-25347][ML][DOC] Spark datasource for image/libsvm...

2018-10-09 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22675
  
we need this for 2.4?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21710#discussion_r223568243
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -154,3 +160,74 @@ setMethod("write.ml", signature(object = 
"FPGrowthModel", path = "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PrefixSpan
+#'
+#' A parallel PrefixSpan algorithm to mine frequent sequential patterns.
+#' \code{spark.prefixSpan} returns an instance of PrefixSpan.
+#' \code{spark.findFrequentSequentialPatterns} returns a complete set of 
frequent sequential
+#' patterns.
+#' For more details, see
+#' 
\href{https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html#prefixspan}{
+#' PrefixSpan}.
+#'
+#' @param minSupport Minimal support level.
+#' @param maxPatternLength Maximal pattern length.
+#' @param maxLocalProjDBSize Maximum number of items (including delimiters 
used in the internal
+#'   storage format) allowed in a projected 
database before local
+#'   processing.
+#' @param sequenceCol name of the sequence column in dataset.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.prefixSpan} returns an instance of PrefixSpan
+#' @rdname spark.prefixSpan
+#' @name spark.prefixSpan
+#' @aliases spark.prefixSpan,ANY-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))),
+#'   list(list(list(1L), list(3L, 2L), list(1L, 2L))),
+#'   list(list(list(1L, 2L), list(5L))),
+#'   list(list(list(6L, schema = c("sequence"))
+#' prefix_Span <- spark.prefixSpan(minSupport = 0.5, maxPatternLength = 5L,
+#' maxLocalProjDBSize = 3200L)
+#' frequency <- spark.findFrequentSequentialPatterns(prefix_Span, df)
+#' showDF(frequency)
+#' }
+#' @note spark.prefixSpan since 3.0.0
+setMethod("spark.prefixSpan", signature(),
+  function(minSupport=0.1, maxPatternLength=10L,
+   maxLocalProjDBSize=3200L, sequenceCol="sequence") {
+if (!is.numeric(minSupport) || minSupport < 0) {
+  stop("minSupport should be a number with value >= 0.")
+}
+if (!is.integer(maxPatternLength) || maxPatternLength <= 0) {
+  stop("maxPatternLength should be a number with value > 0.")
+}
+if (!is.numeric(maxLocalProjDBSize) || maxLocalProjDBSize <= 
0) {
+  stop("maxLocalProjDBSize should be a number with value > 0.")
+}
+
+jobj <- callJStatic("org.apache.spark.ml.r.PrefixSpanWrapper", 
"getPrefixSpan",
+as.numeric(minSupport), 
as.integer(maxPatternLength),
+as.numeric(maxLocalProjDBSize), 
as.character(sequenceCol))
+new("PrefixSpan", jobj = jobj)
+  })
+
+# Find frequent sequential patterns.
+
+#' @param object a PrefixSpan object.
+#' @param data A SparkDataFrame.
+#' @return A complete set of frequent sequential patterns in the input 
sequences of itemsets.
+#' The returned \code{SparkDataFrame} contains columns of sequence 
and corresponding
+#' frequency. The schema of it will be:
+#' \code{sequence: ArrayType(ArrayType(T))} (T is the item type)
+#' \code{freq: Long}
+#' @rdname spark.prefixSpan
+#' @aliases findFrequentSequentialPatterns,PrefixSpan,SparkDataFrame-method
+#' @note spark.findFrequentSequentialPatterns(PrefixSpan, SparkDataFrame) 
since 3.0.0
+
--- End diff --

remove extra line - note this might break doc generation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21710#discussion_r223568592
  
--- Diff: docs/ml-frequent-pattern-mining.md ---
@@ -85,3 +85,56 @@ Refer to the [R API docs](api/R/spark.fpGrowth.html) for 
more details.
 
 
 
+
+## PrefixSpan
--- End diff --

is this not in the doc for branch-2.4? we might need a quick fix on the 
branch then before the next RC, I think


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21710#discussion_r223568708
  
--- Diff: examples/src/main/python/ml/prefixspan_example.py ---
@@ -0,0 +1,48 @@
+#
--- End diff --

ditto for java and scala and python example



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22645: [SPARK-25566][SPARK-25567][WEBUI][SQL]Support pag...

2018-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22645#discussion_r223198079
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -121,74 +122,257 @@ private[ui] class AllExecutionsPage(parent: SQLTab) 
extends WebUIPage("") with L
   {
 if (running.nonEmpty) {
   
-Running 
Queries:
+Running Queries:
 {running.size}
   
 }
   }
   {
 if (completed.nonEmpty) {
   
-Completed 
Queries:
+Completed 
Queries:
 {completed.size}
   
 }
   }
   {
 if (failed.nonEmpty) {
   
-Failed 
Queries:
+Failed Queries:
 {failed.size}
   
 }
   }
 
   
+
 UIUtils.headerSparkPage(request, "SQL", summary ++ content, parent, 
Some(5000))
   }
+
+  private def executionsTable(
+request: HttpServletRequest,
+executionTag: String,
+executionData: Seq[SQLExecutionUIData],
+currentTime: Long,
+showRunningJobs: Boolean,
+showSucceededJobs: Boolean,
+showFailedJobs: Boolean): Seq[Node] = {
+
+// stripXSS is called to remove suspicious characters used in XSS 
attacks
+val allParameters = request.getParameterMap.asScala.toMap.map { case 
(k, v) =>
+  UIUtils.stripXSS(k) -> v.map(UIUtils.stripXSS).toSeq
+}
+val parameterOtherTable = 
allParameters.filterNot(_._1.startsWith(executionTag))
+  .map(para => para._1 + "=" + para._2(0))
+
+val parameterExecutionPage = 
UIUtils.stripXSS(request.getParameter(s"$executionTag.page"))
+val parameterExecutionSortColumn = UIUtils.stripXSS(request.
+  getParameter(s"$executionTag.sort"))
+val parameterExecutionSortDesc = 
UIUtils.stripXSS(request.getParameter(s"$executionTag.desc"))
+val parameterExecutionPageSize = UIUtils.stripXSS(request.
--- End diff --

could you either have `.` last or first of the next line.
for example L172 is at the end and L166 is in front - let's do this 
consistently with other code in the file


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-23257][K8S][TESTS] Kerberos Support Integration T...

2018-10-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22608
  
> calling an external docker-image like: ifilonenko/hadoop-base:latest for 
now

for now it's probably ok, but is there a solution before the next release?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...

2018-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22275#discussion_r223197940
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4434,6 +4434,12 @@ def test_timestamp_dst(self):
 self.assertPandasEqual(pdf, df_from_python.toPandas())
 self.assertPandasEqual(pdf, df_from_pandas.toPandas())
 
+def test_toPandas_batch_order(self):
+df = self.spark.range(64, numPartitions=8).toDF("a")
+with 
self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 4}):
+pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
+self.assertPandasEqual(pdf, pdf_arrow)
--- End diff --

that sounds good


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r223197863
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -246,30 +248,38 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-allConf <- sparkR.conf()
-if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
-identical(allConf[["spark.sql.repl.eagerEval.enabled"]], 
"true")) {
-  argsList <- list()
-  argsList$x <- object
-  if 
(!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
-numRows <- 
as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
-if (numRows > 0) {
-  argsList$numRows <- numRows
+showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
--- End diff --

IMO pretty print should plug in to something more R standard like 
[printr](https://yihui.name/printr/) or

[lemon](https://cran.r-project.org/web/packages/lemon/vignettes/lemon_print.html)
or

[print.x](https://stat.ethz.ch/R-manual/R-devel/library/base/html/print.dataframe.html)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r223197875
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -246,30 +248,38 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-allConf <- sparkR.conf()
-if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
-identical(allConf[["spark.sql.repl.eagerEval.enabled"]], 
"true")) {
-  argsList <- list()
-  argsList$x <- object
-  if 
(!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
-numRows <- 
as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
-if (numRows > 0) {
-  argsList$numRows <- numRows
+showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
--- End diff --

could we consider leaving print/show option out? I'd like to get eager 
compute to work even in basic sparkR / R shell


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22145: [SPARK-25152][K8S] Enable SparkR Integration Tests for K...

2018-10-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22145
  
@shaneknapp could we do this soon?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22645: [SPARK-25566][SPARK-25567][WEBUI][SQL]Support pag...

2018-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22645#discussion_r223174258
  
--- Diff: core/src/main/scala/org/apache/spark/ui/PagedTable.scala ---
@@ -31,7 +31,7 @@ import org.apache.spark.util.Utils
  *
  * @param pageSize the number of rows in a page
  */
-private[ui] abstract class PagedDataSource[T](val pageSize: Int) {
+private[spark] abstract class PagedDataSource[T](val pageSize: Int) {
--- End diff --

is there a way to do this without opening this all to the spark level?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22639: [SPARK-25647][k8s] Add spark streaming compatibility sui...

2018-10-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22639
  
@liyinan926 @mccheah 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22625: [SPARK-25637][CORE] SparkException: Could not fin...

2018-10-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22625#discussion_r222550195
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -98,6 +98,8 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   private val reviveThread =
 
ThreadUtils.newDaemonSingleThreadScheduledExecutor("driver-revive-thread")
 
+  private var driverEndpointStopped = false
--- End diff --

shouldn't this be inside  class DriverEndpoint?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22615: [SPARK-25016][BUILD][CORE] Remove support for Had...

2018-10-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22615#discussion_r222549104
  
--- Diff: dev/appveyor-install-dependencies.ps1 ---
@@ -95,7 +95,7 @@ $env:MAVEN_OPTS = "-Xmx2g -XX:ReservedCodeCacheSize=512m"
 Pop-Location
 
 # == Hadoop bin package
-$hadoopVer = "2.6.4"
+$hadoopVer = "2.7.3"
--- End diff --

this could be easy to forget - the next time we need to update this, we 
might be searching and replacing "2.7.3" if this is 2.7.1 then it might be 
missed...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-10-02 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21710
  
actually, as of 2 hrs ago seems like we are skipping 2.5, to 3.0. we could 
wait a bit until that is merged


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] zeppelin issue #3200: ZEPPELIN-3804. Separate log file for each interpreter ...

2018-10-02 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3200
  
how to find out which log file is for which?


---


[GitHub] zeppelin issue #3132: [ZEPPELIN-3701].Missing first several '0' and losing d...

2018-10-02 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3132
  
this should be in 0.9. as for release date plan, please ask on 
dev@zeppelin.apache.org


---


[GitHub] spark issue #20405: [SPARK-23229][SQL] Dataset.hint should use planWithBarri...

2018-10-02 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20405
  
probably.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22527: [SPARK-17952][SQL] Nested Java beans support in createDa...

2018-09-30 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22527
  
Jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] zeppelin issue #3197: docs: fix punctuation

2018-09-30 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3197
  
can you also check other .md? it's probably best to have one PR


---


[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-09-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21669
  
like it, but we could also first support cluster mode and add client mode 
after.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-25572][SPARKR] test only if not cran

2018-09-29 Thread felixcheung
Repository: spark
Updated Branches:
  refs/heads/master f246813af -> f4b138082


[SPARK-25572][SPARKR] test only if not cran

## What changes were proposed in this pull request?

CRAN doesn't seem to respect the system requirements as running tests - we have 
seen cases where SparkR is run on Java 10, which unfortunately Spark does not 
start on. For 2.4, lets attempt skipping all tests

## How was this patch tested?

manual, jenkins, appveyor

Author: Felix Cheung 

Closes #22589 from felixcheung/ralltests.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f4b13808
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f4b13808
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f4b13808

Branch: refs/heads/master
Commit: f4b138082ff91be74b0f5bbe19cdb90dd9e5f131
Parents: f246813
Author: Felix Cheung 
Authored: Sat Sep 29 14:48:32 2018 -0700
Committer: Felix Cheung 
Committed: Sat Sep 29 14:48:32 2018 -0700

--
 R/pkg/tests/run-all.R | 83 --
 1 file changed, 44 insertions(+), 39 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f4b13808/R/pkg/tests/run-all.R
--
diff --git a/R/pkg/tests/run-all.R b/R/pkg/tests/run-all.R
index 94d7518..1e96418 100644
--- a/R/pkg/tests/run-all.R
+++ b/R/pkg/tests/run-all.R
@@ -18,50 +18,55 @@
 library(testthat)
 library(SparkR)
 
-# Turn all warnings into errors
-options("warn" = 2)
+# SPARK-25572
+if (identical(Sys.getenv("NOT_CRAN"), "true")) {
 
-if (.Platform$OS.type == "windows") {
-  Sys.setenv(TZ = "GMT")
-}
+  # Turn all warnings into errors
+  options("warn" = 2)
 
-# Setup global test environment
-# Install Spark first to set SPARK_HOME
+  if (.Platform$OS.type == "windows") {
+Sys.setenv(TZ = "GMT")
+  }
 
-# NOTE(shivaram): We set overwrite to handle any old tar.gz files or 
directories left behind on
-# CRAN machines. For Jenkins we should already have SPARK_HOME set.
-install.spark(overwrite = TRUE)
+  # Setup global test environment
+  # Install Spark first to set SPARK_HOME
 
-sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R")
-sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db")
-invisible(lapply(sparkRWhitelistSQLDirs,
- function(x) { unlink(file.path(sparkRDir, x), recursive = 
TRUE, force = TRUE)}))
-sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE)
+  # NOTE(shivaram): We set overwrite to handle any old tar.gz files or 
directories left behind on
+  # CRAN machines. For Jenkins we should already have SPARK_HOME set.
+  install.spark(overwrite = TRUE)
 
-sparkRTestMaster <- "local[1]"
-sparkRTestConfig <- list()
-if (identical(Sys.getenv("NOT_CRAN"), "true")) {
-  sparkRTestMaster <- ""
-} else {
-  # Disable hsperfdata on CRAN
-  old_java_opt <- Sys.getenv("_JAVA_OPTIONS")
-  Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt))
-  tmpDir <- tempdir()
-  tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir)
-  sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg,
-   spark.executor.extraJavaOptions = tmpArg)
-}
+  sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R")
+  sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db")
+  invisible(lapply(sparkRWhitelistSQLDirs,
+   function(x) { unlink(file.path(sparkRDir, x), recursive = 
TRUE, force = TRUE)}))
+  sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE)
 
-test_package("SparkR")
+  sparkRTestMaster <- "local[1]"
+  sparkRTestConfig <- list()
+  if (identical(Sys.getenv("NOT_CRAN"), "true")) {
+sparkRTestMaster <- ""
+  } else {
+# Disable hsperfdata on CRAN
+old_java_opt <- Sys.getenv("_JAVA_OPTIONS")
+Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt))
+tmpDir <- tempdir()
+tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir)
+sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg,
+ spark.executor.extraJavaOptions = tmpArg)
+  }
 
-if (identical(Sys.getenv("NOT_CRAN"), "true")) {
-  # set random seed for predictable results. mostly for base's sample() in 
tree and classification
-  set.seed(42)
-  # for testthat 1.0.2 later, change reporter from "summary" to 
default_reporter()
-  testthat:::run_tests("SparkR",
-  

[GitHub] spark issue #22589: [SPARK-25572][SPARKR] test only if not cran

2018-09-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22589
  
merged to master/2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-25572][SPARKR] test only if not cran

2018-09-29 Thread felixcheung
Repository: spark
Updated Branches:
  refs/heads/branch-2.4 a14306b1d -> fef30271b


[SPARK-25572][SPARKR] test only if not cran

## What changes were proposed in this pull request?

CRAN doesn't seem to respect the system requirements as running tests - we have 
seen cases where SparkR is run on Java 10, which unfortunately Spark does not 
start on. For 2.4, lets attempt skipping all tests

## How was this patch tested?

manual, jenkins, appveyor

Author: Felix Cheung 

Closes #22589 from felixcheung/ralltests.

(cherry picked from commit f4b138082ff91be74b0f5bbe19cdb90dd9e5f131)
Signed-off-by: Felix Cheung 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fef30271
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fef30271
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fef30271

Branch: refs/heads/branch-2.4
Commit: fef30271bc4a77e7f94b240cd8d7fef71f47935e
Parents: a14306b
Author: Felix Cheung 
Authored: Sat Sep 29 14:48:32 2018 -0700
Committer: Felix Cheung 
Committed: Sat Sep 29 14:48:51 2018 -0700

--
 R/pkg/tests/run-all.R | 83 --
 1 file changed, 44 insertions(+), 39 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fef30271/R/pkg/tests/run-all.R
--
diff --git a/R/pkg/tests/run-all.R b/R/pkg/tests/run-all.R
index 94d7518..1e96418 100644
--- a/R/pkg/tests/run-all.R
+++ b/R/pkg/tests/run-all.R
@@ -18,50 +18,55 @@
 library(testthat)
 library(SparkR)
 
-# Turn all warnings into errors
-options("warn" = 2)
+# SPARK-25572
+if (identical(Sys.getenv("NOT_CRAN"), "true")) {
 
-if (.Platform$OS.type == "windows") {
-  Sys.setenv(TZ = "GMT")
-}
+  # Turn all warnings into errors
+  options("warn" = 2)
 
-# Setup global test environment
-# Install Spark first to set SPARK_HOME
+  if (.Platform$OS.type == "windows") {
+Sys.setenv(TZ = "GMT")
+  }
 
-# NOTE(shivaram): We set overwrite to handle any old tar.gz files or 
directories left behind on
-# CRAN machines. For Jenkins we should already have SPARK_HOME set.
-install.spark(overwrite = TRUE)
+  # Setup global test environment
+  # Install Spark first to set SPARK_HOME
 
-sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R")
-sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db")
-invisible(lapply(sparkRWhitelistSQLDirs,
- function(x) { unlink(file.path(sparkRDir, x), recursive = 
TRUE, force = TRUE)}))
-sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE)
+  # NOTE(shivaram): We set overwrite to handle any old tar.gz files or 
directories left behind on
+  # CRAN machines. For Jenkins we should already have SPARK_HOME set.
+  install.spark(overwrite = TRUE)
 
-sparkRTestMaster <- "local[1]"
-sparkRTestConfig <- list()
-if (identical(Sys.getenv("NOT_CRAN"), "true")) {
-  sparkRTestMaster <- ""
-} else {
-  # Disable hsperfdata on CRAN
-  old_java_opt <- Sys.getenv("_JAVA_OPTIONS")
-  Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt))
-  tmpDir <- tempdir()
-  tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir)
-  sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg,
-   spark.executor.extraJavaOptions = tmpArg)
-}
+  sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R")
+  sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db")
+  invisible(lapply(sparkRWhitelistSQLDirs,
+   function(x) { unlink(file.path(sparkRDir, x), recursive = 
TRUE, force = TRUE)}))
+  sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE)
 
-test_package("SparkR")
+  sparkRTestMaster <- "local[1]"
+  sparkRTestConfig <- list()
+  if (identical(Sys.getenv("NOT_CRAN"), "true")) {
+sparkRTestMaster <- ""
+  } else {
+# Disable hsperfdata on CRAN
+old_java_opt <- Sys.getenv("_JAVA_OPTIONS")
+Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt))
+tmpDir <- tempdir()
+tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir)
+sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg,
+ spark.executor.extraJavaOptions = tmpArg)
+  }
 
-if (identical(Sys.getenv("NOT_CRAN"), "true")) {
-  # set random seed for predictable results. mostly for base's sample() in 
tree and classification
-  set.seed(42)
-  # for testthat 1.0.2 later, change reporter f

[GitHub] spark pull request #22589: [SPARK-25572][SPARKR] test only if not cran

2018-09-29 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22589#discussion_r221440760
  
--- Diff: R/pkg/tests/run-all.R ---
@@ -18,50 +18,55 @@
 library(testthat)
 library(SparkR)
 
-# Turn all warnings into errors
-options("warn" = 2)
+# SPARK-25572
+if (identical(Sys.getenv("NOT_CRAN"), "true")) {
 
-if (.Platform$OS.type == "windows") {
-  Sys.setenv(TZ = "GMT")
-}
+  # Turn all warnings into errors
+  options("warn" = 2)
 
-# Setup global test environment
-# Install Spark first to set SPARK_HOME
+  if (.Platform$OS.type == "windows") {
+Sys.setenv(TZ = "GMT")
+  }
 
-# NOTE(shivaram): We set overwrite to handle any old tar.gz files or 
directories left behind on
-# CRAN machines. For Jenkins we should already have SPARK_HOME set.
-install.spark(overwrite = TRUE)
+  # Setup global test environment
+  # Install Spark first to set SPARK_HOME
 
-sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R")
-sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db")
-invisible(lapply(sparkRWhitelistSQLDirs,
- function(x) { unlink(file.path(sparkRDir, x), recursive = 
TRUE, force = TRUE)}))
-sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE)
+  # NOTE(shivaram): We set overwrite to handle any old tar.gz files or 
directories left behind on
+  # CRAN machines. For Jenkins we should already have SPARK_HOME set.
+  install.spark(overwrite = TRUE)
 
-sparkRTestMaster <- "local[1]"
-sparkRTestConfig <- list()
-if (identical(Sys.getenv("NOT_CRAN"), "true")) {
-  sparkRTestMaster <- ""
-} else {
-  # Disable hsperfdata on CRAN
-  old_java_opt <- Sys.getenv("_JAVA_OPTIONS")
-  Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt))
-  tmpDir <- tempdir()
-  tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir)
-  sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg,
-   spark.executor.extraJavaOptions = tmpArg)
-}
+  sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R")
+  sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db")
+  invisible(lapply(sparkRWhitelistSQLDirs,
+   function(x) { unlink(file.path(sparkRDir, x), recursive 
= TRUE, force = TRUE)}))
+  sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE)
 
-test_package("SparkR")
+  sparkRTestMaster <- "local[1]"
+  sparkRTestConfig <- list()
+  if (identical(Sys.getenv("NOT_CRAN"), "true")) {
+sparkRTestMaster <- ""
+  } else {
+# Disable hsperfdata on CRAN
+old_java_opt <- Sys.getenv("_JAVA_OPTIONS")
+Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt))
+tmpDir <- tempdir()
+tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir)
+sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg,
+ spark.executor.extraJavaOptions = tmpArg)
+  }
 
-if (identical(Sys.getenv("NOT_CRAN"), "true")) {
-  # set random seed for predictable results. mostly for base's sample() in 
tree and classification
-  set.seed(42)
-  # for testthat 1.0.2 later, change reporter from "summary" to 
default_reporter()
-  testthat:::run_tests("SparkR",
-   file.path(sparkRDir, "pkg", "tests", "fulltests"),
-   NULL,
-   "summary")
-}
+  test_package("SparkR")
+
+  if (identical(Sys.getenv("NOT_CRAN"), "true")) {
--- End diff --

We are trying this now - we can clean it up if this works



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22589: [SPARK-25572][SPARKR] test only if not cran

2018-09-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22589
  
We are tryin this now - we can clean it up if this works






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-09-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21710
  
@huaxingao do you want to bring this up to 2.5.0?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22589: [SPARK-25572][SPARKR] test only if not cran

2018-09-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22589
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22588: [SPARK-25262][DOC][FOLLOWUP] Fix link tags in html table

2018-09-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22588
  
@ifilonenko @liyinan926


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22589: [SPARK-25572][SPARKR] test only if not cran

2018-09-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22589
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22589: [SPARK-25572][SPARKR] test only if not cran

2018-09-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22589
  
really odd error, not sure if it's related to an earlier change? 
@HyukjinKwon what do you think?
```
. Error: Test correct concurrency of RRDD.compute() (@test_rdd.R#800) 
-
org.apache.spark.SparkException: Job aborted due to stage failure: Task 87 
in stage 201.0 failed 1 times, most recent failure: Lost task 87.0 in stage 
201.0 (TID 439, localhost, executor driver): java.lang.Assertionassertion failed
at scala.Predef$.assert(Predef.scala:156)
at org.apache.spark.api.r.SerDe$.readStringBytes(SerDe.scala:120)
at org.apache.spark.api.r.SerDe$.readString(SerDe.scala:127)
at org.apache.spark.api.r.RAuthHelper.readUtf8(RAuthHelper.scala:29)
at 
org.apache.spark.security.SocketAuthHelper.authClient(SocketAuthHelper.scala:57)
at org.apache.spark.api.r.RRunner.compute(RRunner.scala:79)
at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:54)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:325)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:289)
at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:325)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:289)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:419)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:425)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22589: [SPARK-25572][SPARKR] test only if not cran

2018-09-29 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22589
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22589: [SPARK-25572][SPARKR] test only if not cran

2018-09-28 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22589
  
diff w/o whitespace for actual change


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22589: [SPARK-25572][SPARKR] test only if not cran

2018-09-28 Thread felixcheung
GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/22589

[SPARK-25572][SPARKR] test only if not cran

## What changes were proposed in this pull request?

CRAN doesn't seem to respect the system requirements as running tests - we 
have seen cases where SparkR is run on Java 10, which unfortunately Spark does 
not start on. For 2.4, lets attempt skipping all tests

## How was this patch tested?

manual, jenkins, appveyor


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark ralltests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22589.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22589


commit 3b7414d9adf55ef74ce2d81403e570e5d6951a05
Author: Felix Cheung 
Date:   2018-09-29T05:10:23Z

test only if not cran




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r221415711
  
--- Diff: R/pkg/R/functions.R ---
@@ -2203,6 +2209,23 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{from_csv}: Parses a column containing a CSV string into a Column 
of \code{structType}
+#' with the specified \code{schema}.
+#' If the string is unparseable, the Column will contain the value NA.
+#'
+#' @rdname column_collection_functions
+#' @aliases from_csv from_csv,Column,character-method
+#' @note from_csv since 2.5.0
--- End diff --

consider adding example as in 
https://github.com/apache/spark/pull/22379/files#diff-d97f9adc2dcac0703568c799ff106987R2180?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r221416198
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -21,11 +21,7 @@ context("Show SparkDataFrame when eager execution is 
enabled.")
 
 test_that("eager execution is not enabled", {
   # Start Spark session without eager execution enabled
-  sparkSession <- if (windows_with_hadoop()) {
-sparkR.session(master = sparkRTestMaster)
-  } else {
-sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
-  }
+  sparkR.session(master = sparkRTestMaster)
--- End diff --

as mentioned here 
https://github.com/apache/spark/pull/22455#discussion_r220030686



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r221416164
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -21,11 +21,7 @@ context("Show SparkDataFrame when eager execution is 
enabled.")
 
 test_that("eager execution is not enabled", {
   # Start Spark session without eager execution enabled
-  sparkSession <- if (windows_with_hadoop()) {
-sparkR.session(master = sparkRTestMaster)
-  } else {
-sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
-  }
+  sparkR.session(master = sparkRTestMaster)
--- End diff --

you should definitely set `enableHiveSupport = FALSE` - historically this 
hasn't work well in other R tests when hive catalog is enabled


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r221415826
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -246,30 +248,38 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-allConf <- sparkR.conf()
-if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
-identical(allConf[["spark.sql.repl.eagerEval.enabled"]], 
"true")) {
-  argsList <- list()
-  argsList$x <- object
-  if 
(!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
-numRows <- 
as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
-if (numRows > 0) {
-  argsList$numRows <- numRows
+showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
--- End diff --

hmm, this naming convention? typically we don't mix `.` and `_` and I don't 
think we have anything with `SparkDataFrame`

seems like we have `spark.sparkr.something` before 
https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L343



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] zeppelin issue #3185: Copy-editing overview.md

2018-09-24 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3185
  
this was merged? is github not syncing?


---


[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220034035
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,25 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
+  argsList <- list()
+  argsList$x <- object
+  numRows <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]])
+  if (numRows > 0) {
+argsList$numRows <- numRows
+  }
+  truncate <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]])
--- End diff --

just asking since this variation was not in unit test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220033577
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,31 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+allConf <- sparkR.conf()
+if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
--- End diff --

you can also save off `foo <- 
allConf[["spark.sql.repl.eagerEval.enabled"]]` and reuse it, ditto for other 
cases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220033775
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,48 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. By default, eager execution 
is not enabled and can be enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+Maximum number of rows and maximum number of characters per column of data 
to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and 
`spark.sql.repl.eagerEval.truncate` configuration properties, respectively. 
These properties are only effective when eager execution is enabled. If these 
properties are not set explicitly, by default, data up to 20 rows and up to 20 
characters per column will be showed.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]",
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true",
+  spark.sql.repl.eagerEval.maxNumRows = 
as.integer(10)))
+
+# Create a grouped and sorted SparkDataFrame
+df <- createDataFrame(faithful)
+df2 <- arrange(summarize(groupBy(df, df$waiting), count = n(df$waiting)), 
"waiting")
+
+# Similar to R data.frame, displays the data returned, instead of 
SparkDataFrame class string
+df2
+
+##+---+-+
+##|waiting|count|
+##+---+-+
+##|   43.0|1|
+##|   45.0|3|
+##|   46.0|5|
+##|   47.0|4|
+##|   48.0|3|
+##|   49.0|5|
+##|   50.0|5|
+##|   51.0|6|
+##|   52.0|5|
+##|   53.0|7|
+##+---+-+
+##only showing top 10 rows
+
+{% endhighlight %} 
+
+
+Note that to enable eager execution through `sparkR` command, add 
`spark.sql.repl.eagerEval.enabled=true` configuration property to the `--conf` 
option.
--- End diff --

`sparkR command` - I think should be `sparkR shell`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220033684
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1491,9 +1491,10 @@ object SQLConf {
   val REPL_EAGER_EVAL_ENABLED = 
buildConf("spark.sql.repl.eagerEval.enabled")
 .doc("Enables eager evaluation or not. When true, the top K rows of 
Dataset will be " +
   "displayed if and only if the REPL supports the eager evaluation. 
Currently, the " +
-  "eager evaluation is only supported in PySpark. For the notebooks 
like Jupyter, " +
-  "the HTML table (generated by _repr_html_) will be returned. For 
plain Python REPL, " +
-  "the returned outputs are formatted like dataframe.show().")
+  "eager evaluation is supported in PySpark and SparkR. In PySpark, 
for the notebooks like " +
+  "Jupyter, the HTML table (generated by _repr_html_) will be 
returned. For plain Python " +
+  "REPL, the returned outputs are formatted like dataframe.show(). In 
SparkR, the returned " +
+  "outputs are showed as R data.frame.")
--- End diff --

` are showed as R data.frame` - it's not actually the exact format, so how 
about we say ` are showed similar to R data.frame would`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r220030686
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,60 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
--- End diff --

actually you can use `, enableHiveSupport = FALSE` for this file in all 
cases - we only need the opposite for SQL tests


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22537: [SPARK-21291][R] add R partitionBy API in DataFrame

2018-09-24 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22537
  
could you kick off appveyor again by close/re-open this PR?
also might need to split the JIRA - since it's referring to both partition 
and bucket


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



zeppelin git commit: Copy-editing overview.md

2018-09-24 Thread felixcheung
Repository: zeppelin
Updated Branches:
  refs/heads/master 86ee813e4 -> 6c623d323


Copy-editing overview.md

Extensive grammar revisions for zeppelin/docs/usage/interpreter/overview.md

### What is this PR for?
The documentation at zeppelin/docs/usage/interpreter/overview.md contained 
numerous instances of bad grammar. This PR improves it.

### What type of PR is it?
Documentation

Author: Gerard de Melo 
Author: User 

Closes #3185 from gdemelo/patch-1 and squashes the following commits:

b16ddf2 [User] Minor changes to wording
8b7773b [Gerard de Melo] Copy-editing overview.md


Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo
Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/6c623d32
Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/6c623d32
Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/6c623d32

Branch: refs/heads/master
Commit: 6c623d323fa04350d89b076cd2b3a760433370fc
Parents: 86ee813
Author: Gerard de Melo 
Authored: Sun Sep 23 20:37:51 2018 -0400
Committer: Felix Cheung 
Committed: Mon Sep 24 17:59:52 2018 -0700

--
 docs/usage/interpreter/overview.md | 80 -
 1 file changed, 39 insertions(+), 41 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/zeppelin/blob/6c623d32/docs/usage/interpreter/overview.md
--
diff --git a/docs/usage/interpreter/overview.md 
b/docs/usage/interpreter/overview.md
index 5b567c7..6580a59 100644
--- a/docs/usage/interpreter/overview.md
+++ b/docs/usage/interpreter/overview.md
@@ -1,7 +1,7 @@
 ---
 layout: page
 title: "Interpreter in Apache Zeppelin"
-description: "This document explains about the role of interpreters, 
interpreters group and interpreter settings in Apache Zeppelin. The concept of 
Zeppelin interpreter allows any language/data-processing-backend to be plugged 
into Zeppelin."
+description: "This document explains the role of interpreters, interpreter 
groups and interpreter settings in Apache Zeppelin. The concept of Zeppelin 
interpreters allows any language or data-processing backend to be plugged into 
Zeppelin."
 group: usage/interpreter 
 ---
 

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

2018-09-24 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22455
  
> Looks like the test was terminated in the middle, not likely related to 
the any code change. Could someone please ask for retest? Thanks.

appveyor one? you need to close and re-open this PR to trigger it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] zeppelin issue #3185: Copy-editing overview.md

2018-09-23 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3185
  
LGTM
merging if no more comment


---


[GitHub] zeppelin issue #3185: Copy-editing overview.md

2018-09-23 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3185
  
great, thx!


---


[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r219714851
  
--- Diff: R/pkg/R/functions.R ---
@@ -3404,19 +3404,27 @@ setMethod("collect_set",
 #' Equivalent to \code{split} SQL function.
 #'
 #' @rdname column_string_functions
+#' @param limit determines the length of the returned array.
+#'  \itemize{
+#'  \item \code{limit > 0}: length of the array will be at 
most \code{limit}
+#'  \item \code{limit <= 0}: the returned array can have any 
length
+#'  }
+#'
 #' @aliases split_string split_string,Column-method
 #' @examples
 #'
 #' \dontrun{
 #' head(select(df, split_string(df$Sex, "a")))
 #' head(select(df, split_string(df$Class, "\\d")))
+#' head(select(df, split_string(df$Class, "\\d", 2)))
 #' # This is equivalent to the following SQL expression
 #' head(selectExpr(df, "split(Class, 'd')"))}
--- End diff --

good point - also the example should run in the order documented.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219689581
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,61 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- createDataFrame(faithful)
+  expect_is(df, "SparkDataFrame")
+  expected <- "eruptions:double, waiting:double"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session with eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster,
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true",
+  spark.sql.repl.eagerEval.maxNumRows 
= as.integer(10)))
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true",
+  spark.sql.repl.eagerEval.maxNumRows 
= as.integer(10)))
--- End diff --

I'd set sparkConfig to a list before this and simply reuse it for both 
cases instead of having them duplicated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219689697
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,25 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
+  argsList <- list()
+  argsList$x <- object
+  numRows <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]])
+  if (numRows > 0) {
+argsList$numRows <- numRows
+  }
+  truncate <- 
as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]])
--- End diff --

have you tested with the case when only `spark.sql.repl.eagerEval.truncate` 
is set, and `spark.sql.repl.eagerEval.maxNumRows` is not?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219689694
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +246,25 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

instead of calling sparkR.conf three times, I'm wondering if it 
cleaner/safer to get a copy like in here:

https://github.com/apache/spark/blob/95b177c8f0862c6965a7c3cd76b3935c975adee9/R/pkg/tests/fulltests/test_sparkSQL.R#L3538



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-09-21 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22305
  
@gatorsmile @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...

2018-09-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22275#discussion_r219404072
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4434,6 +4434,12 @@ def test_timestamp_dst(self):
 self.assertPandasEqual(pdf, df_from_python.toPandas())
 self.assertPandasEqual(pdf, df_from_pandas.toPandas())
 
+def test_toPandas_batch_order(self):
+df = self.spark.range(64, numPartitions=8).toDF("a")
+with 
self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": 4}):
+pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
+self.assertPandasEqual(pdf, pdf_arrow)
--- End diff --

hm, is this test case "enough" to trigger any possible problem just by 
random? would increasing the number of batch or num record per batch increase 
the chance of streaming order or concurrency issue perhaps?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...

2018-09-21 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22275
  
got it. so the size of the each batch could grow. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219402870
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

respecting `spark.sql.repl.eagerEval.maxNumRows` somewhat makes sense. but 
instead of changing `showDF` which has other cases beyond eagerEval, we could 
change where `showDF` is called by `show`, and pass the max rows.

here 
https://github.com/apache/spark/pull/22455/files#diff-508641a8bd6c6b59f3e77c80cdcfa6a9R249


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219402281
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +245,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

^ that - it's two different "overloads" for two different "class"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-09-19 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/7
  
long thread, are we all good with this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21649: [SPARK-23648][R][SQL]Adds more types for hint in SparkR

2018-09-19 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21649
  
merged to master, thx


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-23648][R][SQL] Adds more types for hint in SparkR

2018-09-19 Thread felixcheung
Repository: spark
Updated Branches:
  refs/heads/master 76399d75e -> 95b177c8f


[SPARK-23648][R][SQL] Adds more types for hint in SparkR

## What changes were proposed in this pull request?

Addition of numeric and list hints for  SparkR.

## How was this patch tested?
Add test in test_sparkSQL.R

Author: Huaxin Gao 

Closes #21649 from huaxingao/spark-23648.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/95b177c8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/95b177c8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/95b177c8

Branch: refs/heads/master
Commit: 95b177c8f0862c6965a7c3cd76b3935c975adee9
Parents: 76399d7
Author: Huaxin Gao 
Authored: Wed Sep 19 21:27:30 2018 -0700
Committer: Felix Cheung 
Committed: Wed Sep 19 21:27:30 2018 -0700

--
 R/pkg/R/DataFrame.R   | 12 +++-
 R/pkg/tests/fulltests/test_sparkSQL.R |  9 +
 2 files changed, 20 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/95b177c8/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 458deca..a1cb478 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -3985,7 +3985,17 @@ setMethod("hint",
   signature(x = "SparkDataFrame", name = "character"),
   function(x, name, ...) {
 parameters <- list(...)
-stopifnot(all(sapply(parameters, is.character)))
+if (!all(sapply(parameters, function(y) {
+  if (is.character(y) || is.numeric(y)) {
+TRUE
+  } else if (is.list(y)) {
+all(sapply(y, function(z) { is.character(z) || is.numeric(z) 
}))
+  } else {
+FALSE
+  }
+}))) {
+  stop("sql hint should be character, numeric, or list with 
character or numeric.")
+}
 jdf <- callJMethod(x@sdf, "hint", name, parameters)
 dataFrame(jdf)
   })

http://git-wip-us.apache.org/repos/asf/spark/blob/95b177c8/R/pkg/tests/fulltests/test_sparkSQL.R
--
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index 0c4bdb3..40d8f80 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -2419,6 +2419,15 @@ test_that("join(), crossJoin() and merge() on a 
DataFrame", {
   expect_true(any(grepl("BroadcastHashJoin", execution_plan_broadcast)))
 })
 
+test_that("test hint", {
+  df <- sql("SELECT * FROM range(10e10)")
+  hintList <- list("hint2", "hint3", "hint4")
+  execution_plan_hint <- capture.output(
+explain(hint(df, "hint1", 1.23456, "aa", hintList), TRUE)
+  )
+  expect_true(any(grepl("1.23456, aa", execution_plan_hint)))
+})
+
 test_that("toJSON() on DataFrame", {
   df <- as.DataFrame(cars)
   df_json <- toJSON(df)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030775
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -244,11 +244,15 @@ setMethod("showDF",
 #' @note show(SparkDataFrame) since 1.4.0
 setMethod("show", "SparkDataFrame",
   function(object) {
-cols <- lapply(dtypes(object), function(l) {
-  paste(l, collapse = ":")
-})
-s <- paste(cols, collapse = ", ")
-cat(paste(class(object), "[", s, "]\n", sep = ""))
+if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
--- End diff --

also not sure if it's done for python, consider adding to the doc above 
(L229) how it behaves with eagerEval


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >