[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18320
  
Does it fail by running just gapply and nothing else?
From what you have found in your investigations and the code you pointed 
to, I suspect this isn't limited to gapply.

I think this PR so only works around the problem. I am concern that an user 
can also run into this issue.

An naive approach might be to change `park.sparkr.use.daemon` inside gapply 
when it is called, but I suspect that only shifts the problem around, and it 
might fail then with other methods that shuffles or calls UDFs. If a long 
running demon process is the problem, either we find and fix the leak (close 
the pipe, socket etc) or we put a count on the number of execution and re-cycle 
the demon process periodically before this leak becomes fatal.

thought?
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

2017-06-15 Thread keypointt
Github user keypointt commented on the issue:

https://github.com/apache/spark/pull/17451
  
no worries Holden, totally understood

thank you for the input and I'll try it out 👍 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18231
  
**[Test build #78157 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78157/testReport)**
 for PR 18231 at commit 
[`5b0ce67`](https://github.com/apache/spark/commit/5b0ce674fb3070c6749f9caf8cbbbeabb702ce01).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18231: [SPARK-20994] Remove redundant characters in Open...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18231#discussion_r122367121
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 ---
@@ -209,4 +190,51 @@ private ShuffleMetrics() {
 }
   }
 
+  private class ManagedBufferIterator implements Iterator {
+
+private int index = 0;
+private final String appId;
+private final String execId;
+private final int shuffleId;
+// An array containing mapId and reduceId pairs.
+private final int[] mapIdAndReduceIds;
+
+ManagedBufferIterator(String appId, String execId, String[] blockIds) {
+  this.appId = appId;
+  this.execId = execId;
+  String[] blockId0Parts = blockIds[0].split("_");
+  if (blockId0Parts.length < 4 || !blockId0Parts[0].equals("shuffle")) 
{
--- End diff --

Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18231: [SPARK-20994] Remove redundant characters in Open...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18231#discussion_r122366821
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 ---
@@ -209,4 +190,51 @@ private ShuffleMetrics() {
 }
   }
 
+  private class ManagedBufferIterator implements Iterator {
+
+private int index = 0;
+private final String appId;
+private final String execId;
+private final int shuffleId;
+// An array containing mapId and reduceId pairs.
+private final int[] mapIdAndReduceIds;
+
+ManagedBufferIterator(String appId, String execId, String[] blockIds) {
+  this.appId = appId;
+  this.execId = execId;
+  String[] blockId0Parts = blockIds[0].split("_");
+  if (blockId0Parts.length < 4 || !blockId0Parts[0].equals("shuffle")) 
{
--- End diff --

use `blockId0Parts.length != 4`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18284: [SPARK-21072][SQL] TreeNode.mapChildren should only appl...

2017-06-15 Thread ConeyLiu
Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/18284
  
thanks everyone for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17702: [SPARK-20408][SQL] Get the glob path in parallel to redu...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17702
  
**[Test build #78156 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78156/testReport)**
 for PR 17702 at commit 
[`a3a3509`](https://github.com/apache/spark/commit/a3a3509ca72a57d9df97e6ce50c16c1b40acfbb9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18239: [SPARK-19462] fix bug in Exchange--pass in a tmp "newPar...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18239
  
@cloud-fan
Thanks a lot for reply.
Yes, I'm also hesitate to backport branch-1.6;
But I think this bug is too obvious -- with 
`spark.sql.adaptive.enabled=true`, any rerunning of `ShuffleMapStage` will 
fail. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18321: [SPARK-12552][FOLLOWUP] Fix flaky test for "o.a.s...

2017-06-15 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18321#discussion_r122365295
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
@@ -214,7 +214,7 @@ class MasterSuite extends SparkFunSuite
   master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
   // Wait until Master recover from checkpoint data.
   eventually(timeout(5 seconds), interval(100 milliseconds)) {
-master.idToApp.size should be(1)
+master.workers.size should be(1)
--- End diff --

yes, that's right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18268: [SPARK-21054] [SQL] Reset Command support reset specific...

2017-06-15 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/18268
  
Hive supports reset multiple keys like: `reset config1 config2`, should we 
also support that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18231
  
**[Test build #78155 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78155/testReport)**
 for PR 18231 at commit 
[`2592ef4`](https://github.com/apache/spark/commit/2592ef40e16382e80072b4d51273120443aef3fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18231
  
@cloud-fan 
Thanks a lot for taking time review this. I refined accordingly :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17702: [SPARK-20408][SQL] Get the glob path in parallel ...

2017-06-15 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/17702#discussion_r122364493
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -389,6 +389,23 @@ case class DataSource(
   }
 
   /**
+   * Return all paths represented by the wildcard string.
+   */
+  private def getGlobbedPaths(qualified: Path): Seq[Path] = {
--- End diff --

You are right.
I'll fix this and also limit the max parallelism num in next patch, reuse 
the config in `InMemoryFileIndex.bulkListLeafFiles`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18319: [SPARK-21114] [TEST] [2.1] Fix test failure in Sp...

2017-06-15 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/18319


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18320
  
@felixcheung, BTW, is it okay as a PR alone as is?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18318: [SPARK-21112] [SQL] ALTER TABLE SET TBLPROPERTIES...

2017-06-15 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18318#discussion_r122363898
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -235,7 +235,7 @@ case class AlterTableSetPropertiesCommand(
 // direct property.
 val newTable = table.copy(
   properties = table.properties ++ properties,
-  comment = properties.get("comment"))
+  comment = properties.get("comment").orElse(table.comment))
--- End diff --

alter table src set tblproperties ('foo' = 'bar', 'comment' = 
'table_comment');
alter table src unset tblproperties ('foo');
we will lost comment in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18162#discussion_r122363701
  
--- Diff: 
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
@@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf) extends 
SparkListener with Logging {
 new StageUIData
   })
   val taskData = stageData.taskData.get(taskId)
-  val metrics = TaskMetrics.fromAccumulatorInfos(accumUpdates)
+  val accumsFiltered = if 
(conf.get(TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES)) {
+accumUpdates
+  } else {
+accumUpdates.filter(info => info.name.isDefined && 
info.update.isDefined && info.name !=
--- End diff --

to be more clear, I think we should just do an assert here to make sure 
there is not UPDATED_BLOCK_STATUSES accumulator updates, instead of doing a 
filter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18303
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18285: [SPARK-20338][CORE]Spaces in spark.eventLog.dir are not ...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18285
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18285: [SPARK-20338][CORE]Spaces in spark.eventLog.dir are not ...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18285
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78144/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18285: [SPARK-20338][CORE]Spaces in spark.eventLog.dir are not ...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18285
  
**[Test build #78144 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78144/testReport)**
 for PR 18285 at commit 
[`536a445`](https://github.com/apache/spark/commit/536a4456637cc3b1db0445c61f4192520d27a9ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18303
  
**[Test build #78154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78154/testReport)**
 for PR 18303 at commit 
[`244bbae`](https://github.com/apache/spark/commit/244bbae71c2aa0b9f173ad7ac16ad0440eaab99c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18303
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18320
  
I suspect this is an issue in R. I will raise this issue in R community 
soon and share it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18239: [SPARK-19462] fix bug in Exchange--pass in a tmp "newPar...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18239
  
I can hardly remember the code of Spark 1.6 and I'm not sure when is the 
next release of the 1.6 branch. BTW this bug can be worked around by turning 
off `spark.sql.adaptive.enabled`, do we really wanna spend time on it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-06-15 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17758
  
@wzhfy Applied. Could u check again? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

2017-06-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18025#discussion_r122362550
  
--- Diff: R/pkg/R/generics.R ---
@@ -919,10 +920,9 @@ setGeneric("array_contains", function(x, value) { 
standardGeneric("array_contain
 #' @export
 setGeneric("ascii", function(x) { standardGeneric("ascii") })
 
-#' @param x Column to compute on or a GroupedData object.
--- End diff --

yes, that's one of the code-gen methods that don't actually have 
documentation (which is a problem) but somehow inherit one from base:: that 
CRAN check doesn't complain about it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18231
  
LGTM except some minor comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18231: [SPARK-20994] Remove redundant characters in Open...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18231#discussion_r122362155
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 ---
@@ -209,4 +190,51 @@ private ShuffleMetrics() {
 }
   }
 
+  private class ManagedBufferIterator implements Iterator {
+
+private int index = 0;
+private final String appId;
+private final String execId;
+private final int shuffleId;
+// An array containing mapId and reduceId pairs.
+private final int[] mapIdAndReduceIds;
+
+ManagedBufferIterator(String appId, String execId, String[] blockIds) {
+  this.appId = appId;
+  this.execId = execId;
+  String[] blockId0Parts = blockIds[0].split("_");
+  if (blockId0Parts.length < 4) {
+throw new IllegalArgumentException("Unexpected block id format: " 
+ blockIds[0]);
+  }
+  if (!blockId0Parts[0].equals("shuffle")) {
+throw new IllegalArgumentException("Expected shuffle block id, 
got: " + blockIds[0]);
+  }
+  this.shuffleId = Integer.parseInt(blockId0Parts[1]);
+  mapIdAndReduceIds = new int[2 * blockIds.length];
+  for (int i = 0; i < blockIds.length; i++) {
+String[] blockIdParts = blockIds[i].split("_");
+if (Integer.parseInt(blockIdParts[1]) != shuffleId) {
+  throw new IllegalArgumentException("Expected shuffleId=" + 
shuffleId +
+", got:" + blockIds[i]);
+}
+mapIdAndReduceIds[2 * i] = Integer.parseInt(blockIdParts[2]);
+mapIdAndReduceIds[2 * i + 1] = Integer.parseInt(blockIdParts[3]);
+  }
+}
+
+@Override
+public boolean hasNext() {
+  return index < mapIdAndReduceIds.length / 2;
--- End diff --

nit: we can keep a `pos`, and increase it by 2 in `next`, so here we can 
just write `pos < mapIdAndReduceIds.length` to save a division.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18025
  
**[Test build #78153 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78153/testReport)**
 for PR 18025 at commit 
[`19d063c`](https://github.com/apache/spark/commit/19d063c6995fa6bd780830a941f6b1f7c45c1bac).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18025
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18231: [SPARK-20994] Remove redundant characters in Open...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18231#discussion_r122361985
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 ---
@@ -209,4 +190,51 @@ private ShuffleMetrics() {
 }
   }
 
+  private class ManagedBufferIterator implements Iterator {
+
+private int index = 0;
+private final String appId;
+private final String execId;
+private final int shuffleId;
+// An array containing mapId and reduceId pairs.
+private final int[] mapIdAndReduceIds;
+
+ManagedBufferIterator(String appId, String execId, String[] blockIds) {
+  this.appId = appId;
+  this.execId = execId;
+  String[] blockId0Parts = blockIds[0].split("_");
+  if (blockId0Parts.length < 4) {
+throw new IllegalArgumentException("Unexpected block id format: " 
+ blockIds[0]);
+  }
+  if (!blockId0Parts[0].equals("shuffle")) {
+throw new IllegalArgumentException("Expected shuffle block id, 
got: " + blockIds[0]);
+  }
+  this.shuffleId = Integer.parseInt(blockId0Parts[1]);
+  mapIdAndReduceIds = new int[2 * blockIds.length];
+  for (int i = 0; i < blockIds.length; i++) {
+String[] blockIdParts = blockIds[i].split("_");
--- End diff --

shall we check `blockIdParts[0] == "shufle"`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18025
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78153/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18231: [SPARK-20994] Remove redundant characters in Open...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18231#discussion_r122361955
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 ---
@@ -209,4 +190,51 @@ private ShuffleMetrics() {
 }
   }
 
+  private class ManagedBufferIterator implements Iterator {
+
+private int index = 0;
+private final String appId;
+private final String execId;
+private final int shuffleId;
+// An array containing mapId and reduceId pairs.
+private final int[] mapIdAndReduceIds;
+
+ManagedBufferIterator(String appId, String execId, String[] blockIds) {
+  this.appId = appId;
+  this.execId = execId;
+  String[] blockId0Parts = blockIds[0].split("_");
+  if (blockId0Parts.length < 4) {
--- End diff --

shall we be more strict and use `blockId0Parts.length != 4`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18025
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18320
  
Yes, I guess it will pass if we reduce `spark.sql.shuffle.partitions` (< I 
didn't look carefully and test this either). Just to make sure (and to share 
what I investigated ...), from my code read,

With `spark.sparkr.use.daemon` enabled, for each task execution,
1. JVM  start (if not started)> R daemon
2. JVM  send port --> R daemon fork with the 
port---> R worker

This looks being tested on OSs except for Windows.

With `spark.sparkr.use.daemon` disabled, for each task execution,
1. JVM  forking processes from Java 
(expensive)-> R worker

This looks being tested only on Windows.

This PR proposes to switch this one to latter case (which was the former 
before) by avoiding calling the (already running from other execution) R daemon.

I am fine with giving a shot with reducing the number of partitions if you 
are fond of it.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18025
  
**[Test build #78152 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78152/testReport)**
 for PR 18025 at commit 
[`0a7f5fc`](https://github.com/apache/spark/commit/0a7f5fcac2e0295d92b82d8909c4f1b11c82f016).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18025
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78152/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18319: [SPARK-21114] [TEST] [2.1] Fix test failure in Spark 2.1...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18319
  
thanks, merging to 2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18268: [SPARK-21054] [SQL] Reset Command support reset s...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18268#discussion_r122361086
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
 ---
@@ -301,4 +301,10 @@ class SparkSqlParserSuite extends PlanTest {
   "SELECT a || b || c FROM t",
   Project(UnresolvedAlias(concat) :: Nil, 
UnresolvedRelation(TableIdentifier("t"
   }
+
+  test("reset") {
+assertEqual("reset", ResetCommand(None))
+assertEqual("reset spark.test.property", 
ResetCommand(Some("spark.test.property")))
+assertEqual("reset #$a!", ResetCommand(Some("#$a!")))
--- End diff --

can we check hive's behavior? I think special chars are not allowed in 
config name and parser should throw exception for this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18319: [SPARK-21114] [TEST] [2.1] Fix test failure in Spark 2.1...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18319
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78143/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18319: [SPARK-21114] [TEST] [2.1] Fix test failure in Spark 2.1...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18319
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18319: [SPARK-21114] [TEST] [2.1] Fix test failure in Spark 2.1...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18319
  
**[Test build #78143 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78143/testReport)**
 for PR 18319 at commit 
[`367e8e5`](https://github.com/apache/spark/commit/367e8e526e1f9b631765626b43767dcc16a037e6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18284: [SPARK-21072][SQL] TreeNode.mapChildren should on...

2017-06-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18284


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18284: [SPARK-21072][SQL] TreeNode.mapChildren should only appl...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18284
  
thanks, merging to master/2.2/2.1!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18321: [SPARK-12552][FOLLOWUP] Fix flaky test for "o.a.s.deploy...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18321
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18320
  
that's very interesting. that code has been around for 2 years - to be 
honest I'm not 100% sure about what it is doing.
perhaps this could also be fixed with a lower number of partitions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18318: [SPARK-21112] [SQL] ALTER TABLE SET TBLPROPERTIES should...

2017-06-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18318
  
Only the master branch has such an issue. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18318: [SPARK-21112] [SQL] ALTER TABLE SET TBLPROPERTIES...

2017-06-15 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18318#discussion_r122359957
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -235,7 +235,7 @@ case class AlterTableSetPropertiesCommand(
 // direct property.
 val newTable = table.copy(
   properties = table.properties ++ properties,
-  comment = properties.get("comment"))
+  comment = properties.get("comment").orElse(table.comment))
--- End diff --

Could you show an example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-06-15 Thread leifwalsh
Github user leifwalsh commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r122359928
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1648,8 +1650,30 @@ def toPandas(self):
 02  Alice
 15Bob
 """
-import pandas as pd
-return pd.DataFrame.from_records(self.collect(), 
columns=self.columns)
+if self.sql_ctx.getConf("spark.sql.execution.arrow.enable", 
"false").lower() == "true":
+try:
+import pyarrow
+tables = self._collectAsArrow()
+table = pyarrow.concat_tables(tables)
--- End diff --

If tables is an empty list (e.g. if you load a dataset, filter the whole 
thing, and produce zero rows), `pyarrow.concat_tables` raises an exception 
rather than producing an empty table.  This should probably be fixed in arrow 
(cc @wesm) but we should be defensive here.  Probably should try to produce a 
`DataFrame` with the right schema but no rows if possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18025
  
**[Test build #78149 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78149/testReport)**
 for PR 18025 at commit 
[`014b9f3`](https://github.com/apache/spark/commit/014b9f3069a6e2075cb8be307c5d74081dabe15a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18025
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78149/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18025
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15821
  
yea I think it's fine to keep `ArrowPayload`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18283: [TEST][SPARKR][CORE] Fix broken SparkSubmitSuite

2017-06-15 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18283
  
@shaneknapp right - this script (install-dev.sh) has been assuming it can 
find `jar` without checking for JAVA_HOME, so I was saying it could be improved 
that way; but yea this script hasn't been changed for years also..

let me know if it recurs - I could just fix it by checking JAVA_HOME.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r122359743
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala
 ---
@@ -0,0 +1,423 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.arrow
+
+import java.io.ByteArrayOutputStream
+import java.nio.channels.Channels
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BufferAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file._
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.FloatingPointPrecision
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, FieldType, 
Schema}
+import org.apache.arrow.vector.util.ByteArrayReadableSeekableByteChannel
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * Store Arrow data in a form that can be serialized by Spark.
+ */
+private[sql] class ArrowPayload(payload: Array[Byte]) extends Serializable 
{
+
+  /**
+   * Create an ArrowPayload from an ArrowRecordBatch and Spark schema.
+   */
+  def this(batch: ArrowRecordBatch, schema: StructType, allocator: 
BufferAllocator) = {
+this(ArrowConverters.batchToByteArray(batch, schema, allocator))
--- End diff --

sounds good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r122359727
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala
 ---
@@ -0,0 +1,1218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.arrow
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+import java.sql.{Date, Timestamp}
+import java.text.SimpleDateFormat
+import java.util.Locale
+
+import com.google.common.io.Files
+import org.apache.arrow.memory.RootAllocator
+import org.apache.arrow.vector.{VectorLoader, VectorSchemaRoot}
+import org.apache.arrow.vector.file.json.JsonFileReader
+import org.apache.arrow.vector.util.Validator
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types.{BinaryType, StructField, StructType}
+import org.apache.spark.util.Utils
+
+
+class ArrowConvertersSuite extends SharedSQLContext with BeforeAndAfterAll 
{
+  import testImplicits._
+
+  private var tempDataPath: String = _
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+tempDataPath = Utils.createTempDir(namePrefix = 
"arrow").getAbsolutePath
+  }
+
+  test("collect to arrow record batch") {
+val indexData = (1 to 6).toDF("i")
+val arrowPayloads = indexData.toArrowPayload.collect()
+assert(arrowPayloads.nonEmpty)
+assert(arrowPayloads.length == indexData.rdd.getNumPartitions)
+val allocator = new RootAllocator(Long.MaxValue)
+val arrowRecordBatches = arrowPayloads.map(_.loadBatch(allocator))
+val rowCount = arrowRecordBatches.map(_.getLength).sum
+assert(rowCount === indexData.count())
+arrowRecordBatches.foreach(batch => assert(batch.getNodes.size() > 0))
+arrowRecordBatches.foreach(_.close())
+allocator.close()
+  }
+
+  test("short conversion") {
+val json =
+  s"""
+ |{
+ |  "schema" : {
+ |"fields" : [ {
+ |  "name" : "a_s",
+ |  "type" : {
+ |"name" : "int",
+ |"isSigned" : true,
+ |"bitWidth" : 16
+ |  },
+ |  "nullable" : false,
+ |  "children" : [ ],
+ |  "typeLayout" : {
+ |"vectors" : [ {
+ |  "type" : "VALIDITY",
+ |  "typeBitWidth" : 1
+ |}, {
+ |  "type" : "DATA",
+ |  "typeBitWidth" : 16
+ |} ]
+ |  }
+ |}, {
+ |  "name" : "b_s",
+ |  "type" : {
+ |"name" : "int",
+ |"isSigned" : true,
+ |"bitWidth" : 16
+ |  },
+ |  "nullable" : true,
+ |  "children" : [ ],
+ |  "typeLayout" : {
+ |"vectors" : [ {
+ |  "type" : "VALIDITY",
+ |  "typeBitWidth" : 1
+ |}, {
+ |  "type" : "DATA",
+ |  "typeBitWidth" : 16
+ |} ]
+ |  }
+ |} ]
+ |  },
+ |  "batches" : [ {
+ |"count" : 6,
+ |"columns" : [ {
+ |  "name" : "a_s",
+ |  "count" : 6,
+ |  "VALIDITY" : [ 1, 1, 1, 1, 1, 1 ],
+ |  "DATA" : [ 1, -1, 2, -2, 32767, -32768 ]
+ |}, {
+ |  "name" : "b_s",
+ |  "count" : 6,
+ |  "VALIDITY" : [ 1, 0, 0, 1, 0, 1 ],
+ |  "DATA" : [ 1, 0, 0, -2, 0, -32768 ]
+ |} ]
+   

[GitHub] spark issue #18249: [SPARK-19937] Collect metrics for remote bytes read to d...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78140/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18249: [SPARK-19937] Collect metrics for remote bytes read to d...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18249
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15821#discussion_r122359492
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala
 ---
@@ -0,0 +1,423 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.arrow
+
+import java.io.ByteArrayOutputStream
+import java.nio.channels.Channels
+
+import scala.collection.JavaConverters._
+
+import io.netty.buffer.ArrowBuf
+import org.apache.arrow.memory.{BufferAllocator, RootAllocator}
+import org.apache.arrow.vector._
+import org.apache.arrow.vector.BaseValueVector.BaseMutator
+import org.apache.arrow.vector.file._
+import org.apache.arrow.vector.schema.{ArrowFieldNode, ArrowRecordBatch}
+import org.apache.arrow.vector.types.FloatingPointPrecision
+import org.apache.arrow.vector.types.pojo.{ArrowType, Field, FieldType, 
Schema}
+import org.apache.arrow.vector.util.ByteArrayReadableSeekableByteChannel
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+
+/**
+ * Store Arrow data in a form that can be serialized by Spark.
+ */
+private[sql] class ArrowPayload(payload: Array[Byte]) extends Serializable 
{
+
+  /**
+   * Create an ArrowPayload from an ArrowRecordBatch and Spark schema.
+   */
+  def this(batch: ArrowRecordBatch, schema: StructType, allocator: 
BufferAllocator) = {
+this(ArrowConverters.batchToByteArray(batch, schema, allocator))
+  }
+
+  /**
+   * Convert the ArrowPayload to an ArrowRecordBatch.
+   */
+  def loadBatch(allocator: BufferAllocator): ArrowRecordBatch = {
+ArrowConverters.byteArrayToBatch(payload, allocator)
+  }
+
+  /**
+   * Get the ArrowPayload as an Array[Byte].
+   */
+  def toByteArray: Array[Byte] = payload
+}
+
+private[sql] object ArrowConverters {
+
+  /**
+   * Map a Spark DataType to ArrowType.
+   */
+  private[arrow] def sparkTypeToArrowType(dataType: DataType): ArrowType = 
{
+dataType match {
+  case BooleanType => ArrowType.Bool.INSTANCE
+  case ShortType => new ArrowType.Int(8 * ShortType.defaultSize, true)
+  case IntegerType => new ArrowType.Int(8 * IntegerType.defaultSize, 
true)
+  case LongType => new ArrowType.Int(8 * LongType.defaultSize, true)
+  case FloatType => new 
ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
+  case DoubleType => new 
ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
+  case ByteType => new ArrowType.Int(8, true)
+  case StringType => ArrowType.Utf8.INSTANCE
+  case BinaryType => ArrowType.Binary.INSTANCE
+  case _ => throw new UnsupportedOperationException(s"Unsupported data 
type: $dataType")
+}
+  }
+
+  /**
+   * Convert a Spark Dataset schema to Arrow schema.
+   */
+  private[arrow] def schemaToArrowSchema(schema: StructType): Schema = {
+val arrowFields = schema.fields.map { f =>
+  new Field(f.name, f.nullable, sparkTypeToArrowType(f.dataType), 
List.empty[Field].asJava)
+}
+new Schema(arrowFields.toList.asJava)
+  }
+
+  /**
+   * Maps Iterator from InternalRow to ArrowPayload. Limit 
ArrowRecordBatch size in ArrowPayload
+   * by setting maxRecordsPerBatch or use 0 to fully consume rowIter.
+   */
+  private[sql] def toPayloadIterator(
+  rowIter: Iterator[InternalRow],
+  schema: StructType,
+  maxRecordsPerBatch: Int): Iterator[ArrowPayload] = {
+new Iterator[ArrowPayload] {
+  private val _allocator = new RootAllocator(Long.MaxValue)
+  private var _nextPayload = if (rowIter.nonEmpty) convert() else null
+
+  override def hasNext: Boolean = _nextPayload != null
+
+  override def next(): ArrowPayload = {
+val obj = _nextPayload
+if (hasNext) {
+ 

[GitHub] spark issue #18249: [SPARK-19937] Collect metrics for remote bytes read to d...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18249
  
**[Test build #78140 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78140/testReport)**
 for PR 18249 at commit 
[`9768860`](https://github.com/apache/spark/commit/9768860046f69530926215ef1ec5162213a20616).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18025
  
**[Test build #78153 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78153/testReport)**
 for PR 18025 at commit 
[`19d063c`](https://github.com/apache/spark/commit/19d063c6995fa6bd780830a941f6b1f7c45c1bac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18320
  
For normal usecases, I carefully suspect it might be fine because I 
executed 200 * ~10 tasks in a single machine quickly but I don't know if it 
happens frequently when it runs slowly in a cluster in a distributed manner.

At least, this was not reproduced when the number of fork executions is not 
many. Practically, it might be fine but need more investigation if this is 
important to prioritize this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18075#discussion_r122359345
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -233,10 +222,124 @@ class CodegenContext {
   // The collection of sub-expression result resetting methods that need 
to be called on each row.
   val subexprFunctions = mutable.ArrayBuffer.empty[String]
 
-  def declareAddedFunctions(): String = {
-addedFunctions.map { case (funcName, funcCode) => funcCode 
}.mkString("\n")
+  /**
+   * Holds the class and instance names to be generated. `OuterClass` is a 
placeholder standing for
+   * whichever class is generated as the outermost class and which will 
contain any nested
+   * sub-classes. All other classes and instance names in this list will 
represent private, nested
+   * sub-classes.
+   */
+  private val classes: mutable.ListBuffer[(String, String)] =
+mutable.ListBuffer[(String, String)]("OuterClass" -> null)
+
+  // A map holding the current size in bytes of each class to be generated.
+  private val classSize: mutable.Map[String, Int] =
+mutable.Map[String, Int]("OuterClass" -> 0)
+
+  // Nested maps holding function names and their code belonging to each 
class.
+  private val classFunctions: mutable.Map[String, mutable.Map[String, 
String]] =
+mutable.Map("OuterClass" -> mutable.Map.empty[String, String])
+
+  // Returns the size of the most recently added class.
+  private def currClassSize(): Int = classSize(classes.head._1)
+
+  // Returns the class name and instance name for the most recently added 
class.
+  private def currClass(): (String, String) = classes.head
+
+  // Adds a new class. Requires the class' name, and its instance name.
+  private def addClass(className: String, classInstance: String): Unit = {
+classes.prepend(className -> classInstance)
+classSize += className -> 0
+classFunctions += className -> mutable.Map.empty[String, String]
   }
 
+  /**
+   * Adds a function to the generated class. If the code for the 
`OuterClass` grows too large, the
+   * function will be inlined into a new private, nested class, and a 
class-qualified name for the
+   * function will be returned. Otherwise, the function will be inined to 
the `OuterClass` the
+   * simple `funcName` will be returned.
+   *
+   * @param funcName the class-unqualified name of the function
+   * @param funcCode the body of the function
+   * @param inlineToOuterClass whether the given code must be inlined to 
the `OuterClass`. This
--- End diff --

yup, whole stage codegen is really tricky...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18321: [SPARK-12552][FOLLOWUP] Fix flaky test for "o.a.s...

2017-06-15 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18321#discussion_r122359325
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
@@ -214,7 +214,7 @@ class MasterSuite extends SparkFunSuite
   master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
   // Wait until Master recover from checkpoint data.
   eventually(timeout(5 seconds), interval(100 milliseconds)) {
-master.idToApp.size should be(1)
+master.workers.size should be(1)
--- End diff --

I think the reason is workers are recovered later than applications.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18321: [SPARK-12552][FOLLOWUP] Fix flaky test for "o.a.s...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18321#discussion_r122359276
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala ---
@@ -214,7 +214,7 @@ class MasterSuite extends SparkFunSuite
   master.rpcEnv.setupEndpoint(Master.ENDPOINT_NAME, master)
   // Wait until Master recover from checkpoint data.
   eventually(timeout(5 seconds), interval(100 milliseconds)) {
-master.idToApp.size should be(1)
+master.workers.size should be(1)
--- End diff --

can you explain more about why it may fail?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18301#discussion_r122359244
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -74,6 +80,19 @@ object SQLMetrics {
   private val TIMING_METRIC = "timing"
   private val AVERAGE_METRIC = "average"
 
+  private val baseForAvgMetric: Int = 10
--- End diff --

Yeah, that's why I record the ceil of average number at the beginning. cc 
@rxin What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18025
  
@felixcheung Your comments are all addressed now. Please let me know if 
there is anything else needed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17758
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78142/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17758
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18301#discussion_r122359108
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -74,6 +80,19 @@ object SQLMetrics {
   private val TIMING_METRIC = "timing"
   private val AVERAGE_METRIC = "average"
 
+  private val baseForAvgMetric: Int = 10
--- End diff --

I'm not quite sure this hack worth. For small number of probes, we don't 
care the values I think. For large number of probes, having one more digit in 
the fraction part is not very useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18320
  
Yes, there is still the issue and this only fixes (avoid) the test failure. 
I believe running the codes should reproduce the issue for both Mac and CentOS. 

What I don't get it, when the number of fork executions is not many, this 
is not reproduced (even sometimes increasing pipes were not observed 
sometimes). The number of the pipes decrease in a certain condition (it did not 
look related with time but some events). 

The issue with `gapply` is exposed and found now as it invokes many forks 
via `daemon.R` but I guess this issue might still exist for all other APIs 
executing R native function with this daemon. I gave a shot to resolve the root 
cause within `daemon.R` with several tries but I could not make it.

Root cause is:

With a terminal executing `watch -n 0.01 "lsof -c R | wc -l"`

With another terminal:

```r
for(i in 0:200) {
  p <- parallel:::mcfork()
  if (inherits(p, "masterProcess")) {
tools::pskill(Sys.getpid(), tools::SIGUSR1)
parallel:::mcexit(0L)
  }
}
```

The number of opened pipes just keep increasing. I double checked the 
processes and sockets are closed via `netstats` and `ps`.

We need to resolve this one.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17758
  
**[Test build #78142 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78142/testReport)**
 for PR 17758 at commit 
[`44f3d35`](https://github.com/apache/spark/commit/44f3d35fc947b845b60a527723a0c5aabf991145).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18025
  
**[Test build #78152 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78152/testReport)**
 for PR 18025 at commit 
[`0a7f5fc`](https://github.com/apache/spark/commit/0a7f5fcac2e0295d92b82d8909c4f1b11c82f016).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18092
  
**[Test build #78151 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78151/testReport)**
 for PR 18092 at commit 
[`d01134e`](https://github.com/apache/spark/commit/d01134ef92401a5275c7388c8e6d65c82785acfa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

2017-06-15 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18025#discussion_r122358689
  
--- Diff: R/pkg/R/generics.R ---
@@ -919,10 +920,9 @@ setGeneric("array_contains", function(x, value) { 
standardGeneric("array_contain
 #' @export
 setGeneric("ascii", function(x) { standardGeneric("ascii") })
 
-#' @param x Column to compute on or a GroupedData object.
--- End diff --

In this case, we will have to document `avg` on its own, like `count`, 
`first` and `last`. I cannot document the `x` param here since it will show up 
in the  doc for the column class. Interestingly, there is not even a doc of the 
`avg` method from the `GroupedData` class 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18308: [SPARK-21099][Spark Core] INFO Log Message Using Incorre...

2017-06-15 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18308
  
> I wonder if whether executor is completely gone or whether executor is 
still there but has no cached RDD, if both scenarios return false. 

Yes, that's the case, we cannot differentiate this two scenarios. But I 
think it is fine, since it is just a log issue and hard for us to differentiate 
them in the current code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18320
  
thx - I think more importantly, does the issue manifest when someone 
manually call gapply in a similar way on RHEL/CentOS? We could workaround the 
test failure, but if user can use into this in normal use then we need to 
address this within gapply


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18320
  
**[Test build #78148 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78148/testReport)**
 for PR 18320 at commit 
[`52c8abf`](https://github.com/apache/spark/commit/52c8abf9551e126f75ef0aa0a042f1ebd13e8d47).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78148/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18320
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18321: [SPARK-12552][FOLLOWUP] Fix flaky test for "o.a.s.deploy...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18321
  
**[Test build #78150 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78150/testReport)**
 for PR 18321 at commit 
[`55c5d12`](https://github.com/apache/spark/commit/55c5d12023dec1cbf2e8aa6b4507c49c3df5b322).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18320
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78147/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18320
  
**[Test build #78147 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78147/testReport)**
 for PR 18320 at commit 
[`505d75f`](https://github.com/apache/spark/commit/505d75f0e9a90481f96d0f1fefd4f9baaa38ee7d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18321: [SPARK-12552][FOLLOWUP] Fix flaky test for "o.a.s...

2017-06-15 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/18321

[SPARK-12552][FOLLOWUP] Fix flaky test for 
"o.a.s.deploy.master.MasterSuite.master correctly recover the application"

## What changes were proposed in this pull request?

Due to the RPC asynchronous event processing, The test "correctly recover 
the application" could potentially be failed. The issue could be found in here: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78126/testReport/org.apache.spark.deploy.master/MasterSuite/master_correctly_recover_the_application/.

So here fixing this flaky test.

## How was this patch tested?

Existing UT.

CC @cloud-fan @jiangxb1987 , please help to review, thanks!


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-12552-followup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18321.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18321


commit 55c5d12023dec1cbf2e8aa6b4507c49c3df5b322
Author: jerryshao 
Date:   2017-06-16T03:10:48Z

Fix flaky test

Change-Id: I20f1a68b682cbfda05be319b365495c80fb4cda4




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18025
  
**[Test build #78149 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78149/testReport)**
 for PR 18025 at commit 
[`014b9f3`](https://github.com/apache/spark/commit/014b9f3069a6e2075cb8be307c5d74081dabe15a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

2017-06-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18075#discussion_r122356960
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -233,10 +222,118 @@ class CodegenContext {
   // The collection of sub-expression result resetting methods that need 
to be called on each row.
   val subexprFunctions = mutable.ArrayBuffer.empty[String]
 
-  def declareAddedFunctions(): String = {
-addedFunctions.map { case (funcName, funcCode) => funcCode 
}.mkString("\n")
+  val outerClassName = "OuterClass"
+
+  /**
+   * Holds the class and instance names to be generated, where 
`OuterClass` is a placeholder
+   * standing for whichever class is generated as the outermost class and 
which will contain any
+   * nested sub-classes. All other classes and instance names in this list 
will represent private,
+   * nested sub-classes.
+   */
+  private val classes: mutable.ListBuffer[(String, String)] =
+mutable.ListBuffer[(String, String)](outerClassName -> null)
+
+  // A map holding the current size in bytes of each class to be generated.
+  private val classSize: mutable.Map[String, Int] =
+mutable.Map[String, Int](outerClassName -> 0)
+
+  // Nested maps holding function names and their code belonging to each 
class.
+  private val classFunctions: mutable.Map[String, mutable.Map[String, 
String]] =
+mutable.Map(outerClassName -> mutable.Map.empty[String, String])
+
+  // Returns the size of the most recently added class.
+  private def currClassSize(): Int = classSize(classes.head._1)
+
+  // Returns the class name and instance name for the most recently added 
class.
+  private def currClass(): (String, String) = classes.head
+
+  // Adds a new class. Requires the class' name, and its instance name.
+  private def addClass(className: String, classInstance: String): Unit = {
+classes.prepend(className -> classInstance)
+classSize += className -> 0
+classFunctions += className -> mutable.Map.empty[String, String]
+  }
+
+  /**
+   * Adds a function to the generated class. If the code for the 
`OuterClass` grows too large, the
+   * function will be inlined into a new private, nested class, and a 
class-qualified name for the
--- End diff --

nit: class instance-qualified name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

2017-06-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18075#discussion_r122356982
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -233,10 +222,118 @@ class CodegenContext {
   // The collection of sub-expression result resetting methods that need 
to be called on each row.
   val subexprFunctions = mutable.ArrayBuffer.empty[String]
 
-  def declareAddedFunctions(): String = {
-addedFunctions.map { case (funcName, funcCode) => funcCode 
}.mkString("\n")
+  val outerClassName = "OuterClass"
+
+  /**
+   * Holds the class and instance names to be generated, where 
`OuterClass` is a placeholder
+   * standing for whichever class is generated as the outermost class and 
which will contain any
+   * nested sub-classes. All other classes and instance names in this list 
will represent private,
+   * nested sub-classes.
+   */
+  private val classes: mutable.ListBuffer[(String, String)] =
+mutable.ListBuffer[(String, String)](outerClassName -> null)
+
+  // A map holding the current size in bytes of each class to be generated.
+  private val classSize: mutable.Map[String, Int] =
+mutable.Map[String, Int](outerClassName -> 0)
+
+  // Nested maps holding function names and their code belonging to each 
class.
+  private val classFunctions: mutable.Map[String, mutable.Map[String, 
String]] =
+mutable.Map(outerClassName -> mutable.Map.empty[String, String])
+
+  // Returns the size of the most recently added class.
+  private def currClassSize(): Int = classSize(classes.head._1)
+
+  // Returns the class name and instance name for the most recently added 
class.
+  private def currClass(): (String, String) = classes.head
+
+  // Adds a new class. Requires the class' name, and its instance name.
+  private def addClass(className: String, classInstance: String): Unit = {
+classes.prepend(className -> classInstance)
+classSize += className -> 0
+classFunctions += className -> mutable.Map.empty[String, String]
+  }
+
+  /**
+   * Adds a function to the generated class. If the code for the 
`OuterClass` grows too large, the
+   * function will be inlined into a new private, nested class, and a 
class-qualified name for the
+   * function will be returned. Otherwise, the function will be inined to 
the `OuterClass` the
+   * simple `funcName` will be returned.
+   *
+   * @param funcName the class-unqualified name of the function
+   * @param funcCode the body of the function
+   * @param inlineToOuterClass whether the given code must be inlined to 
the `OuterClass`. This
+   *   can be necessary when a function is 
declared outside of the context
+   *   it is eventually referenced and a returned 
qualified function name
+   *   cannot otherwise be accessed.
+   * @return the name of the function, qualified by class if it will be 
inlined to a private,
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

2017-06-15 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18025#discussion_r122356625
  
--- Diff: R/pkg/R/generics.R ---
@@ -1403,20 +1416,25 @@ setGeneric("unix_timestamp", function(x, format) { 
standardGeneric("unix_timesta
 #' @export
 setGeneric("upper", function(x) { standardGeneric("upper") })
 
-#' @rdname var
+#' @rdname column_aggregate_functions
+#' @param y,na.rm,use currently not used.
--- End diff --

Good point. Moved to `column_aggregate_functions`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

2017-06-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18075#discussion_r122356214
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -233,10 +222,124 @@ class CodegenContext {
   // The collection of sub-expression result resetting methods that need 
to be called on each row.
   val subexprFunctions = mutable.ArrayBuffer.empty[String]
 
-  def declareAddedFunctions(): String = {
-addedFunctions.map { case (funcName, funcCode) => funcCode 
}.mkString("\n")
+  /**
+   * Holds the class and instance names to be generated. `OuterClass` is a 
placeholder standing for
+   * whichever class is generated as the outermost class and which will 
contain any nested
+   * sub-classes. All other classes and instance names in this list will 
represent private, nested
+   * sub-classes.
+   */
+  private val classes: mutable.ListBuffer[(String, String)] =
+mutable.ListBuffer[(String, String)]("OuterClass" -> null)
+
+  // A map holding the current size in bytes of each class to be generated.
+  private val classSize: mutable.Map[String, Int] =
+mutable.Map[String, Int]("OuterClass" -> 0)
+
+  // Nested maps holding function names and their code belonging to each 
class.
+  private val classFunctions: mutable.Map[String, mutable.Map[String, 
String]] =
+mutable.Map("OuterClass" -> mutable.Map.empty[String, String])
+
+  // Returns the size of the most recently added class.
+  private def currClassSize(): Int = classSize(classes.head._1)
+
+  // Returns the class name and instance name for the most recently added 
class.
+  private def currClass(): (String, String) = classes.head
+
+  // Adds a new class. Requires the class' name, and its instance name.
+  private def addClass(className: String, classInstance: String): Unit = {
+classes.prepend(className -> classInstance)
+classSize += className -> 0
+classFunctions += className -> mutable.Map.empty[String, String]
   }
 
+  /**
+   * Adds a function to the generated class. If the code for the 
`OuterClass` grows too large, the
+   * function will be inlined into a new private, nested class, and a 
class-qualified name for the
+   * function will be returned. Otherwise, the function will be inined to 
the `OuterClass` the
+   * simple `funcName` will be returned.
+   *
+   * @param funcName the class-unqualified name of the function
+   * @param funcCode the body of the function
+   * @param inlineToOuterClass whether the given code must be inlined to 
the `OuterClass`. This
--- End diff --

It seems to me, as the `stopEarly` in `Limit` is going to override the 
`stopEarly` in `BufferedRowIterator`, we can only put it in outer class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18320
  
**[Test build #78148 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78148/testReport)**
 for PR 18320 at commit 
[`52c8abf`](https://github.com/apache/spark/commit/52c8abf9551e126f75ef0aa0a042f1ebd13e8d47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18320
  
Yes, I believe you are correct and the daemon is already running but it 
avoids to use the problematic daemon - 
https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L363-L392
 up to my knowledge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18320
  
Yes, I believe you are correct but the daemon is already running but it 
avoids to use the problematic daemon - 
https://github.com/apache/spark/blob/478fbc866fbfdb4439788583281863ecea14e8af/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L363-L392
 up to my knowledge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18320
  
Hmm I'm not sure - I'm pretty sure the session / spark context is already 
initialized when this test is run and changing the setting here does it affect 
the existing daemon process already running?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18320
  
**[Test build #78147 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78147/testReport)**
 for PR 18320 at commit 
[`505d75f`](https://github.com/apache/spark/commit/505d75f0e9a90481f96d0f1fefd4f9baaa38ee7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17702: [SPARK-20408][SQL] Get the glob path in parallel ...

2017-06-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17702#discussion_r122354359
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -389,6 +389,23 @@ case class DataSource(
   }
 
   /**
+   * Return all paths represented by the wildcard string.
+   */
+  private def getGlobbedPaths(qualified: Path): Seq[Path] = {
--- End diff --

at least we should follow `InMemoryFileIndex.bulkListLeafFiles` and `Picks 
the listing strategy adaptively depending on the number of paths to list`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in gapply/ga...

2017-06-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18320
  
cc @felixcheung, @shivaram and @MLnick.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18320: [SPARK-21093][R] Avoid mcfork in R's daemon in ga...

2017-06-15 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/18320

[SPARK-21093][R] Avoid mcfork in R's daemon in gapply/gapplyCollect tests

## What changes were proposed in this pull request?

`mcfork` in R looks opening a pipe ahead but the existing logic does not 
properly close it when it is executed hot. This leads to the failure of more 
forking due to the limit for number of files open.

This hot execution looks particularly for `gapply`/`gapplyCollect`. For 
unknown reason, this happens more easily in CentOS and could be reproduced in 
Mac too.

All the details are described in 
https://issues.apache.org/jira/browse/SPARK-21093

This PR proposes simply to avoid reusing that daemon but each process from 
JVM that look terminating all correctly.

## How was this patch tested?

I ran the codes below on both CentOS and Mac.

```r
df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
collect(gapply(df, "a", function(key, x) { x }, schema(df)))
collect(gapply(df, "a", function(key, x) { x }, schema(df)))
...  # 30 times
```

Also, now it passes R tests on CentOS as below:

```
SparkSQL functions: Spark package found in SPARK_HOME: .../spark

..

..

..

..

..


```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-21093

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18320.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18320


commit 505d75f0e9a90481f96d0f1fefd4f9baaa38ee7d
Author: hyukjinkwon 
Date:   2017-06-16T02:37:53Z

Avoid mcfork in R's daemon in gapply tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17702: [SPARK-20408][SQL] Get the glob path in parallel to redu...

2017-06-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17702
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >