date:20170422

[GitHub] spark issue #17732: Branch 2.0

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17732
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17732: Branch 2.0

2017-04-22 Thread tangchun

GitHub user tangchun opened a pull request:

https://github.com/apache/spark/pull/17732

Branch 2.0

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17732.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17732


commit 0cdd7370a61618d042417ee387a3c32ee5c924e6
Author: Bjarne Fruergaard 
Date:   2016-09-29T22:39:57Z

[SPARK-17721][MLLIB][ML] Fix for multiplying transposed SparseMatrix with 
SparseVector

## What changes were proposed in this pull request?

* changes the implementation of gemv with transposed SparseMatrix and 
SparseVector both in mllib-local and mllib (identical)
* adds a test that was failing before this change, but succeeds with these 
changes.

The problem in the previous implementation was that it only increments `i`, 
that is enumerating the columns of a row in the SparseMatrix, when the 
row-index of the vector matches the column-index of the SparseMatrix. In cases 
where a particular row of the SparseMatrix has non-zero values at 
column-indices lower than corresponding non-zero row-indices of the 
SparseVector, the non-zero values of the SparseVector are enumerated without 
ever matching the column-index at index `i` and the remaining column-indices 
i+1,...,indEnd-1 are never attempted. The test cases in this PR illustrate this 
issue.

## How was this patch tested?

I have run the specific `gemv` tests in both mllib-local and mllib. I am 
currently still running `./dev/run-tests`.

## ___
As per instructions, I hereby state that this is my original work and that 
I license the work to the project (Apache Spark) under the project's open 
source license.

Mentioning dbtsai, viirya and brkyvz whom I can see have worked/authored on 
these parts before.

Author: Bjarne Fruergaard 

Closes #15296 from bwahlgreen/bugfix-spark-17721.

(cherry picked from commit 29396e7d1483d027960b9a1bed47008775c4253e)
Signed-off-by: Joseph K. Bradley 

commit a99ea4c9e0e2f91e4b524987788f0acee88e564d
Author: Bryan Cutler 
Date:   2016-09-29T23:31:30Z

Updated the following PR with minor changes to allow cherry-pick to 
branch-2.0

[SPARK-17697][ML] Fixed bug in summary calculations that pattern match 
against label without casting

In calling LogisticRegression.evaluate and 
GeneralizedLinearRegression.evaluate using a Dataset where the Label is not of 
a double type, calculations pattern match against a double and throw a 
MatchError.  This fix casts the Label column to a DoubleType to ensure there is 
no MatchError.

Added unit tests to call evaluate with a dataset that has Label as other 
numeric types.

Author: Bryan Cutler 

Closes #15288 from BryanCutler/binaryLOR-numericCheck-SPARK-17697.

(cherry picked from commit 2f739567080d804a942cfcca0e22f91ab7cbea36)
Signed-off-by: Joseph K. Bradley 

commit 744aac8e6ff04d7a3f1e8ccad335605ac8fe2f29
Author: Dongjoon Hyun 
Date:   2016-10-01T05:05:59Z

[MINOR][DOC] Add an up-to-date description for default serialization during 
shuffling

## What changes were proposed in this pull request?

This PR aims to make the doc up-to-date. The documentation is generally 
correct, but after https://issues.apache.org/jira/browse/SPARK-13926, Spark 
starts to choose Kyro as a default serialization library during shuffling of 
simple types, arrays of simple types, or string type.

## How was this patch tested?

This is a documentation update.

Author: Dongjoon Hyun 

Closes #15315 from dongjoon-hyun/SPARK-DOC-SERIALIZER.

(cherry picked from commit 15e9bbb49e00b3982c428d39776725d0dea2cdfa)
Signed-off-by: Reynold Xin 

commit b57e2acb134d94dafc81686da875c5dd3ea35c74
Author: Jagadeesan 
Date:   2016-10-03T09:46:38Z

[SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,â¦

## What changes were proposed in this pull request?

To build R docs (which are built when R tests are run), users need

[GitHub] spark issue #17730: [SPARK-20439] [SQL] Fix Catalog API listTables and getTa...

2017-04-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17730
  
cc @cloud-fan @sameeragarwal 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17480: [SPARK-20079][Core][yarn] Re registration of AM hangs sp...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17480
  
**[Test build #76076 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76076/testReport)**
 for PR 17480 at commit 
[`17a7757`](https://github.com/apache/spark/commit/17a7757c3ba76f083fa198519580a2146cb6c8af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17480: [SPARK-20079][Core][yarn] Re registration of AM h...

2017-04-22 Thread witgo

Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/17480#discussion_r112825043
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -249,7 +249,6 @@ private[spark] class ExecutorAllocationManager(
* yarn-client mode when AM re-registers after a failure.
*/
   def reset(): Unit = synchronized {
-initializing = true
--- End diff --

@jerryshao  @vanzin
 I think that deleting the `initializing = true` is a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-22 Thread facaiy

Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
Hi, I has checked R GBM's code and found that:
R's gbm uses mean value $(x + y) / 2$,  not weighted mean $(c_x * x + c_y * 
y) / (c_x + c_y)$ described in [JIRA 
SPARK-16957](https://issues.apache.org/jira/browse/SPARK-16957), for split 
point.

1. code snippet:
[gbm-developers/gbm](https://github.com/gbm-developers/gbm)
commit a1defa382a629f8b97bf9f552dcd821ee7ac9dac
src/node_search.cpp:145:
```c++
  else if(cCurrentVarClasses == 0)   // variable is continuous
  {
// Evaluate the current split
dCurrentSplitValue = 0.5*(dLastXValue + dX);
  }
```

2. test
To verify it, I create a toy dataset and take a test on R. 
```R
> f = c(0.0, 0.0, 1.0, 1.0, 1.0, 1.0)
> l = c(0,   0,   1,   1,   1,   1)
> df = data.frame(l, f)
> sapply(df, class)
l f
"numeric" "numeric"
> mod <- gbm(l~f, data=df, n.trees=1, bag.fraction=1, n.minobsinnode=1, 
distribution = "bernoulli")
> pretty.gbm.tree(mod)
  SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction 
Weight
00  5.00e-011 2   3   1.33  
6
1   -1 -3.00e-03   -1-1  -1   0.00  
2
2   -1  1.50e-03   -1-1  -1   0.00  
4
3   -1  1.480297e-19   -1-1  -1   0.00  
6
 Prediction
0  1.480297e-19
1 -3.00e-03
2  1.50e-03
3  1.480297e-19
```
As expected,
the root's split point is 5.00e-01, namely mean value `0.5 = (0 + 1) / 
2`, not weighted mean `0.7 = (0 * 2 + 1 * 4) / 6`.

3. conclusion
I prefer to using weighted mean for split point in the PR, rather than mean 
value in R's gbm package. How about you? @sethah @srowen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17649: [SPARK-20380][SQL] Output table comment for DESC ...

2017-04-22 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17649#discussion_r112824647
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-table-after-alter-table.sql.out
 ---
@@ -0,0 +1,162 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 12
+
+
+-- !query 0
+CREATE TABLE table_with_comment (a STRING, b INT, c STRING, d STRING) 
USING parquet COMMENT 'table_comment'
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+DESC formatted table_with_comment
+-- !query 1 schema
+struct
+-- !query 1 output
+# col_name data_type   comment 
+a  string  
+b  int 
+c  string  
+d  string  
+   
+# Detailed Table Information   

+Database   default 
+Table  table_with_comment  
+Created [not included in comparison]
+Last Access [not included in comparison]
+Type   MANAGED 
+Provider   parquet 
+Commenttable_comment   
+Location [not included in 
comparison]sql/core/spark-warehouse/table_with_comment
+
+
+-- !query 2
+ALTER TABLE table_with_comment set tblproperties(comment = "modified 
comment")
+-- !query 2 schema
+struct<>
+-- !query 2 output
+
+
+
+-- !query 3
+DESC formatted table_with_comment
+-- !query 3 schema
+struct
+-- !query 3 output
+# col_name data_type   comment 
+a  string  
+b  int 
+c  string  
+d  string  
+   
+# Detailed Table Information   

+Database   default 
+Table  table_with_comment  
+Created [not included in comparison]
+Last Access [not included in comparison]
+Type   MANAGED 
+Provider   parquet 
+Commentmodified comment
+Properties [comment=modified comment]  

--- End diff --

We should remove `comment` from `Properties `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...

2017-04-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17649
  
@wzhfy Could you check the behavior of Hive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17649: [SPARK-20380][SQL] Output table comment for DESC ...

2017-04-22 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17649#discussion_r112824524
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -267,8 +271,15 @@ case class AlterTableUnsetPropertiesCommand(
 }
   }
 }
+// if 'comment' key is present in the seq of keys which needs to be 
unset then reset the table
+// level comment with none.
+val tableComment = if (propKeys.contains("comment")) {
+  None
+} else {
+  table.properties.get("comment")
+}
--- End diff --

Nit:
```Scala
val comment = if (propKeys.contains("comment")) None else 
table.properties.get("comment")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17708: [SPARK-20413] Add new query hint NO_COLLAPSE.

2017-04-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17708
  
It sounds like we should not simply merge two Projects to avoid calling the 
same UDF multiple times.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17469


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-22 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17469
  
LGTM, thanks for your work on this @map222 & thanks for your work reviewing 
this @HyukjinKwon. 
Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...

2017-04-22 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17688
  
LGTM, thanks @HyukjinKwon for noticing the lack of bool in the scala code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type...

2017-04-22 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17688#discussion_r112823550
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1238,7 +1238,7 @@ def fillna(self, value, subset=None):
 Value to replace null values with.
 If the value is a dict, then `subset` is ignored and `value` 
must be a mapping
 from column name (string) to replacement value. The 
replacement value must be
-an int, long, float, or string.
+an int, long, float, boolean, or string.
--- End diff --

That makes sense, I'd say that the eventual improvement would maybe be 
offering `fill` for bool for symetry with the rest of the types but its not 
necessary here rather than type checking for bool on the input.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type...

2017-04-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17688#discussion_r112823211
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1238,7 +1238,7 @@ def fillna(self, value, subset=None):
 Value to replace null values with.
 If the value is a dict, then `subset` is ignored and `value` 
must be a mapping
 from column name (string) to replacement value. The 
replacement value must be
-an int, long, float, or string.
+an int, long, float, boolean, or string.
--- End diff --

I think this indicates the replacement `If the value is a dict` whereas 
`param value` can't be a bool as below:

```python
>>> from pyspark.sql import Row
>>> spark.createDataFrame([Row(a=None), Row(a=True)]).fillna({"a": 
True}).first()
Row(a=True)
>>> spark.createDataFrame([Row(a=None), Row(a=True)]).fillna(True).first()
Row(a=None)
```

I can't find `def fill(value: Boolean)` in `functions.scala`. Namely, this 
will call it with `int`. So,

```python
>>> spark.createDataFrame([Row(a=None), Row(a=0)]).fillna(True).first()
Row(a=1)
>>> spark.createDataFrame([Row(a=None), Row(a=0)]).fillna(False).first()
Row(a=0)
```

So, the current status looks correct to me.

BTW, ideally, we should throw an exception in 

```python
if not isinstance(value, (float, int, long, basestring, dict)):
raise ValueError("value should be a float, int, long, string, or dict")
```

However, in Python boolean is a int - 
https://www.python.org/dev/peps/pep-0285/

>   6) Should bool inherit from int?
>
>=> Yes.


```python
>>> isinstance(True, int)
True
```

However, this looks just a documentation fix and I guess there are many 
instances with it. I think it is fine with not fixing it here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17731
  
so essentially it's still evaluating the `get` before when the 2nd `get` is 
hit from the delay binding (as a way to prevent going into an infinite loop, 
really)

what if you have this instead to break the loop?
```
delayAssign(delayedAssign(".sparkRsession",
  { rm(".sparkRsession", envir = SparkR:::.sparkREnv); sparkR.session(..) },
 assign.env=SparkR:::.sparkREnv)
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17731
  
so both `sparkSession` or `sparkRjsc` are valid even after the call to 
`get` failed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread vijoshi

Github user vijoshi commented on the issue:

https://github.com/apache/spark/pull/17731
  
"I understand these 2 cases, can you explain how your change connect to 
these two?"

Say, I do this:

```
delayAssign(delayedAssign(".sparkRsession", { sparkR.session(..) }, 
assign.env=SparkR:::.sparkREnv)
```

Now, when the user code such as this runs:

```
a <- createDataFrame(iris)
```

this sequence occurs: 

```
createDataFrame() 
   > getSparkSession() 
 > get(".sparkRsession", envir = .sparkREnv) 
> delayed evaluation of sparkR.session(...) 
> 
   if (exists(".sparkRsession", envir = .sparkREnv))
 sparkSession <- get(".sparkRsession", envir = .sparkREnv) 
# error occurs here
 > Error "Promise already under evaluation"
```

The change is to ignore the "Promise under evaluation" error. At the line 
where error occurs, there doesn't seem to be any other possible cause for 
failure since the previous line of code has already checked that the 
`.sparkRsession` exists in the environment. So if we take it that this happens 
only when is `.sparkRsession`  bound lazily and ignore it - which is what my 
change does - the code proceeds with regular computation of sparkSession. 

Similar is the case with `.sparkRjsc`. The SparkR code inside 
`spark.sparkContext(..)` does this:

```
if (exists(".sparkRjsc", envir = .sparkREnv)) {
  sparkRjsc <- get(".sparkRjsc", envir = .sparkREnv) # "Promise under 
evaluation" error occurs here
}
```
When `.sparkRjsc` is lazily bound the `exists(..)` condition succeeds, and 
the ""Promise under evaluation" error occurs. If the error is ignored 
considering that there can't be any other cause for failure, the lazy 
initialization works.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17469
  
@holdenk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-22 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17469
  
LGTM if committers are okay with merging fixing some of documentation (not 
all) but regarding it is his very first contribution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17731
  
I understand these 2 cases, can you explain how your change connect to 
these two?
if you delay bind to `".sparkRjsc", envir = .sparkREnv`, doesn't it just 
work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17469
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76075/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17469
  
**[Test build #76075 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76075/testReport)**
 for PR 17469 at commit 
[`b52765f`](https://github.com/apache/spark/commit/b52765f5ef156862bd3cc4793a0d3fbd4d334449).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17469
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread vijoshi

Github user vijoshi commented on the issue:

https://github.com/apache/spark/pull/17731
  
@felixcheung yes. We need to support these two types of possibilities:

```
#do not call sparkR.session() - followed by implicit reference to 
sparkSession
a <- createDataFrame(iris)
```

or

```
#do not call sparkR.session() - followed by implicit reference to 
sparkContext
doubled <- spark.lapply(1:10, function(x){2 * x})
```

Internal implementations of APIs like `spark.lapply` directly look for the 
sparkContext so to account for these, the sparkContext needs to be friendly to 
being lazily initialized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17731
  
also, what if an user wants to explicitly create a spark session with 
specific parameter? the delay binding model doesn't seem to support that 
properly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112822291
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -308,6 +308,21 @@ numCyl <- summarize(groupBy(carsDF, carsDF$cyl), count 
= n(carsDF$cyl))
 head(numCyl)
 ```
 
+`groupBy` can be replaced with `cube` or `rollup` to compute subtotals 
across multiple dimensions.
--- End diff --

do you think the programming guide can use updates too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112821786
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3642,3 +3642,58 @@ setMethod("checkpoint",
 df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
 dataFrame(df)
   })
+
+
+#' cube
+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases cube,SparkDataFrame-method
+#' @rdname cube
+#' @name cube
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(cube(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note cube since 2.3.0
+setMethod("cube",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
+jcol <- lapply(cols, function(x) if (is.character(x)) 
column(x)@jc else x@jc)
+sgd <- callJMethod(x@sdf, "cube", jcol)
+groupedData(sgd)
+  })
+
+#' rollup
+#'
+#' Create a multi-dimensional rollup for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases rollup,SparkDataFrame-method
+#' @rdname rollup
+#' @name rollup
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(rollup(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note rollup since 2.3.0
+setMethod("rollup",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
+jcol <- lapply(cols, function(x) if (is.character(x)) 
column(x)@jc else x@jc)
+sgd <- callJMethod(x@sdf, "rollup", jcol)
+groupedData(sgd)
+  })
--- End diff --

please add extra newline at end of file


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112821792
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3642,3 +3642,58 @@ setMethod("checkpoint",
 df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
 dataFrame(df)
   })
+
+
+#' cube
+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases cube,SparkDataFrame-method
+#' @rdname cube
+#' @name cube
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(cube(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note cube since 2.3.0
+setMethod("cube",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
+jcol <- lapply(cols, function(x) if (is.character(x)) 
column(x)@jc else x@jc)
+sgd <- callJMethod(x@sdf, "cube", jcol)
+groupedData(sgd)
+  })
+
+#' rollup
+#'
+#' Create a multi-dimensional rollup for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
--- End diff --

`names(s)` -> `name(s)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112822250
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3642,3 +3642,58 @@ setMethod("checkpoint",
 df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
 dataFrame(df)
   })
+
+
+#' cube
+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases cube,SparkDataFrame-method
+#' @rdname cube
+#' @name cube
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(cube(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note cube since 2.3.0
+setMethod("cube",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
+jcol <- lapply(cols, function(x) if (is.character(x)) 
column(x)@jc else x@jc)
+sgd <- callJMethod(x@sdf, "cube", jcol)
+groupedData(sgd)
+  })
+
+#' rollup
+#'
+#' Create a multi-dimensional rollup for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases rollup,SparkDataFrame-method
+#' @rdname rollup
+#' @name rollup
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(rollup(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note rollup since 2.3.0
+setMethod("rollup",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
--- End diff --

check length of cols


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112822273
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3642,3 +3642,58 @@ setMethod("checkpoint",
 df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
 dataFrame(df)
   })
+
+
+#' cube
+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases cube,SparkDataFrame-method
+#' @rdname cube
+#' @name cube
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(cube(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note cube since 2.3.0
+setMethod("cube",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
+jcol <- lapply(cols, function(x) if (is.character(x)) 
column(x)@jc else x@jc)
+sgd <- callJMethod(x@sdf, "cube", jcol)
+groupedData(sgd)
+  })
+
+#' rollup
+#'
+#' Create a multi-dimensional rollup for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases rollup,SparkDataFrame-method
+#' @rdname rollup
+#' @name rollup
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(rollup(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note rollup since 2.3.0
+setMethod("rollup",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
+jcol <- lapply(cols, function(x) if (is.character(x)) 
column(x)@jc else x@jc)
--- End diff --

ditto `if (class(x) == "Column") x@jc else column(x)@jc`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112822277
  
--- Diff: R/pkg/R/generics.R ---
@@ -631,6 +635,11 @@ setGeneric("sample",
  standardGeneric("sample")
})
 
+#' @rdname rollup
+#' @export
+setGeneric("rollup",
+   function(x, ...) { standardGeneric("rollup") })
--- End diff --

could you keep this in one line please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112821835
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3642,3 +3642,58 @@ setMethod("checkpoint",
 df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
 dataFrame(df)
   })
+
+
+#' cube
+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
--- End diff --

ditto below


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112821831
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3642,3 +3642,58 @@ setMethod("checkpoint",
 df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
 dataFrame(df)
   })
+
+
+#' cube
+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases cube,SparkDataFrame-method
+#' @rdname cube
+#' @name cube
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(cube(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note cube since 2.3.0
+setMethod("cube",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
+jcol <- lapply(cols, function(x) if (is.character(x)) 
column(x)@jc else x@jc)
+sgd <- callJMethod(x@sdf, "cube", jcol)
+groupedData(sgd)
+  })
+
+#' rollup
+#'
+#' Create a multi-dimensional rollup for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
--- End diff --

perhaps `variable(s)` is misleading and just `character name(s) or 
Column(s) to group on.` is sufficient?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112822261
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3642,3 +3642,58 @@ setMethod("checkpoint",
 df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
 dataFrame(df)
   })
+
+
+#' cube
+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases cube,SparkDataFrame-method
+#' @rdname cube
+#' @name cube
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(cube(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note cube since 2.3.0
+setMethod("cube",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
+jcol <- lapply(cols, function(x) if (is.character(x)) 
column(x)@jc else x@jc)
--- End diff --

nit: I'd flip this since Column is a stronger type, and also this way there 
is a nicer error message
instead of `if (is.character(x)) column(x)@jc else x@jc`
do `if (class(x) == "Column") x@jc else column(x)@jc`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112822286
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -308,6 +308,21 @@ numCyl <- summarize(groupBy(carsDF, carsDF$cyl), count 
= n(carsDF$cyl))
 head(numCyl)
 ```
 
+`groupBy` can be replaced with `cube` or `rollup` to compute subtotals 
across multiple dimensions.
--- End diff --

minor: I wouldn't say replace because they are not functionally the same?
how about `use cube or rollup to compute subtotals across multiple 
dimensions.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112822246
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3642,3 +3642,58 @@ setMethod("checkpoint",
 df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
 dataFrame(df)
   })
+
+
+#' cube
+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases cube,SparkDataFrame-method
+#' @rdname cube
+#' @name cube
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(cube(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note cube since 2.3.0
+setMethod("cube",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
--- End diff --

check length of cols is > 0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17730: [SPARK-20439] [SQL] Fix Catalog API listTables and getTa...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17730
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17730: [SPARK-20439] [SQL] Fix Catalog API listTables and getTa...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17730
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76072/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17730: [SPARK-20439] [SQL] Fix Catalog API listTables and getTa...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17730
  
**[Test build #76072 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76072/testReport)**
 for PR 17730 at commit 
[`1a5e24d`](https://github.com/apache/spark/commit/1a5e24dc5d6d538e975200b4eb95583db36d5f9f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...

2017-04-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17688
  
good catch - instead of duplicating it, perhaps just say `supported data 
types` or `supported data types above`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17469
  
**[Test build #76075 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76075/testReport)**
 for PR 17469 at commit 
[`b52765f`](https://github.com/apache/spark/commit/b52765f5ef156862bd3cc4793a0d3fbd4d334449).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17469
  
I don't why Jenkins doesn't pick up the changes automatically...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17469
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17729#discussion_r112822059
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -1546,6 +1546,40 @@ test_that("string operators", {
   expect_equal(collect(select(df3, substring_index(df3$a, ".", 2)))[1, 1], 
"a.b")
   expect_equal(collect(select(df3, substring_index(df3$a, ".", -3)))[1, 
1], "b.c.d")
   expect_equal(collect(select(df3, translate(df3$a, "bc", "12")))[1, 1], 
"a.1.2.d")
+
+  l4 <- list(list(a = "a.b@c.d   1\\b"))
+  df4 <- createDataFrame(l4)
+  expect_equal(
+collect(select(df4, split_string(df4$a, "\\s+")))[1, 1],
+list(list("a.b@c.d", "1\\b"))
+  )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "\\.")))[1, 1],
+list(list("a", "b@c", "d   1\\b"))
+  )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "@")))[1, 1],
+list(list("a.b", "c.d   1\\b"))
+  )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "")))[1, 1],
+list(list("a.b@c.d   1", "b"))
+  )
+
+  l5 <- list(list(a = "abc"))
+  df5 <- createDataFrame(l5)
+  expect_equal(
+collect(select(df5, repeat_string(df5$a, 1L)))[1, 1],
+"abc"
+  )
+  expect_equal(
+collect(select(df5, repeat_string(df5$a, 3)))[1, 1],
+"abcabcabc"
+  )
+  expect_equal(
+collect(select(df5, repeat_string(df5$a, -1)))[1, 1],
--- End diff --

:) ahh, `-1` works?!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17729#discussion_r112821860
  
--- Diff: R/pkg/NAMESPACE ---
@@ -300,6 +300,7 @@ exportMethods("%in%",
   "rank",
   "regexp_extract",
   "regexp_replace",
+  "repeat_string",
--- End diff --

good call on these names!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17729#discussion_r112822015
  
--- Diff: R/pkg/R/functions.R ---
@@ -3745,3 +3745,55 @@ setMethod("collect_set",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"collect_set", x@jc)
 column(jc)
   })
+
+#' split_string
+#'
+#' Splits string on regular expression.
+#'
+#' @param x Column to compute on
+#' @param pattern Java regular expression
+#'
+#' @rdname split_string
+#' @family string_funcs
+#' @aliases split_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- read.text("README.md")
+#'
+#' head(select(split_string(df$value, "\\s+")))
+#' }
+#' @note split_string 2.3.0
+#' @note equivalent to \code{split} SQL function
+setMethod("split_string",
+  signature(x = "Column", pattern = "character"),
+  function(x, pattern) {
+jc <- callJStatic("org.apache.spark.sql.functions", "split", 
x@jc, pattern)
+column(jc)
+  })
+
+#' repeat_string
+#'
+#' Repeats string n times.
+#'
+#' @param x Column to compute on
+#' @param n Number of repetitions
+#'
+#' @rdname repeat_string
+#' @family string_funcs
+#' @aliases repeat_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- createDataFame(data.frame(
+#'   text = c("foo", "bar")
+#' ))
+#'
+#' head(select(repeat_string(df$text, 3)))
+#' }
+#' @note repeat_string 2.3.0
+#' @note equivalent to \code{repeat} SQL function
--- End diff --

ditto above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17729#discussion_r112822000
  
--- Diff: R/pkg/R/functions.R ---
@@ -3745,3 +3745,55 @@ setMethod("collect_set",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"collect_set", x@jc)
 column(jc)
   })
+
+#' split_string
+#'
+#' Splits string on regular expression.
+#'
+#' @param x Column to compute on
+#' @param pattern Java regular expression
+#'
+#' @rdname split_string
+#' @family string_funcs
+#' @aliases split_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- read.text("README.md")
+#'
+#' head(select(split_string(df$value, "\\s+")))
+#' }
+#' @note split_string 2.3.0
+#' @note equivalent to \code{split} SQL function
--- End diff --

Note is somewhat hard to discover on the generated doc page, if you want 
this, you could put it as 2nd content paragraph like below and it will show up 
as the details section like here 
http://spark.apache.org/docs/latest/api/R/read.jdbc.html

```
#' split_string
#'
#' Splits string on regular expression.
#'
#' This is equivalent to \code{split} SQL function
```

(yes, through the magic of roxygen2)

Also, instead of `\code{split}` you might want to link to Spark Scala doc 
too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17729#discussion_r112822029
  
--- Diff: R/pkg/R/functions.R ---
@@ -3745,3 +3745,55 @@ setMethod("collect_set",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"collect_set", x@jc)
 column(jc)
   })
+
+#' split_string
+#'
+#' Splits string on regular expression.
+#'
+#' @param x Column to compute on
+#' @param pattern Java regular expression
+#'
+#' @rdname split_string
+#' @family string_funcs
+#' @aliases split_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- read.text("README.md")
+#'
+#' head(select(split_string(df$value, "\\s+")))
+#' }
+#' @note split_string 2.3.0
+#' @note equivalent to \code{split} SQL function
+setMethod("split_string",
+  signature(x = "Column", pattern = "character"),
+  function(x, pattern) {
+jc <- callJStatic("org.apache.spark.sql.functions", "split", 
x@jc, pattern)
+column(jc)
+  })
+
+#' repeat_string
+#'
+#' Repeats string n times.
+#'
+#' @param x Column to compute on
+#' @param n Number of repetitions
+#'
+#' @rdname repeat_string
+#' @family string_funcs
+#' @aliases repeat_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- createDataFame(data.frame(
+#'   text = c("foo", "bar")
+#' ))
--- End diff --

I'm ok with this though would it be better with the read.text example than 
a fake 1 row like this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17729#discussion_r112822065
  
--- Diff: R/pkg/R/functions.R ---
@@ -3745,3 +3745,55 @@ setMethod("collect_set",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"collect_set", x@jc)
 column(jc)
   })
+
+#' split_string
+#'
+#' Splits string on regular expression.
+#'
+#' @param x Column to compute on
+#' @param pattern Java regular expression
+#'
+#' @rdname split_string
+#' @family string_funcs
+#' @aliases split_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- read.text("README.md")
+#'
+#' head(select(split_string(df$value, "\\s+")))
+#' }
+#' @note split_string 2.3.0
+#' @note equivalent to \code{split} SQL function
+setMethod("split_string",
+  signature(x = "Column", pattern = "character"),
+  function(x, pattern) {
+jc <- callJStatic("org.apache.spark.sql.functions", "split", 
x@jc, pattern)
+column(jc)
+  })
+
+#' repeat_string
+#'
+#' Repeats string n times.
+#'
+#' @param x Column to compute on
+#' @param n Number of repetitions
+#'
+#' @rdname repeat_string
+#' @family string_funcs
+#' @aliases repeat_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- createDataFame(data.frame(
+#'   text = c("foo", "bar")
+#' ))
+#'
+#' head(select(repeat_string(df$text, 3)))
+#' }
+#' @note repeat_string 2.3.0
+#' @note equivalent to \code{repeat} SQL function
+setMethod("repeat_string",
+  signature(x = "Column", n = "numeric"),
+  function(x, n) {
+jc <- callJStatic("org.apache.spark.sql.functions", "repeat", 
x@jc, as.integer(n))
--- End diff --

this is good actually, may I introduce you to `numToInt`, an internal util 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17731
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17731
  
**[Test build #76074 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76074/testReport)**
 for PR 17731 at commit 
[`c06da49`](https://github.com/apache/spark/commit/c06da49214f3591602cdc3220ac606a6adb24ac8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17731
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76074/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...

2017-04-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17467
  
@brkyvz are you ok with this PR at a high level? If yes, I could help with 
review and shepherd this 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17731
  
this **might** be reasonable, but `sparkR.sparkContext` is only called when 
`sparkR.session()` is called, and so I'm not sure I follow how if someone is 
doing this in a brand new R session:
```
# do not call sparkR.session()
a <- createDataFrame(iris)
```

...which is what I understand from the email exchange on user@, I think. 
Could you elaborate if this is what you are trying to support?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17731
  
**[Test build #76074 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76074/testReport)**
 for PR 17731 at commit 
[`c06da49`](https://github.com/apache/spark/commit/c06da49214f3591602cdc3220ac606a6adb24ac8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-04-22 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r112821540
  
--- Diff: docs/graphx-programming-guide.md ---
@@ -708,9 +708,8 @@ messages remaining.
 > messaging function.  These constraints allow additional optimization 
within GraphX.
 
 The following is the type signature of the [Pregel 
operator][GraphOps.pregel] as well as a *sketch*
-of its implementation (note: to avoid stackOverflowError due to long 
lineage chains, graph and 
-messages are periodically checkpoint and the checkpoint interval is set by
-"spark.graphx.pregel.checkpointInterval", it can be disable by set as -1):
+of its implementation (note: to avoid stackOverflowError due to long 
lineage chains, pregel support periodcally
+checkpoint graph and messages by setting 
"spark.graphx.pregel.checkpointInterval"):
--- End diff --

I think we can recommend a good value (say 10 was the earlier default) to 
set to since now it defaults to off
Also, good point about checkpointdir - would be good to mention that is 
required be set as well and link to any doc we have on that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread vijoshi

Github user vijoshi commented on the issue:

https://github.com/apache/spark/pull/17731
  
@felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17731
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17731
  
**[Test build #76073 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76073/testReport)**
 for PR 17731 at commit 
[`4423f5c`](https://github.com/apache/spark/commit/4423f5cb6f47d01d00064fc4886f0fa0eec2e9ed).
 * This patch **fails R style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17731
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76073/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17731: [SPARK-20440][SparkR] Allow SparkR session and context t...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17731
  
**[Test build #76073 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76073/testReport)**
 for PR 17731 at commit 
[`4423f5c`](https://github.com/apache/spark/commit/4423f5cb6f47d01d00064fc4886f0fa0eec2e9ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17731: [SPARK-20440][SparkR] Allow SparkR session and co...

2017-04-22 Thread vijoshi

GitHub user vijoshi opened a pull request:

https://github.com/apache/spark/pull/17731

[SPARK-20440][SparkR] Allow SparkR session and context to have delayed 
bindings

## What changes were proposed in this pull request?

Allow SparkR to ignore the "promise already under evaluation" error in case 
the user has created a delayed binding for the `.sparkRsession / .sparkRjsc` 
names in the `SparkR:::.sparkREnv`. 

## How was this patch tested?

Ran all unit tests - run-tests.sh


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vijoshi/spark lazysparkr_master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17731.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17731


commit 4423f5cb6f47d01d00064fc4886f0fa0eec2e9ed
Author: Vinayak 
Date:   2017-04-21T15:24:13Z

Allow SparkR session and context to have delayed/active binding




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread zero323

Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/17728
  
cc @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat

2017-04-22 Thread zero323

Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/17729
  
cc @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17729
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76071/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17729
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17729
  
**[Test build #76071 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76071/testReport)**
 for PR 17729 at commit 
[`255863a`](https://github.com/apache/spark/commit/255863acfbe6f91ae533c7fee6b190a350b2f880).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17730: [SPARK-20439] [SQL] Fix Catalog API listTables and getTa...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17730
  
**[Test build #76072 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76072/testReport)**
 for PR 17730 at commit 
[`1a5e24d`](https://github.com/apache/spark/commit/1a5e24dc5d6d538e975200b4eb95583db36d5f9f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17730: [SPARK-20439] [SQL] Fix Catalog API listTables an...

2017-04-22 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/17730

[SPARK-20439] [SQL] Fix Catalog API listTables and getTable when failed to 
fetch table metadata

### What changes were proposed in this pull request?

`spark.catalog.listTables` and `spark.catalog.getTable` does not work if we 
are unable to retrieve table metadata due to any reason (e.g., table serde 
class is not accessible or the table type is not accepted by Spark SQL). After 
this PR, the APIs still return the corresponding Table without the description 
and tableType)

### How was this patch tested?
Added a test case

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark listTables

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17730.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17730


commit fded331a6ddc002e0476056cae0ccee095fc75e5
Author: Xiao Li 
Date:   2017-04-22T22:21:13Z

fix.

commit ee2df36d580ed729a38a01b4cd81a41639af6143
Author: Xiao Li 
Date:   2017-04-22T22:30:15Z

clean test case

commit 1a5e24dc5d6d538e975200b4eb95583db36d5f9f
Author: Xiao Li 
Date:   2017-04-22T22:31:16Z

clean test case




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17729
  
**[Test build #76071 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76071/testReport)**
 for PR 17729 at commit 
[`255863a`](https://github.com/apache/spark/commit/255863acfbe6f91ae533c7fee6b190a350b2f880).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...

2017-04-22 Thread zero323

GitHub user zero323 opened a pull request:

https://github.com/apache/spark/pull/17729

[SPARK-20438][R] SparkR wrappers for split and repeat

## What changes were proposed in this pull request?

Add wrappers for `o.a.s.sql.functions`:

- `split` as `split_string`
- `repeat` as `repeat_string`

## How was this patch tested?

Existing tests, additional unit tests, `check-cran.sh`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zero323/spark SPARK-20438

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17729.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17729


commit 255863acfbe6f91ae533c7fee6b190a350b2f880
Author: zero323 
Date:   2017-04-22T22:01:22Z

Add split_string and repeat_string




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17713: [SPARK-20417][SQL] Move subquery error handling t...

2017-04-22 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/17713#discussion_r112819365
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -414,4 +352,269 @@ trait CheckAnalysis extends PredicateHelper {
 
 plan.foreach(_.setAnalyzed())
   }
+
--- End diff --

note to reviewers: This function basically refactors the validation logic 
for subquery expressions from checkAnalysis. This is the entry point function 
to do all the validation for subquery is is called from checkAnalysis().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76069/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15125
  
**[Test build #76069 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76069/testReport)**
 for PR 15125 at commit 
[`24d4ad6`](https://github.com/apache/spark/commit/24d4ad6fd5b05e1d024a42ee656058e77237ffb9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17728
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76070/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17728
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17728
  
**[Test build #76070 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76070/testReport)**
 for PR 17728 at commit 
[`132099c`](https://github.com/apache/spark/commit/132099cc668baa240a0a417950f78fea4be961ec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-22 Thread budde

Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112816898
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -295,6 +306,23 @@ class KinesisSequenceRangeIterator(
 
 private[streaming]
 object KinesisSequenceRangeIterator {
-  val MAX_RETRIES = 3
-  val MIN_RETRY_WAIT_TIME_MS = 100
+  /**
+   * The maximum number of attempts to be made to kinesis. Defaults to 3.
+   */
+  val MAX_RETRIES = "3"
+
+  /**
+   * The interval between consequent kinesis retries. Defaults to 100ms.
+   */
+  val MIN_RETRY_WAIT_TIME_MS = "100ms"
+
+  /**
+   * Key for configuring the retry wait time for kinesis. The values can 
be passed to SparkConf.
--- End diff --

*nit:* I'd make the following tweaks here:

```scala
/**
 * SparkConf key for configuring the wait time to use before retrying a 
Kinesis attempt.
 */
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-22 Thread budde

Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112816922
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -295,6 +306,23 @@ class KinesisSequenceRangeIterator(
 
 private[streaming]
 object KinesisSequenceRangeIterator {
-  val MAX_RETRIES = 3
-  val MIN_RETRY_WAIT_TIME_MS = 100
+  /**
+   * The maximum number of attempts to be made to kinesis. Defaults to 3.
+   */
+  val MAX_RETRIES = "3"
+
+  /**
+   * The interval between consequent kinesis retries. Defaults to 100ms.
+   */
+  val MIN_RETRY_WAIT_TIME_MS = "100ms"
+
+  /**
+   * Key for configuring the retry wait time for kinesis. The values can 
be passed to SparkConf.
+   */
+  val RETRY_WAIT_TIME_KEY = "spark.streaming.kinesis.retry.waitTime"
+
+  /**
+   * Key for configuring the number of retries for kinesis. The values can 
be passed to SparkConf.
--- End diff --

*nit:* I'd make the following tweaks here:

```scala
/**
 * SparkConf key for configuring the maximum number of retries used when 
attempting a Kinesis
 * request.
 */
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-22 Thread budde

Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112817123
  
--- Diff: docs/streaming-kinesis-integration.md ---
@@ -216,3 +216,7 @@ de-aggregate records during consumption.
 - If no Kinesis checkpoint info exists when the input DStream starts, it 
will start either from the oldest record available 
(`InitialPositionInStream.TRIM_HORIZON`) or from the latest tip 
(`InitialPositionInStream.LATEST`).  This is configurable.
   - `InitialPositionInStream.LATEST` could lead to missed records if data 
is added to the stream while no input DStreams are running (and no checkpoint 
info is being stored).
   - `InitialPositionInStream.TRIM_HORIZON` may lead to duplicate 
processing of records where the impact is dependent on checkpoint frequency and 
processing idempotency.
+
+- Kinesis retry configurations
--- End diff --

@brkyvz or another Spark committer might have better suggestions here, but 
I would suggest making this section a new heading (rather than part of 
**Kinesis Checkpointing**) and adding a brief explanatory sentence, e.g.:

```
 Kinesis retry configuration
- A Kinesis DStream will retry any failed request to the Kinesis API. The 
following SparkConf properties can be set in order to customize the behavior of 
the retry logic:
```

followed by the rest of your changes here.

This also reminds me that I owe @brkyvz a change to add docs for the stream 
builder interface here :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-22 Thread budde

Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112816822
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -295,6 +306,23 @@ class KinesisSequenceRangeIterator(
 
 private[streaming]
 object KinesisSequenceRangeIterator {
-  val MAX_RETRIES = 3
-  val MIN_RETRY_WAIT_TIME_MS = 100
+  /**
+   * The maximum number of attempts to be made to kinesis. Defaults to 3.
+   */
+  val MAX_RETRIES = "3"
+
+  /**
+   * The interval between consequent kinesis retries. Defaults to 100ms.
--- End diff --

*nit:* **K**inesis


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-22 Thread budde

Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112816810
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -295,6 +306,23 @@ class KinesisSequenceRangeIterator(
 
 private[streaming]
 object KinesisSequenceRangeIterator {
-  val MAX_RETRIES = 3
-  val MIN_RETRY_WAIT_TIME_MS = 100
+  /**
+   * The maximum number of attempts to be made to kinesis. Defaults to 3.
--- End diff --

*nit:* **K**inesis


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17728
  
**[Test build #76070 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76070/testReport)**
 for PR 17728 at commit 
[`132099c`](https://github.com/apache/spark/commit/132099cc668baa240a0a417950f78fea4be961ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...

2017-04-22 Thread budde

Github user budde commented on the issue:

https://github.com/apache/spark/pull/17467
  
@yssharma Fair enough. I'll try to get your update reviewed later today


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17693
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17693
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76068/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17693
  
**[Test build #76068 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76068/testReport)**
 for PR 17693 at commit 
[`f706ce3`](https://github.com/apache/spark/commit/f706ce3bfbc1f22ed32d2785fd7674fd7d03e874).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-22 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17712
  
cc @gatorsmile 

This is related to the deterministic thing you want to do?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15125
  
**[Test build #76069 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76069/testReport)**
 for PR 15125 at commit 
[`24d4ad6`](https://github.com/apache/spark/commit/24d4ad6fd5b05e1d024a42ee656058e77237ffb9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-22 Thread dding3

Github user dding3 commented on the issue:

https://github.com/apache/spark/pull/15125
  
OK, agreed. If user didn't set checkpointer directory while we turn on 
checkpoint in pregel by default, there may be exception. I will change 
spark.graphx.pregel.checkpointInterval to -1 as default value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17693
  
**[Test build #76068 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76068/testReport)**
 for PR 17693 at commit 
[`f706ce3`](https://github.com/apache/spark/commit/f706ce3bfbc1f22ed32d2785fd7674fd7d03e874).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17688: [MINOR][DOCS][PYTHON] Adding missing boolean type for re...

2017-04-22 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17688
  
@vundela L1237


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...

2017-04-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17693
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17720: [SPARK-20407][TESTS][BACKPORT-2.1] ParquetQuerySuite 'En...

2017-04-22 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17720
  
Thanks! Merging to 2.1

Could you close it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17728
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76067/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17728
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-22 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17728
  
**[Test build #76067 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76067/testReport)**
 for PR 17728 at commit 
[`3104eb1`](https://github.com/apache/spark/commit/3104eb131ac2a9a6cb334c3043fe37d2f98c3ecb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17719: [SPARK-20431][SQL] Specify a schema by using a DD...

2017-04-22 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17719#discussion_r112813484
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -68,6 +68,18 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   }
 
   /**
+   * Specifies the schema by using the input DDL-formatted string. Some 
data sources (e.g. JSON) can
+   * infer the input schema automatically from data. By specifying the 
schema here, the underlying
+   * data source can skip the schema inference step, and thus speed up 
data loading.
+   *
+   * @since 2.3.0
+   */
+  def schema(schemaString: String): DataFrameReader = {
+this.userSpecifiedSchema = Option(StructType.fromDDL(schemaString))
--- End diff --

Sorry, I misread the Python codes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 186 matches

Mail list logo