[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
@steveloughran Agree. We always can make an exception if most need it. 

- For any external behavior change (even in experimental APIs), we need to 
document it and also mention it in the release note. This can simplify the 
version upgrade and let our users trust us.  
- Need to have more discussion regarding the backport policy (to the 
maintenance branches and release candidate branches). This should be discussed 
and finalized at least in the dev list. When the policy is completed, we should 
also send it to the user list and the committers are responsible for enforcing 
it. 

I am trying to summarize what we did in the Spark 2.3 release.  It took 
almost 2 month to release it.  Will send the postmortem to the community with 
some proposal about the backport policy.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-18 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/21060
  
* from the ASF process-police perspective, something like 
versioning/backport policy is something which should be done on the ASF dev 
list...consider asking in user@ to see what people's preferences are. 
Worthwhile mentioning in the project report too.
* from a personal perspective: its good to have a policy, but really good 
to leave a little bit of wiggle-room, even if its something like 'a vote on the 
developer list can override any policy on a case-by-case basis". That is: you 
can do more than just fixes, but its something where the decision is opened up. 
This makes clear the cost *and* avoids the "why did you cherry-pick this 
without asking me" conversations.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
I am fine to accept different opinions for this specific PR. Reverting this 
PR is not my goal here. This is a public community. It sounds like the commit 
message clearly delivers what this PR does to you: `This PR proposes to add 
collect to a query executor as an action.`, although I still have different 
opinions. We need to collect and accept different opinions always. 

I am also glad you agree on the backport policy I proposed above. 
Hopefully, everyone is on the same page for avoiding unnecessary overhead. 

> The minor bug fixes/improvements that have external behavior changes

I personally thought this PR fits this category. No matter whether the 
behavior changes are correct or not, we should still not backport it if the 
issue is neither critical nor a regression. That is what I emphasized in the 
above argument multiple times. The API inconsistency is not rare between our 
APIs. We did not backport these PRs. Now, I am fine to backport it because it 
is an experimental API. Thus, we can say we do not guarantee the backport 
compatibility. If it were a public API, I would insist my original opinion. 

I am also glad many community members have a lot of experience with mission 
critical software development. This can help improve documentation, code 
quality and test coverage. Development of application/mobile software is 
completely different from development of system software. We are in the right 
direction. We need to enforce it with stricter discipline. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21060
  
I do not see a problem with the commit message here. Is that really the 
issue? it accurately describes _what_ changes. The _why_ has always been 
documented in discussion, and it is here already. Sometimes the _why_ is 
documented in comments too; I don't see a particular need for that here, but, 
if that's the issue, why isn't that what we're talking about?

You continue to portray this as a behavior change, and I think you mean "a 
change in what is considered correct behavior". However all the other comments 
suggest otherwise; the argument from consistency seems much stronger.

Your proposed criteria for backports sort of align with accepted practice, 
which is to follow semver.org semantics. I think semver is reasonably clear, in 
general and in this case. I see broad agreement for this backport, and people 
simply disagree with your interpretation. It is not a failure to understand 
criteria.

Believe me, people here have plenty experience with software, versioning, 
and the impact of changes. I'd put more faith in the judgment of your peers. 
Your anecdotes are of a type that's familiar to many people, but, I also fail 
to see how they're relevant here.

You are adopting a 'conservative' position and I think in this case it's 
out of line with normal practice. I think you should accept that people 
disagree and move on.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
I might not explain it well. Sorry for the misunderstanding. Thank you 
@rxin for helping me clarify my points. It sounds like many of you think this 
backport is fine. I am not against this specific PR. We do not need to revert 
the PR but just improve the documentation. That should be fine, although I 
still personally prefer to adding the configuration.  

As what I said in the original PR 
https://github.com/apache/spark/pull/21007 that was merged to master, let me 
point out two points here too.

- PR descriptions will be part of the commit log. We need to be very 
careful before merging the PR. In the past, I also missed a few when I did the 
merge. To be honest, I am not sure how the native English speakers think. The 
first paragraph scared me when I reading the PR commit log. @srowen WDYT?

```
This PR proposes to add collect to a query executor as an action.
```

- Document the behavior changes that are visible to the external 
users/developers. In Spark 2.3, we started to enforce it in every merged PR. I 
believe many of you got multiple similar comments in the previous PRs. This PR 
should also upgrade the migration guides. @HyukjinKwon Do you agree?

Before we finalize the backport policy, below is my inputs about the 
whitelist which we can backport:
- The critical/important bug fixes and security fixes.
- The regression fixes.
- The PRs that do not touch the production code, like test-only patches, 
documentation fixes, and the log message fixes.

Avoid backporting the PRs if it contains
- The new features
- The minor bug fixes/improvements that have external behavior changes
- The code refactoring
- The code changes with the high/mid risk

In the OSS community, I believe no committer will be fired just because we 
merged/introduced a bug, right? If the users application failed due to an 
upgrade, normally we blame our users or the bug are just accidentally 
introduced. However, this is not acceptable in my first team. Let me share what 
I experienced. Just various customer accidents in my related product teams. 

- One director got demoted (almost fired) due to a bad release. She is a 
very nice lady. We really like her. That release had many cool features but the 
quality is not controlled well. Many customers are not willing to upgrade. 
- There is a famous system upgrade failure a few years ago. The whole 
system became very slow after the upgrade. It took 10s hours to recover the 
system. After a few days, the GM went to the customer site and got blamed in 
the whole day. Multiple architects and VPs were forced to write apology 
letters. Customers planned to sue us.  In the customer side, the CTO got fired 
later and the upgrade accident was also on the national TV news because it 
affects many people. 
- A few directors were on call with me 10+ nights to resolve one Japanese 
customer data corruption issue. The client teams ran multiple systems at the 
same time to reproduce the issue. After a few weeks, it was finally resolved 
after reading the memory dump. The root cause is the code merge from one branch 
to another branch many years ago. 

If all the above people believes Spark is the best product in Big Data, we 
need to be more conservative. Our decisions could affect many people. This is 
not the first time I argued with the other committers/contributors about the PR 
quality. In one previous PR, I left almost 100 comments just because the 
documents are not accurate.

If my above comments offend anyone, I apologize. Everyone has different 
understanding about the software development because we have different work 
experience. The whole community already did a wonderful job compared with the 
other open source projects. I still believe we can do a better job, right? Let 
us formalize the backport policy and enforce them in each release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21060
  
Adding a flag just in 2.3 is, at least, an unusual thing to do. By this 
logic lots of backports should be flag protected but we don't. Why is this 
special?

 I still don't see much argument against this backport. I count about 3-4 
committers in favor and 1 against. Let's leave it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
I am okay if there's a specific reason. I think this is the point - if 
there's a specific reason, that should be mentioned and explained ahead. 
Actually, I (and @srowen did as well IIUC) asked this many times, see above.

I would have investigated or would have just said that I am okay with 
reverting. I don't usually get in the way if there's a specific reason. It 
would be great if we can have more open talks next time. 

> for the 2.3.x backport, add a config that so it is possible to turn this 
off in production, if somebody actually has their job failed because of this? 
It's a small delta from what this PR already does, and that should alleviate 
the concerns @gatorsmile has.

I am personally fine with reverting or adding a configuration if that's 
what you guys feel strongly; however, I should say it sounds unusual to have a 
config to control this behaviour in branch-2.3 alone and it sounds less worth. 
The case you mention sounds really unlikely and I wonder if that makes sense 
tho. It's also experimental as you all said.

Also, I should note that I have been confused about the backporting policy 
and the bunch of configurations to control each behaviour. If that's just 
concerns to be addressed, that's fine but sounds what people must follow so 
far. If this is true, I feel sure this should be documented. I feel sure we 
shouldn't have such overhead next time. I am pretty sure this isn't the first 
time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21060
  
It looks to me this is a bug fix that can merit backporting, as 
QueryExecutionListener is also marked as experimental,

In this case, I think @gatorsmile is worried one might have written a 
listener that enumerates the possible function names, and that listener will 
fail now with a new action name. I feel this is quite unlikely, but I also 
appreciate @gatorsmile's concern for backward compatibility, and I've certainly 
been wrong before when our fixes break existing workloads.

(On the spectrum of being extremely conservative to extremely liberal, I 
think I'm in general more on the middle, whereas @gatorsmile probably leans 
more to the conservative side. There isn't really anything wrong with this, and 
it's good to have balancing forces in a project.)

How about this, @HyukjinKwon -- for the 2.3.x backport, add a config that 
so it is possible to turn this off in production, if somebody actually has 
their job failed because of this? It's a small delta from what this PR already 
does, and that should alleviate the concerns @gatorsmile has. I'd also change 
the function doc for onSuccess/onFailure to make it clear that we will add new 
function names in the future, and users shouldn't expect a fixed list of 
function names.






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/21060
  
I agree with what @srowen said:


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21060
  
This was a bug fix from my perspective and looked to be low risk.  I don't 
think this changes any behavior for the user, except if you do a `collect` from 
pyspark and have a `QueryExecutionListener`, then it will now get the expected 
callback instead of nothing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/21060
  
This is one of those great problems in software engineering: no good 
answer. I think case-by-case is generally the best tactic, with a bias against 
feature backport, though my track record is a bit mixed. 

Patches which fix security issues at the expense of compatibility are real 
problems here: they need to go in even knowing stuff will break —especially 
when you quietly push it out with an innocuous JIRA title until you actually do 
the releases. People start complaining that XML entity expansion has has 
stopped working, REST APIs failing if unauthed, when that is the exact outcome 
intended,

Talk to @templedf for a good policy here



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
cc @rdblue and @steveloughran too who I guess should be interested in 
setting up a backporting policy.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
I am not saying we shouldn't be careful. I am trying to be careful when I 
backport. So, your reasons are:

- any behaviour changes shouldn't be backported and it's the basic backport 
rule

  I disagree unless it's clearly documented as a rule. Even if so, I would 
like to make this as an exception because it's less invasive, looks a bug, 
affects an actual user group and fixes the case to make it sense. That's what I 
have been used to so far.

- the query execution listener is not clearly defined

  I am seeing `collect` is included in the original commit - 
https://github.com/apache/spark/commit/15ff85b3163acbe8052d4489a00bcf1d2332fcf0.
 I don't see a reason to specifically exclude PySpark's case since Scala and R 
also work. I don't think we would exclude this on purpose.

- It's not a critical issue nor a regression.

  I don't think we should only make a backport for a critical issue or a 
regression. That's a strong reason to backport but there are still other cases 
that can be backported based on my understanding and observations. If it's a 
bug quite clearly and it affects an actual user group, I would guess it can be 
valuable for a backport. The fix is straightforward, less invasive and small.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
Like what I said above, we need to be very careful when backporting the PR 
with the behavior changes, especially when this is **neither a critical issue 
nor a regression**. Even if this is a bug based on your understanding, we 
should still not backport such PRs. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
This is not just about just inconsistency but a bug. The previous behaivour 
doesn't make sense.

Sure, no need to rush.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
Fixing API inconsistency should not be treated as a bug fix. 

Please give me a few days. I need to summarize the Spark 2.3 release and 
list all the PRs that were backported to the release candidate branches. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
This is not an new feature addition .. this fixes an exiting functionality 
to work as expected and consistently .. 
Sure, that'd be great. Will join in the discussion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
> This case specifically collect in PySpark doesn't work alone whereas all 
other actions like foreach, show and other cases in other languages works in 
all other APIs. Also, that's what a query execution listener describes. Do you 
believe you would make this exception for PySpark specifically in any case?

To improve the usability, we should change it in the master branch. My 
point is we should not backport this PR to 2.3 release. 

> Let's open a discussion in the mailing list and see if we can see the 
agreement. I think this was not the first time we talked about this and think 
it's better to open a proper discussion and make a decision.

Sure, let me lead the discussion in the dev channel and welcome you to add 
the inputs there. Next, we should also discuss the rule which PRs can be 
backported to RC branches when we do the release. In Spark 2.3 release, we 
backported many PRs that should not be merged to the release candidate 
branches. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
> The behavior consistency among Python/Scala/R/JAVA does not mean a bug, 
right?

This case specifically `collect` in PySpark doesn't work alone whereas all 
other actions like `foreach`, `show` and other cases in other languages works 
in all other APIs. Also, that's what a query execution listener describes. Do 
you believe you would make this exception for PySpark specifically in any case?

I am seeing `foreach` and etc was fixed 
https://github.com/apache/spark/commit/154351e6dbd24c4254094477e3f7defcba979b1a 
and also see `collect` is included in the original commit - 
https://github.com/apache/spark/commit/15ff85b3163acbe8052d4489a00bcf1d2332fcf0

> I am not against this specific PR. All the committers need to be really 
careful when they make a decision to backport a behavior change. If any 
committer does it, we should jump in and stop the backport. This is what we 
should do.

Let's open a discussion in the mailing list and see if we can see the 
agreement. I think this was not the first time we talked about this and think 
it's better to open a proper discussion and make a decision. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
> The callback works for collect in R and Scala but Python doesn't. I think 
we should at least match the behaviour. I wonder why it's hard to say a bug 
when collect is detected in some APIs but not in some APIs.

The behavior consistency among Python/Scala/R/JAVA does not mean a bug, 
right? 

> That's because the change was big and invasive. I wouldn't backport it 
too; however, this fix is relatively small.

This is the reason why we did not backport that PR. We still can backport 
the minimal changes to the previous releases.

> I think we usually use committer's judgement when we make an exception. I 
already have been seeing many backports that actually causes behaviour changes 
and I did this because it looks being backported in general. This is the reason 
why we should formally document it if this is actually the rule.

I am not against this specific PR. All the committers need to be really 
careful when they make a decision to backport a behavior change. If any 
committer does it, we should jump in and stop the backport. This is what we 
should do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
> We need to be very careful when backporting the PR with the behavior 
changes, especially when this is neither a critical issue nor a regression. 
Thus, I do not think we should backport this PR.

I am not saying we shouldn't be careful but affects actual user group and 
actual scenarios.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
> withCallback was added in Spark 1.6 release 
https://issues.apache.org/jira/browse/SPARK-11068 Since then, my understanding 
is we never clearly define which should be part of withCallback. Thus, it is 
hard to say this is a bug fix.

The callback works for `collect` in R and Scala but Python doesn't. I think 
we should at least match the behaviour. I wonder why it's hard to say a bug 
when `collect` is detected in some APIs but not in some APIs.

> We hit the similar issue in #18064. At that time, we did not backport the 
PR to the previous releases too.

That's because the change was big and invasive. I wouldn't backport it too; 
however, this fix is relatively small.

> Thus, I do not think we should make an exception for this PR just because 
the customers of @HyukjinKwon hit this issue

It's not because my customers but I am saying it fixes an actual usecase 
and it affects actual users.

> If we make an exception, it becomes harder to decide which PRs are 
qualified for a backport.

I think we usually use committer's judgement when we make an exception. I 
already have been seeing many backports that actually causes behaviour changes 
and I did this because it looks being backported in general. This is the reason 
why we should formally document it if this is actually the rule.

What I am less sure is, why this one specifically prompted you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
`withCallback` was added in Spark 1.6 release 
https://issues.apache.org/jira/browse/SPARK-11068 Since then, my understanding 
is we never clearly define which should be part of `withCallback`. Thus, it is 
hard to say this is a bug fix. 

We hit the similar issue in https://github.com/apache/spark/pull/18064. At 
that time, we did not backport the PR to the previous releases too. Thus, I do 
not think we should make an exception for this PR just because the customers of 
@HyukjinKwon hit this issue. If we make an exception, it becomes harder to 
decide which PRs are qualified for a backport. 

We need to be very careful when backporting the PR with the behavior 
changes, especially when this is **neither a critical issue nor a regression**. 
Thus, I do not think we should backport this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-15 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21060
  
This certainly looks like a bug fix. I don't know this area well, but I 
don't see an argument here that the current behavior is correct. Right?

When we say we don't back-port behavior changes, we mean "changes in what 
is meant to be correct behavior". All bug fixes change behavior, but to restore 
correct behavior. So I don't see an argument against back-porting because it's 
a behavior change. 

Of course, sometimes practical concerns override that. If we thought 
programs were relying on the 'wrong' behavior then we'd have to think twice 
about correcting it. I don't see that argument being made here, but, I'm not 
sure? There is evidence the 'wrong' behavior is impacting users though?

@gatorsmile I must say I don't understand your position here, can you 
clarify? So far standard practice here says this is a reasonable backport. 
What's different here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
Yup, that should reduce some overhead like this. I would like to listen 
what you guys think cc @srowenn, @vanzin, @felixcheung, @holdenk too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
I do think we should clearly document the rule what we can backport. 

I do not think we should make an exception for this PR. cc @rxin @marmbrus 
@yhuai @cloud-fan @ueshin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
How about we formally document that in the guide?

I have been always putting more importance on practice and I personally 
think we are fine to make a backport if it's a bug and the fix is 
straightforward. IMHO, principal is a base but we should put more importance on 
practice. 

Even if I take your words, I would then like to make this as an exception 
since this fixes actual usecases from our customers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
If this can be treated as a bug to backport, we have many behavior change 
PRs that can be backported. We are building the system software. We have to be 
more principled.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
I agree that It's better to avoid a behaviour change but this one is a 
clearly a bug and the fix is straightforward. I am puzzled why this 
specifically prompted you. I wouldn't revert if there's not specific worry 
about this patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
This is just the basic backport rule we follow for each PR. We should not 
make an exception for this PR. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
I am a bit puzzled because `QueryExecutionListener` should call the 
callback for actions and `collect` triggers it in Scala and R but it doesn't in 
PySpark specifically. It sounds a bug and this fix is relatively 
straightforward. The previous behaviour was it was not being called which 
didn't make sense.

I agree that it's discouraged to make a behaviour change to the maintenance 
release, sure. However, I was thinking it makes sense to backport if the fix is 
not complicated and looks a bug quite clearly. I think we shouldn't say it's 
improvement in this case.

Were actual apps or test cases broken somewhere?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
Users apps should not be blamed in this case. If they want this change, 
they should upgrade to the newer release. Basically, we should not introduce 
any external behavior change in the maintenance release if possible. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
I guess the behaviour changes here is that a custom query execution 
listener now can recognise the action `collect` in PySpark which other APIs 
have detected. Mind explaining how it breaks external apps? If the callback 
should not be called specifically `collect` but not other actions like `show` 
in PySpark, I would say it should be to blame yours apps.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
This will introduce the behavior change and it is not a regression. The 
changes we made in this PR could break the external app. We should not do it in 
the maintenance release. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
hm I would say it's a bug since the action is not detected which is 
supposed to call the callback. The test is a bit complicated but the fix is 
relatively straightforward.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
Since this is not a bug fix, I plan to revert this PR. WDYT? @HyukjinKwon 
@BryanCutler 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
Merged to branch-2.3.

Thanks for reviewing this @BryanCutler.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89363/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21060
  
**[Test build #89363 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89363/testReport)**
 for PR 21060 at commit 
[`4656724`](https://github.com/apache/spark/commit/4656724d27c208d794f99691cfbf93b4bb118d93).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2328/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21060
  
**[Test build #89363 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89363/testReport)**
 for PR 21060 at commit 
[`4656724`](https://github.com/apache/spark/commit/4656724d27c208d794f99691cfbf93b4bb118d93).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21060
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89352/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21060
  
**[Test build #89352 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89352/testReport)**
 for PR 21060 at commit 
[`4656724`](https://github.com/apache/spark/commit/4656724d27c208d794f99691cfbf93b4bb118d93).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2323/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21060
  
**[Test build #89352 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89352/testReport)**
 for PR 21060 at commit 
[`4656724`](https://github.com/apache/spark/commit/4656724d27c208d794f99691cfbf93b4bb118d93).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21060
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89327/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21060
  
**[Test build #89327 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89327/testReport)**
 for PR 21060 at commit 
[`4656724`](https://github.com/apache/spark/commit/4656724d27c208d794f99691cfbf93b4bb118d93).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2306/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21060
  
**[Test build #89327 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89327/testReport)**
 for PR 21060 at commit 
[`4656724`](https://github.com/apache/spark/commit/4656724d27c208d794f99691cfbf93b4bb118d93).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89312/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21060
  
**[Test build #89312 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89312/testReport)**
 for PR 21060 at commit 
[`4656724`](https://github.com/apache/spark/commit/4656724d27c208d794f99691cfbf93b4bb118d93).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2294/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21060
  
**[Test build #89312 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89312/testReport)**
 for PR 21060 at commit 
[`4656724`](https://github.com/apache/spark/commit/4656724d27c208d794f99691cfbf93b4bb118d93).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
cc @BryanCutler 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org