Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21060
I might not explain it well. Sorry for the misunderstanding. Thank you
@rxin for helping me clarify my points. It sounds like many of you think this
backport is fine. I am not against this specific PR. We do not need to revert
the PR but just improve the documentation. That should be fine, although I
still personally prefer to adding the configuration.
As what I said in the original PR
https://github.com/apache/spark/pull/21007 that was merged to master, let me
point out two points here too.
- PR descriptions will be part of the commit log. We need to be very
careful before merging the PR. In the past, I also missed a few when I did the
merge. To be honest, I am not sure how the native English speakers think. The
first paragraph scared me when I reading the PR commit log. @srowen WDYT?
```
This PR proposes to add collect to a query executor as an action.
```
- Document the behavior changes that are visible to the external
users/developers. In Spark 2.3, we started to enforce it in every merged PR. I
believe many of you got multiple similar comments in the previous PRs. This PR
should also upgrade the migration guides. @HyukjinKwon Do you agree?
Before we finalize the backport policy, below is my inputs about the
whitelist which we can backport:
- The critical/important bug fixes and security fixes.
- The regression fixes.
- The PRs that do not touch the production code, like test-only patches,
documentation fixes, and the log message fixes.
Avoid backporting the PRs if it contains
- The new features
- The minor bug fixes/improvements that have external behavior changes
- The code refactoring
- The code changes with the high/mid risk
In the OSS community, I believe no committer will be fired just because we
merged/introduced a bug, right? If the users application failed due to an
upgrade, normally we blame our users or the bug are just accidentally
introduced. However, this is not acceptable in my first team. Let me share what
I experienced. Just various customer accidents in my related product teams.
- One director got demoted (almost fired) due to a bad release. She is a
very nice lady. We really like her. That release had many cool features but the
quality is not controlled well. Many customers are not willing to upgrade.
- There is a famous system upgrade failure a few years ago. The whole
system became very slow after the upgrade. It took 10s hours to recover the
system. After a few days, the GM went to the customer site and got blamed in
the whole day. Multiple architects and VPs were forced to write apology
letters. Customers planned to sue us. In the customer side, the CTO got fired
later and the upgrade accident was also on the national TV news because it
affects many people.
- A few directors were on call with me 10+ nights to resolve one Japanese
customer data corruption issue. The client teams ran multiple systems at the
same time to reproduce the issue. After a few weeks, it was finally resolved
after reading the memory dump. The root cause is the code merge from one branch
to another branch many years ago.
If all the above people believes Spark is the best product in Big Data, we
need to be more conservative. Our decisions could affect many people. This is
not the first time I argued with the other committers/contributors about the PR
quality. In one previous PR, I left almost 100 comments just because the
documents are not accurate.
If my above comments offend anyone, I apologize. Everyone has different
understanding about the software development because we have different work
experience. The whole community already did a wonderful job compared with the
other open source projects. I still believe we can do a better job, right? Let
us formalize the backport policy and enforce them in each release.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]