[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

gatorsmile Mon, 16 Apr 2018 20:21:07 -0700

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21060
  
    I might not explain it well. Sorry for the misunderstanding. Thank you 
@rxin for helping me clarify my points. It sounds like many of you think this 
backport is fine. I am not against this specific PR. We do not need to revert 
the PR but just improve the documentation. That should be fine, although I 
still personally prefer to adding the configuration.  
    
    As what I said in the original PR 
https://github.com/apache/spark/pull/21007 that was merged to master, let me 
point out two points here too.
    
    - PR descriptions will be part of the commit log. We need to be very 
careful before merging the PR. In the past, I also missed a few when I did the 
merge. To be honest, I am not sure how the native English speakers think. The 
first paragraph scared me when I reading the PR commit log. @srowen WDYT?
    
    ```
    This PR proposes to add collect to a query executor as an action.
    ```
    
    - Document the behavior changes that are visible to the external 
users/developers. In Spark 2.3, we started to enforce it in every merged PR. I 
believe many of you got multiple similar comments in the previous PRs. This PR 
should also upgrade the migration guides. @HyukjinKwon Do you agree?
    
    Before we finalize the backport policy, below is my inputs about the 
whitelist which we can backport:
    - The critical/important bug fixes and security fixes.
    - The regression fixes.
    - The PRs that do not touch the production code, like test-only patches, 
documentation fixes, and the log message fixes.
    
    Avoid backporting the PRs if it contains
    - The new features
    - The minor bug fixes/improvements that have external behavior changes
    - The code refactoring
    - The code changes with the high/mid risk
    
    In the OSS community, I believe no committer will be fired just because we 
merged/introduced a bug, right? If the users application failed due to an 
upgrade, normally we blame our users or the bug are just accidentally 
introduced. However, this is not acceptable in my first team. Let me share what 
I experienced. Just various customer accidents in my related product teams. 
    
    - One director got demoted (almost fired) due to a bad release. She is a 
very nice lady. We really like her. That release had many cool features but the 
quality is not controlled well. Many customers are not willing to upgrade. 
    - There is a famous system upgrade failure a few years ago. The whole 
system became very slow after the upgrade. It took 10s hours to recover the 
system. After a few days, the GM went to the customer site and got blamed in 
the whole day. Multiple architects and VPs were forced to write apology 
letters. Customers planned to sue us.  In the customer side, the CTO got fired 
later and the upgrade accident was also on the national TV news because it 
affects many people. 
    - A few directors were on call with me 10+ nights to resolve one Japanese 
customer data corruption issue. The client teams ran multiple systems at the 
same time to reproduce the issue. After a few weeks, it was finally resolved 
after reading the memory dump. The root cause is the code merge from one branch 
to another branch many years ago. 
    
    If all the above people believes Spark is the best product in Big Data, we 
need to be more conservative. Our decisions could affect many people. This is 
not the first time I argued with the other committers/contributors about the PR 
quality. In one previous PR, I left almost 100 comments just because the 
documents are not accurate.
    
    If my above comments offend anyone, I apologize. Everyone has different 
understanding about the software development because we have different work 
experience. The whole community already did a wonderful job compared with the 
other open source projects. I still believe we can do a better job, right? Let 
us formalize the backport policy and enforce them in each release.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

Reply via email to