Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/6864#issuecomment-113729847
  
    With the help from @yhuai, finally found the root cause of the 
`OrcSourceSuite` failures showed in previous Jenkins builds. [SPARK-8501] [1] 
is opened to track that issue.
    
    The reason why it shows in this PR and couldn't be reproduced locally on my 
laptop is that I changed the thread count number of the local `SparkContext` 
used by `TestHiveContext` to `*`, which uses 32 cores on Jenkins and 8 cores on 
my laptop. On the other hand, the testing data used in `OrcSourceSuite` 
consists of 10 rows, which means the ORC table written on my laptop consists of 
8 part-files and each one contains some rows, while the one written on Jenkins 
consists of 32 part-files and some of them contains zero rows. It turned out 
that those empty ORC files messed things up. Please refer to [SPARK-8501] [1] 
for details.
    
    For this reason, I made two more updates:
    
    1. Change `local[*]` to `local[32]` for more determinism. 32 is chosen 
because Jenkins has 32 cores, and it should be enough for detecting concurrency 
issues.
    2. Increased row number of the testing data used in `OrcSourceSuite` to 100 
to temporarily workaround the build failure. SPARK-8501 will be fixed in 
another PR.
    
    [1]: https://issues.apache.org/jira/browse/SPARK-8501


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to