GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/18719

    [SPARK-21512][SQL][TEST] DatasetCacheSuite needs to execute unpersistent 
after executing peristent

    ## What changes were proposed in this pull request?
    
    This PR avoids to reuse unpersistent dataset among test cases by making 
dataset unpersistent at the end of each test case.
    
    In `DatasetCacheSuite`, the test case `"get storage level"` does not make 
dataset unpersisit after make the dataset persisitent. The same dataset will be 
made persistent by the test case `"persist and then rebind right encoder when 
join 2 datasets"` Thus, we run these test cases, the second case does not 
perform to make dataset persistent. This is because in 
    
    When we run only the second case, it performs to make dataset persistent. 
It is not good to change behavior of the second test suite. The first test case 
should correctly make dataset unpersistent.
    
    ```
    Testing started at 17:52 ...
    01:52:15.053 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
    01:52:48.595 WARN org.apache.spark.sql.execution.CacheManager: Asked to 
cache already cached data.
    01:52:48.692 WARN org.apache.spark.sql.execution.CacheManager: Asked to 
cache already cached data.
    01:52:50.864 WARN org.apache.spark.storage.RandomBlockReplicationPolicy: 
Expecting 1 replicas with only 0 peer/s.
    01:52:50.864 WARN org.apache.spark.storage.RandomBlockReplicationPolicy: 
Expecting 1 replicas with only 0 peer/s.
    01:52:50.868 WARN org.apache.spark.storage.BlockManager: Block rdd_8_1 
replicated to only 0 peer(s) instead of 1 peers
    01:52:50.868 WARN org.apache.spark.storage.BlockManager: Block rdd_8_0 
replicated to only 0 peer(s) instead of 1 peers
    ```
    
    After this PR, these messages do not appear
    ```
    Testing started at 18:14 ...
    02:15:05.329 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
    
    Process finished with exit code 0
    ```
    
    ## How was this patch tested?
    
    Used the existing test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-21512

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18719.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18719
    
----
commit 3e4438de4fafa4795c2225a8bba2e4e1172c1948
Author: Kazuaki Ishizaki <[email protected]>
Date:   2017-07-23T09:33:05Z

    initial commit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to