GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22429

    [SPARK-25440][SQL] Dump query execution info to a file

    ## What changes were proposed in this pull request?
    
    In the PR, I propose new method for debugging queries by dumping info about 
their execution to a file. It saves logical, optimized and physical plan 
similar to the `explain()` method + generated code. One of the advantages of 
the method over `explain` is it doesn't truncate output and doesn't not 
materializes full output in memory which can cause OOMs.
    
    ## How was this patch tested?
    
    Added a test which checks that new method dumps correct info about a query.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 plan-to-file

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22429.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22429
    
----
commit 19b9a684b6d0985cf563257d6321fd6f14458d36
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-15T12:22:42Z

    Stub implementation and a test

commit 90832f9571e8cafde622069dec4f837141b07c30
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-15T12:57:24Z

    Saving all plans to file

commit 673ae565e42acc00df9acfba670c0491172ffb19
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-15T13:02:49Z

    Output attributes

commit fbde8120122c83eecbad2d060f959e467ecb4ff0
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-15T13:14:50Z

    Output whole stage codegen

commit dca19d33ed516bea8e3d113c5e81e3e1e2f2d77d
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-15T13:25:43Z

    Reusing codegenToOutputStream

commit 66351a09f60f0c9b21abe12bdb559a36761e8a8c
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-15T14:16:22Z

    Code de-duplication

commit 2ee75bcd4d495eb6031581ad7a38e757c254175b
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-09-15T14:41:26Z

    Do not truncate fields

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to