GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/22429
[SPARK-25440][SQL] Dump query execution info to a file
## What changes were proposed in this pull request?
In the PR, I propose new method for debugging queries by dumping info about
their execution to a file. It saves logical, optimized and physical plan
similar to the `explain()` method + generated code. One of the advantages of
the method over `explain` is it doesn't truncate output and doesn't not
materializes full output in memory which can cause OOMs.
## How was this patch tested?
Added a test which checks that new method dumps correct info about a query.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 plan-to-file
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22429.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22429
----
commit 19b9a684b6d0985cf563257d6321fd6f14458d36
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-15T12:22:42Z
Stub implementation and a test
commit 90832f9571e8cafde622069dec4f837141b07c30
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-15T12:57:24Z
Saving all plans to file
commit 673ae565e42acc00df9acfba670c0491172ffb19
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-15T13:02:49Z
Output attributes
commit fbde8120122c83eecbad2d060f959e467ecb4ff0
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-15T13:14:50Z
Output whole stage codegen
commit dca19d33ed516bea8e3d113c5e81e3e1e2f2d77d
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-15T13:25:43Z
Reusing codegenToOutputStream
commit 66351a09f60f0c9b21abe12bdb559a36761e8a8c
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-15T14:16:22Z
Code de-duplication
commit 2ee75bcd4d495eb6031581ad7a38e757c254175b
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-15T14:41:26Z
Do not truncate fields
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]