[GitHub] [spark] dilipbiswal opened a new pull request #28493: [SPARK-31673][SQL] QueryExection.debug.toFile() to take an addtional explain mode param.

GitBox Sun, 10 May 2020 16:05:17 -0700


dilipbiswal opened a new pull request #28493:
URL: https://github.com/apache/spark/pull/28493



   ### What changes were proposed in this pull request?
   Currently QueryExecution.debug.toFile dumps the query plan information in a 
fixed format. This PR adds an additional explain mode parameter that writes the 
debug information as per the user supplied format.
   ```
   df.queryExecution.debug.toFile("/tmp/plan.txt", explainMode = 
ExplainMode.fromString("formatted"))
   ```
   ```
   == Physical Plan ==
   * Filter (2)
   +- Scan hive default.s1 (1)
   
   
   (1) Scan hive default.s1
   Output [2]: [c1#15, c2#16]
   Arguments: [c1#15, c2#16], HiveTableRelation `default`.`s1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#15, c2#16]
   
   (2) Filter [codegen id : 1]
   Input [2]: [c1#15, c2#16]
   Condition : (isnotnull(c1#15) AND (c1#15 > 0))
   
   
   == Whole Stage Codegen ==
   Found 1 WholeStageCodegen subtrees.
   == Subtree 1 / 1 (maxMethodCodeSize:220; maxConstantPoolSize:105(0.16% 
used); numInnerClasses:0) ==
   *(1) Filter (isnotnull(c1#15) AND (c1#15 > 0))
   +- Scan hive default.s1 [c1#15, c2#16], HiveTableRelation `default`.`s1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#15, c2#16]
   
   Generated code:
   /* 001 */ public Object generate(Object[] references) {
   /* 002 */   return new GeneratedIteratorForCodegenStage1(references);
   /* 003 */ }
   /* 004 */
   /* 005 */ // codegenStageId=1
   /* 006 */ final class GeneratedIteratorForCodegenStage1 extends 
org.apache.spark.sql.execution.BufferedRowIterator {
   /* 007 */   private Object[] references;
   /* 008 */   private scala.collection.Iterator[] inputs;
   /* 009 */   private scala.collection.Iterator inputadapter_input_0;
   /* 010 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] 
filter_mutableStateArray_0 = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[1];
   /* 011 */
   /* 012 */   public GeneratedIteratorForCodegenStage1(Object[] references) {
   /* 013 */     this.references = references;
   /* 014 */   }
   /* 015 */
   /* 016 */   public void init(int index, scala.collection.Iterator[] inputs) {
   /* 017 */     partitionIndex = index;
   /* 018 */     this.inputs = inputs;
   /* 019 */     inputadapter_input_0 = inputs[0];
   /* 020 */     filter_mutableStateArray_0[0] = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(2, 0);
   /* 021 */
   /* 022 */   }
   /* 023 */
   /* 024 */   protected void processNext() throws java.io.IOException {
   /* 025 */     while ( inputadapter_input_0.hasNext()) {
   /* 026 */       InternalRow inputadapter_row_0 = (InternalRow) 
inputadapter_input_0.next();
   /* 027 */
   /* 028 */       do {
   /* 029 */         boolean inputadapter_isNull_0 = 
inputadapter_row_0.isNullAt(0);
   /* 030 */         int inputadapter_value_0 = inputadapter_isNull_0 ?
   /* 031 */         -1 : (inputadapter_row_0.getInt(0));
   /* 032 */
   /* 033 */         boolean filter_value_2 = !inputadapter_isNull_0;
   /* 034 */         if (!filter_value_2) continue;
   /* 035 */
   /* 036 */         boolean filter_value_3 = false;
   /* 037 */         filter_value_3 = inputadapter_value_0 > 0;
   /* 038 */         if (!filter_value_3) continue;
   /* 039 */
   /* 040 */         ((org.apache.spark.sql.execution.metric.SQLMetric) 
references[0] /* numOutputRows */).add(1);
   /* 041 */
   /* 042 */         boolean inputadapter_isNull_1 = 
inputadapter_row_0.isNullAt(1);
   /* 043 */         int inputadapter_value_1 = inputadapter_isNull_1 ?
   /* 044 */         -1 : (inputadapter_row_0.getInt(1));
   /* 045 */         filter_mutableStateArray_0[0].reset();
   /* 046 */
   /* 047 */         filter_mutableStateArray_0[0].zeroOutNullBytes();
   /* 048 */
   /* 049 */         filter_mutableStateArray_0[0].write(0, 
inputadapter_value_0);
   /* 050 */
   /* 051 */         if (inputadapter_isNull_1) {
   /* 052 */           filter_mutableStateArray_0[0].setNullAt(1);
   /* 053 */         } else {
   /* 054 */           filter_mutableStateArray_0[0].write(1, 
inputadapter_value_1);
   /* 055 */         }
   /* 056 */         append((filter_mutableStateArray_0[0].getRow()));
   /* 057 */
   /* 058 */       } while(false);
   /* 059 */       if (shouldStop()) return;
   /* 060 */     }
   /* 061 */   }
   /* 062 */
   /* 063 */ }
   ```
   ### Why are the changes needed?
   Hopefully enhances the usability of debug.toFile(..)
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   Added a test in QueryExecutionSuite
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dilipbiswal opened a new pull request #28493: [SPARK-31673][SQL] QueryExection.debug.toFile() to take an addtional explain mode param.

Reply via email to