Re: Review Request 71811: Extract Compiler from Driver

2019-11-30 Thread Miklos Gergely


> On Nov. 28, 2019, 12:39 p.m., Zoltan Haindrich wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
> > Lines 165 (patched)
> > 
> >
> > I think DriverState should live at the "Driver" level; and not get 
> > mixed into other classes (followup?)

Unfortunately the failure of some tests showed me that driverState must be in 
Compiler too. Compiler is a step of the driver, and during that step some state 
modifications are happening, thus it should be sent to the compiler to mark 
them.


- Miklos


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71811/#review218846
---


On Dec. 1, 2019, 7:12 a.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71811/
> ---
> 
> (Updated Dec. 1, 2019, 7:12 a.m.)
> 
> 
> Review request for hive and Zoltan Haindrich.
> 
> 
> Bugs: HIVE-22526
> https://issues.apache.org/jira/browse/HIVE-22526
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Driver class contains ~600 lines of code responsible for compiling the 
> command. That means that from the command String a Plan needs to be created, 
> and also a transaction needs to be started (in most of the cases). This is a 
> thing done by the compile function, which has a lot of sub functions to help 
> this task, while itself is also really big. All these codes should be put 
> into a separate class, where it can do it's job without getting mixed with 
> the other codes in the Driver.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Compiler.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 54d12baa96 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1afcfc8969 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java 26e904af0b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 4d79ebc933 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java fa6c9d03ec 
> 
> 
> Diff: https://reviews.apache.org/r/71811/diff/4/
> 
> 
> Testing
> ---
> 
> All the tests are still running fine.
> 
> 
> Thanks,
> 
> Miklos Gergely
> 
>



Re: Review Request 71811: Extract Compiler from Driver

2019-11-30 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71811/
---

(Updated Dec. 1, 2019, 7:12 a.m.)


Review request for hive and Zoltan Haindrich.


Bugs: HIVE-22526
https://issues.apache.org/jira/browse/HIVE-22526


Repository: hive-git


Description
---

The Driver class contains ~600 lines of code responsible for compiling the 
command. That means that from the command String a Plan needs to be created, 
and also a transaction needs to be started (in most of the cases). This is a 
thing done by the compile function, which has a lot of sub functions to help 
this task, while itself is also really big. All these codes should be put into 
a separate class, where it can do it's job without getting mixed with the other 
codes in the Driver.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/Compiler.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 54d12baa96 
  ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1afcfc8969 
  ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java 26e904af0b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 4d79ebc933 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java fa6c9d03ec 


Diff: https://reviews.apache.org/r/71811/diff/4/

Changes: https://reviews.apache.org/r/71811/diff/3-4/


Testing
---

All the tests are still running fine.


Thanks,

Miklos Gergely



Re: Review Request 71811: Extract Compiler from Driver

2019-11-30 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71811/
---

(Updated Nov. 30, 2019, 10:54 a.m.)


Review request for hive and Zoltan Haindrich.


Bugs: HIVE-22526
https://issues.apache.org/jira/browse/HIVE-22526


Repository: hive-git


Description
---

The Driver class contains ~600 lines of code responsible for compiling the 
command. That means that from the command String a Plan needs to be created, 
and also a transaction needs to be started (in most of the cases). This is a 
thing done by the compile function, which has a lot of sub functions to help 
this task, while itself is also really big. All these codes should be put into 
a separate class, where it can do it's job without getting mixed with the other 
codes in the Driver.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/Compiler.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 54d12baa96 
  ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1afcfc8969 
  ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java 26e904af0b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 4d79ebc933 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java fa6c9d03ec 


Diff: https://reviews.apache.org/r/71811/diff/3/

Changes: https://reviews.apache.org/r/71811/diff/2-3/


Testing
---

All the tests are still running fine.


Thanks,

Miklos Gergely



Re: Review Request 71811: Extract Compiler from Driver

2019-11-29 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71811/
---

(Updated Nov. 29, 2019, 9:47 p.m.)


Review request for hive and Zoltan Haindrich.


Bugs: HIVE-22526
https://issues.apache.org/jira/browse/HIVE-22526


Repository: hive-git


Description
---

The Driver class contains ~600 lines of code responsible for compiling the 
command. That means that from the command String a Plan needs to be created, 
and also a transaction needs to be started (in most of the cases). This is a 
thing done by the compile function, which has a lot of sub functions to help 
this task, while itself is also really big. All these codes should be put into 
a separate class, where it can do it's job without getting mixed with the other 
codes in the Driver.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/Compiler.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java bb41c15bb4 
  ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1afcfc8969 
  ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java 26e904af0b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 4d79ebc933 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java fa6c9d03ec 


Diff: https://reviews.apache.org/r/71811/diff/2/

Changes: https://reviews.apache.org/r/71811/diff/1-2/


Testing
---

All the tests are still running fine.


Thanks,

Miklos Gergely



Re: Review Request 71811: Extract Compiler from Driver

2019-11-29 Thread Miklos Gergely


> On Nov. 28, 2019, 12:39 p.m., Zoltan Haindrich wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
> > Lines 167 (patched)
> > 
> >
> > I don't feel this closely connected to compilation; queryId could be 
> > assigned in the driver (followup?)

Moved queryId related stuff back to Driver.


- Miklos


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71811/#review218846
---


On Nov. 29, 2019, 9:47 p.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71811/
> ---
> 
> (Updated Nov. 29, 2019, 9:47 p.m.)
> 
> 
> Review request for hive and Zoltan Haindrich.
> 
> 
> Bugs: HIVE-22526
> https://issues.apache.org/jira/browse/HIVE-22526
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Driver class contains ~600 lines of code responsible for compiling the 
> command. That means that from the command String a Plan needs to be created, 
> and also a transaction needs to be started (in most of the cases). This is a 
> thing done by the compile function, which has a lot of sub functions to help 
> this task, while itself is also really big. All these codes should be put 
> into a separate class, where it can do it's job without getting mixed with 
> the other codes in the Driver.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Compiler.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java bb41c15bb4 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1afcfc8969 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java 26e904af0b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 4d79ebc933 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java fa6c9d03ec 
> 
> 
> Diff: https://reviews.apache.org/r/71811/diff/2/
> 
> 
> Testing
> ---
> 
> All the tests are still running fine.
> 
> 
> Thanks,
> 
> Miklos Gergely
> 
>



Re: Review Request 71811: Extract Compiler from Driver

2019-11-29 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71811/#review218856
---




ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 101 (patched)


Good point, let's put it into this patch. I've moved some functions into 
the DriverUtils, so now both Compiler and Driver can access them. Some other 
function call are put back into Driver.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 106 (patched)


sure, done



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 125 (patched)


That was what I had in mind at first. Later I realized that the AST tree 
can not be the return of the parse function, as it may be modified later in the 
analyze function. Still I agree with you, let's have the semantic analyzer and 
the query plan at least as local variables, result of the functions analyze and 
createPlan. I also agree on returning the plan, it is making it cleaner indeed.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 132 (patched)


Got rid of parseError and compileError as global variables, now they are 
local variables in compile. In cleanUp compileError was and still not always 
true, as cleanUp is also invoked in the finally block if there was no error at 
all.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 161 (patched)


Moved back to Driver.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 165 (patched)


Moved back to Driver.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 167 (patched)


Moved back to Driver.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 188 (patched)


I've moved everything back to the driver except the processing of the query 
string itself, I believe it may still should be here in the compiler. Let me 
know what you think.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 193 (patched)


Ensuring of the context moved to Driver.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 498 (patched)


Agree, moved it there.



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 565 (patched)


Moved it to Hive. Nice catch!


- Miklos Gergely


On Nov. 29, 2019, 9:47 p.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71811/
> ---
> 
> (Updated Nov. 29, 2019, 9:47 p.m.)
> 
> 
> Review request for hive and Zoltan Haindrich.
> 
> 
> Bugs: HIVE-22526
> https://issues.apache.org/jira/browse/HIVE-22526
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Driver class contains ~600 lines of code responsible for compiling the 
> command. That means that from the command String a Plan needs to be created, 
> and also a transaction needs to be started (in most of the cases). This is a 
> thing done by the compile function, which has a lot of sub functions to help 
> this task, while itself is also really big. All these codes should be put 
> into a separate class, where it can do it's job without getting mixed with 
> the other codes in the Driver.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Compiler.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java bb41c15bb4 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1afcfc8969 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverUtils.java 26e904af0b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 4d79ebc933 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java fa6c9d03ec 
> 
> 
> Diff: https://reviews.apache.org/r/71811/diff/2/
> 
> 
> Testing
> ---
> 
> All the tests are still running fine.
> 
> 
> Thanks,
> 
> Miklos Gergely
> 
>



Re: Review Request 71811: Extract Compiler from Driver

2019-11-28 Thread Zoltan Haindrich

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71811/#review218846
---




ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 101 (patched)


it would be great to reduce the class tangle between Driver and this 
Compiler class - not sure if that's possible in this patch or not...



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 106 (patched)


nit: can we have this as an apidoc instead of some comments?



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 125 (patched)


I think `plan` should be a local variable; and this method could return it; 
so that compile produces something :)
...and similarily for other fields could probably be made local variables - 
local variables make it more clear what's happening (to me at least); and they 
also help in making the code "drop-to-frame" friendly - which is very handy 
during debugging sessions...



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 132 (patched)


I haven't checked it; this might be something which was even there before:

* parse() throws ParseError (which is not cpe)
* before returning with the exception it sets parseError boolean
* handleException sets compileError unconditionally
* cleanUp does some conditional on parseError; and uses compileError - but 
that is true in that case unconditionally

these parseError/compileError booleans seem to be bad to me...



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 161 (patched)


this doesn't belong here; I think it is safeto leave this reset thing in 
the Driver; and not pollute this new class with the "resetTaskIds" thing; or 
there is something I'm not considering?



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 165 (patched)


I think DriverState should live at the "Driver" level; and not get mixed 
into other classes (followup?)



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 167 (patched)


I don't feel this closely connected to compilation; queryId could be 
assigned in the driver (followup?)



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 188 (patched)


okay...after I've made a few comments in this method right now I kinda feel 
that we should try to leave this "initialize" outside in the Driver somewhere; 
it kind works only on the "driverContext" anyway...



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 193 (patched)


I don't think this should be here; can't we have a context when we enter 
the compiler?



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 498 (patched)


I think it might make sense to relocate this method into the `ExplainTask` 
itself; instead of keeping it here...



ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
Lines 565 (patched)


the method `Driver.dumpMetaCallTimingWithoutEx` can be made static; however 
even with that modifier I'm not sure where to put itright now it doesn't 
seem to belong to neither of Driver/Compiler class


- Zoltan Haindrich


On Nov. 25, 2019, 10:24 a.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71811/
> ---
> 
> (Updated Nov. 25, 2019, 10:24 a.m.)
> 
> 
> Review request for hive and Zoltan Haindrich.
> 
> 
> Bugs: HIVE-22526
> https://issues.apache.org/jira/browse/HIVE-22526
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Driver class contains ~600 lines of code responsible for compiling the 
> command. That means that from the command String a Plan needs to be created, 
> and also a transaction needs to be started (in most of the cases). This is a 
> thing done by the compile function, which has a lot of sub functions to help 
> this task, while itself is also really big. All these codes should be put 
> into a separate class, where it can do it's job without getting mixed with 
> the other codes in the Driver.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Compiler.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java bb41c15bb4 
>   

Review Request 71811: Extract Compiler from Driver

2019-11-25 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71811/
---

Review request for hive and Zoltan Haindrich.


Bugs: HIVE-22526
https://issues.apache.org/jira/browse/HIVE-22526


Repository: hive-git


Description
---

The Driver class contains ~600 lines of code responsible for compiling the 
command. That means that from the command String a Plan needs to be created, 
and also a transaction needs to be started (in most of the cases). This is a 
thing done by the compile function, which has a lot of sub functions to help 
this task, while itself is also really big. All these codes should be put into 
a separate class, where it can do it's job without getting mixed with the other 
codes in the Driver.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/Compiler.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java bb41c15bb4 
  ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1afcfc8969 


Diff: https://reviews.apache.org/r/71811/diff/1/


Testing
---

All the tests are still running fine.


Thanks,

Miklos Gergely