[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741980#comment-13741980 ] Jaideep Dhok commented on HIVE-4569: I have put up a patch on the work done so far. In this patch, ExecuteStatement and ExecutestatementAsync are two separate calls. This also has GetQueryPlan. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch, HIVE-4569.D12333.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742335#comment-13742335 ] Henry Robinson commented on HIVE-4569: -- As an alternative suggestion, what about considering a {{WaitUntilComplete(TOperationStatus)}} call? The benefit would be that there was immediately a way to block on the result of every operation (rather than adding {{*Async}} APIs to the interface and doubling its size). Then {{executeStatement}} doesn't need to change its documented semantics, and Hive can immediately be compatible by making {{WaitUntilComplete}} a no-op until asynchronous support is completely ready. I also agree that it might be worth splitting this discussion into a separate JIRA. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch, HIVE-4569.D12333.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742735#comment-13742735 ] Vaibhav Gumashta commented on HIVE-4569: [~ashutoshc][~henryr][~jaid...@research.iiit.ac.in] [~thejas] I definitely think execute async is quite ready and would be a good idea to have that in, while we discuss concerns on GetQueryPlan/TaskStatus. Without splitting, it might be kind of hard to focus on each. While reviewing this patch, I was actually trying to group the changes in two sets - I have a document which kind of summarizes the changes of each group (1. ExecuteAsync 2. GetQueryPlan + TaskStatus). I can upload that if you guys find use for it (if we decide on splitting, we can use it to see what we want in each JIRA). GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch, HIVE-4569.D12333.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740760#comment-13740760 ] Thejas M Nair commented on HIVE-4569: - bq. I will put up a new request, and keep updating it if there are further comments? Sounds good. Looking forward to it. And thanks for working on this! Regarding [~vaibhavgumashta]'s comment about GetQueryPlan backward compatibility. We need to examine what guarantees can be given regarding backward compatibility of the json string queryplan. Is the thrift json structure stable if used by generic json parsers ? I think we should at least state that the operator types and stage types can change across versions. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740787#comment-13740787 ] Thejas M Nair commented on HIVE-4569: - Here are some comments about the async execution api changes from a jdbc/odbc driver implementation perspective - h3. jdbc/odbc requirements: I think the asynchronous execution api is going to be very useful for jdbc/odbc as well. For a long running query there are higher chances of interruptions in the network connection to HS2. This is specially true for HS2 over http (HIVE-4763), where it might pass through http proxy servers. The downside of the async call is that the *dbc client moves to a pull model instead of what was like a push equivalent. It will have to poll with some sleep in between the poll requests to avoid too much load on the server. But this sleep can cause delays in getting notified when the execution is finished. So it will be useful to have support for long poll in such a case to simulate a push (http://en.wikipedia.org/wiki/Push_technology#Long_polling). So that clients can tell the server that it is actually interested in doing a long poll, we need support for it in the HS2 api. Another difference for jdbc/odbc requirement from GetOperationStatus api is that it won't make use of the status of each task. Only the completion of the query execution matters from jdbc/odbc perspective. So for odbc/jdbc the long poll should return before a 'long poll timeout' only if the query has completed. h3. Question about api: While the actual implementation of long poll can be in a different followup jira, I thought it will be useful to discuss if this should have an impact on the async api changes. How should we meet this odbc/jdbc need ? If we follow the pattern we have followed with async execute, this would result in a new GetOperationStatusLongPoll call. It doesn't look like this requirement will have impact on changes planned in this jira, but I just wanted to put my thoughts out incase there were other opinions. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740809#comment-13740809 ] Carl Steinbach commented on HIVE-4569: -- [~jaideepdhok] If I call GetQueryPlan for a statement x, and then subsequently call ExecuteStatement on the same statement, is it guaranteed that ExecuteStatement will always use the same plan that was returned earlier by GetQueryPlan? The names of the functions seem to imply this, but the comments in TCLIService.thrift don't stipulate that ExecuteStatement will use the plan generated by the previous GetQueryPlan call instead of recompiling the statement and possibly creating a different plan. Adding a PrepareStatement call (e.g. PrepareStatement[, GetQueryPlan], ExecuteStatement) is one way of resolving this ambiguity, and at the same time it will help to maintain the close alignment between the HS2 API and ODBC/JDBC. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740808#comment-13740808 ] Carl Steinbach commented on HIVE-4569: -- bq. I think we should at least state that the operator types and stage types can change across versions. Good luck with that. As soon as you have a couple third-party applications that depend on this serialization format you will be locked in regardless of how many warnings you place in the code. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740811#comment-13740811 ] Carl Steinbach commented on HIVE-4569: -- bq. Thejas M Nair I think making executeStatement async by default may break users' expectations since it's a blocking call. Carl Steinbach Had suggested earlier to create two separate calls executeStatement and executeStatementAsync so that the API is easier to understand. I agree with that approach. If we have two different calls, then users can pick one based on their need. It's possible to overload ExecuteStatement to support both synchronous and asynchronous modes without breaking backward compatibility by adding an optional boolean isAsync flag to the request message and setting the default value to false. Whether or not this makes more sense than the current approach hinges largely on how many more optional variables we expect to add to the ExecuteStatement[Async] request messages in the future. If we have two functions then we'll need to make the same changes in two different places. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741259#comment-13741259 ] Thejas M Nair commented on HIVE-4569: - bq. Good luck with that. As soon as you have a couple third-party applications that depend on this serialization format you will be locked in regardless of how many warnings you place in the code. Yes, I agree that risk is very real. Do we want to put these commitments on the still young hive ? Trying to keep this api backward compatible can be a big burden for hive. Should we go for something more minimalistic instead ? Just a compile() function instead of getQueryPlan() like what was put forward in HIVE-4321 ? bq. It's possible to overload ExecuteStatement to support both synchronous and asynchronous modes without breaking backward compatibility by adding an optional boolean isAsync flag to the request message and setting the default value to false. I am ok with having different functions for this. But I think function overloading is a more natural way of doing this. Deciding whether it should be async or not based on a parameter seems more natural way of programming, compared to using different functions for that. We can either have one function with default value or have two with same name. ie, Instead of ExecuteStatementAsync, I think having a ExecuteStatement with additional isAsync parameter is more clean. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741298#comment-13741298 ] Vaibhav Gumashta commented on HIVE-4569: [~cwsteinbach] [~thejas] With respect to overloading ExecuteStatement, I think the previous patch by [~jaid...@research.iiit.ac.in] was probably doing that. But there was a suggestion in the rb that overloading ExecuteStatement in the thrift API may not correspond to overloading in CLIService/ICLIService. Are you suggesting that the thrift api has just ExecuteStatement, and based on whether the async flag is set to true/false in the corresponding TExecuteStatementReq, we branch off to using ICLIService#executeStatementAsync or ICLIService#executeStatement? GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741324#comment-13741324 ] Thejas M Nair commented on HIVE-4569: - bq. Whether or not this makes more sense than the current approach hinges largely on how many more optional variables we expect to add to the ExecuteStatement[Async] request messages in the future. I think it is reasonable to expect the unexpected, ie expect more optional parameters coming up in future. This is what I am thinking. [~cwsteinbach] Please let me know if you think this is reasonable. 1. In TCLDriver.thrift, as in the original patch, add optional bool runAsync to TExecuteStatementReq 2. In ICLIService (and its implementation CLIService), introduce a executeAsync function that gets called if runAsync==true. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741335#comment-13741335 ] Vaibhav Gumashta commented on HIVE-4569: [~thejas] Only addition I would make is setting runAsync to false by default in TExecuteStatementReq. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741335#comment-13741335 ] Vaibhav Gumashta commented on HIVE-4569: [~thejas] Only addition I would make is setting runAsync to false by default in TExecuteStatementReq. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741823#comment-13741823 ] Jaideep Dhok commented on HIVE-4569: bq. [~vgumashta] What could be the use case for returning the query plan? And how will it be consumed by the client? Making it public means that any change to the query plan in future will break the consumer code. It was outlined in the HS2 spec, but not implemented. Having a query plan is useful for tracking query progress. We have another use case where we want to access query plan through code, but currently there's no way to do that. If you want to guard against changes to query plan code, then plan object needs to be declared at the thrift layer, and implementation has to convert between internal query plan (ql layer) to thrift query plan (and vice versa), like it is being done for data types and operation states. bq. [~thejas] Is the thrift json structure stable if used by generic json parsers ? I think we should at least state that the operator types and stage types can change across versions. You need the Thrift JSON parsers to encode/decode the JSON query plan into the corresponding Java object. bq. [~cwsteinbach] If I call GetQueryPlan for a statement x, and then subsequently call ExecuteStatement on the same statement, is it guaranteed that ExecuteStatement will always use the same plan that was returned earlier by GetQueryPlan? Yes, unless configuration was altered between the two calls through SET operations, or the conf overlay is different. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741828#comment-13741828 ] Jaideep Dhok commented on HIVE-4569: [~thejas] I think having two calls with different names, ExecuteStatement and ExecuteStatementAsync will be less confusing for the user. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741952#comment-13741952 ] Ashutosh Chauhan commented on HIVE-4569: Seems like there is a general consensus that async execute statement is a good idea. So, lets unblock it and get that part of the patch in. In the meanwhile we can continue to discuss the way to add getQueryPlan. [~jaideepdhok] I understand we have went back n forth on doing these two issues in one patch Vs multiple, but looks like thats a good way to make progress. If you agree, can you put up a patch containing async execute statement. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739276#comment-13739276 ] Jaideep Dhok commented on HIVE-4569: [~vgumashta] Initially it was split into three JIRAs, but other people suggested that it would be easier to track progress in a single JIRA. I've completed most of the changes, and have updated based on last review by [~cwsteinbach] GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739279#comment-13739279 ] Jaideep Dhok commented on HIVE-4569: Sorry for the duplicate review request. Please refer to the last one. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739334#comment-13739334 ] Thejas M Nair commented on HIVE-4569: - [~jaideepdhok] The patch on phabricator links look incomplete, for example it is missing service/if/TCLIService.thrift. Can you update the patch in the phabricator link with original review comments (https://reviews.facebook.net/D11469) ? That way it is easier to track changes across patches. Having a new phabricator link for each patch iteration makes it difficult to follow the changes between patches. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739363#comment-13739363 ] Thejas M Nair commented on HIVE-4569: - [~jaideepdhok] [~cwsteinbach] Should we keep the api simple (small) by just making the current execute function asynchronous instead of adding an additional execute function in the api ? I think [~henryr] has a good point that it was always documented to be asynchronous (it just happened that it always was so late in returning the call that the operation was finished :) ). Also, I think it makes sense to make the GetResultSetMetadata and FetchResults api blocking until operation finishes, instead of throwing an error if status is not FINISHED. This will also help to prevent breakage of any user code that was written with the assumption that execute is blocking. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739385#comment-13739385 ] Jaideep Dhok commented on HIVE-4569: bq. Having a new phabricator link for each patch iteration makes it difficult to follow the changes between patches. [~thejas] Looks like the changes got split into two requests. Unfortunately I am unable to update the previous revision, as I had lost the previous arc commit. I will put up a new request, and keep updating it if there are further comments? bq. Should we keep the api simple (small) by just making the current execute function asynchronous instead of adding an additional execute function in the api ? [~thejas] I think making executeStatement async by default may break users' expectations since it's a blocking call. [~cwsteinbach] Had suggested earlier to create two separate calls executeStatement and executeStatementAsync so that the API is easier to understand. I agree with that approach. If we have two different calls, then users can pick one based on their need. For getting result set in case of async the flow would be - ExecuteStatementAsync, GetOperationStatus (until query completes), then fetch result set. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739420#comment-13739420 ] Vaibhav Gumashta commented on HIVE-4569: [~thejas] I think you mean by making GetResultSetMetadata and FetchResults API blocking, we can change the executeStatement to async by default but at the same time not break any user code? GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739435#comment-13739435 ] Amareshwari Sriramadasu commented on HIVE-4569: --- I think it makes sense to have two apis as JDBC drivers can call one with sync and other users interested in async can call async api. Though the documentation of execute() has to be changed to say that it is executed synchronously. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740420#comment-13740420 ] Vaibhav Gumashta commented on HIVE-4569: [~jaid...@research.iiit.ac.in] [~amareshwari] I have some concern regarding the GetQueryPlan api that we are exposing regarding backward compatibility. What could be the use case for returning the query plan? And how will it be consumed by the client? Making it public means that any change to the query plan in future will break the consumer code. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738575#comment-13738575 ] Vaibhav Gumashta commented on HIVE-4569: It seems that this JIRA is handling two different use cases: 1. Implement ExecuteStatement asynchronously (and the related GetOperationStatus api) 2. Implement GetQueryPlan api. I see that these are fairly independent features. How about we split it into 2 JIRAS to have independent focussed discussion? Also, I can volunteer to continue the work. Thanks. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736885#comment-13736885 ] Henry Robinson commented on HIVE-4569: -- Although {{executeStatement}} is implemented synchronously in Hive, was it meant to be synchronous from the outset? The comment in the Thrift definition suggests otherwise: {code} // ExecuteStatement() // // Execute a statement. // The returned OperationHandle can be used to check on the // status of the statement, and to fetch results once the // statement has finished executing. {code} GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699773#comment-13699773 ] Jaideep Dhok commented on HIVE-4569: {quote} Thrift makes it easy to add additional optional parameters without breaking backward compatibility, but not Java. I'd recommend creating a new executeStatementAsync call to ICLIService (and here) instead of modifying the method signature. Also, that probably indicates that we should add a new complimentary RPC to the HS2 Thrift IDL instead of using adding an optional parameter to ExecuteStatement just to keep these things in sync. {quote} Do we need explicit new request and response objects for both executeStatement and executeStatementAsync calls? I think the same request call should do? Also, I found in the code that conf overlay is not actually being applied before executing an operation. I suppose there should be another JIRA for that. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699775#comment-13699775 ] Jaideep Dhok commented on HIVE-4569: bq. I think the same request call should do? Sorry, I meant the same request and response objects used in ExecuteStatement at the moment. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695920#comment-13695920 ] Phabricator commented on HIVE-4569: --- cwsteinbach has commented on the revision HIVE-4569 [jira] GetQueryPlan api in Hive Server2. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/TaskStatus.java:1 Missing ASF license header. ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:1019 This looks like a debug statement. Should it be removed? ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java:95 Can you add some comments here explaining what each one of these states actually means? Also, do we need an UNKNOWN state? I included one in the Thrift IDL OperationState, but in retrospect that was probably a mistake. service/if/TCLIService.thrift:34 As discussed earlier we shouldn't add this dependency to the HS2 API. Please remove it and return the Task information in JSON or XML. service/if/TCLIService.thrift:41 We need to bump the version number since this patch extends the HS2 API with new functionality. Can you also please add a comment here briefly summarize what was added in the new version? service/if/TCLIService.thrift:594 Thrift allows you specify default values for optional fields. I think we should set this value to 'false' by default. service/if/TCLIService.thrift:866 Just want to double-check that TTaskState and TTaskStatus will be removed since the plan state will be serialized as JSON or XML, right? service/if/TCLIService.thrift:1003 Where is TGetQueryPlanReq? The comments at the top stipulate that every RPC has it's own req/resp message pair. service/if/TCLIService.thrift:1006 Just double-checking that this will be changed to a string. service/if/TCLIService.thrift:1043 Please don't overload TExecuteStatementReq. service/src/java/org/apache/hive/service/cli/CLIService.java:149 Thrift makes it easy to add additional optional parameters without breaking backward compatibility, but not Java. I'd recommend creating a new executeStatementAsync call to ICLIService (and here) instead of modifying the method signature. Also, that probably indicates that we should add a new complimentary RPC to the HS2 Thrift IDL instead of using adding an optional parameter to ExecuteStatement just to keep these things in sync. service/src/java/org/apache/hive/service/cli/CLIService.java:318 s/:getQueryPlan/: getQueryPlan/ ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java:367 I don't think this method is thread-safe. I recommend replacing the four boolean state variables (started, initialized, isdone, queued, wth??) with the single TaskState enum you added and make sure that all access to this state variable is synchronized. REVISION DETAIL https://reviews.facebook.net/D11469 To: JIRA, jaideepdhok Cc: cwsteinbach GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695923#comment-13695923 ] Carl Steinbach commented on HIVE-4569: -- [~jaideepdhok] I made it through half the patch and left comments on phabricator. I'll aim to get through the rest sometime this weekend. Sorry for the delay. Also, I just wanted to say thanks for tackling this problem. Support for async execution was a big hole in the API and I'm excited that it's going to be fixed soon. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694605#comment-13694605 ] Jaideep Dhok commented on HIVE-4569: [~cwsteinbach] Do you have any comments on the rest of the patch? GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691935#comment-13691935 ] Carl Steinbach commented on HIVE-4569: -- @Jaideep: Thanks for posting an updated patch. I plan to spend some more time tonight looking this over closely, but in the meantime I wanted to raise one high-level concern. I think the HS2 Thrift API should be as self-contained as possible. In particular I don't think it's a good idea to inherit functionality from any of the quasi-public Thrift APIs that already exist (e.g. queryplan.thrift) for the following reasons: * We version the HS2 Thrift API in order to maintain backward compatibility with older clients, and I'm worried that people will forget to bump the version number in TCLIService.thrift when they make a change in queryplan.thrift. * One of the original design goals of HS2 was to decouple the network serialization layer from the service layer in the interest of eventually being able to easily support multiple different serialization formats (e.g. Protobufs, Avro, Thrift, etc). I think depending on queryplan.thrift will make it harder to do this. * At the moment TCLIService.thrift doesn't expose anything that ties it directly to Hive, and I'd like to keep it that way. For example, there's no reason why we couldn't also embed the Pig language runtime in HS2 and expose it through the HS2 API (see the [AccessServer proposal|https://cwiki.apache.org/confluence/display/Hive/AccessServer+Design+Proposal] for more details). Tying the new QueryPlan RPC to queryplan.thrift will make this harder to do. Instead of depending on queryplan.thrift I'd like to propose that TGetQueryPlanResp return a JSON or XML encoded version of the queryplan. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692731#comment-13692731 ] Jaideep Dhok commented on HIVE-4569: [~cwsteinbach] Thanks for the reply. I was not aware of AccessServer. If we don't need a dependency on queryplan.thrift, then I guess it would make sense to use XML encoding, since there is already code to serialize/deserialize query plan to/from XML. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692748#comment-13692748 ] Carl Steinbach commented on HIVE-4569: -- [~jaideepdhok] Thrift also supports two different types of JSON serialization: TJSONProtocol and TSimpleJSONProtcol. I have no preference either way, but I've noticed that JSON seems to be more popular than XML these days. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681858#comment-13681858 ] Thejas M Nair commented on HIVE-4569: - bq. Right now I have done this by passing a boolean flag while calling executeStatement [~jaideepdhok] I assume this is going to be an optional field, and that adding optional fields to thrift argument would keep the api backwards compatible to old clients that don't set this field. Can you please confirm ? GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681905#comment-13681905 ] Jaideep Dhok commented on HIVE-4569: [~thejas] It's an optional field in the Thrift request object for execute statement. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13675698#comment-13675698 ] Jaideep Dhok commented on HIVE-4569: Update on the work done so far - # h5. Added getQueryPlan API with Thrift # h5. Added support for non-blocking queries. ## Right now I have done this by passing a boolean flag while calling executeStatement ## If the flag is set to true, query runs in non-blocking mode. The flag defaults to false. ## I've implemented this by adding a fixed size thread pool in the OperationManager, for running non-blocking operations. A reference to the future is kept in the operation, so that it can be cancelled. ## Once the query is running in the background, users can poll status using GetOperationStatus. ## Users can cancel the query by calling CancelOperation # h5. Additions in GetOperationStatus ## OperationManager calls operation.getTaskStatuses(), Each operation can override this method to customize reporting ## SQLOperation returns the task statuses by calling getTaskStatuses() on the current driver. ## Driver reports task statuses by iterating through all tasks in the plan ## Changes in HS2 thrift API - {code} // GetOperationStatus() // // Get the status of an operation running on the server. struct TGetOperationStatusReq { // Session to run this request against 1: required TOperationHandle operationHandle } // State of a sub task in an operation enum TTaskState { // The task has been initialized INITIALIZED_STATE, // Driver is currently running the task RUNNING_STATE, // Task is completed FINISHED_STATE, // Task is queued in the driver QUEUED_STATE, // State is unkown UNKOWN_STATE } // Status of a sub task in an operation struct TTaskStatus { // Task ID 1: required string taskId // External ID for this task, For example MapRedTask can return job ID of the Hadoop job 2: optional string externalHandle // Current state of the task as seen by driver 3: required TTaskState state } struct TGetOperationStatusResp { 1: required TStatus status // State of the whole operation 2: optional TOperationState operationState // List of statuses of sub tasks 3: optional listTTaskStatus taskStatuses } {code} h5. Things pending as of now # If the Task runs in a sub-process, then external handle (job ID) is returned as null. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668072#comment-13668072 ] Jaideep Dhok commented on HIVE-4569: Work from HIVE-4570 and HIVE-4617 have been moved to this issue. To restate the scope of the issue, here are the proposed changes: # Add GetQueryPlan Thrift API. This will return plan object containing Stage and Task information for the query. This call will not run the query. # A way to run query asynchronously so that query progress can be monitored without waiting them to complete. # Extend OperationState struct returned by GetOperationState to include more information like job IDs launched for sub-tasks, query progress indicator. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13664636#comment-13664636 ] Carl Steinbach commented on HIVE-4569: -- bq. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. It was not added because it became clear during implementation of HiveServer2 that it was a bad idea to extend (i.e. depend on) any of the existing legacy Hive Thrift APIs. We also were narrowly focused on supporting JDBC/ODBC, and neither of these APIs provide explicit support for retrieving the execution plan. @Jaideep: I think it would be a good idea to post some notes about how you plan to modify the HS2 Thrift API and get feedback before spending time doing the implementation work. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13664836#comment-13664836 ] Jaideep Dhok commented on HIVE-4569: @Carl: This change will not affect JDBC/ODBC clients. Currently clients using Thrift have no way to get query plan, which is why we wanted to add this. Here are the changes proposed: # Add GetQueryPlan with arguments same as ExecuteStatement - {code}TGetQueryPlanResp GetQueryPlan(1:TExecuteStatementReq req);{code} # Run a SQLOperation for the request, calling Driver.compile with the statement and return the plan object. Throw HiveSQLException with return code of compile if it fails. # New response type for the above call - {code} struct TGetQueryPlanResp { 1: required TStatus status // Queryplan 2: required queryplan.Query plan } {code} We'll have to include queryplan.thrift in TCLIService.thrift for the return type GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira