Hm, I need to check if statistics are enabled for this table and up-to-date. I'm going to check this.
I don't know if I was clear in my previous statement, but I am surprised that a job is launched just by doing a select * from my_table. I thought a select * from my_table was not running any MR jobs. Best regards. Tale. On Mon, Mar 21, 2016 at 4:48 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Well I use Spark as engine. > > Now the question is have you updated statistics on ORC table? > > HTH > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 21 March 2016 at 15:32, Tale Firefly <tale.h...@gmail.com> wrote: > >> Re. >> >> Ty ty for your answer. >> >> I'm using Tez as execution engine for this query. >> And it launches a job to yarn. >> >> Do you know why it launches a job just for a select when I use Tez as >> execution engine ? >> >> BR. >> >> Tale >> >> >> On Mon, Mar 21, 2016 at 4:17 PM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Hi, >>> >>> Your query is a table level query that covers all rows in the table. >>> >>> Using ODBC you are connecting to Hive server 2 that runs on a given port. >>> >>> Depending on the version of Hive you are running Hive under the >>> bonnet is most likely using Map-Reduce as the execution engine. >>> >>> Data has to be collected from all blocks that hold data for this table. >>> The underlying ORC stats can only act at table level as there is no >>> predicate push down and data has to be sent to ODBC driver through the >>> network. >>> >>> The ODBC driver can only communicate with Hive server 2 so there is no >>> connectivity to individual nodes from your client. >>> >>> So in summary Hive server 2 collects data from all blocks and forwards >>> it to the client. The actual collection and filtering of result set in SQL >>> query will depend on many factors. >>> >>> HTH >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 21 March 2016 at 14:26, Tale Firefly <tale.h...@gmail.com> wrote: >>> >>>> Hello guys ! >>>> >>>> I'm trying to understand the mechanism for a simple query select * from >>>> my_table when using HiveServer2. >>>> >>>> I'm using the hortonworks ODBC Driver for HiveServer2. >>>> I just do a select * from my_table. >>>> my_table is an ORC table based on files divised into blocks located on >>>> all my datanodes. >>>> I have 50 datanodes. >>>> >>>> My question is the following : >>>> Does all the data go from the datanodes to the node hosting the >>>> hiveserver2 before coming back to my client ? >>>> Or does all the data go directly from the datanodes to my client ? >>>> >>>> Hope you can help me o/ >>>> >>>> Thank you >>>> >>>> Tale >>>> >>> >>> >> >