Oh my bad, even with the execution engine set to MR, my query turns into a MR job.
I'm gonna make more tests with Hive CLI and beeline, and excel to check if this behaviour is linked to the ODBC driver. BR. Tale. On Mon, Mar 21, 2016 at 4:56 PM, Tale Firefly <tale.h...@gmail.com> wrote: > Hm, I need to check if statistics are enabled for this table and > up-to-date. > I'm going to check this. > > I don't know if I was clear in my previous statement, but I am surprised > that a job is launched just by doing a select * from my_table. > I thought a select * from my_table was not running any MR jobs. > > Best regards. > > Tale. > > On Mon, Mar 21, 2016 at 4:48 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Well I use Spark as engine. >> >> Now the question is have you updated statistics on ORC table? >> >> HTH >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 21 March 2016 at 15:32, Tale Firefly <tale.h...@gmail.com> wrote: >> >>> Re. >>> >>> Ty ty for your answer. >>> >>> I'm using Tez as execution engine for this query. >>> And it launches a job to yarn. >>> >>> Do you know why it launches a job just for a select when I use Tez as >>> execution engine ? >>> >>> BR. >>> >>> Tale >>> >>> >>> On Mon, Mar 21, 2016 at 4:17 PM, Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> Your query is a table level query that covers all rows in the table. >>>> >>>> Using ODBC you are connecting to Hive server 2 that runs on a given >>>> port. >>>> >>>> Depending on the version of Hive you are running Hive under the >>>> bonnet is most likely using Map-Reduce as the execution engine. >>>> >>>> Data has to be collected from all blocks that hold data for this table. >>>> The underlying ORC stats can only act at table level as there is no >>>> predicate push down and data has to be sent to ODBC driver through the >>>> network. >>>> >>>> The ODBC driver can only communicate with Hive server 2 so there is no >>>> connectivity to individual nodes from your client. >>>> >>>> So in summary Hive server 2 collects data from all blocks and forwards >>>> it to the client. The actual collection and filtering of result set in SQL >>>> query will depend on many factors. >>>> >>>> HTH >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 21 March 2016 at 14:26, Tale Firefly <tale.h...@gmail.com> wrote: >>>> >>>>> Hello guys ! >>>>> >>>>> I'm trying to understand the mechanism for a simple query select * >>>>> from my_table when using HiveServer2. >>>>> >>>>> I'm using the hortonworks ODBC Driver for HiveServer2. >>>>> I just do a select * from my_table. >>>>> my_table is an ORC table based on files divised into blocks located on >>>>> all my datanodes. >>>>> I have 50 datanodes. >>>>> >>>>> My question is the following : >>>>> Does all the data go from the datanodes to the node hosting the >>>>> hiveserver2 before coming back to my client ? >>>>> Or does all the data go directly from the datanodes to my client ? >>>>> >>>>> Hope you can help me o/ >>>>> >>>>> Thank you >>>>> >>>>> Tale >>>>> >>>> >>>> >>> >> >