Hello everyone. Thanks for your answers.
I'm gonna test this. Best regards. Tale On Mon, Mar 21, 2016 at 10:06 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Hi > > Simple select * query launches a job when the input size is >1Gb by > default. Two configs that determines if a job has to be launched > > hive.fetch.task.conversion > hive.fetch.task.conversion.threshold > > Is your table size >1GB (hive.fetch.task.conversion.threshold)? You can > see that from “describe formatted tablename”. > > Thanks > Prasanth > > On Mar 21, 2016, at 11:16 AM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > You are correct. it should not. There is nothing to optimise here. > > 0: jdbc:hive2://rhes564:10010/default> > *select * from countries; *OK > INFO : Compiling > command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318): > select * from countries > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: > Schema(fieldSchemas:[FieldSchema(name:countries.country_id, type:double, > comment:null), FieldSchema(name:countries.country_iso_code, type:string, > comment:null), FieldSchema(name:countries.country_name, type:string, > comment:null), FieldSchema(name:countries.country_subregion, type:string, > comment:null), FieldSchema(name:countries.country_subregion_id, > type:double, comment:null), FieldSchema(name:countries.country_region, > type:string, comment:null), FieldSchema(name:countries.country_region_id, > type:double, comment:null), FieldSchema(name:countries.country_total, > type:string, comment:null), FieldSchema(name:countries.country_total_id, > type:double, comment:null), FieldSchema(name:countries.country_name_hist, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318); > Time taken: 0.047 seconds > INFO : Executing > command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318): > select * from countries > INFO : Completed executing > command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318); > Time taken: 0.001 seconds > INFO : OK > > Dr Mich Talebzadeh > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > http://talebzadehmich.wordpress.com > > > > On 21 March 2016 at 15:56, Tale Firefly <tale.h...@gmail.com> wrote: > >> Hm, I need to check if statistics are enabled for this table and >> up-to-date. >> I'm going to check this. >> >> I don't know if I was clear in my previous statement, but I am surprised >> that a job is launched just by doing a select * from my_table. >> I thought a select * from my_table was not running any MR jobs. >> >> Best regards. >> >> Tale. >> >> On Mon, Mar 21, 2016 at 4:48 PM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Well I use Spark as engine. >>> >>> Now the question is have you updated statistics on ORC table? >>> >>> HTH >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 21 March 2016 at 15:32, Tale Firefly <tale.h...@gmail.com> wrote: >>> >>>> Re. >>>> >>>> Ty ty for your answer. >>>> >>>> I'm using Tez as execution engine for this query. >>>> And it launches a job to yarn. >>>> >>>> Do you know why it launches a job just for a select when I use Tez as >>>> execution engine ? >>>> >>>> BR. >>>> >>>> Tale >>>> >>>> >>>> On Mon, Mar 21, 2016 at 4:17 PM, Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Your query is a table level query that covers all rows in the table. >>>>> >>>>> Using ODBC you are connecting to Hive server 2 that runs on a given >>>>> port. >>>>> >>>>> Depending on the version of Hive you are running Hive under the >>>>> bonnet is most likely using Map-Reduce as the execution engine. >>>>> >>>>> Data has to be collected from all blocks that hold data for this >>>>> table. The underlying ORC stats can only act at table level as there is no >>>>> predicate push down and data has to be sent to ODBC driver through the >>>>> network. >>>>> >>>>> The ODBC driver can only communicate with Hive server 2 so there is no >>>>> connectivity to individual nodes from your client. >>>>> >>>>> So in summary Hive server 2 collects data from all blocks and forwards >>>>> it to the client. The actual collection and filtering of result set in SQL >>>>> query will depend on many factors. >>>>> >>>>> HTH >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 21 March 2016 at 14:26, Tale Firefly <tale.h...@gmail.com> wrote: >>>>> >>>>>> Hello guys ! >>>>>> >>>>>> I'm trying to understand the mechanism for a simple query select * >>>>>> from my_table when using HiveServer2. >>>>>> >>>>>> I'm using the hortonworks ODBC Driver for HiveServer2. >>>>>> I just do a select * from my_table. >>>>>> my_table is an ORC table based on files divised into blocks located >>>>>> on all my datanodes. >>>>>> I have 50 datanodes. >>>>>> >>>>>> My question is the following : >>>>>> Does all the data go from the datanodes to the node hosting the >>>>>> hiveserver2 before coming back to my client ? >>>>>> Or does all the data go directly from the datanodes to my client ? >>>>>> >>>>>> Hope you can help me o/ >>>>>> >>>>>> Thank you >>>>>> >>>>>> Tale >>>>>> >>>>> >>>>> >>>> >>> >> > >