Re: Mechanism when doing a select *

Tale Firefly Tue, 22 Mar 2016 02:22:34 -0700

Hello everyone.

Thanks for your answers.


I'm gonna test this.

Best regards.

Tale



On Mon, Mar 21, 2016 at 10:06 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> Hi
>
> Simple select * query launches a job when the input size is >1Gb by
> default. Two configs that determines if a job has to be launched
>
> hive.fetch.task.conversion
> hive.fetch.task.conversion.threshold
>
> Is your table size >1GB (hive.fetch.task.conversion.threshold)? You can
> see that from “describe formatted tablename”.
>
> Thanks
> Prasanth
>
> On Mar 21, 2016, at 11:16 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> You are correct. it  should not. There is nothing to optimise here.
>
> 0: jdbc:hive2://rhes564:10010/default>
> *select * from countries; *OK
> INFO  : Compiling
> command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318):
> select * from countries
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema:
> Schema(fieldSchemas:[FieldSchema(name:countries.country_id, type:double,
> comment:null), FieldSchema(name:countries.country_iso_code, type:string,
> comment:null), FieldSchema(name:countries.country_name, type:string,
> comment:null), FieldSchema(name:countries.country_subregion, type:string,
> comment:null), FieldSchema(name:countries.country_subregion_id,
> type:double, comment:null), FieldSchema(name:countries.country_region,
> type:string, comment:null), FieldSchema(name:countries.country_region_id,
> type:double, comment:null), FieldSchema(name:countries.country_total,
> type:string, comment:null), FieldSchema(name:countries.country_total_id,
> type:double, comment:null), FieldSchema(name:countries.country_name_hist,
> type:string, comment:null)], properties:null)
> INFO  : Completed compiling
> command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318);
> Time taken: 0.047 seconds
> INFO  : Executing
> command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318):
> select * from countries
> INFO  : Completed executing
> command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318);
> Time taken: 0.001 seconds
> INFO  : OK
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 21 March 2016 at 15:56, Tale Firefly <tale.h...@gmail.com> wrote:
>
>> Hm, I need to check if statistics are enabled for this table and
>> up-to-date.
>> I'm going to check this.
>>
>> I don't know if I was clear in my previous statement, but I am surprised
>> that a job is launched just by doing a select * from my_table.
>> I thought a select * from my_table was not running any MR jobs.
>>
>> Best regards.
>>
>> Tale.
>>
>> On Mon, Mar 21, 2016 at 4:48 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Well I use Spark as engine.
>>>
>>> Now the question is have you updated statistics on ORC table?
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 21 March 2016 at 15:32, Tale Firefly <tale.h...@gmail.com> wrote:
>>>
>>>> Re.
>>>>
>>>> Ty ty for your answer.
>>>>
>>>> I'm using Tez as execution engine for this query.
>>>> And it launches a job to yarn.
>>>>
>>>> Do you know why it launches a job just for a select when I use Tez as
>>>> execution engine ?
>>>>
>>>> BR.
>>>>
>>>> Tale
>>>>
>>>>
>>>> On Mon, Mar 21, 2016 at 4:17 PM, Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Your query is a table level query  that covers all rows in the table.
>>>>>
>>>>> Using ODBC you are connecting to Hive server 2 that runs on a given
>>>>> port.
>>>>>
>>>>> Depending on the version of Hive you are running Hive under the
>>>>> bonnet is most likely using Map-Reduce as the execution engine.
>>>>>
>>>>> Data has to be collected from all blocks that hold data for this
>>>>> table. The underlying ORC stats can only act at table level as there is no
>>>>> predicate push down and data has to be sent to ODBC driver through the
>>>>> network.
>>>>>
>>>>> The ODBC driver can only communicate with Hive server 2 so there is no
>>>>> connectivity to individual nodes from your client.
>>>>>
>>>>> So in summary Hive server 2 collects data from all blocks and forwards
>>>>> it to the client. The actual collection and filtering of result set in SQL
>>>>> query will depend on many factors.
>>>>>
>>>>> HTH
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 21 March 2016 at 14:26, Tale Firefly <tale.h...@gmail.com> wrote:
>>>>>
>>>>>> Hello guys !
>>>>>>
>>>>>> I'm trying to understand the mechanism for a simple query select *
>>>>>> from my_table when using HiveServer2.
>>>>>>
>>>>>> I'm using the hortonworks ODBC Driver for HiveServer2.
>>>>>> I just do a select * from my_table.
>>>>>> my_table is an ORC table based on files divised into blocks located
>>>>>> on all my datanodes.
>>>>>> I have 50 datanodes.
>>>>>>
>>>>>> My question is the following :
>>>>>> Does all the data go from the datanodes to the node hosting the
>>>>>> hiveserver2 before coming back to my client ?
>>>>>> Or does all the data go directly from the datanodes to my client ?
>>>>>>
>>>>>> Hope you can help me o/
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> Tale
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>

Re: Mechanism when doing a select *

Reply via email to