Hm, I need to check if statistics are enabled for this table and up-to-date.
I'm going to check this.

I don't know if I was clear in my previous statement, but I am surprised
that a job is launched just by doing a select * from my_table.
I thought a select * from my_table was not running any MR jobs.

Best regards.

Tale.

On Mon, Mar 21, 2016 at 4:48 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Well I use Spark as engine.
>
> Now the question is have you updated statistics on ORC table?
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 21 March 2016 at 15:32, Tale Firefly <tale.h...@gmail.com> wrote:
>
>> Re.
>>
>> Ty ty for your answer.
>>
>> I'm using Tez as execution engine for this query.
>> And it launches a job to yarn.
>>
>> Do you know why it launches a job just for a select when I use Tez as
>> execution engine ?
>>
>> BR.
>>
>> Tale
>>
>>
>> On Mon, Mar 21, 2016 at 4:17 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Your query is a table level query  that covers all rows in the table.
>>>
>>> Using ODBC you are connecting to Hive server 2 that runs on a given port.
>>>
>>> Depending on the version of Hive you are running Hive under the
>>> bonnet is most likely using Map-Reduce as the execution engine.
>>>
>>> Data has to be collected from all blocks that hold data for this table.
>>> The underlying ORC stats can only act at table level as there is no
>>> predicate push down and data has to be sent to ODBC driver through the
>>> network.
>>>
>>> The ODBC driver can only communicate with Hive server 2 so there is no
>>> connectivity to individual nodes from your client.
>>>
>>> So in summary Hive server 2 collects data from all blocks and forwards
>>> it to the client. The actual collection and filtering of result set in SQL
>>> query will depend on many factors.
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 21 March 2016 at 14:26, Tale Firefly <tale.h...@gmail.com> wrote:
>>>
>>>> Hello guys !
>>>>
>>>> I'm trying to understand the mechanism for a simple query select * from
>>>> my_table when using HiveServer2.
>>>>
>>>> I'm using the hortonworks ODBC Driver for HiveServer2.
>>>> I just do a select * from my_table.
>>>> my_table is an ORC table based on files divised into blocks located on
>>>> all my datanodes.
>>>> I have 50 datanodes.
>>>>
>>>> My question is the following :
>>>> Does all the data go from the datanodes to the node hosting the
>>>> hiveserver2 before coming back to my client ?
>>>> Or does all the data go directly from the datanodes to my client ?
>>>>
>>>> Hope you can help me o/
>>>>
>>>> Thank you
>>>>
>>>> Tale
>>>>
>>>
>>>
>>
>

Reply via email to