Oh my bad, even with the execution engine set to MR, my query turns into a
MR job.

I'm gonna make more tests with Hive CLI and beeline, and excel to check if
this behaviour is linked to the ODBC driver.

BR.

Tale.

On Mon, Mar 21, 2016 at 4:56 PM, Tale Firefly <tale.h...@gmail.com> wrote:

> Hm, I need to check if statistics are enabled for this table and
> up-to-date.
> I'm going to check this.
>
> I don't know if I was clear in my previous statement, but I am surprised
> that a job is launched just by doing a select * from my_table.
> I thought a select * from my_table was not running any MR jobs.
>
> Best regards.
>
> Tale.
>
> On Mon, Mar 21, 2016 at 4:48 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Well I use Spark as engine.
>>
>> Now the question is have you updated statistics on ORC table?
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 21 March 2016 at 15:32, Tale Firefly <tale.h...@gmail.com> wrote:
>>
>>> Re.
>>>
>>> Ty ty for your answer.
>>>
>>> I'm using Tez as execution engine for this query.
>>> And it launches a job to yarn.
>>>
>>> Do you know why it launches a job just for a select when I use Tez as
>>> execution engine ?
>>>
>>> BR.
>>>
>>> Tale
>>>
>>>
>>> On Mon, Mar 21, 2016 at 4:17 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Your query is a table level query  that covers all rows in the table.
>>>>
>>>> Using ODBC you are connecting to Hive server 2 that runs on a given
>>>> port.
>>>>
>>>> Depending on the version of Hive you are running Hive under the
>>>> bonnet is most likely using Map-Reduce as the execution engine.
>>>>
>>>> Data has to be collected from all blocks that hold data for this table.
>>>> The underlying ORC stats can only act at table level as there is no
>>>> predicate push down and data has to be sent to ODBC driver through the
>>>> network.
>>>>
>>>> The ODBC driver can only communicate with Hive server 2 so there is no
>>>> connectivity to individual nodes from your client.
>>>>
>>>> So in summary Hive server 2 collects data from all blocks and forwards
>>>> it to the client. The actual collection and filtering of result set in SQL
>>>> query will depend on many factors.
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 21 March 2016 at 14:26, Tale Firefly <tale.h...@gmail.com> wrote:
>>>>
>>>>> Hello guys !
>>>>>
>>>>> I'm trying to understand the mechanism for a simple query select *
>>>>> from my_table when using HiveServer2.
>>>>>
>>>>> I'm using the hortonworks ODBC Driver for HiveServer2.
>>>>> I just do a select * from my_table.
>>>>> my_table is an ORC table based on files divised into blocks located on
>>>>> all my datanodes.
>>>>> I have 50 datanodes.
>>>>>
>>>>> My question is the following :
>>>>> Does all the data go from the datanodes to the node hosting the
>>>>> hiveserver2 before coming back to my client ?
>>>>> Or does all the data go directly from the datanodes to my client ?
>>>>>
>>>>> Hope you can help me o/
>>>>>
>>>>> Thank you
>>>>>
>>>>> Tale
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to