Re: [Drill 1.10.0/1.12.0] Query Started Taking Time + frequent one or more node lost connectivity error

Kunal Khatua Wed, 14 Mar 2018 07:14:12 -0700

Hi Anup

It helps if you can share the profile (*.sys.drill / *.json files) to help
explain. I don't think the user mailing list allows attachments, so you
could use an online document sharing service (e.g. Google Drive, etc) to do
the same.


Coming back to your description, it seems like you are trying to read from
a source and write to a destination with partitioning (or a
HashJoin/HashAgg prior to writing). If that is the case, the records are
all getting into 1 fragment most likely because of skew in the data's
unique values on which you are doing a partition.

Is the data highly skewed on such a column?



On Wed, Mar 14, 2018 at 1:16 AM, Anup Tiwari <anup.tiw...@games24x7.com>
wrote:

> Also i have observed one thing, the query which is taking time is creating
> ~30-40 fragments and 99.99999% of record is getting written into only one
> fragment.
>
>
>
>
> On Wed, Mar 14, 2018 1:37 PM, Anup Tiwari anup.tiw...@games24x7.com
> wrote:
> Hi Padma,
> Please find my highlighted answer w.r.t. your question :-
> Connection loss error can happen when zookeeper thinks that a node is dead
> becauseit did not get heartbeat from the node. It can be because the node
> is
> busy or you havenetwork problems. Q) Did anything changed in your network
> ? Answer : No. Also we cross verify Intra communication within nodes and its
> working fine.
>
> Q) Is the data static or are you adding new data ? Answer : Data is static.
> Q) Do you have metadata caching enabled ?Answer : No.
> PARQUET_WRITER seem to be indicate you are doing some kind of CTAS. : This
> is
> correct, we are doing CTAS.
> The block missing exception could possibly mean some problem with name
> node or
> bad diskson one of the node. : There is no bad disk also when i checked
> that
> file from hadoop ls command and it is present so can you tell me why here
> drill
> is showing block missing? Also you have mentioned "it could possibly mean
> problem with name node"; i have checked namenode is running fine. Also we
> are
> executing some hive queries on same cluster those are running fine so if
> it is
> namenode issue then i think it should affect all queries.
>
>
>
>
> On Mon, Mar 12, 2018 11:24 PM, Padma Penumarthy ppenumar...@mapr.com
> wrote:
> There can be lot of issues here.
>
> Connection loss error can happen when zookeeper thinks that a node is dead
> because
>
> it did not get heartbeat from the node. It can be because the node is busy
> or
> you have
>
> network problems. Did anything changed in your network ?
> Is the data static or are you adding new data ? Do you have metadata
> caching
> enabled ?
>
> PARQUET_WRITER seem to be indicate you are doing some kind of CTAS.
>
> The block missing exception could possibly mean some problem with name
> node or
> bad disks
>
> on one of the node.
>
>
>
> Thanks
>
> Padma
>
>
>
>
> On Mar 12, 2018, at 1:27 AM, Anup Tiwari <anup.tiw...@games24x7.com>
>> wrote:
>>
>
>
>>
> Hi All,
>>
>
> From last couple of days i am stuck in a problem. I have a query which left
>>
>
> joins 3 drill tables(parquet), everyday it is used to take around 15-20
>> mins
>>
> but
>
> from last couple of days it is taking more than 45 mins and when i tried to
>>
>
> drill down i can see in operator profile that 40% query time is going to
>>
>
> PARQUET_WRITER and 28% time in PARQUET_ROW_GROUP_SCAN. I am not sure if
>> before
>>
>
> this issue the stats were same or not as earlier it gets executed in 15-20
>> min
>>
>
> max.Also on top of this a table, we used to create a table which is now
>>
> showing
>
> below error :-
>>
>
> SYSTEM ERROR: BlockMissingException: Could not obtain block:
>>
>
> BP-1083556055-10.51.2.101-1481111327179:blk_1094763477_21022752
>>
>
> Also in last few days i am getting frequent one or more node lost
>> connectivity
>>
>
> error.
>>
>
> I just upgraded to Drill 1.12.0 from 1.10.0 but above issues are still
>> there.
>>
>
> Any help will be appreciated.
>>
>
> Regards,
>>
>
> Anup Tiwari
>>
>
>
>
>
>
>
>
>
>
> Regards,
> Anup Tiwari
>
>
> Regards,
> Anup Tiwari

Re: [Drill 1.10.0/1.12.0] Query Started Taking Time + frequent one or more node lost connectivity error

Reply via email to