setting an administrator

2017-05-04 Thread Knapp, Michael
Hi,

I am trying to set drill administrators but it’s just not working.  I have 
setup a custom authenticator that uses a backend database for authentication, 
and that is working.  The only problem is I am a “user” not an administrator, 
leaving me essentially powerless and drill useless.

First, I think the 
instructions
 are not clear, it is not clear to me if I should be executing the SET 
statement from the web console or something else.  I have tried this:

I updated my drill-override.conf, I have attempted setting 
“drill.exec.security.admin.users” and “security.admin.users”.  I have set them 
to single values and also attempted putting the values in brackets like a list. 
 None of these combinations have worked.

It was unclear to me how I was supposed to run your SQL statements when I am 
not an administrator in the first place.  Then I guessed I should try it from 
the sqlline, but that also is not working.

sqlline> ALTER SYSTEM SET `security.admin.users` = "my_id";
No current connection

Why is it saying that I have no current connection?  What am I missing here?

Michael Knapp


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Drill query are stuck in ENQUEUED mode

2017-05-04 Thread Paul Rogers
Hi All,

Looks like queues are enabled. The numbers used for the queue are cost which 
are only very lightly related to your row count. Cost is in what I call 
“planning units”: units that make sense to produce good plans, but which don’t 
directly map to real-world unit such as rows, memory or time.

When queuing is enabled, queries start in the Foreman in the Enqueued state. 
During that time, we do query planning and parallelization.

Once planning is complete, the Foreman waits on ZooKeeper to get a run slot. 
ZooKeeper implements a distributed semaphore. I like to think of it as a “take 
a number” schema like in old-time delis: Each query takes a number and starts 
to run when its number comes up.

If queries are stuck in the Enqueued state, it could therefore be due to either 
1) planning is taking a long time, or 2) the query is waiting for its slot from 
ZK. A jstack will tell us which is the case.

BTW: The default queue sizes are probably too large for all but the largest 
clusters. You will want to adjust the numbers for your load to best make use of 
the cores and memory you have available.

Given your number of cores and memory, running 110 concurrent queries is 
probably asking too much of your hardware. Try a much smaller number: maybe 2 
and 5. Remember that all queries will be parallelized to 70% of your CPU count 
(20 in your case) and each will launch multiple threads per CPU. Running even 7 
queries, with a couple of fragments each, will cause Drill to run 7 * 2 * 20 = 
280 threads on your 24 cores which may still be too large.

So, it may be that the reason queries are stuck in Enqueued state is that your 
cluster is overloaded: queries are competing for resources, running slowly and 
blocking incoming queries.

- Paul

> On May 4, 2017, at 5:52 AM, jasbir.s...@accenture.com wrote:
> 
> Hi,
> 
> Thanks for the updates.
> 
> Just want to update you that we are already following this parameter. Value 
> of below parameters are - 
> 1. exec.queue.enable = TRUE
> 2. exec.queue.large = 10 (Default)
> 3. exec.queue.small = 100 (Default)
> 4. exec.queue.threshold = 3000(Default)
> Exec.queue.timeout_millis = 30(Default)
> 
> Even after following queries are getting stuck.  As in our case parquet 
> record count is less than 1000, do I still need to set this threshold or I 
> need to change this value?
> As it doesn't tells me on how much record count it decides to queue the query 
> in SMALL section. What is that record count?
> 
> Description of my cluster is - 
> 1. Nodes = 1 - Local mode
> 2. Memory = 64 GB
> 3. Cores = 24
> 4. It's a linux box 
> 
> Please do let me know how to resolve query stuck issue as it is hampering 
> products performance. 
> 
> Regards,
> Jasbir Singh
> 
> -Original Message-
> From: Kunal Khatua [mailto:kkha...@mapr.com] 
> Sent: Wednesday, May 03, 2017 10:43 PM
> To: user@drill.apache.org
> Cc: Kothari, Maneesh ; Kumar, H. P. 
> 
> Subject: Re: Drill query are stuck in ENQUEUED mode
> 
> You could try experimenting with these parameters:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.apache.org_docs_enabling-2Dquery-2Dqueuing_=DwIFAg=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=7scIIjM0jY9x3fjvY6a_yERLxMA2NwA8l0DnuyrL6yA=0506fa1JfLmFVj-y5MXeRZgSyJyeYwWoQYArxFhAFpc=Dr7zjb2Guf7ZBOsBVweE4jbrxUv_RRxAMGk9NBqL5Zo=
>   when you have Queuing enabled.
> 
> 
> However, in a false state, there should be no queuing (i.e. no ENQUEUED mode).
> 
> 
> Can you provide a description of your cluster? (e.g. nodes, cores, memory... 
> is it a VM, etc)
> 
> 
> One way to understand why a query is 'stuck' with queuing disabled would be 
> to use JStack and identify the state of the foreman and fragment threads 
> executing that query.
> 
> 
> For this, you first need to know what is the query ID (the part that follows 
> the 
> "https://urldefense.proofpoint.com/v2/url?u=http-3A__hostname-3A8047_profiles_=DwIFAg=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=7scIIjM0jY9x3fjvY6a_yERLxMA2NwA8l0DnuyrL6yA=0506fa1JfLmFVj-y5MXeRZgSyJyeYwWoQYArxFhAFpc=WL1ut57ugpifNnM-IVN1E-NzFBOigrMvDwHw4Uj3etg=
>  " part of the query's web UI page). Use that as a reference to identify the 
> corresponding threads.
> 
> 
> Usually, a large number of threads will carry this as a part of their thread 
> names. However, if it is stuck, my hunch is that only 1 fragment thread is 
> stuck, while the others completed. So you should be able to find that easily.
> 
> 
> Kunal Khatua
> 
> Engineering
> 
> [1490734684477_mapr.png]
> 
> www.mapr.com  >
> 
> 
> From: jasbir.s...@accenture.com 
> Sent: Wednesday, May 3, 2017 9:37:44 

Re: Drill query are stuck in ENQUEUED mode

2017-05-04 Thread Kunal Khatua
What does the jstack command (located in the same dir as the java executable) 
say when a query is hung?


Also, what is the JStack when the Drillbit is running with the 
exec.queue.enable=FALSE ?


From: jasbir.s...@accenture.com 
Sent: Thursday, May 4, 2017 5:52:43 AM
To: user@drill.apache.org
Cc: maneesh.koth...@accenture.com; h.p.ku...@accenture.com
Subject: RE: Drill query are stuck in ENQUEUED mode

Hi,

Thanks for the updates.

Just want to update you that we are already following this parameter. Value of 
below parameters are -
1. exec.queue.enable = TRUE
2. exec.queue.large = 10 (Default)
3. exec.queue.small = 100 (Default)
4. exec.queue.threshold = 3000(Default)
Exec.queue.timeout_millis = 30(Default)

Even after following queries are getting stuck.  As in our case parquet record 
count is less than 1000, do I still need to set this threshold or I need to 
change this value?
As it doesn't tells me on how much record count it decides to queue the query 
in SMALL section. What is that record count?

Description of my cluster is -
1. Nodes = 1 - Local mode
2. Memory = 64 GB
3. Cores = 24
4. It's a linux box

Please do let me know how to resolve query stuck issue as it is hampering 
products performance.

Regards,
Jasbir Singh

-Original Message-
From: Kunal Khatua [mailto:kkha...@mapr.com]
Sent: Wednesday, May 03, 2017 10:43 PM
To: user@drill.apache.org
Cc: Kothari, Maneesh ; Kumar, H. P. 

Subject: Re: Drill query are stuck in ENQUEUED mode

You could try experimenting with these parameters:

https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.apache.org_docs_enabling-2Dquery-2Dqueuing_=DwIFAg=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=7scIIjM0jY9x3fjvY6a_yERLxMA2NwA8l0DnuyrL6yA=0506fa1JfLmFVj-y5MXeRZgSyJyeYwWoQYArxFhAFpc=Dr7zjb2Guf7ZBOsBVweE4jbrxUv_RRxAMGk9NBqL5Zo=
  when you have Queuing enabled.


However, in a false state, there should be no queuing (i.e. no ENQUEUED mode).


Can you provide a description of your cluster? (e.g. nodes, cores, memory... is 
it a VM, etc)


One way to understand why a query is 'stuck' with queuing disabled would be to 
use JStack and identify the state of the foreman and fragment threads executing 
that query.


For this, you first need to know what is the query ID (the part that follows 
the 
"https://urldefense.proofpoint.com/v2/url?u=http-3A__hostname-3A8047_profiles_=DwIFAg=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=7scIIjM0jY9x3fjvY6a_yERLxMA2NwA8l0DnuyrL6yA=0506fa1JfLmFVj-y5MXeRZgSyJyeYwWoQYArxFhAFpc=WL1ut57ugpifNnM-IVN1E-NzFBOigrMvDwHw4Uj3etg=
 " part of the query's web UI page). Use that as a reference to identify the 
corresponding threads.


Usually, a large number of threads will carry this as a part of their thread 
names. However, if it is stuck, my hunch is that only 1 fragment thread is 
stuck, while the others completed. So you should be able to find that easily.


Kunal Khatua

Engineering

[1490734684477_mapr.png]

www.mapr.com


From: jasbir.s...@accenture.com 
Sent: Wednesday, May 3, 2017 9:37:44 AM
To: user@drill.apache.org
Cc: maneesh.koth...@accenture.com; h.p.ku...@accenture.com
Subject: RE: Drill query are stuck in ENQUEUED mode

Few things to update -

My parquet files are having less than 1000 records. I create around 250 parquet 
files from my application and fetch data from them using Apache Drill.
When I restart the drill instance all 250 queries on parquet files would 
execute, but after sometime out of 250 only 1 query would be stuck. And this 
process repeats thereafter after every execution.

And can someone also let me know about the number of records which determine 
that it will be in exec.queue.small?


Regards,
Jasbir Singh

-Original Message-
From: Sing, Jasbir
Sent: Wednesday, May 03, 2017 6:40 PM
To: user@drill.apache.org
Cc: Kothari, Maneesh 
Subject: RE: Drill query are stuck in ENQUEUED mode

IN false state, over the period of time queries were getting stuck in ENQUEUED 
state because of which I had turned this to TRUE and now they are even getting 
stuck with this property as TRUE.

-Original Message-
From: Khurram Faraaz [mailto:kfar...@mapr.com]
Sent: Wednesday, May 03, 2017 6:35 PM
To: user@drill.apache.org
Cc: Kothari, Maneesh 
Subject: Re: Drill query are stuck in ENQUEUED mode

Does your query execute and complete when you set exec.queue.enable = false ?

The default is to set exec.queue.enable to false.


Thanks,

Khurram


From: jasbir.s...@accenture.com 

RE: Drill query are stuck in ENQUEUED mode

2017-05-04 Thread jasbir.sing
Hi,

Thanks for the updates.

Just want to update you that we are already following this parameter. Value of 
below parameters are - 
1. exec.queue.enable = TRUE
2. exec.queue.large = 10 (Default)
3. exec.queue.small = 100 (Default)
4. exec.queue.threshold = 3000(Default)
Exec.queue.timeout_millis = 30(Default)

Even after following queries are getting stuck.  As in our case parquet record 
count is less than 1000, do I still need to set this threshold or I need to 
change this value?
As it doesn't tells me on how much record count it decides to queue the query 
in SMALL section. What is that record count?

Description of my cluster is - 
1. Nodes = 1 - Local mode
2. Memory = 64 GB
3. Cores = 24
4. It's a linux box 

Please do let me know how to resolve query stuck issue as it is hampering 
products performance. 

Regards,
Jasbir Singh

-Original Message-
From: Kunal Khatua [mailto:kkha...@mapr.com] 
Sent: Wednesday, May 03, 2017 10:43 PM
To: user@drill.apache.org
Cc: Kothari, Maneesh ; Kumar, H. P. 

Subject: Re: Drill query are stuck in ENQUEUED mode

You could try experimenting with these parameters:

https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.apache.org_docs_enabling-2Dquery-2Dqueuing_=DwIFAg=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=7scIIjM0jY9x3fjvY6a_yERLxMA2NwA8l0DnuyrL6yA=0506fa1JfLmFVj-y5MXeRZgSyJyeYwWoQYArxFhAFpc=Dr7zjb2Guf7ZBOsBVweE4jbrxUv_RRxAMGk9NBqL5Zo=
  when you have Queuing enabled.


However, in a false state, there should be no queuing (i.e. no ENQUEUED mode).


Can you provide a description of your cluster? (e.g. nodes, cores, memory... is 
it a VM, etc)


One way to understand why a query is 'stuck' with queuing disabled would be to 
use JStack and identify the state of the foreman and fragment threads executing 
that query.


For this, you first need to know what is the query ID (the part that follows 
the 
"https://urldefense.proofpoint.com/v2/url?u=http-3A__hostname-3A8047_profiles_=DwIFAg=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=7scIIjM0jY9x3fjvY6a_yERLxMA2NwA8l0DnuyrL6yA=0506fa1JfLmFVj-y5MXeRZgSyJyeYwWoQYArxFhAFpc=WL1ut57ugpifNnM-IVN1E-NzFBOigrMvDwHw4Uj3etg=
 " part of the query's web UI page). Use that as a reference to identify the 
corresponding threads.


Usually, a large number of threads will carry this as a part of their thread 
names. However, if it is stuck, my hunch is that only 1 fragment thread is 
stuck, while the others completed. So you should be able to find that easily.


Kunal Khatua

Engineering

[1490734684477_mapr.png]

www.mapr.com


From: jasbir.s...@accenture.com 
Sent: Wednesday, May 3, 2017 9:37:44 AM
To: user@drill.apache.org
Cc: maneesh.koth...@accenture.com; h.p.ku...@accenture.com
Subject: RE: Drill query are stuck in ENQUEUED mode

Few things to update -

My parquet files are having less than 1000 records. I create around 250 parquet 
files from my application and fetch data from them using Apache Drill.
When I restart the drill instance all 250 queries on parquet files would 
execute, but after sometime out of 250 only 1 query would be stuck. And this 
process repeats thereafter after every execution.

And can someone also let me know about the number of records which determine 
that it will be in exec.queue.small?


Regards,
Jasbir Singh

-Original Message-
From: Sing, Jasbir
Sent: Wednesday, May 03, 2017 6:40 PM
To: user@drill.apache.org
Cc: Kothari, Maneesh 
Subject: RE: Drill query are stuck in ENQUEUED mode

IN false state, over the period of time queries were getting stuck in ENQUEUED 
state because of which I had turned this to TRUE and now they are even getting 
stuck with this property as TRUE.

-Original Message-
From: Khurram Faraaz [mailto:kfar...@mapr.com]
Sent: Wednesday, May 03, 2017 6:35 PM
To: user@drill.apache.org
Cc: Kothari, Maneesh 
Subject: Re: Drill query are stuck in ENQUEUED mode

Does your query execute and complete when you set exec.queue.enable = false ?

The default is to set exec.queue.enable to false.


Thanks,

Khurram


From: jasbir.s...@accenture.com 
Sent: Wednesday, May 3, 2017 5:58:48 PM
To: user@drill.apache.org
Cc: maneesh.koth...@accenture.com
Subject: Drill query are stuck in ENQUEUED mode

Hi,

I am having queries in which I am fetching just 1 row from the parquet file 
using LIMIT  1, and even these queries are stuck in ENQUEUED state in DRILL.

I am using exec.queue.enable = true and have default settings for the rest.

Can you help me out in this.

Regards,
Jasbir Singh


Re: Parquet, Arrow, and Drill Roadmap

2017-05-04 Thread John Omernik
I've created a JIRA on this request. The idea here being some higher level
descriptions of these projects (I included Calcite in the JIRA too), what
they do for the project, what the current state of integration is, what
options we have for future states, and what benefits those future states
bring.   For Parquet, I think we could go deeper into some of the
settings/tweaks with real world examples to help folks do data better.

Thanks!


https://issues.apache.org/jira/browse/DRILL-5471

On Tue, May 2, 2017 at 1:46 PM, Padma Penumarthy 
wrote:

> One thing I want to add is use_new_reader uses reader from parquet-mr
> library, where as
> default one is drill’s native reader which is supposed to be better,
> performance wise.
> But, it does not support complex types and we automatically switch to use
> reader from parquet library
> when we have to read complex types.
>
> Thanks,
> Padma
>
>
> On May 2, 2017, at 11:09 AM, Jinfeng Ni  apache.org>> wrote:
>
>
> - What the two readers are (is one a special drill thing, is the other  a
> standard reader from the parquet project?)
> - What is the eventual goal here... to be able to use and switch between
> both? To provide the option? To have code parity with another project?
>
> Both readers were for reading parquet data into Drill's value vector.
> The default one (when store.parquet.use_new_reader is false) was
> faster (based on measurements done by people worked on the two
> readers), but it could not support complex type like map/array.  The
> new reader would be used by Drill either if you change the option to
> true, or when the parquet data you are querying contain complex type
> (even with the default option being false). Therefore, both readers
> might be used by Drill code.
>
> There was a Parquet hackathon some time ago, which aimed to make
> people in different projects using parquet work together to
> standardize a vectorized reader. I did not keep track of that effort.
> People with better knowledge of that may share their inputs.
>
>
> - Do either of the readers work with Arrow?
>
> For now, neither works with Arrow, since Drill has not integrated with
> Arrow yet. See DRILL-4455 for the latest discussion
> (https://issues.apache.org/jira/browse/DRILL-4455).  I would expect
> Drill's parquet reader will work with Arrow, once the integration is
> done.
>
>