Re: Connecting to Drill ODBC DSN takes exceptionally long time

2016-07-25 Thread Neeraja Rentachintala
Andries did a great job on putting together the material below on this
topic. This info will be helpful to you to optimize metadata access
experience from Tableay.
Additionally make sure you are using the Tableau TDC file that ships with
Drill ODBC drier.

https://community.mapr.com/community/answers/blog/2016/07/20/drill-best-practices-for-bi-and-analytical-tools

-Neeraja

On Mon, Jul 25, 2016 at 9:17 PM, Santosh Kulkarni <
santoshskulkarn...@gmail.com> wrote:

> While connecting Tableau to Drill ODBC DSN, it takes almost 5 minutes to
> connect to Drill.I created 2 DSN, one for Zookeeper Quorum and other one
> for Direct to drillbit, Both take very long time to connect successfully to
> Drill.
>
> Also, after the connection just to open Schema and the tables within the
> schema it takes another few minutes. Underlying datasource is Hive.
>
> Any thoughts on what causes this issue?
>
> Thanks,
>
> Santosh
>


Connecting to Drill ODBC DSN takes exceptionally long time

2016-07-25 Thread Santosh Kulkarni
While connecting Tableau to Drill ODBC DSN, it takes almost 5 minutes to
connect to Drill.I created 2 DSN, one for Zookeeper Quorum and other one
for Direct to drillbit, Both take very long time to connect successfully to
Drill.

Also, after the connection just to open Schema and the tables within the
schema it takes another few minutes. Underlying datasource is Hive.

Any thoughts on what causes this issue?

Thanks,

Santosh


Re: Tableau Web Data Connector

2016-07-25 Thread Aman Sinha
Steve,
As far as I know, this has not been written (or maybe someone has written
but not yet contributed).
Agree that it would certainly be a useful functionality.

-Aman

On Mon, Jul 25, 2016 at 11:17 AM, Steve Warren  wrote:

> Has anyone written a Tableau Web Data Connector for Drill? I noticed
> prestodb.io has one and it really opens up the ability to interface with
> drill over the internet.
>
> --
> Confidentiality Notice and Disclaimer:  The information contained in this
> e-mail and any attachments, is not transmitted by secure means and may also
> be legally privileged and confidential.  If you are not an intended
> recipient, you are hereby notified that any dissemination, distribution, or
> copying of this e-mail is strictly prohibited.  If you have received this
> e-mail in error, please notify the sender and permanently delete the e-mail
> and any attachments immediately. You should not retain, copy or use this
> e-mail or any attachment for any purpose, nor disclose all or any part of
> the contents to any other person. MyVest Corporation, MyVest Advisors and
> their affiliates accept no responsibility for any unauthorized access
> and/or alteration or dissemination of this communication nor for any
> consequence based on or arising out of the use of information that may have
> been illegitimately accessed or altered.
>


Overflow detection in Drill

2016-07-25 Thread Khurram Faraaz
Hi All,

As of today Drill does not handle overflow detection and does not report
that was an overflow to users, instead we just return results that are
incorrect. This issue has been discussed (but not in detail) in the past.

It would be great if Drill also handled overflow detection in data of type
(int, bigint etc) like other existing DBMSs do. Users will not want to see
incorrect/wrong results, instead an error that informs users that there was
an overflow will make more sense.

Here is an example of one such query that returns incorrect results as
compared to Postgres. Difference in results (related to overflow detection
problem), col1 is of type BIGINT

{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT col1, AVG(SUM(col1)) OVER ( PARTITION
BY col7 ORDER BY col0 ) FROM `allTypsUniq.parquet` GROUP BY col0,col1,col7;
+--+--+
| col1 |  EXPR$1  |
+--+--+
| 5000 | 5000.0   |
| 9223372036854775807  | -4.6116860184273853E18   |
| 65534| -3.0744573456182349E18   |
| -1   | -2.30584300921367629E18  |
| 1| -1.84467440737094093E18  |
| 17   | -1.53722867280911744E18  |
| 1000 | -1.31762457669352909E18  |
| 200  | -1.15292150460683802E18  |
| 4611686018427387903  | -5.1240955760303514E17   |
| 1001 | -4.6116860184273152E17   |
| 30   | -4.1924418349339232E17   |
| -65535   | -65535.0 |
| 1000 | 4967232.5|
| 0| 3311488.35   |
| 13   | 2483619.5|
| 23   | 1986900.2|
| 999  | 3322416.65   |
| 197  | 2847813.8571428573   |
| 9223372036854775806  | -1.1529215046043552E18   |
| 92233720385475807| -1.01457092404992947E18  |
| 25   | -9.1311383164493645E17   |
| 3000 | -8.3010348331357837E17   |
+--+--+
22 rows selected (0.46 seconds)
{noformat}

Results from Postgres

{noformat}
postgres=# SELECT col1, AVG(SUM(col1)) OVER ( PARTITION BY col7 ORDER BY
col0 ) FROM fewrwspqq_101 GROUP BY col0,col1,col7;
col1 |  avg
-+---
5000 | 5000.
 9223372036854775807 |   4611686018427390404
   65534 |   3074457345618282114
  -1 |   2305843009213711585
   1 |   1844674407370969268
  17 |   1537228672809141060
1000 |   1317624576693549623
 200 |   1152921504606855945
 4611686018427387903 |   1537228672809137273
1001 |   1383505805528223646
  30 |   1257732550480203317
  -65535 |   -65535.
1000 |  4967232.5000
   0 |  3311488.
  13 |  2483619.5000
  23 |  1986900.2000
 999 |  3322416.6667
 197 |  2847813.857142857143
 9223372036854775806 |   1152921504609338813
   92233720385475807 |   1035067306362242923
  25 |931560575726018634
3000 |846873250660017212
(22 rows)
{noformat}

Thanks,
Khurram


Tableau Web Data Connector

2016-07-25 Thread Steve Warren
Has anyone written a Tableau Web Data Connector for Drill? I noticed
prestodb.io has one and it really opens up the ability to interface with
drill over the internet.

-- 
Confidentiality Notice and Disclaimer:  The information contained in this 
e-mail and any attachments, is not transmitted by secure means and may also 
be legally privileged and confidential.  If you are not an intended 
recipient, you are hereby notified that any dissemination, distribution, or 
copying of this e-mail is strictly prohibited.  If you have received this 
e-mail in error, please notify the sender and permanently delete the e-mail 
and any attachments immediately. You should not retain, copy or use this 
e-mail or any attachment for any purpose, nor disclose all or any part of 
the contents to any other person. MyVest Corporation, MyVest Advisors and 
their affiliates accept no responsibility for any unauthorized access 
and/or alteration or dissemination of this communication nor for any 
consequence based on or arising out of the use of information that may have 
been illegitimately accessed or altered.


Re: deploy dockerized drill cluster

2016-07-25 Thread Scott Kinney
it's still not picking up the store.json* config changes

The only way I can see to set these is with running ALTER SYSTEM query after 
drill api is up.



Scott Kinney | DevOps
stem   |   m  510.282.1299
100 Rollins Road, Millbrae, California 94030

This e-mail and/or any attachments contain Stem, Inc. confidential and 
proprietary information and material for the sole use of the intended 
recipient(s). Any review, use or distribution that has not been expressly 
authorized by Stem, Inc. is strictly prohibited. If you are not the intended 
recipient, please contact the sender and delete all copies. Thank you.


From: John Omernik 
Sent: Monday, July 25, 2016 8:21 AM
To: user
Subject: Re: deploy dockerized drill cluster

Try (for the sake of the conversation here) using host networking, and see
if it changes how successful your setup is.  (I know bridged is preferred,
but try the host side and see what happens)

John

On Mon, Jul 25, 2016 at 10:06 AM, Scott Kinney 
wrote:

> I'm running the docker in bridged network mode.
>
>
> 
> Scott Kinney | DevOps
> stem   |   m  510.282.1299
> 100 Rollins Road, Millbrae, California 94030
>
> This e-mail and/or any attachments contain Stem, Inc. confidential and
> proprietary information and material for the sole use of the intended
> recipient(s). Any review, use or distribution that has not been expressly
> authorized by Stem, Inc. is strictly prohibited. If you are not the
> intended recipient, please contact the sender and delete all copies. Thank
> you.
>
> 
> From: John Omernik 
> Sent: Sunday, July 24, 2016 8:28 AM
> To: user
> Subject: Re: deploy dockerized drill cluster
>
> Are you running Drill in host networking or bridged networking?
>
> On Sat, Jul 23, 2016 at 1:21 PM, Scott Kinney 
> wrote:
>
> > Hm, i must have set those another way in embeded mode. I can't see where.
> > Those settings persist between drill restarts.
> >
> >
> >
> > 
> > Scott Kinney | DevOps
> > stem   |   m  510.282.1299
> > 100 Rollins Road, Millbrae, California 94030
> >
> > This e-mail and/or any attachments contain Stem, Inc. confidential and
> > proprietary information and material for the sole use of the intended
> > recipient(s). Any review, use or distribution that has not been expressly
> > authorized by Stem, Inc. is strictly prohibited. If you are not the
> > intended recipient, please contact the sender and delete all copies.
> Thank
> > you.
> >
> > 
> > From: Abhishek Girish 
> > Sent: Friday, July 22, 2016 1:57 PM
> > To: Drill User List
> > Subject: Re: deploy dockerized drill cluster
> >
> > You can set boot level start-up options in drill-override.conf [1]. But I
> > don't think we can do the same with the system options. Someone else can
> > comment if there is a workaround.
> >
> > On why it works for you with drill-embedded, is something I'm trying to
> > understand. I attempted this and couldn't manage to get those options to
> > show up in embedded mode.
> >
> > [1] https://drill.apache.org/docs/start-up-options/
> >
> > On Fri, Jul 22, 2016 at 1:20 PM, Scott Kinney 
> > wrote:
> >
> > > I have built a drill docker images very much like
> > > https://github.com/bigstepinc/apache-drill/blob/master/Dockerfile
> > >
> > >
> > > I volume mount a drill-override.conf file that looks like:
> > >
> > > drill.exec:{
> > >   cluster-id: drill1,
> > >   zk.connect: 192.1.1.1:2181",
> > >   sys.store.provider.local.path: "/drill-storage",
> > >   store.json.all_text_mode: True,
> > >   store.json.read_numbers_as_double: True
> > > }
> > >
> > > it seems to be connecting to the zookeeper (otherwise it would fail
> > right?
> > > i should actually confirm this).
> > >
> > > I know it is picking up my s3 plugin that i volume mount inside
> > > /drill-storage but it's not setting the json all_text_mode and
> > > read_numbers_as_double.
> > >
> > > I have a drill-embeded instance that seems to pick these json setting
> > from
> > > the drill-override file just fine.
> > >
> > > Can you see what I'm doing wrong?
> > >
> > >
> > >
> > >
> > >
> > > 
> > > Scott Kinney | DevOps
> > > stem    |   m  510.282.1299
> > > 100 Rollins Road, Millbrae, California 94030
> > >
> > > This e-mail and/or any attachments contain Stem, Inc. confidential and
> > > proprietary information and material for the sole use of the intended
> > > recipient(s). Any review, use or distribution that has not been
> expressly
> > > authorized by Stem, Inc. is strictly prohibited. If you are not the
> > > intended recipient, please contact the sender and delete all copies.
> > Thank
> > > you.
> > >
> >
>


Re: tmp noexec

2016-07-25 Thread Leon Clayton

I move the /tmp off local disk into the distributed FS on a node local volume 
on MapR. Other file systems can be inserted. 

Open up drill-override.conf on all of the nodes, and insert this :

sort: {
purge.threshold : 100,
external: {
  batch.size : 4000,
  spill: {
batch.size : 4000,
group.size : 100,
threshold : 200,
directories : [ "/var/mapr/local/Hostname/drillspill" ],
fs : "maprfs:///"
  }
}
  }

> On 25 Jul 2016, at 16:44, scott  wrote:
> 
> Hello,
> I've run into an issue where Drill will not start if mount permissions are
> set on /tmp to noexec. The permissions were set to noexec due to security
> concerns. I'm using Drill version 1.7. The error I get when starting Drill
> is:
> 
> Exception in thread "main" java.lang.UnsatisfiedLinkError:
> /tmp/libnetty-transport-native-epoll5743269078378802025.so:
> /tmp/libnetty-transport-native-epoll5743269078378802025.so: failed to map
> segment from shared object: Operation not permitted
> 
> Does anyone know of a way to configure Drill to use a different tmp
> location?
> 
> Thanks,
> Scott



tmp noexec

2016-07-25 Thread scott
Hello,
I've run into an issue where Drill will not start if mount permissions are
set on /tmp to noexec. The permissions were set to noexec due to security
concerns. I'm using Drill version 1.7. The error I get when starting Drill
is:

Exception in thread "main" java.lang.UnsatisfiedLinkError:
/tmp/libnetty-transport-native-epoll5743269078378802025.so:
/tmp/libnetty-transport-native-epoll5743269078378802025.so: failed to map
segment from shared object: Operation not permitted

Does anyone know of a way to configure Drill to use a different tmp
location?

Thanks,
Scott


Re: deploy dockerized drill cluster

2016-07-25 Thread John Omernik
Try (for the sake of the conversation here) using host networking, and see
if it changes how successful your setup is.  (I know bridged is preferred,
but try the host side and see what happens)

John

On Mon, Jul 25, 2016 at 10:06 AM, Scott Kinney 
wrote:

> I'm running the docker in bridged network mode.
>
>
> 
> Scott Kinney | DevOps
> stem   |   m  510.282.1299
> 100 Rollins Road, Millbrae, California 94030
>
> This e-mail and/or any attachments contain Stem, Inc. confidential and
> proprietary information and material for the sole use of the intended
> recipient(s). Any review, use or distribution that has not been expressly
> authorized by Stem, Inc. is strictly prohibited. If you are not the
> intended recipient, please contact the sender and delete all copies. Thank
> you.
>
> 
> From: John Omernik 
> Sent: Sunday, July 24, 2016 8:28 AM
> To: user
> Subject: Re: deploy dockerized drill cluster
>
> Are you running Drill in host networking or bridged networking?
>
> On Sat, Jul 23, 2016 at 1:21 PM, Scott Kinney 
> wrote:
>
> > Hm, i must have set those another way in embeded mode. I can't see where.
> > Those settings persist between drill restarts.
> >
> >
> >
> > 
> > Scott Kinney | DevOps
> > stem   |   m  510.282.1299
> > 100 Rollins Road, Millbrae, California 94030
> >
> > This e-mail and/or any attachments contain Stem, Inc. confidential and
> > proprietary information and material for the sole use of the intended
> > recipient(s). Any review, use or distribution that has not been expressly
> > authorized by Stem, Inc. is strictly prohibited. If you are not the
> > intended recipient, please contact the sender and delete all copies.
> Thank
> > you.
> >
> > 
> > From: Abhishek Girish 
> > Sent: Friday, July 22, 2016 1:57 PM
> > To: Drill User List
> > Subject: Re: deploy dockerized drill cluster
> >
> > You can set boot level start-up options in drill-override.conf [1]. But I
> > don't think we can do the same with the system options. Someone else can
> > comment if there is a workaround.
> >
> > On why it works for you with drill-embedded, is something I'm trying to
> > understand. I attempted this and couldn't manage to get those options to
> > show up in embedded mode.
> >
> > [1] https://drill.apache.org/docs/start-up-options/
> >
> > On Fri, Jul 22, 2016 at 1:20 PM, Scott Kinney 
> > wrote:
> >
> > > I have built a drill docker images very much like
> > > https://github.com/bigstepinc/apache-drill/blob/master/Dockerfile
> > >
> > >
> > > I volume mount a drill-override.conf file that looks like:
> > >
> > > drill.exec:{
> > >   cluster-id: drill1,
> > >   zk.connect: 192.1.1.1:2181",
> > >   sys.store.provider.local.path: "/drill-storage",
> > >   store.json.all_text_mode: True,
> > >   store.json.read_numbers_as_double: True
> > > }
> > >
> > > it seems to be connecting to the zookeeper (otherwise it would fail
> > right?
> > > i should actually confirm this).
> > >
> > > I know it is picking up my s3 plugin that i volume mount inside
> > > /drill-storage but it's not setting the json all_text_mode and
> > > read_numbers_as_double.
> > >
> > > I have a drill-embeded instance that seems to pick these json setting
> > from
> > > the drill-override file just fine.
> > >
> > > Can you see what I'm doing wrong?
> > >
> > >
> > >
> > >
> > >
> > > 
> > > Scott Kinney | DevOps
> > > stem    |   m  510.282.1299
> > > 100 Rollins Road, Millbrae, California 94030
> > >
> > > This e-mail and/or any attachments contain Stem, Inc. confidential and
> > > proprietary information and material for the sole use of the intended
> > > recipient(s). Any review, use or distribution that has not been
> expressly
> > > authorized by Stem, Inc. is strictly prohibited. If you are not the
> > > intended recipient, please contact the sender and delete all copies.
> > Thank
> > > you.
> > >
> >
>


Re: Partition prunning using CURRENT_DATE?

2016-07-25 Thread Oscar Morante
Great, thanks!  I'm gonna try swapping the view file and see how it 
goes.


On Thu, Jul 21, 2016 at 11:09:45AM -0500, John Omernik wrote:

Yes, I have a view that has the hard coded date in it. It wasn't difficult,
and using the REST API was actually fairly neat/clean.  I agree with you,
it would be nice, but this worked pretty well for me too.  (I also wonder
if you could just change the raw view def file and update the date)


John


On Thu, Jul 21, 2016 at 2:26 AM, Oscar Morante  wrote:


Hi John,
I've been following your trail of emails :)  Thanks for sharing all that
info, it's very useful.

I think I'm trying to do something very similar to what you did.  I have
data flowing from Storm into S3 and I wanted to be able to periodically
preprocess/repartition into new folder and then have views to merge recent
data from the raw Storm files and old data from the
preprocessed/repartitioned folders.  These views are intended to be used
from Tableau.

I guess I can create a small process that checks when a new folder with
preprocessed data is available and replaces the appropriate view files with
new versions that have the proper date string.  But it would be a lot nicer
to just do it in the view and have a dumb process executing the periodic
queries.

How did you "solve" it in the end?  If I can ask.

Thanks,
Oscar


On Wed, Jul 20, 2016 at 07:26:20AM -0500, John Omernik wrote:


I think I ran into that issue before and (someone will correct me if I am
wrong) the issue is that current_date is only materialized AFTER planning.
Thus the pruning, which occurs during planning doesn't happen.  Is this a
programatic query or just something that is being done for users? I know
my
issue was I wanted a view that showed only the current date, and I
struggled to come up with a good solution to that.

John


On Wed, Jul 20, 2016 at 6:06 AM, Oscar Morante 
wrote:

I'm trying to trigger partition prunning like this:


   select *
   from dfs.`json/by-date`
   where dir0 = cast(current_date as varchar);

But apparently, it only works when passing a literal.  Am I missing
something?

Thanks,







--
Oscar Morante
"Self-education is, I firmly believe, the only kind of education there is."
 -- Isaac Asimov.


signature.asc
Description: Digital signature


Table value functions in Drill

2016-07-25 Thread Tushar Pathare
Like suppose I need to get the following sql function in drill

create function abc ()
returns @xyz table (id int,name varchar(100))
as
begin
insert into @xyz
select 1 id,'aaa' name
union
select 2 id,'bbb' name
return
end

Tushar B Pathare
High Performance Computing (HPC) Administrator
General Parallel File System
Scientific Computing
Bioinformatics Division
Research

Sidra Medical and Research Centre
Sidra OPC Building
PO Box 26999  |  Doha, Qatar
Near QNCC,5th Floor
Office 4003  ext 37443 | M +974 74793547
tpath...@sidra.org | 
www.sidra.org


Disclaimer: This email and its attachments may be confidential and are intended 
solely for the use of the individual to whom it is addressed. If you are not 
the intended recipient, any reading, printing, storage, disclosure, copying or 
any other action taken in respect of this e-mail is prohibited and may be 
unlawful. If you are not the intended recipient, please notify the sender 
immediately by using the reply function and then permanently delete what you 
have received. Any views or opinions expressed are solely those of the author 
and do not necessarily represent those of Sidra Medical and Research Center.


Sql functions help

2016-07-25 Thread Tushar Pathare
Hello Team,
 Can we write a java function in drill to invoke a shell 
script or sql functions to return table-value format.

Thanks


Tushar B Pathare
High Performance Computing (HPC) Administrator
General Parallel File System
Scientific Computing
Bioinformatics Division
Research

Sidra Medical and Research Centre
Sidra OPC Building
PO Box 26999  |  Doha, Qatar
Near QNCC,5th Floor
Office 4003  ext 37443 | M +974 74793547
tpath...@sidra.org | 
www.sidra.org


Disclaimer: This email and its attachments may be confidential and are intended 
solely for the use of the individual to whom it is addressed. If you are not 
the intended recipient, any reading, printing, storage, disclosure, copying or 
any other action taken in respect of this e-mail is prohibited and may be 
unlawful. If you are not the intended recipient, please notify the sender 
immediately by using the reply function and then permanently delete what you 
have received. Any views or opinions expressed are solely those of the author 
and do not necessarily represent those of Sidra Medical and Research Center.


Re: How drill works internally

2016-07-25 Thread rahul challapalli
You can start with the high level architecture [1]. Then the community
might help you if you have any specific questions.

[1] https://drill.apache.org/architecture/

On Sun, Jul 24, 2016 at 11:36 PM, Sanjiv Kumar  wrote:

> How drill runs query internally. I want to know how drill execute query for
> different data sources.I want to know internal process of drill.
>
>
>
>  ..
>   Thanks & Regards
>   *Sanjiv Kumar*
>


How drill works internally

2016-07-25 Thread Sanjiv Kumar
How drill runs query internally. I want to know how drill execute query for
different data sources.I want to know internal process of drill.



 ..
  Thanks & Regards
  *Sanjiv Kumar*