Re: Custom Schema Support for Delimited Files

2016-02-15 Thread Hsuan Yi Chu
Hi, How about defining a view on top? For example, create view XXX as select cast(columns[0] as int) col1, cast(columns[1] as int) col2 from `xxx.csv`. On Mon, Feb 15, 2016 at 11:34 PM, Usman Ali wrote: > Hi, >Is there any support for giving custom schema

Custom Schema Support for Delimited Files

2016-02-15 Thread Usman Ali
Hi, Is there any support for giving custom schema for schema-less delimited files in storage plug-in or somewhere else? If not, is it going to be part of any future release? Usman Ali

Re: Drill Doc Question: Multi Tenant Clusters

2016-02-15 Thread Abdel Hakim Deneche
Someone may want to confirm this, but I think Drill will properly set the default value (num cores x .7), and it will be specific to every node, but when you query the option from sys.options, it will show you the value on the "foreman" node for that specific query. Once you set it manually using

Re: Drill Doc Question: Multi Tenant Clusters

2016-02-15 Thread John Omernik
Drill did not automatically set that, it set it to 12, which is likely .7 or close to it on a 16 core machine, but I have 7 nodes, with different cores, so is this setting per drill-bit or is it a cluster wide setting? Is it possible to set this in the drill-overide based on the node itself, or

Re: Drill Doc Question: Multi Tenant Clusters

2016-02-15 Thread Abdel Hakim Deneche
so yes, you are correct, you should set it to 1 x 32 x 0.7 Btw, Drill should already have set this option to 32 x 0.7 On Mon, Feb 15, 2016 at 11:36 AM, Abdel Hakim Deneche wrote: > Don't be, it took me quite some time to figure out this one either =P > > the "number of

Re: Drill Doc Question: Multi Tenant Clusters

2016-02-15 Thread Abdel Hakim Deneche
Don't be, it took me quite some time to figure out this one either =P the "number of active drillbits" refers to the number of Drillbits running on each node of the cluster. Generally, you have 1 active Drillbit per node. On Mon, Feb 15, 2016 at 11:22 AM, John Omernik wrote:

Re: REFRESH TABLE METADATA - Access Denied

2016-02-15 Thread John Omernik
https://issues.apache.org/jira/browse/DRILL-4143 On Mon, Feb 15, 2016 at 1:30 PM, Neeraja Rentachintala < nrentachint...@maprtech.com> wrote: > John > What is the JIRA# where you are adding more info. > > -thanks > > On Mon, Feb 15, 2016 at 11:10 AM, John Omernik wrote: > > >

Re: REFRESH TABLE METADATA - Access Denied

2016-02-15 Thread Neeraja Rentachintala
John What is the JIRA# where you are adding more info. -thanks On Mon, Feb 15, 2016 at 11:10 AM, John Omernik wrote: > Arg, this problem is crazy. (I'll put this in the JIRA too) So after > waiting a while, and loading more data. I tried to refresh table metadata > on the

Re: Drill Doc Question: Multi Tenant Clusters

2016-02-15 Thread John Omernik
I am really sorry for being dense here, but based on your comment then, and the docs then if you had sixteen 32 core machines, but only one drill bit running per node, you'd still use 1 (one drill bit per node) * 32 (the number of cores) * 0.7 (the modifier in the docs) to get 23 as the number to

Re: REFRESH TABLE METADATA - Access Denied

2016-02-15 Thread John Omernik
Arg, this problem is crazy. (I'll put this in the JIRA too) So after waiting a while, and loading more data. I tried to refresh table metadata on the table, using the dataadm user (basically the user who owns the data). Note all directories and files are owned by dataadm:dataadm and the

Re: Drill Doc Question: Multi Tenant Clusters

2016-02-15 Thread Abdel Hakim Deneche
No, it's the maximum number of threads each drillbit will be able to spawn for every major fragment of a query. If you run a query on a cluster of 32 core machines, and the query plan contains multiple major fragments, each major fragment will have "at most" 32 x 0.7= 23 minor fragments (or

Re: querying json with arrays of varying dimensionality fails

2016-02-15 Thread Jinfeng Ni
I think union type support is still in beta stage; that's why it's turned off by default. Could you please file a JIRA for the problem you encountered? That way, we will not lose track of those issues. Thanks! On Mon, Feb 15, 2016 at 4:32 AM, Karol Potocki wrote: > Ok, now the

Re: REFRESH TABLE METADATA - Access Denied

2016-02-15 Thread John Omernik
So I am not sure what's happened here. The JIRA isn't filled out, but I can't seem to reproduce the problem. Was this stealth fixed? Based on some testing, even when the data directory is owned by a different user than the drillbit, the .parquet_metadata files are created as mapr:mapr with 755

Drill Doc Question: Multi Tenant Clusters

2016-02-15 Thread John Omernik
*https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/#configuring-query-queuing * *On this page, on the setting planner.width.max_per_node it says the below. In the

mass user realtime, high concurrency , hardware resource

2016-02-15 Thread john lee
hi, i want to build a app which need for support *sevaral hundred million users* *realtime* query from about *ten **billion row records*. dose apache drill fit for this requirement? dose it support High concurrency? dose it need mass hardware resource to archive the low latency performance?

Re: querying json with arrays of varying dimensionality fails

2016-02-15 Thread Karol Potocki
Ok, now the query executes succesfully. But now the problem moved to flatten step. Trying: select flatten(feature) from dfs.`tmp/1.json`; causes: Error: SYSTEM ERROR: SchemaChangeRuntimeException: Inner vector type mismatch. Requested type: [minor_type: BIGINT mode: OPTIONAL ], actual type: