Controlling number of fragments with Apache Drill CREATE TABLE AS (CTAS) and parquet

2020-03-25 Thread Avner Levy
I'm running Drill 1.17.0 in embedded mode and reading/writing files to S3. When I create new CTAS tables it creates 6 small fragments parquet files (1.5MB each). Is there a way to control the number of fragments? To limit it to 1 fragment only? Already tried the following but it didn't help:

REST basic authentication

2020-04-06 Thread Avner Levy
I've noticed that REST basic authentication will be supported in the 1.18 release. Any idea what is the planned release date for this?

Rest API and SQL injection

2020-05-10 Thread Avner Levy
Hi, I'm trying to use Apache Drill as a database for providing SQL over S3 parquet files. Drill is used for serving multi-tenant data for multiple customers. Since I need to build the SQL string using the REST API I'm vulnerable to SQL injection attacks. I do test all user input and close it

Re: Rest API and SQL injection

2020-05-11 Thread Avner Levy
. > --C > > > On May 11, 2020, at 10:58 AM, Avner Levy wrote: > > > > Thanks Paul, > > It seems I wasn't clear enough in my previous email. > > I have a server in between the end users and drill (exactly as you > > suggested), my concern is SQL attack

Re: Rest API and SQL injection

2020-05-11 Thread Avner Levy
e to build your SQL statement correctly, as you are > doing. Don't just append web text to a SQL statement. > > > I don't think this is unique to Drill. I'd be surprised if most people > allow, say, public access to their HBase, Cassandra or MySQL DBs. > > > Thanks, > -

Re: Planning times

2020-06-15 Thread Avner Levy
rally speaking, for a small file size like that, querying a > parquet file should be nearly instantaneous, with or without the schema or > metastore. > > Good luck! > > -- C > > > > > > > On Jun 7, 2020, at 11:08 AM, Avner Levy wrote: > > > > >

exec.queue.enable in drill-embedded

2020-06-28 Thread Avner Levy
Hi, I'm using Drill 1.18 (master) docker and trying to configure its memory after getting out of heap memory errors: "RESOURCE ERROR: There is not enough heap memory to run this query using the web interface." The docker is serving remote clients through the REST API. The queries are simple

Re: exec.queue.enable in drill-embedded

2020-06-29 Thread Avner Levy
ry you are using > should be plenty - once the REST problem is fixed. > > Thanks, > > - Paul > > > On Sun, Jun 28, 2020 at 3:17 PM Avner Levy wrote: > > > Hi, > > I'm using Drill 1.18 (master) docker and trying to configure its memory > > after getting out

Re: Planning times

2020-06-06 Thread Avner Levy
e enabled? > --C > > > > > On Jun 4, 2020, at 9:02 PM, Avner Levy wrote: > > > > Thanks Rafael for your answer. > > As I wrote in the previous email these planning times occur even when > > selecting one fields from one tiny file (60k) that I pass directly

Planning times

2020-06-04 Thread Avner Levy
I'm running Apache Drill (1.18 master branch) in a docker with data stored in Parquet files on S3. When I run queries, even the most simple ones such as: select name from `parquet/data/data.parquet` limit 1 The "Planning" time is 0.7-1.5 sec while the "Execution" is only 0.112 sec. These

Re: Planning times

2020-06-04 Thread Avner Levy
fael > > On Thu, Jun 4, 2020 at 2:43 PM Avner Levy wrote: > > > > I'm running Apache Drill (1.18 master branch) in a docker with data > stored > > in Parquet files on S3. > > When I run queries, even the most simple ones such as: > > > > select name from

Re: Planning times

2020-06-07 Thread Avner Levy
683 sec 0.000 sec 0.090 sec > > 0.773 secOptions Overview Session OptionsName Valuemetastore.enabled > true* > > > > > > On Thu, Jun 4, 2020 at 9:09 PM Charles Givre wrote: > > > > > Hi Avner, > > > Maybe you said this already but what version of Dr