Hi Charles, Paul, Thanks for your answers. I'm interested in a case, where there is a Rest service which authenticate the service's users, get a request with the user parameters and build from it the SQL sent to Drill. The customers are identified by some account ID and they send for example the name of an entity they are looking for as a parameter in the service's REST request. Then the service can build the select (just an example): SELECT x FROM S3.db.`data/[CUSTOMER_ID]/data.parquet` where name='CUSTOMER_USER_INPUT]'
In such case, they can still send in CUSTOMER_USER_INPUT the following: "x' union SELECT x FROM S3.db.`data/[OTHER_CUSTOMER_ID]/data.parquet`". Usually such stuff are solved with prepared statements, but I believe this isn't supported over REST. I would prefer not having to authenticate my end users to Drill since this creates more work and complexity. Is there a way to have prepared statements in Drill? Is it supported in other protocols? (JDBC/ODBC) Limiting the query folder outside the SQL would do the job as well. Any feedback is appreciated, Thanks, Avner On Sun, May 10, 2020 at 5:57 PM Paul Rogers <[email protected]> wrote: > Hi Charles, > > One of the changes I was looking at was allowing multiple SQL statements > per REST request to get around the lack of session. The idea would be to > issue a number of ALTER SESSION, CTTAS, USE and similar statements followed > by a single query that returns data. > > > A better solution is to enable session support for the REST API. We > discussed the challenges involved due the disconnected nature of HTTP > requests. > > Another good improvement would be a SQL command way to create configs, not > just JSON editing. That way it would be easier to automate creation of a > config. Also, it would be handy to be able to externalize configs so they > can be stored in locations other than ZK (or local disk, in embedded mode.) > For this use case, a query for user "X" would work against the "s3-X" > config would could be retrieved from an external system that knows the > mapping from user X to the S3 files visible to X, and the security tokens > to use for that user. > > The question for now, however, is how to do this with the code that exists > in Drill 1.17. I'm hoping someone has worked out a solution. > > > Thanks, > - Paul > > > > On Sunday, May 10, 2020, 1:05:50 PM PDT, Charles Givre < > [email protected]> wrote: > > Hi Avner, Paul, > I was reading this and wondering: > > 1. Is it in fact true (I think it is) that Drill does not allow multiple > queries to be submitted in one REST request? I seem to remember running > into that issue when I was trying to do some of the Superset work. > 2. If a user is required to be authenticated to execute a query, would > that not prevent the possibility of a non-authenticated user executing > arbitrary queries against someone else's data? > 3. I would definitely create separate data sources for each tenant, but I > don't know that it is necessary (or helpful) to create one for each query. > > I'd agree with Paul, that Drill's access model needs improvement and that > would be a good addition to the project. We might be able to assist with > that if there's interest. > Best, > -- C > > > > On May 10, 2020, at 3:55 PM, Paul Rogers <[email protected]> > wrote: > > > > Hi Avner, > > > > Drill was designed for a system in which the user name maps to a > certificate on the underlying file system, and the file system provides > complete security. This model has not been extended to the cloud world. > > > > What you want is a way to authenticate your user, map the user to a > storage plugin config for only that client's files, then restrict that user > to only that config. Further, you'd want the config to obtain S3 keys from > a vault of some sort. If you have that, you'd not have to worry about SQL > injection since only an authorized user could muck with the SQL, and they > could only access their own data -- which they can presumably access anyway. > > > > > > At present, Drill has no out-of-the-box security model for this use > case; there is no mechanism to associate users with configs, or to > externalize S3 security keys. Such a system would be a worthwhile addition > to the project. > > > > I wonder, has anyone else found a workaround for this use case? Maybe > via Kerberos or some such? > > > > > > Thanks, > > - Paul > > > > > > > > On Sunday, May 10, 2020, 12:04:16 PM PDT, Avner Levy < > [email protected]> wrote: > > > > Hi, > > I'm trying to use Apache Drill as a database for providing SQL over S3 > > parquet files. > > Drill is used for serving multi-tenant data for multiple customers. > > Since I need to build the SQL string using the REST API I'm vulnerable to > > SQL injection attacks. > > I do test all user input and close it between apostrophes and > > escape apostrophe in the user input by doubling it but I'm still > concerned > > about optional SQL attacks. > > Will adding a different data source (which points to a different folder > on > > S3) per tenant is something that will have impact on performance? (I > might > > have thousands of those) > > Does it make sense to create the data source on the fly before query? > > Is there another way to limit the sent SQL to a specific folder? > > Thanks, > > Avner >
