Hi James you can try creating CTAS query and write the results back to s3 and then query the data from resulted table
On Sat, Sep 15, 2018 at 11:01 PM Vitalii Diravka <[email protected]> wrote: > Hi James, > > This is the mail for user mailing list. > There is no attachment, please upload it to Google Drive, for instance, and > give us the link. > > Did you try to use Drill SqlLine? > > > Kind regards > Vitalii > > > On Sat, Sep 15, 2018 at 7:45 PM James Barney <[email protected]> > wrote: > > > Hey, > > I've had pretty great success using drill on top of S3 but I'm hitting > one > > big issue: a "long running" query (more than 4.5 minutes) will succeed > > after submitting but the UI times out with 'network error (tcp error): > > ""'. See attachment. > > > > Basics: > > Running Drill 1.14 on Amazon Linux. Only modification I made is this > > parameter at runtime to drill-env.sh for reading encrypted files from S3: > > export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS > > -Dcom.amazonaws.services.s3.enableV4" > > > > To simplify things I'm just on one drill node with this query: > > select distinct(column_name) from > s3.`/path/to/files/year/month/day/hour/` > > > > All the files are well-formed parquet files and querying any single file > > returns fine in a few seconds. When I scale the cluster up to 50+ nodes, > > the query obviously returns much faster and no time out occurs. However, > > more complicated/higher data volume queries (ie, querying a whole days > > worth of data instead of one hour) suffer the same timeout. > > > > Are there settings I can tweak to prevent this timeout from occurring? > Can > > I save the results of the query somewhere since it's succeeding in the > > background? > > > > Drill demolishes our current solution with its performance and we really > > want to use it but this bug is making it tricky to sell. > > > > Thanks, > > James > > > -- Nitin Pawar
