Hi James

you can try creating CTAS query and write the results back to s3 and then
query the data from resulted table


On Sat, Sep 15, 2018 at 11:01 PM Vitalii Diravka <[email protected]>
wrote:

> Hi James,
>
> This is the mail for user mailing list.
> There is no attachment, please upload it to Google Drive, for instance, and
> give us the link.
>
> Did you try to use Drill SqlLine?
>
>
> Kind regards
> Vitalii
>
>
> On Sat, Sep 15, 2018 at 7:45 PM James Barney <[email protected]>
> wrote:
>
> > Hey,
> > I've had pretty great success using drill on top of S3 but I'm hitting
> one
> > big issue: a "long running" query (more than 4.5 minutes) will succeed
> > after submitting but the UI times out with  'network error (tcp error):
> > ""'. See attachment.
> >
> > Basics:
> > Running Drill 1.14 on Amazon Linux. Only modification I made is this
> > parameter at runtime to drill-env.sh for reading encrypted files from S3:
> > export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS
> > -Dcom.amazonaws.services.s3.enableV4"
> >
> > To simplify things I'm just on one drill node with this query:
> > select distinct(column_name) from
> s3.`/path/to/files/year/month/day/hour/`
> >
> > All the files are well-formed parquet files and querying any single file
> > returns fine in a few seconds. When I scale the cluster up to 50+ nodes,
> > the query obviously returns much faster and no time out occurs. However,
> > more complicated/higher data volume queries (ie, querying a whole days
> > worth of data instead of one hour) suffer the same timeout.
> >
> > Are there settings I can tweak to prevent this timeout from occurring?
> Can
> > I save the results of the query somewhere since it's succeeding in the
> > background?
> >
> > Drill demolishes our current solution with its performance and we really
> > want to use it but this bug is making it tricky to sell.
> >
> > Thanks,
> > James
> >
>


-- 
Nitin Pawar

Reply via email to