I think what Charles meant was "WHERE (dir2 = 15 AND dir3 < 20) OR (dir2 = 14 
AND dir3 > 4)",  and of course you need to add dir0 and dir1 for year and month.


And what do you mean by "scan all the files on every query", scan all the files 
of one day data, I thought this was your purpose?

________________________________
From: Wesley Chow <[email protected]>
Sent: Wednesday, May 10, 2017 9:04:12 AM
To: [email protected]
Subject: Re: querying from multiple directories in S3

I don't think so, because doesn't AND commute, which would mean dir2 = 15
AND dir2=14 would always be false?

Even if there is some comparison that works, isn't there still an issue
that the S3 file source has to scan all the files on every query?

Wes

On Wed, May 10, 2017 at 8:15 AM, Charles Givre <[email protected]> wrote:

> Hi Wes,
> Are you putting the dirX fields in the WHERE clause?
> IE  Couldn't you do soemthing like:
>
> SELECT  <fields>
> FROM s3.data
> WHERE (dir2 = 15 AND dir3 < 20) AND (dir2 = 14 AND dir3 > 4)
>
> In theory this could work for UTC -4.  It’s ugly… but I think it would
> work.
> — C
>
>
>
> > On May 9, 2017, at 10:06, Wesley Chow <[email protected]> wrote:
> >
> > What is the recommended way to issue a query against a large number of
> > tables in S3? At the moment I'm aliasing the table as a giant UNION ALL,
> > but is there a better way to do this?
> >
> > Our data is stored as a time hierarchy, like YYYY/MM/DD/HH/MM in UTC, but
> > unfortunately I can't simply run the query recursively on an entire day
> of
> > data. I usually need a day of data in a non-UTC time zone. Is there some
> > elegant way to grab that data using the dir0, dir1 magic columns?
> >
> > Thanks,
> > Wes
>
>

Reply via email to