Yeah… 
Thanks Chunhui!

> On May 10, 2017, at 12:15, Chunhui Shi <[email protected]> wrote:
> 
> I think what Charles meant was "WHERE (dir2 = 15 AND dir3 < 20) OR (dir2 = 14 
> AND dir3 > 4)",  and of course you need to add dir0 and dir1 for year and 
> month.
> 
> 
> And what do you mean by "scan all the files on every query", scan all the 
> files of one day data, I thought this was your purpose?
> 
> ________________________________
> From: Wesley Chow <[email protected]>
> Sent: Wednesday, May 10, 2017 9:04:12 AM
> To: [email protected]
> Subject: Re: querying from multiple directories in S3
> 
> I don't think so, because doesn't AND commute, which would mean dir2 = 15
> AND dir2=14 would always be false?
> 
> Even if there is some comparison that works, isn't there still an issue
> that the S3 file source has to scan all the files on every query?
> 
> Wes
> 
> On Wed, May 10, 2017 at 8:15 AM, Charles Givre <[email protected]> wrote:
> 
>> Hi Wes,
>> Are you putting the dirX fields in the WHERE clause?
>> IE  Couldn't you do soemthing like:
>> 
>> SELECT  <fields>
>> FROM s3.data
>> WHERE (dir2 = 15 AND dir3 < 20) AND (dir2 = 14 AND dir3 > 4)
>> 
>> In theory this could work for UTC -4.  It’s ugly… but I think it would
>> work.
>> — C
>> 
>> 
>> 
>>> On May 9, 2017, at 10:06, Wesley Chow <[email protected]> wrote:
>>> 
>>> What is the recommended way to issue a query against a large number of
>>> tables in S3? At the moment I'm aliasing the table as a giant UNION ALL,
>>> but is there a better way to do this?
>>> 
>>> Our data is stored as a time hierarchy, like YYYY/MM/DD/HH/MM in UTC, but
>>> unfortunately I can't simply run the query recursively on an entire day
>> of
>>> data. I usually need a day of data in a non-UTC time zone. Is there some
>>> elegant way to grab that data using the dir0, dir1 magic columns?
>>> 
>>> Thanks,
>>> Wes
>> 
>> 

Reply via email to