Re: Time-travel reads via SQL query

Andrew Wong Tue, 28 Nov 2017 14:10:31 -0800

Ah I see, looks like this is an open, tracked issue. See KUDU-1702
<https://issues.apache.org/jira/browse/KUDU-1702?jql=project%20%3D%20KUDU%20AND%20resolution%20%3D%20Unresolved%20AND%20text%20~%20%22read_at_snapshot%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>,
although not sure what the status on it is.


On Tue, Nov 28, 2017 at 12:17 PM, Mauricio Aristizabal <
[email protected]> wrote:

> Thanks very much David and Andrew.  Yes I'm aware this functionality is
> available via the java and C++ clients, but actually what I'm asking is if
> it could be made available via SQL/impala.  Something like "select X from Y
> where snapshot_micros = 2343242423 <(234)%20324-2423>" (where
> snapshot_micros is a virtual column that would need a better name), or
> perhaps as part of the table name like "select X from Y@2343242423
> <(234)%20324-2423>".  -m
>
> On Tue, Nov 28, 2017 at 12:05 PM, David Alves <[email protected]>
> wrote:
>
>> Hi Mauricio
>>
>>   Andrew is right. That feature already exists in some form. With
>> READ_AT_SNAPSHOT you can provide a timestamp which will be the timepoint
>> under which all the scans are performed.
>>   Note that, while generally supported and functionally tested, we
>> haven't focused a lot of resources into testing this, so your performance
>> mileage may vary.
>>   In order to enable this for time points more than 5 mins in the past
>> you need to increase the "--tablet_history_max_age_sec" flag so that the
>> history won't get garbage collected.
>>
>> HTH
>> -david
>>
>> On Mon, Nov 27, 2017 at 9:42 PM, Andrew Wong <[email protected]> wrote:
>>
>>> Hi Mauricio,
>>>
>>> If you haven't already, take a look at the READ_AT_SNAPSHOT read mode
>>> (more info here
>>> <https://kudu.apache.org/docs/transaction_semantics.html#_read_operations_scans>).
>>> IIUC, it seems similar to, if not exactly what you're looking for!
>>>
>>>
>>> Andrew
>>>
>>> On Mon, Nov 27, 2017 at 5:02 PM, Mauricio Aristizabal <
>>> [email protected]> wrote:
>>>
>>>> Hi all, has there been any talk of supporting this any time soon?
>>>>
>>>> Time travel reads are such a cool feature, but even more than in ETL
>>>> jobs (via Java/Scala), they would be most useful via SQL to ensure
>>>> consistency when reading.
>>>>
>>>> Specifically, for example our spark streaming job updates dozens of
>>>> aggregation tables every 30 seconds.  To make the data fully consistent we
>>>> would love to have views over these aggs tagged with the exact timestamp we
>>>> want to expose.  When each batch is done and all tables updated, we would
>>>> update all the views forward, effectively hiding the updates we're doing
>>>> until they're all ready.
>>>>
>>>> -m
>>>>
>>>>
>>>>
>>>> --
>>>> *MAURICIO ARISTIZABAL*
>>>> Architect - Business Intelligence + Data Science
>>>> [email protected](m)+1 323 309 4260 <(323)%20309-4260>
>>>> 223 E. De La Guerra St. | Santa Barbara, CA 93101
>>>> <https://maps.google.com/?q=223+E.+De+La+Guerra+St.+%7C+Santa+Barbara,+CA+93101&entry=gmail&source=g>
>>>>
>>>> Overview <http://www.impactradius.com/?src=slsap> | Twitter
>>>> <https://twitter.com/impactradius> | Facebook
>>>> <https://www.facebook.com/pages/Impact-Radius/153376411365183> |
>>>> LinkedIn <https://www.linkedin.com/company/impact-radius-inc->
>>>>
>>>
>>>
>>>
>>> --
>>> Andrew Wong
>>>
>>
>>
>
>
> --
> *MAURICIO ARISTIZABAL*
> Architect - Business Intelligence + Data Science
> [email protected](m)+1 323 309 4260 <(323)%20309-4260>
> 223 E. De La Guerra St. | Santa Barbara, CA 93101
> <https://maps.google.com/?q=223+E.+De+La+Guerra+St.+%7C+Santa+Barbara,+CA+93101&entry=gmail&source=g>
>
> Overview <http://www.impactradius.com/?src=slsap> | Twitter
> <https://twitter.com/impactradius> | Facebook
> <https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn
> <https://www.linkedin.com/company/impact-radius-inc->
>



-- 
Andrew Wong

Re: Time-travel reads via SQL query

Reply via email to