Yes, you can use Windowing in batch mode by applying your own
timestamps. For example,

from apache_beam.transforms.window import TimestampedValue

grouped_by_window = (p
 | Read(...)
 | beam.Map(lambda x: TimestampedValue(x, timestamp=extract_timestamp(x))
 | beam.WindowInto(FixedWindows(5))
 | beam.Map(lambda x: (key(x), value(x)))  # If the data wasn't already keyed...
 | beam.GroupByKey())

Now grouped_by_windows will contain a (k, vs) tuple for each key and
five-second window. When running in batch mode, everything up to the
group-by-key will be executed in its entirety before anything after,
but the data will still be appropriately partitioned.

- Robert



On Sat, Jun 24, 2017 at 5:08 AM, Morand, Sebastien
<[email protected]> wrote:
> Hi Ahmet,
>
> Yes I know, it's exactly what I said in message: streaming is NOT supported
> in python yet.
>
> My question is: is there any way to timestamp the bounded sources and use
> the Windowing in BATCH mode?
>
> Regards,
>
>
> Sébastien MORAND
> Team Lead Solution Architect
> Technology & Operations / Digital Factory
> Veolia - Group Information Systems & Technology (IS&T)
> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08
> Bureau 0144C (Ouest)
> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France
> www.veolia.com
>
>
>
> On 23 June 2017 at 21:32, Ahmet Altay <[email protected]> wrote:
>>
>> Hi Sebastien,
>>
>> Streaming is not supported in the Python SDK yet. There is a steady
>> progress on that and soon we will be able to offer it. It is already
>> supported in Java.
>>
>> Thank you,
>> Ahmet
>>
>> On Fri, Jun 23, 2017 at 12:26 PM, Morand, Sebastien
>> <[email protected]> wrote:
>>>
>>> Ok I think I understand these concepts, so as streamline is not supported
>>> in python yet, there is no way to timestamp the bounded sources, isn't
>>> there?
>>>
>>> Is it possible in java?
>>>
>>> Sébastien MORAND
>>> Team Lead Solution Architect
>>> Technology & Operations / Digital Factory
>>> Veolia - Group Information Systems & Technology (IS&T)
>>> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08
>>> Bureau 0144C (Ouest)
>>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France
>>> www.veolia.com
>>>
>>>
>>>
>>> On 23 June 2017 at 17:04, Eugene Kirpichov <[email protected]> wrote:
>>>>
>>>> To elaborate on what Kenn said: the main difference between batch and
>>>> streaming execution modes of a typical Beam runner is how they handle
>>>> GroupByKey.
>>>>
>>>> The next stage can start processing the KV<K, Iterable<V>> in a
>>>> particular window (assuming Java) when the GBK is sure it's seen all
>>>> elements with this key in this window.
>>>>
>>>> A typical runner in streaming mode assumes that data within a key is
>>>> more or less ordered by time, with the watermark giving a bound on how out
>>>> of order it's likely to be. Because of that, runners can eg declare a 
>>>> window
>>>> of a key "complete" when the watermark reaches the end of window. More
>>>> precisely, this is controlled by triggers.
>>>>
>>>> However, typical data sources used in batch mode (eg files, key value
>>>> tables etc) are not ordered by time at all and hence data is assumed to be
>>>> completely out of order and no useful watermark can be provided. So runners
>>>> have only one option: process all data, and then declare all keys and all
>>>> windows complete, and then start the next stage.
>>>>
>>>> Note that this is not a fundamental limitation of the Beam model: it's
>>>> conceivable to have a runner that reads a bounded dataset (eg a set of log
>>>> files marked with date in the filename) with some knowledge of their time
>>>> ordering and could pipeline the stages. Just current runners treat all
>>>> bounded datasets as completely out of order, and the current API for custom
>>>> bounded sources doesn't even have a function for reporting a watermark.
>>>>
>>>> Hope this helps.
>>>>
>>>>
>>>> On Fri, Jun 23, 2017, 7:02 AM Kenneth Knowles <[email protected]> wrote:
>>>>>
>>>>> Hello!
>>>>>
>>>>> The behavior you are seeing is what makes something batch mode
>>>>> processing. The essential definition of streaming mode processing is that
>>>>> you get output before you have processed all the data.
>>>>>
>>>>> Event time windowing does not control when computations occur - when
>>>>> you window into FixedWindows of five seconds, this means your data will be
>>>>> grouped according to the window that contains the timestamp on the event. 
>>>>> In
>>>>> streaming mode, this grouping will generally be output soon after the
>>>>> watermark exceeds the end of the window, but this can be customized using
>>>>> triggers.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Thu, Jun 22, 2017 at 10:13 AM, Morand, Sebastien
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to window a batch, but whatever I try, the timestamp is not
>>>>>> working, it's blocked by the group by :
>>>>>>
>>>>>>
>>>>>> I want the transform-comine (the last on my screenshot) starts before
>>>>>> the group_source has read all the files in input, and it's never 
>>>>>> happening.
>>>>>> First it's reading all my files in the group by, and when it's over, it
>>>>>> starts the transform-combine  ...
>>>>>>
>>>>>> I put a FixedWindow with 5 seconds, doesn't change anything, any way
>>>>>> to do so?
>>>>>>
>>>>>> NB, JOB ID : 2017-06-22_10_03_47-4479358064592021427
>>>>>>
>>>>>> thanks by advance
>>>>>>
>>>>>> Sébastien MORAND
>>>>>> Team Lead Solution Architect
>>>>>> Technology & Operations / Digital Factory
>>>>>> Veolia - Group Information Systems & Technology (IS&T)
>>>>>> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08
>>>>>> Bureau 0144C (Ouest)
>>>>>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France
>>>>>> www.veolia.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------------------------
>>>>>> This e-mail transmission (message and any attached files) may contain
>>>>>> information that is proprietary, privileged and/or confidential to Veolia
>>>>>> Environnement and/or its affiliates and is intended exclusively for the
>>>>>> person(s) to whom it is addressed. If you are not the intended recipient,
>>>>>> please notify the sender by return e-mail and delete all copies of this
>>>>>> e-mail, including all attachments. Unless expressly authorized, any use,
>>>>>> disclosure, publication, retransmission or dissemination of this e-mail
>>>>>> and/or of its attachments is strictly prohibited.
>>>>>>
>>>>>> Ce message electronique et ses fichiers attaches sont strictement
>>>>>> confidentiels et peuvent contenir des elements dont Veolia Environnement
>>>>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
>>>>>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
>>>>>> message par erreur, merci de le retourner a son emetteur et de le 
>>>>>> detruire
>>>>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la
>>>>>> publication, la distribution, ou la reproduction non expressement 
>>>>>> autorisees
>>>>>> de ce message et de ses pieces attachees sont interdites.
>>>>>>
>>>>>> --------------------------------------------------------------------------------------------
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>> --------------------------------------------------------------------------------------------
>>> This e-mail transmission (message and any attached files) may contain
>>> information that is proprietary, privileged and/or confidential to Veolia
>>> Environnement and/or its affiliates and is intended exclusively for the
>>> person(s) to whom it is addressed. If you are not the intended recipient,
>>> please notify the sender by return e-mail and delete all copies of this
>>> e-mail, including all attachments. Unless expressly authorized, any use,
>>> disclosure, publication, retransmission or dissemination of this e-mail
>>> and/or of its attachments is strictly prohibited.
>>>
>>> Ce message electronique et ses fichiers attaches sont strictement
>>> confidentiels et peuvent contenir des elements dont Veolia Environnement
>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
>>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
>>> message par erreur, merci de le retourner a son emetteur et de le detruire
>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la
>>> publication, la distribution, ou la reproduction non expressement autorisees
>>> de ce message et de ses pieces attachees sont interdites.
>>>
>>> --------------------------------------------------------------------------------------------
>>
>>
>
>
>
> --------------------------------------------------------------------------------------------
> This e-mail transmission (message and any attached files) may contain
> information that is proprietary, privileged and/or confidential to Veolia
> Environnement and/or its affiliates and is intended exclusively for the
> person(s) to whom it is addressed. If you are not the intended recipient,
> please notify the sender by return e-mail and delete all copies of this
> e-mail, including all attachments. Unless expressly authorized, any use,
> disclosure, publication, retransmission or dissemination of this e-mail
> and/or of its attachments is strictly prohibited.
>
> Ce message electronique et ses fichiers attaches sont strictement
> confidentiels et peuvent contenir des elements dont Veolia Environnement
> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
> message par erreur, merci de le retourner a son emetteur et de le detruire
> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la
> publication, la distribution, ou la reproduction non expressement autorisees
> de ce message et de ses pieces attachees sont interdites.
> --------------------------------------------------------------------------------------------

Reply via email to