from:"Lars BK"

Re: Reading and writing to external services in DoFns

2017-07-31 Thread Lars BK

Hi Vilhelm,

Thanks, it helps. Yes, it should work in batch. I suppose it can work in
streaming too, if one gets the windowing straight, but I haven't given much
thought to that situation yet! I'm not sure when records written at the end
of the streaming pipeline would be available for lookup in the beginning of
the pipeline, but perhaps one could leverage stateful processing somehow to
not have to worry about that.

Regards,
Lars

On Sat, Jul 29, 2017 at 10:53 AM Vilhelm von Ehrenheim <
vonehrenh...@gmail.com> wrote:

> Hi!
> You will get a slow pipeline if you do the DB lookups from inside a DoFn.
>
> I have done similar things to what you describe but use the "read all data
> and join" approach but only in batch settings so far. But I think it should
> work fine in the streaming setting as well as long as you add to the set of
> previous records. However, if the index of previous records can fit into
> memory on the nodes I would recommend to use a side input instead that you
> do the check against in the DoFn. That should both be fast and work well in
> streaming.
>
> Hope it helps.
>
> Br,
> Vilhelm von Ehrenheim
>
>
>
> On 28 Jul 2017 14:19, "Lars BK" <larsbkrog...@gmail.com> wrote:
>
>> Hi everyone,
>>
>>
>> I'm researching how to handle a particular use case in Beam that I
>> imagine is common, but that I haven't been able to find any agreed upon
>> best way of doing yet.
>>
>> *Use case: *I'm processing a stream or batch of records with ids, and
>> for each record I want to check whether I've ever seen its id before
>> (beyond the scope of the job execution). In particular, I'm going to be
>> using Google Dataflow, and I plan to store and look up ids in Google
>> Datastore.
>>
>> *Question*: Is it advisable to look up the record id in Datastore per
>> element in a DoFn? I am most worried about latency, and I am wary of the
>> recommendation in the documentation for ParDo
>> <https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/transforms/ParDo.html>
>> that says I'd have to be careful when I write to Datastore:
>>
>> > "..if a DoFn's
>> <https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/transforms/DoFn.html>
>> execution has external side-effects, such as performing updates to external
>> HTTP services, then the DoFn's
>> <https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/transforms/DoFn.html>
>> code needs to take care to ensure that those updates are idempotent and
>> that concurrent updates are acceptable."
>>
>> I found a relevant question on StackOverflow
>> <https://stackoverflow.com/questions/40049621/datastore-queries-in-dataflow-dofn-slow-down-pipeline-when-run-in-the-cloud>
>> where a user is doing something very similar to what I had in mind, and
>> another user says that:
>>
>> > "For each partition of your PCollection the calls to Datastore are
>> going to be single-threaded, hence incur a lot of latency."
>>
>> Is this something I should be worried about, and if so, does anyone know
>> of a better way? The second suggestion of the same user is to read all the
>> ids from Datastore and use a CoGroupByKey, I don't think that apporach that
>> would support streaming mode.
>>
>>
>> I hope somebody here has experience with similar patterns, and I'd
>> greatly appreciate any tips you could share!
>>
>> Regards, Lars
>>
>

Reading and writing to external services in DoFns

2017-07-28 Thread Lars BK

Hi everyone,


I'm researching how to handle a particular use case in Beam that I imagine
is common, but that I haven't been able to find any agreed upon best way of
doing yet.

*Use case: *I'm processing a stream or batch of records with ids, and for
each record I want to check whether I've ever seen its id before (beyond
the scope of the job execution). In particular, I'm going to be using
Google Dataflow, and I plan to store and look up ids in Google Datastore.

*Question*: Is it advisable to look up the record id in Datastore per
element in a DoFn? I am most worried about latency, and I am wary of the
recommendation in the documentation for ParDo

that says I'd have to be careful when I write to Datastore:

> "..if a DoFn's

execution has external side-effects, such as performing updates to external
HTTP services, then the DoFn's

code needs to take care to ensure that those updates are idempotent and
that concurrent updates are acceptable."

I found a relevant question on StackOverflow

where a user is doing something very similar to what I had in mind, and
another user says that:

> "For each partition of your PCollection the calls to Datastore are going
to be single-threaded, hence incur a lot of latency."

Is this something I should be worried about, and if so, does anyone know of
a better way? The second suggestion of the same user is to read all the ids
from Datastore and use a CoGroupByKey, I don't think that apporach that
would support streaming mode.


I hope somebody here has experience with similar patterns, and I'd greatly
appreciate any tips you could share!

Regards, Lars

Re: Reprocessing historic data with streaming jobs

2017-05-03 Thread Lars BK

Thanks for your input and sorry for the late reply.

Lukasz, you may be right that running the reprocessing as a batch job will
be better and faster. I'm still experimenting with approach 3 where I
publish all messages and then start the job to let the watermark progress
through the data. It seems to be working fairly well right now, but I'm not
sure that the "somewhat ordered" data I send is "ordered enough". (I can
send data ordered by date, but within each date I can give no guaranees)

Thomas, I had not thought of that, thanks. I like the idea, sounds like it
will handle the merge between archive and live data automatically which
would be very nice.

And Ankur, your case sounds similar. I'm starting to lean towards doing
batch jobs for reprocessing too.

I am going to keep experimenting with different approaches (until I have to
move on), and I'll do my best to update here with my findings later.


Lars

On Mon, May 1, 2017 at 6:51 PM Ankur Chauhan <an...@malloc64.com> wrote:

> I have sort of a similar usecase when dealing with failed / cancelled /
> broken streaming pipelines.
> We have an operator that continuously monitors the min-watermark of the
> pipeline and when it detects that the watermark is not advancing for more
> than some threshold. We start a new pipeline and initiate a "patcher" batch
> dataflow that reads the event backups over the possibly broken time range
> (+/- 1 hour).
> It works out well but has the overhead of having to build out an external
> operator process that can detect when to do the batch dataflow process.
>
> Sent from my iPhone
>
> On May 1, 2017, at 09:37, Thomas Groh <tg...@google.com> wrote:
>
> You should also be able to simply add a Bounded Read from the backup data
> source to your pipeline and flatten it with your Pubsub topic. Because all
> of the elements produced by both the bounded and unbounded sources will
> have consistent timestamps, when you run the pipeline the watermark will be
> held until all of the data is read from the bounded sources. Once this is
> done, your pipeline can continue processing only elements from the PubSub
> source. If you don't want the backlog and the current processing to occur
> in the same pipeline, running the same pipeline but just reading from the
> archival data should be sufficient (all of the processing would be
> identical, just the source would need to change).
>
> If you read from both the "live" and "archival" sources within the same
> pipeline, you will need to use additional machines so the backlog can be
> processed promptly if you use a watermark based trigger; watermarks will be
> held until the bounded source is fully processed.
>
> On Mon, May 1, 2017 at 9:29 AM, Lars BK <larsbkrog...@gmail.com> wrote:
>
>> I did not see Lukasz reply before I posted, and I will have to read it a
>> bit later!
>>
>> man. 1. mai 2017 kl. 18.28 skrev Lars BK <larsbkrog...@gmail.com>:
>>
>>> Yes, precisely.
>>>
>>> I think that could work, yes. What you are suggesting sounds like idea
>>> 2) in my original question.
>>>
>>> My main concern is that I would have to allow a great deal of lateness
>>> and that old windows would consume too much memory. Whether it works in my
>>> case or not I don't know yet as I haven't tested it.
>>>
>>> What if I had to process even older data? Could I handle any "oldness"
>>> of data by increasing the allowed lateness and throwing machines at the
>>> problem to hold all the old windows in memory while the backlog is
>>> processed? If so, great! But I would have to dial the allowed lateness back
>>> down when the processing has caught up with the present.
>>>
>>> Is there some intended way of handling reprocessing like this? Maybe
>>> not? Perhaps it is more of a Pubsub and Dataflow question than a Beam
>>> question when it comes down to it.
>>>
>>>
>>> man. 1. mai 2017 kl. 17.25 skrev Jean-Baptiste Onofré <j...@nanthrax.net>:
>>>
>>>> OK, so the messages are "re-publish" on the topic, with the same
>>>> timestamp as
>>>> the original and consume again by the pipeline.
>>>>
>>>> Maybe, you can play with the allowed lateness and late firings ?
>>>>
>>>> Something like:
>>>>
>>>>Window.into(FixedWindows.of(Duration.minutes(xx)))
>>>>.triggering(AfterWatermark.pastEndOfWindow()
>>>>
>>>>  .withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
>>>>.plusDelayOf(FIVE_MINUT

Re: Reprocessing historic data with streaming jobs

2017-05-01 Thread Lars BK

Yes, precisely.

I think that could work, yes. What you are suggesting sounds like idea 2)
in my original question.

My main concern is that I would have to allow a great deal of lateness and
that old windows would consume too much memory. Whether it works in my case
or not I don't know yet as I haven't tested it.

What if I had to process even older data? Could I handle any "oldness" of
data by increasing the allowed lateness and throwing machines at the
problem to hold all the old windows in memory while the backlog is
processed? If so, great! But I would have to dial the allowed lateness back
down when the processing has caught up with the present.

Is there some intended way of handling reprocessing like this? Maybe not?
Perhaps it is more of a Pubsub and Dataflow question than a Beam question
when it comes down to it.


man. 1. mai 2017 kl. 17.25 skrev Jean-Baptiste Onofré <j...@nanthrax.net>:

> OK, so the messages are "re-publish" on the topic, with the same timestamp
> as
> the original and consume again by the pipeline.
>
> Maybe, you can play with the allowed lateness and late firings ?
>
> Something like:
>
>Window.into(FixedWindows.of(Duration.minutes(xx)))
>.triggering(AfterWatermark.pastEndOfWindow()
>
>  .withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
>.plusDelayOf(FIVE_MINUTES))
>
>  .withLateFirings(AfterProcessingTime.pastFirstElementInPane()
>.plusDelayOf(TEN_MINUTES)))
>.withAllowedLateness(Duration.minutes()
>.accumulatingFiredPanes())
>
> Thoughts ?
>
> Regards
> JB
>
> On 05/01/2017 05:12 PM, Lars BK wrote:
> > Hi Jean-Baptiste,
> >
> > I think the key point in my case is that I have to process or reprocess
> "old"
> > messages. That is, messages that are late because they are streamed from
> an
> > archive file and are older than the allowed lateness in the pipeline.
> >
> > In the case I described the messages had already been processed once and
> no
> > longer in the topic, so they had to be sent and processed again. But it
> might as
> > well have been that I had received a backfill of data that absolutely
> needs to
> > be processed regardless of it being later than the allowed lateness with
> respect
> > to present time.
> >
> > So when I write this now it really sounds like I either need to allow
> more
> > lateness or somehow rewind the watermark!
> >
> > Lars
> >
> > man. 1. mai 2017 kl. 16.34 skrev Jean-Baptiste Onofré <j...@nanthrax.net
> > <mailto:j...@nanthrax.net>>:
> >
> > Hi Lars,
> >
> > interesting use case indeed ;)
> >
> > Just to understand: if possible, you don't want to re-consume the
> messages from
> > the PubSub topic right ? So, you want to "hold" the PCollections for
> late data
> > processing ?
> >
> > Regards
> > JB
> >
> > On 05/01/2017 04:15 PM, Lars BK wrote:
> > > Hi,
> > >
> > > Is there a preferred way of approaching reprocessing historic data
> with
> > > streaming jobs?
> > >
> > > I want to pose this as a general question, but I'm working with
> Pubsub and
> > > Dataflow specifically. I am a fan of the idea of replaying/fast
> forwarding
> > > through historic data to reproduce results (as you perhaps would
> with Kafka),
> > > but I'm having a hard time unifying this way of thinking with the
> concepts of
> > > watermarks and late data in Beam. I'm not sure how to best mimic
> this with the
> > > tools I'm using, or if there is a better way.
> > >
> > > If there is a previous discussion about this I might have missed
> (and I'm
> > > guessing there is), please direct me to it!
> > >
> > >
> > > The use case:
> > >
> > > Suppose I discover a bug in a streaming job with event time
> windows and an
> > > allowed lateness of 7 days, and that I subsequently have to
> reprocess all the
> > > data for the past month. Let us also assume that I have an archive
> of my
> > source
> > > data (in my case in Google cloud storage) and that I can republish
> it all
> > to the
> > > message queue I'm using.
> > >
> > > Some ideas that may or may not work I would love to get your
> thoughts on:
> > >
> > > 1) Start a new instance of the job that reads from a sep

Re: Reprocessing historic data with streaming jobs

2017-05-01 Thread Lars BK

Hi Jean-Baptiste,

I think the key point in my case is that I have to process or reprocess
"old" messages. That is, messages that are late because they are streamed
from an archive file and are older than the allowed lateness in the
pipeline.

In the case I described the messages had already been processed once and no
longer in the topic, so they had to be sent and processed again. But it
might as well have been that I had received a backfill of data that
absolutely needs to be processed regardless of it being later than the
allowed lateness with respect to present time.

So when I write this now it really sounds like I either need to allow more
lateness or somehow rewind the watermark!

Lars

man. 1. mai 2017 kl. 16.34 skrev Jean-Baptiste Onofré <j...@nanthrax.net>:

> Hi Lars,
>
> interesting use case indeed ;)
>
> Just to understand: if possible, you don't want to re-consume the messages
> from
> the PubSub topic right ? So, you want to "hold" the PCollections for late
> data
> processing ?
>
> Regards
> JB
>
> On 05/01/2017 04:15 PM, Lars BK wrote:
> > Hi,
> >
> > Is there a preferred way of approaching reprocessing historic data with
> > streaming jobs?
> >
> > I want to pose this as a general question, but I'm working with Pubsub
> and
> > Dataflow specifically. I am a fan of the idea of replaying/fast
> forwarding
> > through historic data to reproduce results (as you perhaps would with
> Kafka),
> > but I'm having a hard time unifying this way of thinking with the
> concepts of
> > watermarks and late data in Beam. I'm not sure how to best mimic this
> with the
> > tools I'm using, or if there is a better way.
> >
> > If there is a previous discussion about this I might have missed (and I'm
> > guessing there is), please direct me to it!
> >
> >
> > The use case:
> >
> > Suppose I discover a bug in a streaming job with event time windows and
> an
> > allowed lateness of 7 days, and that I subsequently have to reprocess
> all the
> > data for the past month. Let us also assume that I have an archive of my
> source
> > data (in my case in Google cloud storage) and that I can republish it
> all to the
> > message queue I'm using.
> >
> > Some ideas that may or may not work I would love to get your thoughts on:
> >
> > 1) Start a new instance of the job that reads from a separate source to
> which I
> > republish all messages. This shouldn't work because 14 days of my data
> is later
> > than the allowed limit, buy the remaining 7 days should be reprocessed
> as intended.
> >
> > 2) The same as 1), but with allowed lateness of one month. When the job
> is
> > caught up, the lateness can be adjusted back to 7 days. I am afraid this
> > approach may consume too much memory since I'm letting a whole month of
> windows
> > remain in memory. Also I wouldn't get the same triggering behaviour as
> in the
> > original job since most or all of the data is late with respect to the
> > watermark, which I assume is near real time when the historic data
> enters the
> > pipeline.
> >
> > 3) The same as 1), but with the republishing first and only starting the
> new job
> > when all messages are already waiting in the queue. The watermark should
> then
> > start one month back in time and only catch up with the present once all
> the
> > data is reprocessed, yielding no late data. (Experiments I've done with
> this
> > approach produce somewhat unexpected results where early panes that are
> older
> > than 7 days appear to be both the first and the last firing from their
> > respective windows.) Early firings triggered by processing time would
> probably
> > differ by the results should be the same? This approach also feels a bit
> awkward
> > as it requires more orchestration.
> >
> > 4) Batch process the archived data instead and start a streaming job in
> > parallel. Would this in a sense be a more honest approach since I'm
> actually
> > reprocessing batches of archived data? The triggering behaviour in the
> streaming
> > version of the job would not apply in batch, and I would want to avoid
> stitching
> > together results from two jobs if I can.
> >
> >
> > These are the approaches I've thought of currently, and any input is much
> > appreciated.  Have any of you faced similar situations, and how did you
> solve them?
> >
> >
> > Regards,
> > Lars
> >
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reprocessing historic data with streaming jobs

2017-05-01 Thread Lars BK

Hi,

Is there a preferred way of approaching reprocessing historic data with
streaming jobs?

I want to pose this as a general question, but I'm working with Pubsub and
Dataflow specifically. I am a fan of the idea of replaying/fast forwarding
through historic data to reproduce results (as you perhaps would with
Kafka), but I'm having a hard time unifying this way of thinking with the
concepts of watermarks and late data in Beam. I'm not sure how to best
mimic this with the tools I'm using, or if there is a better way.

If there is a previous discussion about this I might have missed (and I'm
guessing there is), please direct me to it!


The use case:

Suppose I discover a bug in a streaming job with event time windows and an
allowed lateness of 7 days, and that I subsequently have to reprocess all
the data for the past month. Let us also assume that I have an archive of
my source data (in my case in Google cloud storage) and that I can
republish it all to the message queue I'm using.

Some ideas that may or may not work I would love to get your thoughts on:

1) Start a new instance of the job that reads from a separate source to
which I republish all messages. This shouldn't work because 14 days of my
data is later than the allowed limit, buy the remaining 7 days should be
reprocessed as intended.

2) The same as 1), but with allowed lateness of one month. When the job is
caught up, the lateness can be adjusted back to 7 days. I am afraid this
approach may consume too much memory since I'm letting a whole month of
windows remain in memory. Also I wouldn't get the same triggering behaviour
as in the original job since most or all of the data is late with respect
to the watermark, which I assume is near real time when the historic data
enters the pipeline.

3) The same as 1), but with the republishing first and only starting the
new job when all messages are already waiting in the queue. The watermark
should then start one month back in time and only catch up with the present
once all the data is reprocessed, yielding no late data. (Experiments I've
done with this approach produce somewhat unexpected results where early
panes that are older than 7 days appear to be both the first and the last
firing from their respective windows.) Early firings triggered by
processing time would probably differ by the results should be the same?
This approach also feels a bit awkward as it requires more orchestration.

4) Batch process the archived data instead and start a streaming job in
parallel. Would this in a sense be a more honest approach since I'm
actually reprocessing batches of archived data? The triggering behaviour in
the streaming version of the job would not apply in batch, and I would want
to avoid stitching together results from two jobs if I can.


These are the approaches I've thought of currently, and any input is much
appreciated.  Have any of you faced similar situations, and how did you
solve them?


Regards,
Lars

Re: Apache Beam Slack channel

2017-04-30 Thread Lars BK

I got no email at first but was able to sign up after asking for a password
reset as the invite was registered with Slack. When I had signed up I
finally got the invite email. Strange, but case closed! Thanks.

Lars.


søn. 30. apr. 2017 kl. 21.43 skrev Jean-Baptiste Onofré <j...@nanthrax.net>:

> Aviem already sent the invite on your gmail address. You should have it in
> your
> mbox.
>
> Can you check please ?
>
> Thanks,
> Regards
> JB
>
> On 04/30/2017 05:31 PM, Lars BK wrote:
> > Yes please.
> >
> > Thanks,
> > Lars
> >
> > On Sun, Apr 30, 2017 at 5:22 PM Jean-Baptiste Onofré <j...@nanthrax.net
> > <mailto:j...@nanthrax.net>> wrote:
> >
> > Hi
> >
> > Should I use your Gmail address ?
> >
> > Regards
> > JB
> > On Apr 30, 2017, at 16:58, Lars BK <larsbkrog...@gmail.com
> > <mailto:larsbkrog...@gmail.com>> wrote:
> >
> > Invitation not received, did something go wrong?
> >
> > Lars
> >
> > On Sun, Apr 30, 2017 at 3:57 PM Aviem Zur <aviem...@gmail.com
> > <mailto:aviem...@gmail.com>> wrote:
> >
> > Invitation sent.
> >
> > On Sun, Apr 30, 2017 at 4:41 PM Lars BK <
> larsbkrog...@gmail.com
> > <mailto:larsbkrog...@gmail.com>> wrote:
> >
> > Hi,
> >
> > I would like to request an invite to the Slack team too.
> >
> > Regards,
> > Lars
> >
> >
> > On 2017-04-28 15:40 (+0200), Ismaël Mejía <
> ieme...@gmail.com
> > <mailto:ieme...@gmail.com>> wrote:
> > > Done.
> > >
> > > On Fri, Apr 28, 2017 at 3:32 PM, Andrew Psaltis
> > <psaltis.and...@gmail.com  psaltis.and...@gmail.com>>
> > > wrote:
> > >
> > > > Please add me as well. Thanks,
> > > >
> > > > On Fri, Apr 28, 2017 at 7:59 AM, Anuj Kumar
> > <anujs...@gmail.com <mailto:anujs...@gmail.com>> wrote:
> > > >
> > > >> Thanks
> > > >>
> > > >> On Fri, Apr 28, 2017 at 3:56 PM, Aviem Zur
> > <aviem...@gmail.com <mailto:aviem...@gmail.com>> wrote:
> > > >>
> > > >>> Invitation sent.
> > > >>>
> > > >>> On Fri, Apr 28, 2017 at 1:24 PM Anuj Kumar
> > <anujs...@gmail.com <mailto:anujs...@gmail.com>> wrote:
> > > >>>
> > > >>>> Please add me. Thanks.
> > > >>>>
> > > >>>> On Fri, Apr 28, 2017 at 9:20 AM, Tom Pollard <
> > > >>>> tpoll...@flashpoint-intel.com
> > <mailto:tpoll...@flashpoint-intel.com>> wrote:
> > > >>>>
> > > >>>>> Done
> > > >>>>>
> > > >>>>>
> > > >>>>> On Apr 27, 2017, at 11:48 PM, Sai Boorlagadda <
> > > >>>>> sai.boorlaga...@gmail.com
> > <mailto:sai.boorlaga...@gmail.com>> wrote:
> > > >>>>>
> > > >>>>> Please include me as well.
> > > >>>>>
> > > >>>>> Sai
> > > >>>>>
> > > >>>>> On Thu, Apr 27, 2017 at 5:59 PM, Davor Bonaci
> > <da...@apache.org <mailto:da...@apache.org>>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> (There were already done by someone.)
> > > >>>>>>
> > > >>>>>> On Thu, Apr 27, 2017 at 1:53 PM, Tony Moulton <
> > > >>>>>> tmoul...@flashpoint-intel.com
> > <mailto:tmoul...@flashpoint-intel.com>>

Re: Apache Beam Slack channel

2017-04-30 Thread Lars BK

Yes please.

Thanks,
Lars

On Sun, Apr 30, 2017 at 5:22 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi
>
> Should I use your Gmail address ?
>
> Regards
> JB
> On Apr 30, 2017, at 16:58, Lars BK <larsbkrog...@gmail.com> wrote:
>>
>> Invitation not received, did something go wrong?
>>
>> Lars
>>
>> On Sun, Apr 30, 2017 at 3:57 PM Aviem Zur <aviem...@gmail.com> wrote:
>>
>>> Invitation sent.
>>>
>>> On Sun, Apr 30, 2017 at 4:41 PM Lars BK <larsbkrog...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I would like to request an invite to the Slack team too.
>>>>
>>>> Regards,
>>>> Lars
>>>>
>>>>
>>>> On 2017-04-28 15:40 (+0200), Ismaël Mejía <ieme...@gmail.com> wrote:
>>>> > Done.
>>>> >
>>>> > On Fri, Apr 28, 2017 at 3:32 PM, Andrew Psaltis <
>>>> psaltis.and...@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > Please add me as well. Thanks,
>>>> > >
>>>> > > On Fri, Apr 28, 2017 at 7:59 AM, Anuj Kumar <anujs...@gmail.com>
>>>> wrote:
>>>> > >
>>>> > >> Thanks
>>>> > >>
>>>> > >> On Fri, Apr 28, 2017 at 3:56 PM, Aviem Zur <aviem...@gmail.com>
>>>> wrote:
>>>> > >>
>>>> > >>> Invitation sent.
>>>> > >>>
>>>> > >>> On Fri, Apr 28, 2017 at 1:24 PM Anuj Kumar <anujs...@gmail.com>
>>>> wrote:
>>>> > >>>
>>>> > >>>> Please add me. Thanks.
>>>> > >>>>
>>>> > >>>> On Fri, Apr 28, 2017 at 9:20 AM, Tom Pollard <
>>>> > >>>> tpoll...@flashpoint-intel.com> wrote:
>>>> > >>>>
>>>> > >>>>> Done
>>>> > >>>>>
>>>> > >>>>>
>>>> > >>>>> On Apr 27, 2017, at 11:48 PM, Sai Boorlagadda <
>>>> > >>>>> sai.boorlaga...@gmail.com> wrote:
>>>> > >>>>>
>>>> > >>>>> Please include me as well.
>>>> > >>>>>
>>>> > >>>>> Sai
>>>> > >>>>>
>>>> > >>>>> On Thu, Apr 27, 2017 at 5:59 PM, Davor Bonaci <da...@apache.org
>>>> >
>>>> > >>>>> wrote:
>>>> > >>>>>
>>>> > >>>>>> (There were already done by someone.)
>>>> > >>>>>>
>>>> > >>>>>> On Thu, Apr 27, 2017 at 1:53 PM, Tony Moulton <
>>>> > >>>>>> tmoul...@flashpoint-intel.com> wrote:
>>>> > >>>>>>
>>>> > >>>>>>> Please include me as well during the next batch of Slack
>>>> additions.
>>>> > >>>>>>> Thanks!
>>>> > >>>>>>>
>>>> > >>>>>>> —
>>>> > >>>>>>> Tony
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>> On Apr 27, 2017, at 4:51 PM, <oscar.b.rodrig...@accenture.com>
>>>> <
>>>> > >>>>>>> oscar.b.rodrig...@accenture.com> wrote:
>>>> > >>>>>>>
>>>> > >>>>>>> Hi there,
>>>> > >>>>>>>
>>>> > >>>>>>> Can you please add me to the Apache Beam Slack channel?
>>>> > >>>>>>>
>>>> > >>>>>>> Thanks
>>>> > >>>>>>> -Oscar
>>>> > >>>>>>>
>>>> > >>>>>>> Oscar Rodriguez
>>>> > >>>>>>> Solution Architect
>>>> > >>>>>>> Google CoE | Accenture Cloud
>>>> > >>>>>>> M +1 718-440-0881 <(718)%20440-0881> <(718)%20440-0881> | W +1
>>>> 917-452-3923 <(917)%20452-3923>
>>>> > >>>>>>> <(917)%20452-3923>
>>>> > >>>>>>> email: oscar.b.rodrig...@accenture.com
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>> --
>>>> > >>>>>>>
>>>> > >>>>>>> This message is for the designated recipient only and may
>>>> contain
>>>> > >>>>>>> privileged, proprietary, or otherwise confidential
>>>> information. If you have
>>>> > >>>>>>> received it in error, please notify the sender immediately
>>>> and delete the
>>>> > >>>>>>> original. Any other use of the e-mail by you is prohibited.
>>>> Where allowed
>>>> > >>>>>>> by local law, electronic communications with Accenture and
>>>> its affiliates,
>>>> > >>>>>>> including e-mail and instant messaging (including content),
>>>> may be scanned
>>>> > >>>>>>> by our systems for the purposes of information security and
>>>> assessment of
>>>> > >>>>>>> internal compliance with Accenture policy.
>>>> > >>>>>>> 
>>>> > >>>>>>> __
>>>> > >>>>>>>
>>>> > >>>>>>> www.accenture.com
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>
>>>> > >>>>>
>>>> > >>>>>
>>>> > >>>>
>>>> > >>
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Thanks,
>>>> > > Andrew
>>>> > >
>>>> > > Subscribe to my book: Streaming Data <http://manning.com/psaltis>
>>>> > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
>>>> > > twiiter: @itmdata <
>>>> http://twitter.com/intent/user?screen_name=itmdata>
>>>> > >
>>>> >
>>>>
>>>

Re: Apache Beam Slack channel

2017-04-30 Thread Lars BK

Hi,

I would like to request an invite to the Slack team too.

Regards,
Lars


On 2017-04-28 15:40 (+0200), IsmaÃ«l MejÃa  wrote: 
> Done.
> 
> On Fri, Apr 28, 2017 at 3:32 PM, Andrew Psaltis 
> wrote:
> 
> > Please add me as well. Thanks,
> >
> > On Fri, Apr 28, 2017 at 7:59 AM, Anuj Kumar  wrote:
> >
> >> Thanks
> >>
> >> On Fri, Apr 28, 2017 at 3:56 PM, Aviem Zur  wrote:
> >>
> >>> Invitation sent.
> >>>
> >>> On Fri, Apr 28, 2017 at 1:24 PM Anuj Kumar  wrote:
> >>>
>  Please add me. Thanks.
> 
>  On Fri, Apr 28, 2017 at 9:20 AM, Tom Pollard <
>  tpoll...@flashpoint-intel.com> wrote:
> 
> > Done
> >
> >
> > On Apr 27, 2017, at 11:48 PM, Sai Boorlagadda <
> > sai.boorlaga...@gmail.com> wrote:
> >
> > Please include me as well.
> >
> > Sai
> >
> > On Thu, Apr 27, 2017 at 5:59 PM, Davor Bonaci 
> > wrote:
> >
> >> (There were already done by someone.)
> >>
> >> On Thu, Apr 27, 2017 at 1:53 PM, Tony Moulton <
> >> tmoul...@flashpoint-intel.com> wrote:
> >>
> >>> Please include me as well during the next batch of Slack additions.
> >>> Thanks!
> >>>
> >>> â
> >>> Tony
> >>>
> >>>
> >>>
> >>> On Apr 27, 2017, at 4:51 PM,  <
> >>> oscar.b.rodrig...@accenture.com> wrote:
> >>>
> >>> Hi there,
> >>>
> >>> Can you please add me to the Apache Beam Slack channel?
> >>>
> >>> Thanks
> >>> -Oscar
> >>>
> >>> Oscar Rodriguez
> >>> Solution Architect
> >>> Google CoE | Accenture Cloud
> >>> M +1 718-440-0881 <(718)%20440-0881> | W +1 917-452-3923
> >>> <(917)%20452-3923>
> >>> email: oscar.b.rodrig...@accenture.com
> >>>
> >>>
> >>> --
> >>>
> >>> This message is for the designated recipient only and may contain
> >>> privileged, proprietary, or otherwise confidential information. If 
> >>> you have
> >>> received it in error, please notify the sender immediately and delete 
> >>> the
> >>> original. Any other use of the e-mail by you is prohibited. Where 
> >>> allowed
> >>> by local law, electronic communications with Accenture and its 
> >>> affiliates,
> >>> including e-mail and instant messaging (including content), may be 
> >>> scanned
> >>> by our systems for the purposes of information security and 
> >>> assessment of
> >>> internal compliance with Accenture policy.
> >>> 
> >>> __
> >>>
> >>> www.accenture.com
> >>>
> >>>
> >>>
> >>
> >
> >
> 
> >>
> >
> >
> > --
> > Thanks,
> > Andrew
> >
> > Subscribe to my book: Streaming Data 
> > 
> > twiiter: @itmdata 
> >
>

Re: Reading and writing to external services in DoFns

Reading and writing to external services in DoFns

Re: Reprocessing historic data with streaming jobs

Re: Reprocessing historic data with streaming jobs

Re: Reprocessing historic data with streaming jobs

Reprocessing historic data with streaming jobs

Re: Apache Beam Slack channel

Re: Apache Beam Slack channel

Re: Apache Beam Slack channel

9 matches

Site Navigation

Mail list logo

Footer information