Re: Spring with Apache Beam

2019-10-09 Thread Luke Cwik
1. won't work since it is happening at pipeline construction time and not
pipeline execution time.
2. only works if your application context is scoped to the DoFn instance
and doesn't have things you want to possibly share across DoFn instances.

You could also try and make it a PipelineOption that is tagged
with @JsonIgnore and also has a @Default.InstanceFactory like this[1]. This
way when it is accessed by your DoFn it will be initialized for the first
time and shared within your process. Making it a PipelineOption would also
allow you to pass in preinitialized versions for testing.

1:
https://github.com/apache/beam/blob/8267c223425bc201be700babbe596d133b79686e/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L127

On Wed, Oct 9, 2019 at 1:10 PM Jitendra kumavat 
wrote:

> Hi Luke,
>
> Thanks a lot for your reply.
> I tried couple of options which is as follows.
>
> 1. Initialise the context in main method only. and use it.  Creating the
> context:
> new AnnotationConfigApplicationContext(AppConfig.class);
> 2. Creating the context on DoFn.Startup method.
>
> Unfortunately none of the worked perfectly, later works but it has issue
> with @ComponentScan.
> Please let me know your comments for the same.
>
> I will also try this JvmInitializer for context initialisation.
>
> Thanks,
> Jitendra
>
> On Wed, Oct 9, 2019 at 12:48 PM Luke Cwik  wrote:
>
>> -d...@beam.apache.org, +user@beam.apache.org
>>
>> How are you trying to inject your application context?
>> Have you looked at the JvmInitializer.beforeProcessing[1] to create your
>> application context?
>>
>> 1:
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/harness/JvmInitializer.java
>>
>>
>>
>>
>> On Fri, Oct 4, 2019 at 12:32 PM Jitendra kumavat 
>> wrote:
>>
>>> Hi,
>>>
>>> I want to add Spring framework in my apache beam project.  Somehow i am
>>> unable to inject the Spring Application context to executing ParDo
>>> functions. I couldn't find the way to do so? Can you please let me know how
>>> to integrate Spring runtime application context with Apache Beam pipeline.
>>>
>>> Thanks,
>>> Jitendra
>>>
>>


Re: Spring with Apache Beam

2019-10-09 Thread Luke Cwik
-d...@beam.apache.org, +user@beam.apache.org

How are you trying to inject your application context?
Have you looked at the JvmInitializer.beforeProcessing[1] to create your
application context?

1:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/harness/JvmInitializer.java




On Fri, Oct 4, 2019 at 12:32 PM Jitendra kumavat 
wrote:

> Hi,
>
> I want to add Spring framework in my apache beam project.  Somehow i am
> unable to inject the Spring Application context to executing ParDo
> functions. I couldn't find the way to do so? Can you please let me know how
> to integrate Spring runtime application context with Apache Beam pipeline.
>
> Thanks,
> Jitendra
>


Re: Feedback on how we use Apache Beam in my company

2019-10-09 Thread Etienne Chauchot

Very nice !

Thanks

ccing dev list

Etienne

On 09/10/2019 16:55, Pierre Vanacker wrote:


Hi Apache Beam community,

We’ve been working with Apache Beam in production for a few years now 
in my company (Dailymotion).


If you’re interested to know how we use Apache Beam in combination 
with Google Dataflow, we shared this experience in the following 
article : 
https://medium.com/dailymotion/realtime-data-processing-with-apache-beam-and-google-dataflow-at-dailymotion-7d1b994dc816


Thanks to the developers for your great work !

Regards,

Pierre



Feedback on how we use Apache Beam in my company

2019-10-09 Thread Pierre Vanacker
Hi Apache Beam community,

We’ve been working with Apache Beam in production for a few years now in my 
company (Dailymotion).

If you’re interested to know how we use Apache Beam in combination with Google 
Dataflow, we shared this experience in the following article : 
https://medium.com/dailymotion/realtime-data-processing-with-apache-beam-and-google-dataflow-at-dailymotion-7d1b994dc816

Thanks to the developers for your great work !

Regards,

Pierre


Re: Beam discarding massive amount of events due to Window object or inner processing

2019-10-09 Thread Reza Rokni
Hi,

When inserting into PubSub can you set message metadata with the timestamp
from the event? If yes then you can make use of:

https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html#withTimestampAttribute-java.lang.String-

Cheers

Reza

On Wed, 9 Oct 2019 at 16:31, Eddy G  wrote:

> Thanks a lot for the quick response!
>
> I can recall having already played with this when I first deployed this
> consumer and couldn't get around the following issue that I'm getting now
> again...
>
> java.lang.IllegalArgumentException: Cannot output with timestamp
> 2019-10-09T03:12:04.250Z. Output timestamps must be no earlier than the
> timestamp of the current input (2019-10-09T03:12:04.292Z) minus the allowed
> skew (0 milliseconds). See the DoFn#getAllowedTimestampSkew() Javadoc for
> details on changing the allowed skew.
>
> How can I manage skew? Wouldn't it increase as it's happening with the
> current version which uses processing time?
>
> The timestamp that I'm inferring comes straight from the JSON object
> (which is the one looking forward to use) and not from PubSub itself.
>


-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.


Re: Beam discarding massive amount of events due to Window object or inner processing

2019-10-09 Thread Eddy G
Thanks a lot for the quick response!

I can recall having already played with this when I first deployed this 
consumer and couldn't get around the following issue that I'm getting now 
again...

java.lang.IllegalArgumentException: Cannot output with timestamp 
2019-10-09T03:12:04.250Z. Output timestamps must be no earlier than the 
timestamp of the current input (2019-10-09T03:12:04.292Z) minus the allowed 
skew (0 milliseconds). See the DoFn#getAllowedTimestampSkew() Javadoc for 
details on changing the allowed skew.

How can I manage skew? Wouldn't it increase as it's happening with the current 
version which uses processing time?

The timestamp that I'm inferring comes straight from the JSON object (which is 
the one looking forward to use) and not from PubSub itself.