Re: Regarding Beam Slack Channel

2017-12-01 Thread Wesley Tanaka

both sent


On 12/01/2017 06:43 AM, Ziyad Muhammed wrote:

me too, thanks in advance

Best
Ziyad

On Fri, Dec 1, 2017 at 10:54 AM, <linr...@itri.org.tw 
<mailto:linr...@itri.org.tw>> wrote:


Hi
Can I receive this invitation, too?

Thanks
Rick

-Original Message-
From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net
<mailto:j...@nanthrax.net>]
Sent: Friday, December 01, 2017 12:53 PM
To: user@beam.apache.org <mailto:user@beam.apache.org>
Subject: Re: Regarding Beam Slack Channel

Invite sent as well.

Regards
JB

On 11/30/2017 07:19 PM, Yanael Barbier wrote:
> Hello
> Can I get an invite too?
>
> Thanks,
> Yanael
    >
    > Le jeu. 30 nov. 2017 à 19:15, Wesley Tanaka
<wtanaka+b...@wtanaka.com <mailto:wtanaka%2bb...@wtanaka.com>
> <mailto:wtanaka%2bb...@wtanaka.com
<mailto:wtanaka%252bb...@wtanaka.com>>> a écrit :
>
>     Invite sent
>
>
>     On 11/30/2017 08:11 AM, Nalseez Duke wrote:
>>     Hello
>>
>>     Can someone please add me to the Beam slack channel?
>>
>>     Thanks.
>
>
>     --
>     Wesley Tanaka
> https://wtanaka.com/
>

--
Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
http://blog.nanthrax.net
Talend - http://www.talend.com


--
本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。
This email may contain confidential information. Please do not use
or disclose it in any way and delete it if you are not the
intended recipient.




--
Wesley Tanaka
https://wtanaka.com/



Re: Regarding Beam Slack Channel

2017-11-30 Thread Wesley Tanaka

Invite sent

On 11/30/2017 08:11 AM, Nalseez Duke wrote:

Hello

Can someone please add me to the Beam slack channel?

Thanks.



--
Wesley Tanaka
https://wtanaka.com/



Re: Can we transfrom PCollection to ArrayList?

2017-10-31 Thread Wesley Tanaka

Hi Rick,

In the sample code you pasted, one thing to keep in mind is that 
different instances of your DoFn may be processing different input 
elements on different machines.  There's a thread here:


https://lists.apache.org/thread.html/cb1d1fa4db613d885e3fc5eae42d6f97da28615fd3a11c58a2462d17@%3Cuser.beam.apache.org%3E

which lists two possible approaches that you might take.

On 10/30/2017 10:19 PM, linr...@itri.org.tw wrote:


Hi all,

I am Rick

I would like to transform datatype from PCollection to ArrayList, and 
I am not sure if it is right?


My run env is following Java: 1.8 and Beam: 2.1.0

My java code is as:

ArrayList _myList_= *new*ArrayList();

PipelineOptions options= PipelineOptionsFactory./create/();

Pipeline p= Pipeline./create/(options);

PCollection data=p.apply("data",Create./of/(1,2,3,4,5));

PCollection 
_newdata_=data.apply(ParDo./of/(*new*_DoFn<Integer,Integer>()_ {


@ProcessElement

*public**void*processElement(ProcessContext c)

{

*int*datavalue=c.element()+1;

    System.*/out/*.println("data="+datavalue);

c.output(datavalue);

}

}));

p.run();

If any idea could be shared with me, I highly appreciate it.

Thanks

Rick



--
本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 
This email may contain confidential information. Please do not use or 
disclose it in any way and delete it if you are not the intended 
recipient. 



--
Wesley Tanaka
https://wtanaka.com/



Triggers in 0.6.0 buffering behavior aligns with multiples of 10

2017-04-23 Thread Wesley Tanaka
I have a Transform that contains, in order:
* [an unbounded source which eventually moves its watermark to +Infinity when 
it's out of values]* 
Window.triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1))).accumulatingFiredPanes()*
 Combine.globally(myCombineFn)* [a few element-wise type conversions]* A ParDo 
that produces some logging output in processElement
Separately, I have a second ParDo directly attached to the unbounded source to 
also produce logging output.
I noticed that, when I run this pipeline with DirectRunner:
* With 0 input values, I get one NO_FIRING firing* With 1-10 input values, I 
get 1 EARLY firing and one ON_TIME firing* With 11-20 input values, I get 2 
EARLY firings and one ON_TIME firing* With 21-30 input values, I get 3 EARLY 
firings and one ON_TIME firing
Increasing to elementCountAtLeast(10):
* With 0 input values, I get one NO_FIRING firing* With 1-9 input values, I get 
one ON_TIME firing* With 10-19 input values, I get one EARLY firing and one 
ON_TIME firing
Increasing to elementCountAtLeast(12):
* With 0 input values, I get a NO_FIRING firing* With 1-11 input values, I get 
one ON_TIME firing* With 12-19 input values, I get 1 EARLY firing (at 12, 13, 
14, etc) and one ON_TIME firing* With 20-31 input values, I get 1 EARLY firing 
(at 20) and one ON_TIME firing* With 32(!)-40 values, I get 2 EARLY firings 
(1st at 20 and 2nd at 32, 33, etc) and one ON_TIME firing
* With 40-51, I get 2 EARLY firings (1st at 20, 2nd at 40) and one ON_TIME 
firing* With 52..., I get 3 EARLY firings (1st at 20, 2nd at 40, 3rd at 52, 53, 
etc) and one ON_TIME firing...
I realize this satisfies the technical design of triggers (I understand to be: 
"If you specify a trigger, you're not guaranteed it will fire, but you are 
guaranteed that it won't fire more often or earlier than you specified").  I 
also understand it's a good property for DirectRunner to simulate things like 
delays in trigger firing and other behaviors that you might see on a "real" 
runner, but this behavior might also be undesirable since a pipeline author may 
wish to quickly use DirectRunner to confirm their own understanding of 
WindowingStrategy settings, and get confused and think the above behavior is 
due to their error instead of DirectRunner behavior, especially with small 
exploratory PCollections.  It may also be undesirable because getting a 
stronger or more-well-defined guarantee about when panes actually fire (at the 
expense of performance, presumably) might be valuable for something like 
automatically integration testing a pipeline's logic.
A few questions:
1. Is my understanding of trigger semantics correct?
2. Is all this behavior actually a symptom of my UnboundedSource being 
implemented incorrectly, somehow?3. Is the above behavior exactly as intended? 
(including the 0 element case giving NO_FIRING instead of ON_TIME pane?)4. Is 
this a DirectRunner behavior or is it common across runners?  (common across 
SDKS?)5. Is the buffer (which seems to be 10 right now) that's causing this 
behavior configurable, or is it possible to disable it?

---
Wesley Tanaka
https://wtanaka.com/