Re: "processing lull"

2017-10-29 Thread Jacob Marble
- Upstream steps seem to be slowed down by this PTransform (system lag up
and elements/sec down)
- Unbounded source is PubSubIO

Jacob

On Sun, Oct 29, 2017 at 9:22 PM, Jacob Marble  wrote:

> - This does not happen when I don't use the reshuffle hack
> - HTTP QPS seems to be improved with reshuffling, but also seems to
> burst-and-pause
> - The "processing lull" log entry occurs 4 times every 5 minutes
> - Right now, I'm guessing that "processing lull" means that a map
> operation is taking too long
>
> Jacob
>
> On Sun, Oct 29, 2017 at 9:15 PM, Jacob Marble  wrote:
>
>> Good evening-
>>
>> What should I make of the log warning "processing lull for [instant] in
>> state windmill-read" ?
>>
>> - This happens in a streaming pipeline, in Dataflow.
>> - The DoFn that emits the log entry makes HTTP requests to a third-party.
>> - This only happens when I added a side input to the PTransform, to
>> prevent fusing.
>> - The side input is a SingletonView, just an empty string value.
>>
>> Thanks as usual,
>>
>> Jacob
>>
>> Processing lull for PT300.124S in state windmill-read of [step name]
>>   at sun.misc.Unsafe.park(Native Method)
>>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>>   at com.google.cloud.dataflow.worker.repackaged.com.google.commo
>> n.util.concurrent.AbstractFuture.get(AbstractFuture.java:469)
>>   at com.google.cloud.dataflow.worker.repackaged.com.google.commo
>> n.util.concurrent.AbstractFuture$TrustedFuture.get(
>> AbstractFuture.java:76)
>>   at com.google.cloud.dataflow.worker.MetricTrackingWindmillServe
>> rStub.getStateData(MetricTrackingWindmillServerStub.java:188)
>>   at com.google.cloud.dataflow.worker.WindmillStateReader.startBa
>> tchAndBlock(WindmillStateReader.java:405)
>>   at com.google.cloud.dataflow.worker.WindmillStateReader$Wrapped
>> Future.get(WindmillStateReader.java:306)
>>   at com.google.cloud.dataflow.worker.WindmillStateInternals$Wind
>> millValue.read(WindmillStateInternals.java:384)
>>   at com.google.cloud.dataflow.worker.StreamingSideInputFetcher.b
>> lockedMap(StreamingSideInputFetcher.java:249)
>>   at com.google.cloud.dataflow.worker.StreamingSideInputFetcher.s
>> toreIfBlocked(StreamingSideInputFetcher.java:186)
>>   at com.google.cloud.dataflow.worker.StreamingSideInputDoFnRunne
>> r.processElement(StreamingSideInputDoFnRunner.java:70)
>>   at com.google.cloud.dataflow.worker.SimpleParDoFn.processElemen
>> t(SimpleParDoFn.java:233)
>>   at com.google.cloud.dataflow.worker.util.common.worker.ParDoOpe
>> ration.process(ParDoOperation.java:48)
>>   at com.google.cloud.dataflow.worker.util.common.worker.OutputRe
>> ceiver.process(OutputReceiver.java:52)
>>   at com.google.cloud.dataflow.worker.SimpleParDoFn$1.output(Simp
>> leParDoFn.java:183)
>>
>
>


Re: "processing lull"

2017-10-29 Thread Jacob Marble
- This does not happen when I don't use the reshuffle hack
- HTTP QPS seems to be improved with reshuffling, but also seems to
burst-and-pause
- The "processing lull" log entry occurs 4 times every 5 minutes
- Right now, I'm guessing that "processing lull" means that a map operation
is taking too long

Jacob

On Sun, Oct 29, 2017 at 9:15 PM, Jacob Marble  wrote:

> Good evening-
>
> What should I make of the log warning "processing lull for [instant] in
> state windmill-read" ?
>
> - This happens in a streaming pipeline, in Dataflow.
> - The DoFn that emits the log entry makes HTTP requests to a third-party.
> - This only happens when I added a side input to the PTransform, to
> prevent fusing.
> - The side input is a SingletonView, just an empty string value.
>
> Thanks as usual,
>
> Jacob
>
> Processing lull for PT300.124S in state windmill-read of [step name]
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at com.google.cloud.dataflow.worker.repackaged.com.google.
> common.util.concurrent.AbstractFuture.get(AbstractFuture.java:469)
>   at com.google.cloud.dataflow.worker.repackaged.com.google.
> common.util.concurrent.AbstractFuture$TrustedFuture.
> get(AbstractFuture.java:76)
>   at com.google.cloud.dataflow.worker.MetricTrackingWindmillServerSt
> ub.getStateData(MetricTrackingWindmillServerStub.java:188)
>   at com.google.cloud.dataflow.worker.WindmillStateReader.
> startBatchAndBlock(WindmillStateReader.java:405)
>   at com.google.cloud.dataflow.worker.WindmillStateReader$
> WrappedFuture.get(WindmillStateReader.java:306)
>   at com.google.cloud.dataflow.worker.WindmillStateInternals$
> WindmillValue.read(WindmillStateInternals.java:384)
>   at com.google.cloud.dataflow.worker.StreamingSideInputFetcher.
> blockedMap(StreamingSideInputFetcher.java:249)
>   at com.google.cloud.dataflow.worker.StreamingSideInputFetcher.
> storeIfBlocked(StreamingSideInputFetcher.java:186)
>   at com.google.cloud.dataflow.worker.StreamingSideInputDoFnRunner.
> processElement(StreamingSideInputDoFnRunner.java:70)
>   at com.google.cloud.dataflow.worker.SimpleParDoFn.
> processElement(SimpleParDoFn.java:233)
>   at com.google.cloud.dataflow.worker.util.common.worker.
> ParDoOperation.process(ParDoOperation.java:48)
>   at com.google.cloud.dataflow.worker.util.common.worker.
> OutputReceiver.process(OutputReceiver.java:52)
>   at com.google.cloud.dataflow.worker.SimpleParDoFn$1.output(
> SimpleParDoFn.java:183)
>


"processing lull"

2017-10-29 Thread Jacob Marble
Good evening-

What should I make of the log warning "processing lull for [instant] in
state windmill-read" ?

- This happens in a streaming pipeline, in Dataflow.
- The DoFn that emits the log entry makes HTTP requests to a third-party.
- This only happens when I added a side input to the PTransform, to prevent
fusing.
- The side input is a SingletonView, just an empty string value.

Thanks as usual,

Jacob

Processing lull for PT300.124S in state windmill-read of [step name]
  at sun.misc.Unsafe.park(Native Method)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at
com.google.cloud.dataflow.worker.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:469)
  at
com.google.cloud.dataflow.worker.repackaged.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:76)
  at
com.google.cloud.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:188)
  at
com.google.cloud.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:405)
  at
com.google.cloud.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:306)
  at
com.google.cloud.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:384)
  at
com.google.cloud.dataflow.worker.StreamingSideInputFetcher.blockedMap(StreamingSideInputFetcher.java:249)
  at
com.google.cloud.dataflow.worker.StreamingSideInputFetcher.storeIfBlocked(StreamingSideInputFetcher.java:186)
  at
com.google.cloud.dataflow.worker.StreamingSideInputDoFnRunner.processElement(StreamingSideInputDoFnRunner.java:70)
  at
com.google.cloud.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:233)
  at
com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:48)
  at
com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
  at
com.google.cloud.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:183)


Re: How does Beam set up the bundle size in streaming mode (like Pub/Sub)?

2017-10-29 Thread Derek Hao Hu
Hi Kenneth,

Sure! I've created https://issues.apache.org/jira/browse/BEAM-3117 to track
this.

Derek

On Thu, Oct 26, 2017 at 9:24 PM, Kenneth Knowles  wrote:

> Hi Derek,
>
> I agree, that phrasing is simply incorrect and mightily confusing. Would
> you be up for filing a JIRA with some hyperlinks to the pages that say that?
>
> Kenn
>
> On Sun, Oct 22, 2017 at 9:54 AM, Derek Hao Hu 
> wrote:
>
>> Thanks Kenneth! I sort of feel the notions of bundles and windows are a
>> bit confusing in Beam.
>>
>> For example, here is what the Beam Programming Guide says:
>>
>> "When performing an operation that groups elements in an unbounded
>> PCollection, Beam requires a concept called *windowing* to divide a
>> continuously updating data set into logical windows of finite size. Beam
>> processes each window as a bundle, and processing continues as the data set
>> is generated."
>>
>> So then I would assume "bundles" and "windows" are terms that can be used
>> almost interchangeably.
>>
>> Do you know if there's any good posts / documentations about bundles?
>>
>> Cheers,
>>
>> Derek
>>
>> On Wed, Oct 18, 2017 at 6:59 AM, Kenneth Knowles  wrote:
>>
>>> Bundles are decidedly not windows, so let's keep the two terms separate.
>>> It sounds like you are asking about bundles.
>>>
>>> The bundle size is a performance tuning parameter and is arbitrarily
>>> chosen arbitrarily and dynamically chosen by a runner. The runner chooses
>>> based on its best effort to amortize @StartBundle/@FinishBundle operations
>>> across multiple @ProcessElement/@OnTimer calls. Your code must yield
>>> correct results for for any bundling - you should be implementing
>>> per-element logic, where @StartBundle/@FinishBundle are implementation
>>> details.
>>>
>>> Kenn
>>>
>>> On Tue, Oct 17, 2017 at 5:37 PM, Derek Hao Hu 
>>> wrote:
>>>
 Hi,

 Is there any more detailed explanation on how Beam chooses the window
 size (bundle size) in streaming mode? It seems there is no clear answer in
 the [Beam Programming Guide](https://beam.apache.org
 /documentation/programming-guide/) and I can't find how PubsubIO
 implements this windowing strategy as well. :(

 Could someone kindly provide some pointers? Thanks!
 --
 Derek Hao Hu

 Software Engineer | Snapchat
 Snap Inc.

>>>
>>>
>>
>>
>> --
>> Derek Hao Hu
>>
>> Software Engineer | Snapchat
>> Snap Inc.
>>
>
>


-- 
Derek Hao Hu

Software Engineer | Snapchat
Snap Inc.