Re: "processing lull"
- Upstream steps seem to be slowed down by this PTransform (system lag up and elements/sec down) - Unbounded source is PubSubIO Jacob On Sun, Oct 29, 2017 at 9:22 PM, Jacob Marble wrote: > - This does not happen when I don't use the reshuffle hack > - HTTP QPS seems to be improved with reshuffling, but also seems to > burst-and-pause > - The "processing lull" log entry occurs 4 times every 5 minutes > - Right now, I'm guessing that "processing lull" means that a map > operation is taking too long > > Jacob > > On Sun, Oct 29, 2017 at 9:15 PM, Jacob Marble wrote: > >> Good evening- >> >> What should I make of the log warning "processing lull for [instant] in >> state windmill-read" ? >> >> - This happens in a streaming pipeline, in Dataflow. >> - The DoFn that emits the log entry makes HTTP requests to a third-party. >> - This only happens when I added a side input to the PTransform, to >> prevent fusing. >> - The side input is a SingletonView, just an empty string value. >> >> Thanks as usual, >> >> Jacob >> >> Processing lull for PT300.124S in state windmill-read of [step name] >> at sun.misc.Unsafe.park(Native Method) >> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) >> at com.google.cloud.dataflow.worker.repackaged.com.google.commo >> n.util.concurrent.AbstractFuture.get(AbstractFuture.java:469) >> at com.google.cloud.dataflow.worker.repackaged.com.google.commo >> n.util.concurrent.AbstractFuture$TrustedFuture.get( >> AbstractFuture.java:76) >> at com.google.cloud.dataflow.worker.MetricTrackingWindmillServe >> rStub.getStateData(MetricTrackingWindmillServerStub.java:188) >> at com.google.cloud.dataflow.worker.WindmillStateReader.startBa >> tchAndBlock(WindmillStateReader.java:405) >> at com.google.cloud.dataflow.worker.WindmillStateReader$Wrapped >> Future.get(WindmillStateReader.java:306) >> at com.google.cloud.dataflow.worker.WindmillStateInternals$Wind >> millValue.read(WindmillStateInternals.java:384) >> at com.google.cloud.dataflow.worker.StreamingSideInputFetcher.b >> lockedMap(StreamingSideInputFetcher.java:249) >> at com.google.cloud.dataflow.worker.StreamingSideInputFetcher.s >> toreIfBlocked(StreamingSideInputFetcher.java:186) >> at com.google.cloud.dataflow.worker.StreamingSideInputDoFnRunne >> r.processElement(StreamingSideInputDoFnRunner.java:70) >> at com.google.cloud.dataflow.worker.SimpleParDoFn.processElemen >> t(SimpleParDoFn.java:233) >> at com.google.cloud.dataflow.worker.util.common.worker.ParDoOpe >> ration.process(ParDoOperation.java:48) >> at com.google.cloud.dataflow.worker.util.common.worker.OutputRe >> ceiver.process(OutputReceiver.java:52) >> at com.google.cloud.dataflow.worker.SimpleParDoFn$1.output(Simp >> leParDoFn.java:183) >> > >
Re: "processing lull"
- This does not happen when I don't use the reshuffle hack - HTTP QPS seems to be improved with reshuffling, but also seems to burst-and-pause - The "processing lull" log entry occurs 4 times every 5 minutes - Right now, I'm guessing that "processing lull" means that a map operation is taking too long Jacob On Sun, Oct 29, 2017 at 9:15 PM, Jacob Marble wrote: > Good evening- > > What should I make of the log warning "processing lull for [instant] in > state windmill-read" ? > > - This happens in a streaming pipeline, in Dataflow. > - The DoFn that emits the log entry makes HTTP requests to a third-party. > - This only happens when I added a side input to the PTransform, to > prevent fusing. > - The side input is a SingletonView, just an empty string value. > > Thanks as usual, > > Jacob > > Processing lull for PT300.124S in state windmill-read of [step name] > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at com.google.cloud.dataflow.worker.repackaged.com.google. > common.util.concurrent.AbstractFuture.get(AbstractFuture.java:469) > at com.google.cloud.dataflow.worker.repackaged.com.google. > common.util.concurrent.AbstractFuture$TrustedFuture. > get(AbstractFuture.java:76) > at com.google.cloud.dataflow.worker.MetricTrackingWindmillServerSt > ub.getStateData(MetricTrackingWindmillServerStub.java:188) > at com.google.cloud.dataflow.worker.WindmillStateReader. > startBatchAndBlock(WindmillStateReader.java:405) > at com.google.cloud.dataflow.worker.WindmillStateReader$ > WrappedFuture.get(WindmillStateReader.java:306) > at com.google.cloud.dataflow.worker.WindmillStateInternals$ > WindmillValue.read(WindmillStateInternals.java:384) > at com.google.cloud.dataflow.worker.StreamingSideInputFetcher. > blockedMap(StreamingSideInputFetcher.java:249) > at com.google.cloud.dataflow.worker.StreamingSideInputFetcher. > storeIfBlocked(StreamingSideInputFetcher.java:186) > at com.google.cloud.dataflow.worker.StreamingSideInputDoFnRunner. > processElement(StreamingSideInputDoFnRunner.java:70) > at com.google.cloud.dataflow.worker.SimpleParDoFn. > processElement(SimpleParDoFn.java:233) > at com.google.cloud.dataflow.worker.util.common.worker. > ParDoOperation.process(ParDoOperation.java:48) > at com.google.cloud.dataflow.worker.util.common.worker. > OutputReceiver.process(OutputReceiver.java:52) > at com.google.cloud.dataflow.worker.SimpleParDoFn$1.output( > SimpleParDoFn.java:183) >
"processing lull"
Good evening- What should I make of the log warning "processing lull for [instant] in state windmill-read" ? - This happens in a streaming pipeline, in Dataflow. - The DoFn that emits the log entry makes HTTP requests to a third-party. - This only happens when I added a side input to the PTransform, to prevent fusing. - The side input is a SingletonView, just an empty string value. Thanks as usual, Jacob Processing lull for PT300.124S in state windmill-read of [step name] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at com.google.cloud.dataflow.worker.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:469) at com.google.cloud.dataflow.worker.repackaged.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:76) at com.google.cloud.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:188) at com.google.cloud.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:405) at com.google.cloud.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:306) at com.google.cloud.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:384) at com.google.cloud.dataflow.worker.StreamingSideInputFetcher.blockedMap(StreamingSideInputFetcher.java:249) at com.google.cloud.dataflow.worker.StreamingSideInputFetcher.storeIfBlocked(StreamingSideInputFetcher.java:186) at com.google.cloud.dataflow.worker.StreamingSideInputDoFnRunner.processElement(StreamingSideInputDoFnRunner.java:70) at com.google.cloud.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:233) at com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:48) at com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) at com.google.cloud.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:183)
Re: How does Beam set up the bundle size in streaming mode (like Pub/Sub)?
Hi Kenneth, Sure! I've created https://issues.apache.org/jira/browse/BEAM-3117 to track this. Derek On Thu, Oct 26, 2017 at 9:24 PM, Kenneth Knowles wrote: > Hi Derek, > > I agree, that phrasing is simply incorrect and mightily confusing. Would > you be up for filing a JIRA with some hyperlinks to the pages that say that? > > Kenn > > On Sun, Oct 22, 2017 at 9:54 AM, Derek Hao Hu > wrote: > >> Thanks Kenneth! I sort of feel the notions of bundles and windows are a >> bit confusing in Beam. >> >> For example, here is what the Beam Programming Guide says: >> >> "When performing an operation that groups elements in an unbounded >> PCollection, Beam requires a concept called *windowing* to divide a >> continuously updating data set into logical windows of finite size. Beam >> processes each window as a bundle, and processing continues as the data set >> is generated." >> >> So then I would assume "bundles" and "windows" are terms that can be used >> almost interchangeably. >> >> Do you know if there's any good posts / documentations about bundles? >> >> Cheers, >> >> Derek >> >> On Wed, Oct 18, 2017 at 6:59 AM, Kenneth Knowles wrote: >> >>> Bundles are decidedly not windows, so let's keep the two terms separate. >>> It sounds like you are asking about bundles. >>> >>> The bundle size is a performance tuning parameter and is arbitrarily >>> chosen arbitrarily and dynamically chosen by a runner. The runner chooses >>> based on its best effort to amortize @StartBundle/@FinishBundle operations >>> across multiple @ProcessElement/@OnTimer calls. Your code must yield >>> correct results for for any bundling - you should be implementing >>> per-element logic, where @StartBundle/@FinishBundle are implementation >>> details. >>> >>> Kenn >>> >>> On Tue, Oct 17, 2017 at 5:37 PM, Derek Hao Hu >>> wrote: >>> Hi, Is there any more detailed explanation on how Beam chooses the window size (bundle size) in streaming mode? It seems there is no clear answer in the [Beam Programming Guide](https://beam.apache.org /documentation/programming-guide/) and I can't find how PubsubIO implements this windowing strategy as well. :( Could someone kindly provide some pointers? Thanks! -- Derek Hao Hu Software Engineer | Snapchat Snap Inc. >>> >>> >> >> >> -- >> Derek Hao Hu >> >> Software Engineer | Snapchat >> Snap Inc. >> > > -- Derek Hao Hu Software Engineer | Snapchat Snap Inc.