[jira] [Work logged] (BEAM-7303) Move Portable Runner and other of reference runner.
[ https://issues.apache.org/jira/browse/BEAM-7303?focusedWorklogId=338008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338008 ] ASF GitHub Bot logged work on BEAM-7303: Author: ASF GitHub Bot Created on: 04/Nov/19 07:31 Start Date: 04/Nov/19 07:31 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #9936: [BEAM-7303] Move PortableRunner from runners.reference to java.sdks.portability package URL: https://github.com/apache/beam/pull/9936#issuecomment-549246286 @angoenka , @mxm - thanks for the review, you both asked the same question :) My reasoning was that since technically, `PortableRunner` isn't a runner, it would be better to have it in SDK package. It's also analogous to how Python classes are organized - Python PortableRunner is in `sdks/python` and there are no non-Java runners in `runners` package. I can move the runner into `runners.portability` if you think it makes more sense. I'm ok with either. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338008) Time Spent: 40m (was: 0.5h) > Move Portable Runner and other of reference runner. > --- > > Key: BEAM-7303 > URL: https://issues.apache.org/jira/browse/BEAM-7303 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > PortableRunner is used by all Flink, Spark ... . > We should move it out of Reference Runner package to stream line the > dependencies. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python
[ https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=337934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337934 ] ASF GitHub Bot logged work on BEAM-7886: Author: ASF GitHub Bot Created on: 04/Nov/19 03:49 Start Date: 04/Nov/19 03:49 Worklog Time Spent: 10m Work Description: tweise commented on issue #9188: [BEAM-7886] Make row coder a standard coder and implement in Python URL: https://github.com/apache/beam/pull/9188#issuecomment-549217718 Excited to see this happening. Perhaps a good time to squash the fixup commits? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337934) Time Spent: 14h 50m (was: 14h 40m) > Make row coder a standard coder and implement in python > --- > > Key: BEAM-7886 > URL: https://issues.apache.org/jira/browse/BEAM-7886 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-java-core, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 14h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=337933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337933 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 04/Nov/19 03:40 Start Date: 04/Nov/19 03:40 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-549216682 This is pretty much the same PR as before, except it checks the runner via string matching. It LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337933) Time Spent: 5.5h (was: 5h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8374) PublishResult returned by SnsIO is missing sdkResponseMetadata and sdkHttpMetadata
[ https://issues.apache.org/jira/browse/BEAM-8374?focusedWorklogId=337923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337923 ] ASF GitHub Bot logged work on BEAM-8374: Author: ASF GitHub Bot Created on: 04/Nov/19 02:49 Start Date: 04/Nov/19 02:49 Worklog Time Spent: 10m Work Description: jfarr commented on issue #9758: [BEAM-8374] Fixes bug in SnsIO PublishResultCoder URL: https://github.com/apache/beam/pull/9758#issuecomment-549210224 @lukecwik @iemejia I reverted the original coder and added a new "full" (all fields) and "minimal" (no HTTP headers) coder, and also factored out new coders for the common SDK objects. If this looks more like it then I'll flesh out the unit tests, please let me know. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337923) Time Spent: 3h (was: 2h 50m) > PublishResult returned by SnsIO is missing sdkResponseMetadata and > sdkHttpMetadata > -- > > Key: BEAM-8374 > URL: https://issues.apache.org/jira/browse/BEAM-8374 > Project: Beam > Issue Type: Bug > Components: io-java-aws >Affects Versions: 2.13.0, 2.14.0, 2.15.0 >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: Minor > Time Spent: 3h > Remaining Estimate: 0h > > Currently the PublishResultCoder in SnsIO only serializes the messageId field > so the PublishResult returned by Beam returns null for > getSdkResponseMetadata() and getSdkHttpMetadata(). This makes it impossible > to check the HTTP status for errors, which is necessary since this is not > handled in SnsIO. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems
[ https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966373#comment-16966373 ] Chad Dombrova commented on BEAM-7870: - Note that there is not yet support in python for registering a native type to replace the row, as there is in Java. In the example above we'd end up with the {{TypedDict}} subclass instead of {{gcp.pubsub.PubsubMessage}}. This could be a follow up task to row coder support. [~bhulette], do we have a Jira for registering native types for row converters yet? > Externally configured KafkaIO / PubsubIO consumer causes coder problems > --- > > Key: BEAM-7870 > URL: https://issues.apache.org/jira/browse/BEAM-7870 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > > There are limitations for the consumer to work correctly. The biggest issue > is the structure of KafkaIO itself, which uses a combination of the source > interface and DoFns to generate the desired output. The problem is that the > source interface is natively translated by the Flink Runner to support > unbounded sources in portability, while the DoFn runs in a Java environment. > To transfer data between the two a coder needs to be involved. It happens to > be that the initial read does not immediately drop the KafakRecord structure > which does not work together well with our current assumption of only > supporting "standard coders" present in all SDKs. Only the subsequent DoFn > converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn > won't have the coder available in its environment. > There are several possible solutions: > 1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in > the Flink Runner > 2. Modify KafkaIO to immediately drop the KafkaRecord structure > 3. Add the KafkaRecordCoder to all SDKs > 4. Add a generic coder, e.g. AvroCoder to all SDKs > For a workaround which uses (3), please see this patch which is not a proper > fix but adds KafkaRecordCoder to the SDK such that it can be used > encode/decode records: > [https://github.com/mxm/beam/commit/b31cf99c75b3972018180d8ccc7e73d311f4cfed] > > See also > [https://github.com/apache/beam/pull/8251|https://github.com/apache/beam/pull/8251:] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems
[ https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966364#comment-16966364 ] Thomas Weise edited comment on BEAM-7870 at 11/4/19 2:25 AM: - Thanks for the update. I would also prefer the specific record type (PubsubMessage/KafkaRecord) over generic Row and this approach is a nice compromise. was (Author: thw): Thanks for the update. I would also prefer the specific record type (PubsubMessage/KafkaRecord) over generic Row. > Externally configured KafkaIO / PubsubIO consumer causes coder problems > --- > > Key: BEAM-7870 > URL: https://issues.apache.org/jira/browse/BEAM-7870 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > > There are limitations for the consumer to work correctly. The biggest issue > is the structure of KafkaIO itself, which uses a combination of the source > interface and DoFns to generate the desired output. The problem is that the > source interface is natively translated by the Flink Runner to support > unbounded sources in portability, while the DoFn runs in a Java environment. > To transfer data between the two a coder needs to be involved. It happens to > be that the initial read does not immediately drop the KafakRecord structure > which does not work together well with our current assumption of only > supporting "standard coders" present in all SDKs. Only the subsequent DoFn > converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn > won't have the coder available in its environment. > There are several possible solutions: > 1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in > the Flink Runner > 2. Modify KafkaIO to immediately drop the KafkaRecord structure > 3. Add the KafkaRecordCoder to all SDKs > 4. Add a generic coder, e.g. AvroCoder to all SDKs > For a workaround which uses (3), please see this patch which is not a proper > fix but adds KafkaRecordCoder to the SDK such that it can be used > encode/decode records: > [https://github.com/mxm/beam/commit/b31cf99c75b3972018180d8ccc7e73d311f4cfed] > > See also > [https://github.com/apache/beam/pull/8251|https://github.com/apache/beam/pull/8251:] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems
[ https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966364#comment-16966364 ] Thomas Weise commented on BEAM-7870: Thanks for the update. I would also prefer the specific record type (PubsubMessage/KafkaRecord) over generic Row. > Externally configured KafkaIO / PubsubIO consumer causes coder problems > --- > > Key: BEAM-7870 > URL: https://issues.apache.org/jira/browse/BEAM-7870 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > > There are limitations for the consumer to work correctly. The biggest issue > is the structure of KafkaIO itself, which uses a combination of the source > interface and DoFns to generate the desired output. The problem is that the > source interface is natively translated by the Flink Runner to support > unbounded sources in portability, while the DoFn runs in a Java environment. > To transfer data between the two a coder needs to be involved. It happens to > be that the initial read does not immediately drop the KafakRecord structure > which does not work together well with our current assumption of only > supporting "standard coders" present in all SDKs. Only the subsequent DoFn > converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn > won't have the coder available in its environment. > There are several possible solutions: > 1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in > the Flink Runner > 2. Modify KafkaIO to immediately drop the KafkaRecord structure > 3. Add the KafkaRecordCoder to all SDKs > 4. Add a generic coder, e.g. AvroCoder to all SDKs > For a workaround which uses (3), please see this patch which is not a proper > fix but adds KafkaRecordCoder to the SDK such that it can be used > encode/decode records: > [https://github.com/mxm/beam/commit/b31cf99c75b3972018180d8ccc7e73d311f4cfed] > > See also > [https://github.com/apache/beam/pull/8251|https://github.com/apache/beam/pull/8251:] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
[ https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337883 ] ASF GitHub Bot logged work on BEAM-8347: Author: ASF GitHub Bot Created on: 03/Nov/19 23:56 Start Date: 03/Nov/19 23:56 Worklog Time Spent: 10m Work Description: drobert commented on issue #9820: [BEAM-8347]: Consistently advance UnboundedRabbitMqReader watermark URL: https://github.com/apache/beam/pull/9820#issuecomment-549192723 @jkff This has been rebased and should be ready to merge. @jbonofre you have requested changes that I believe @jkff and I disagree with. Can you comment or remove the request? I did make one notable change since the last review which is I now explicit set `current`, `currentTimestamp` and `currentRecordId` to `null` per this javadoc in `Source` for each corresponding getter: ```java @throws NoSuchElementException ... if the last {@link #start} or {@link#advance} returned @code false}. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337883) Time Spent: 3.5h (was: 3h 20m) > UnboundedRabbitMqReader can fail to advance watermark if no new data comes in > - > > Key: BEAM-8347 > URL: https://issues.apache.org/jira/browse/BEAM-8347 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Affects Versions: 2.15.0 > Environment: testing has been done using the DirectRunner. I also > have DataflowRunner available >Reporter: Daniel Robert >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > I stumbled upon this and then saw a similar StackOverflow post: > [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance] > When calling `advance()` if there are no messages, no state changes, > including no changes to the CheckpointMark or Watermark. If there is a > relatively constant rate of new messages coming in, this is not a problem. If > data is bursty, and there are periods of no new messages coming in, the > watermark will never advance. > Contrast this with some of the logic in PubsubIO which will make provisions > for periods of inactivity to advance the watermark (although it, too, is > imperfect: https://issues.apache.org/jira/browse/BEAM-7322 ) > The example given in the StackOverflow post is something like this: > > {code:java} > pipeline > .apply(RabbitMqIO.read() > .withUri("amqp://guest:guest@localhost:5672") > .withQueue("test") > .apply("Windowing", > Window.into( > FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow()) > .withAllowedLateness(Duration.ZERO) > .accumulatingFiredPanes()){code} > If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a > window that never performs an on time trigger. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
[ https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337882 ] ASF GitHub Bot logged work on BEAM-8347: Author: ASF GitHub Bot Created on: 03/Nov/19 23:54 Start Date: 03/Nov/19 23:54 Worklog Time Spent: 10m Work Description: drobert commented on issue #9820: [BEAM-8347]: Consistently advance UnboundedRabbitMqReader watermark URL: https://github.com/apache/beam/pull/9820#issuecomment-549192723 @jkff This has been rebased and should be ready to merge. I did make one notable change since the last review which is I now explicit set `current`, `currentTimestamp` and `currentRecordId` to `null` per this javadoc in `Source` for each corresponding getter: ```java @throws NoSuchElementException ... if the last {@link #start} or {@link#advance} returned @code false}. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337882) Time Spent: 3h 20m (was: 3h 10m) > UnboundedRabbitMqReader can fail to advance watermark if no new data comes in > - > > Key: BEAM-8347 > URL: https://issues.apache.org/jira/browse/BEAM-8347 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Affects Versions: 2.15.0 > Environment: testing has been done using the DirectRunner. I also > have DataflowRunner available >Reporter: Daniel Robert >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > I stumbled upon this and then saw a similar StackOverflow post: > [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance] > When calling `advance()` if there are no messages, no state changes, > including no changes to the CheckpointMark or Watermark. If there is a > relatively constant rate of new messages coming in, this is not a problem. If > data is bursty, and there are periods of no new messages coming in, the > watermark will never advance. > Contrast this with some of the logic in PubsubIO which will make provisions > for periods of inactivity to advance the watermark (although it, too, is > imperfect: https://issues.apache.org/jira/browse/BEAM-7322 ) > The example given in the StackOverflow post is something like this: > > {code:java} > pipeline > .apply(RabbitMqIO.read() > .withUri("amqp://guest:guest@localhost:5672") > .withQueue("test") > .apply("Windowing", > Window.into( > FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow()) > .withAllowedLateness(Duration.ZERO) > .accumulatingFiredPanes()){code} > If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a > window that never performs an on time trigger. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
[ https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337881 ] ASF GitHub Bot logged work on BEAM-8347: Author: ASF GitHub Bot Created on: 03/Nov/19 23:49 Start Date: 03/Nov/19 23:49 Worklog Time Spent: 10m Work Description: drobert commented on pull request #9820: [BEAM-8347]: Consistently advance UnboundedRabbitMqReader watermark URL: https://github.com/apache/beam/pull/9820#discussion_r341881187 ## File path: sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java ## @@ -530,6 +543,10 @@ public boolean advance() throws IOException { // we consume message without autoAck (we want to do the ack ourselves) GetResponse delivery = channel.basicGet(queueName, false); if (delivery == null) { + current = null; Review comment: Notable change since the last round of reviews: The documentation for `Source`'s `getCurrent`, `getCurrentRecordId` and `getCurrentTimestamp` include the following: > * @throws NoSuchElementException if the reader is at the beginning of the input and {@link > * #start} or {@link #advance} wasn't called, or if the last {@link #start} or {@link > * #advance} returned {@code false}. This states that if `advance()` returns `false`, subsequent calls to these should throw `NoSuchElementException`. As such, I'm explicitly setting these to `null`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337881) Time Spent: 3h 10m (was: 3h) > UnboundedRabbitMqReader can fail to advance watermark if no new data comes in > - > > Key: BEAM-8347 > URL: https://issues.apache.org/jira/browse/BEAM-8347 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Affects Versions: 2.15.0 > Environment: testing has been done using the DirectRunner. I also > have DataflowRunner available >Reporter: Daniel Robert >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > I stumbled upon this and then saw a similar StackOverflow post: > [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance] > When calling `advance()` if there are no messages, no state changes, > including no changes to the CheckpointMark or Watermark. If there is a > relatively constant rate of new messages coming in, this is not a problem. If > data is bursty, and there are periods of no new messages coming in, the > watermark will never advance. > Contrast this with some of the logic in PubsubIO which will make provisions > for periods of inactivity to advance the watermark (although it, too, is > imperfect: https://issues.apache.org/jira/browse/BEAM-7322 ) > The example given in the StackOverflow post is something like this: > > {code:java} > pipeline > .apply(RabbitMqIO.read() > .withUri("amqp://guest:guest@localhost:5672") > .withQueue("test") > .apply("Windowing", > Window.into( > FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow()) > .withAllowedLateness(Duration.ZERO) > .accumulatingFiredPanes()){code} > If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a > window that never performs an on time trigger. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
[ https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337875 ] ASF GitHub Bot logged work on BEAM-8347: Author: ASF GitHub Bot Created on: 03/Nov/19 23:34 Start Date: 03/Nov/19 23:34 Worklog Time Spent: 10m Work Description: drobert commented on issue #9820: [BEAM-8347]: Consistently advance UnboundedRabbitMqReader watermark URL: https://github.com/apache/beam/pull/9820#issuecomment-549189280 Update: after running some tests, the migration to the pull-based API in https://github.com/apache/beam/pull/9900 proves to be unsafe for this use case. I misread or misunderstood the semantics but it seems `channel.basicGet` blocks indefinitely, as opposed to returning immediately if there are no messages. This means if no new messages come in for a long period of time, the Runner will detect that the reader is 'stuck' since it's been on a single blocking call for so many minutes. Basically, we need to revert to a push-based-but-blocking API for this. *However* that is a separate issue from the watermarks advancing. I don't think that broken merge should block this PR once rebased. PR for that issue is here: https://github.com/apache/beam/pull/9977 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337875) Time Spent: 3h (was: 2h 50m) > UnboundedRabbitMqReader can fail to advance watermark if no new data comes in > - > > Key: BEAM-8347 > URL: https://issues.apache.org/jira/browse/BEAM-8347 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Affects Versions: 2.15.0 > Environment: testing has been done using the DirectRunner. I also > have DataflowRunner available >Reporter: Daniel Robert >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > I stumbled upon this and then saw a similar StackOverflow post: > [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance] > When calling `advance()` if there are no messages, no state changes, > including no changes to the CheckpointMark or Watermark. If there is a > relatively constant rate of new messages coming in, this is not a problem. If > data is bursty, and there are periods of no new messages coming in, the > watermark will never advance. > Contrast this with some of the logic in PubsubIO which will make provisions > for periods of inactivity to advance the watermark (although it, too, is > imperfect: https://issues.apache.org/jira/browse/BEAM-7322 ) > The example given in the StackOverflow post is something like this: > > {code:java} > pipeline > .apply(RabbitMqIO.read() > .withUri("amqp://guest:guest@localhost:5672") > .withQueue("test") > .apply("Windowing", > Window.into( > FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow()) > .withAllowedLateness(Duration.ZERO) > .accumulatingFiredPanes()){code} > If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a > window that never performs an on time trigger. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API
[ https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337873 ] ASF GitHub Bot logged work on BEAM-7434: Author: ASF GitHub Bot Created on: 03/Nov/19 23:32 Start Date: 03/Nov/19 23:32 Worklog Time Spent: 10m Work Description: drobert commented on pull request #9977: [BEAM-7434] [BEAM-5895] and [BEAM-5894] Fix upgrade to rabbit amqp-client 5.x URL: https://github.com/apache/beam/pull/9977#discussion_r341880189 ## File path: sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/SingleQueueingConsumer.java ## @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.rabbitmq; + +import com.rabbitmq.client.AMQP; +import com.rabbitmq.client.Channel; +import com.rabbitmq.client.ConsumerCancelledException; +import com.rabbitmq.client.DefaultConsumer; +import com.rabbitmq.client.Delivery; +import com.rabbitmq.client.Envelope; +import com.rabbitmq.client.ShutdownSignalException; +import java.io.IOException; +import java.util.concurrent.BlockingDeque; +import java.util.concurrent.LinkedBlockingDeque; +import java.util.concurrent.TimeUnit; +import javax.annotation.Nullable; + +/** + * An implementation of a Consumer (push-based api for rabbit) that accepts at most one message at a + * time from the server and allows the delivered message to be polled by a caller with a timeout. + * + * Because AMQP client 5.x removed the deprecated-in-4.x QueueingConsumer this is effectively a + * re-implementation of the 4.x line's with a few changes notable changes: + * + * + * The original was designed to solve a threading concern which no longer applies. The poison + * message and some other workarounds should not be relevant. + * The original used an unbounded LinkedBlockingQueue. Delivery of messages among multiple + * Beam shards will be more 'fair' if each has only a single unprocessed message. This + * implementation limits enqueued messages to a single one. + * Channel thread will block for up to a configurable length of time (default: 10 minutes) for + * the queue to have an open slot. Previous it would nominally fail immediately, which would + * never happen as the queue was unbounded. + * Exceptions have been simplified to only expose IOException to callers. + * + * + * Note: setting amqp prefetch to 1 does not accomplish the same thing. Prefetch limits the + * number of messages that will be sent to the consumer (good) *which have not been acknowledged + * yet* (bad for Beam). Because acknowledgements happen during {@code finalizeCheckpoint}, prefetch + * would have to be set to the maximum number of messages Beam *could* process before {@code + * finalizeCheckpitn} is called, which is unknowable. + * + * In effect, this version will use less memory than the original but result in lower throughput + * among any single beam shard. + * + * @see + * "https://github.com/rabbitmq/rabbitmq-java-client/blob/4.x.x-stable/src/main/java/com/rabbitmq/client/QueueingConsumer.java; + * for the original QueuingConsumer + */ +public class SingleQueueingConsumer extends DefaultConsumer { + private final BlockingDeque queue = new LinkedBlockingDeque<>(1); + private final long offerTimeoutMillis; + + private static final int DEFAULT_DELIVER_TIMEOUT_MINUTES = 10; + + private volatile @Nullable ShutdownSignalException shutdown = null; + private volatile boolean cancelled = false; + + /** + * Default implementation which will block delivering messages for up to 10 minutes if the queue + * is full. + */ + public SingleQueueingConsumer(Channel channel) { +this(channel, DEFAULT_DELIVER_TIMEOUT_MINUTES, TimeUnit.MINUTES); + } + + /** + * Blocks deliveries for the supplied length of time when the queue is full. + * + * @param channel + * @param deliverTimeout + * @param deliverTimeoutUnits + */ + public SingleQueueingConsumer( + Channel channel, long deliverTimeout, TimeUnit deliverTimeoutUnits) { +
[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API
[ https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337871 ] ASF GitHub Bot logged work on BEAM-7434: Author: ASF GitHub Bot Created on: 03/Nov/19 23:28 Start Date: 03/Nov/19 23:28 Worklog Time Spent: 10m Work Description: drobert commented on pull request #9977: [BEAM-7434] [BEAM-5895] and [BEAM-5894] Fix upgrade to rabbit amqp-client 5.x URL: https://github.com/apache/beam/pull/9977#discussion_r341879939 ## File path: sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqMessage.java ## @@ -19,8 +19,8 @@ import com.rabbitmq.client.AMQP; import com.rabbitmq.client.AMQP.BasicProperties; +import com.rabbitmq.client.Delivery; Review comment: In the 4.x client this was `QueueingConsumer.Delivery`. In 5.x this is a stand-alone class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337871) Time Spent: 1h 20m (was: 1h 10m) > RabbitMqIO uses a deprecated API > > > Key: BEAM-7434 > URL: https://issues.apache.org/jira/browse/BEAM-7434 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Reporter: Nicolas Delsaux >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > The RabbitMqIo class reader (UnboundedRabbitMqReader) uses the > QueueingConsumer, which is denoted as deprecated on RabbitMq side. RabbitMqIo > should replace this consumer with the DefaultConsumer provided by RabbitMq. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API
[ https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337869 ] ASF GitHub Bot logged work on BEAM-7434: Author: ASF GitHub Bot Created on: 03/Nov/19 23:26 Start Date: 03/Nov/19 23:26 Worklog Time Spent: 10m Work Description: drobert commented on pull request #9977: [BEAM-7434] [BEAM-5895] and [BEAM-5894] Fix upgrade to rabbit amqp-client 5.x URL: https://github.com/apache/beam/pull/9977#discussion_r341879842 ## File path: sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java ## @@ -528,12 +533,13 @@ public boolean advance() throws IOException { try { Channel channel = connectionHandler.getChannel(); // we consume message without autoAck (we want to do the ack ourselves) -GetResponse delivery = channel.basicGet(queueName, false); +Delivery delivery = consumer.poll(1, TimeUnit.SECONDS); Review comment: Restoring the effective functionality prior to https://github.com/apache/beam/pull/9900 where this will wait a maximum of 1 second. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337869) Time Spent: 1h 10m (was: 1h) > RabbitMqIO uses a deprecated API > > > Key: BEAM-7434 > URL: https://issues.apache.org/jira/browse/BEAM-7434 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Reporter: Nicolas Delsaux >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > The RabbitMqIo class reader (UnboundedRabbitMqReader) uses the > QueueingConsumer, which is denoted as deprecated on RabbitMq side. RabbitMqIo > should replace this consumer with the DefaultConsumer provided by RabbitMq. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API
[ https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337865 ] ASF GitHub Bot logged work on BEAM-7434: Author: ASF GitHub Bot Created on: 03/Nov/19 23:25 Start Date: 03/Nov/19 23:25 Worklog Time Spent: 10m Work Description: drobert commented on pull request #9977: [BEAM-7434] [BEAM-5895] and [BEAM-5894] Fix upgrade to rabbit amqp-client 5.x URL: https://github.com/apache/beam/pull/9977 In https://github.com/apache/beam/pull/9900 I migrated the rabbitmq amqp client from the 4.x line to the 5.x line. This upgrade removed the deprecated `QueueingConsumer` so I switched to a pull-based rather than push-based api. It turns out this change had a semantic difference. The `channel.basicGet` (pull) is a blocking call. If no messages are delivered to the caller for a long period of time (for example, if no messages have been published to rabbit for some time), the Runner may detect that the reader is stuck and fail with: > java.io.IOException: Failed to advance source: ... So it seems it's imperative to use a non-blocking read. In this PR, Implement a slightly simpler `QueueingConsumer` replacement. This version uses a single-element `LinkedBlockingQueue` for fairness; that is if N consumers are used (from `split`) then each will have at most one unprocessed message. This exposes a `poll` API to the caller such that effectively we have a pull-based API with get-with-timeout semantics but using a push-based API. Note that amqp `prefetch` cannot be used to simulate this. While prefetch *will* cap the number of messages to delivered to the client at once, it will refuse to deliver any more until the delivered messages have been `ack`d, which we cannot guarantee as we don't `ack` until `finalizeCheckpoint`. I've chosen to allow the Consumer to block for up to 10 minutes, which means that if more than 10 minutes elapse between successive calls to `advance` this will fail. I'm not entirely certain on the upstream impact of this, although it is something that will be handled by the configured channel `ExceptionHandler`, the [default implementation](https://github.com/rabbitmq/rabbitmq-java-client/blob/master/src/main/java/com/rabbitmq/client/impl/StrictExceptionHandler.java#L61) of which will close the Connection. It's unclear to me if we want to modify this, extend the time, make the time configurable, or leave it as is. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build
[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
[ https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337862 ] ASF GitHub Bot logged work on BEAM-8347: Author: ASF GitHub Bot Created on: 03/Nov/19 23:10 Start Date: 03/Nov/19 23:10 Worklog Time Spent: 10m Work Description: drobert commented on issue #9820: [BEAM-8347]: Consistently advance UnboundedRabbitMqReader watermark URL: https://github.com/apache/beam/pull/9820#issuecomment-549189280 Update: after running some tests, the migration to the pull-based API in https://github.com/apache/beam/pull/9900 proves to be unsafe for this use case. I misread or misunderstood the semantics but it seems `channel.basicGet` blocks indefinitely, as opposed to returning immediately if there are no messages. This means if no new messages come in for a long period of time, the Runner will detect that the reader is 'stuck' since it's been on a single blocking call for so many minutes. Basically, we need to revert to a push-based-but-blocking API for this. *However* that is a separate issue from the watermarks advancing. I don't think that broken merge should block this PR once rebased. I'll have a separate PR to fix the other up shortly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337862) Time Spent: 2h 50m (was: 2h 40m) > UnboundedRabbitMqReader can fail to advance watermark if no new data comes in > - > > Key: BEAM-8347 > URL: https://issues.apache.org/jira/browse/BEAM-8347 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Affects Versions: 2.15.0 > Environment: testing has been done using the DirectRunner. I also > have DataflowRunner available >Reporter: Daniel Robert >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > I stumbled upon this and then saw a similar StackOverflow post: > [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance] > When calling `advance()` if there are no messages, no state changes, > including no changes to the CheckpointMark or Watermark. If there is a > relatively constant rate of new messages coming in, this is not a problem. If > data is bursty, and there are periods of no new messages coming in, the > watermark will never advance. > Contrast this with some of the logic in PubsubIO which will make provisions > for periods of inactivity to advance the watermark (although it, too, is > imperfect: https://issues.apache.org/jira/browse/BEAM-7322 ) > The example given in the StackOverflow post is something like this: > > {code:java} > pipeline > .apply(RabbitMqIO.read() > .withUri("amqp://guest:guest@localhost:5672") > .withQueue("test") > .apply("Windowing", > Window.into( > FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow()) > .withAllowedLateness(Duration.ZERO) > .accumulatingFiredPanes()){code} > If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a > window that never performs an on time trigger. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-7278) Upgrade some Beam dependencies
[ https://issues.apache.org/jira/browse/BEAM-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mujuzi Moses reassigned BEAM-7278: -- Assignee: Mujuzi Moses > Upgrade some Beam dependencies > -- > > Key: BEAM-7278 > URL: https://issues.apache.org/jira/browse/BEAM-7278 > Project: Beam > Issue Type: Task > Components: dependencies >Reporter: Etienne Chauchot >Assignee: Mujuzi Moses >Priority: Critical > > Some dependencies need to be upgraded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read
[ https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=337816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337816 ] ASF GitHub Bot logged work on BEAM-8382: Author: ASF GitHub Bot Created on: 03/Nov/19 11:04 Start Date: 03/Nov/19 11:04 Worklog Time Spent: 10m Work Description: jfarr commented on issue #9765: [WIP][BEAM-8382] Add rate limit policy to KinesisIO.Read URL: https://github.com/apache/beam/pull/9765#issuecomment-549125609 Hi @aromanenko-dev, I replaced the polling interval with a rate limit policy configured with the following methods: `withBackoffRateLimitPolicy(FluentBackoff fluentBackoff)` Specifies the rate limit policy as BackoffRateLimiter. `withFixedDelayRateLimitPolicy()` Specifies the rate limit policy as FixedDelayRateLimiter with the default delay of 1 second. `withFixedDelayRateLimitPolicy(Duration delay)` Specifies the rate limit policy as FixedDelayRateLimiter with the given delay. `withCustomRateLimitPolicy(RateLimitPolicyFactory rateLimitPolicyFactory)` Specifies the RateLimitPolicyFactory for a custom rate limiter. The default is a no limiter policy that preserves the behavior of the current implementation. wdyt? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337816) Time Spent: 5.5h (was: 5h 20m) > Add polling interval to KinesisIO.Read > -- > > Key: BEAM-8382 > URL: https://issues.apache.org/jira/browse/BEAM-8382 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Affects Versions: 2.13.0, 2.14.0, 2.15.0 >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > With the current implementation we are observing Kinesis throttling due to > ReadProvisionedThroughputExceeded on the order of hundreds of times per > second, regardless of the actual Kinesis throughput. This is because the > ShardReadersPool readLoop() method is polling getRecords() as fast as > possible. > From the KDS documentation: > {quote}Each shard can support up to five read transactions per second. > {quote} > and > {quote}For best results, sleep for at least 1 second (1,000 milliseconds) > between calls to getRecords to avoid exceeding the limit on getRecords > frequency. > {quote} > [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html] > [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites
[ https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337779 ] ASF GitHub Bot logged work on BEAM-8491: Author: ASF GitHub Bot Created on: 03/Nov/19 07:19 Start Date: 03/Nov/19 07:19 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9912: [BEAM-8491] Add ability for replacing transforms with multiple outputs URL: https://github.com/apache/beam/pull/9912 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337779) Time Spent: 1.5h (was: 1h 20m) > Add ability for multiple output PCollections from composites > > > Key: BEAM-8491 > URL: https://issues.apache.org/jira/browse/BEAM-8491 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > The Python SDK has DoOutputTuples which allows for a single transform to have > multiple outputs. However, this does not include the ability for a composite > transform to have multiple outputs PCollections from different transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8467) Enable reading compressed files with Python fileio
[ https://issues.apache.org/jira/browse/BEAM-8467?focusedWorklogId=337780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337780 ] ASF GitHub Bot logged work on BEAM-8467: Author: ASF GitHub Bot Created on: 03/Nov/19 07:19 Start Date: 03/Nov/19 07:19 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9861: [BEAM-8467] Enabling reading compressed files URL: https://github.com/apache/beam/pull/9861 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337780) Time Spent: 1h 10m (was: 1h) > Enable reading compressed files with Python fileio > -- > > Key: BEAM-8467 > URL: https://issues.apache.org/jira/browse/BEAM-8467 > Project: Beam > Issue Type: Improvement > Components: io-py-files >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.
[ https://issues.apache.org/jira/browse/BEAM-8544?focusedWorklogId=337778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337778 ] ASF GitHub Bot logged work on BEAM-8544: Author: ASF GitHub Bot Created on: 03/Nov/19 06:59 Start Date: 03/Nov/19 06:59 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9966: [BEAM-8544] Use ccache for compiling the Beam Python SDK. URL: https://github.com/apache/beam/pull/9966#issuecomment-549110720 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337778) Time Spent: 1h (was: 50m) > Install Beam SDK with ccache for faster re-install. > --- > > Key: BEAM-8544 > URL: https://issues.apache.org/jira/browse/BEAM-8544 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Re-compliling the C modules of the SDK takes 2-3 minutes. This adds to worker > startup time whenever a custom SDK is being used (in particular, during > development and testing). We can use ccache to re-use the old compile results > when the Cython files have not changed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns
[ https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=33=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-33 ] ASF GitHub Bot logged work on BEAM-8435: Author: ASF GitHub Bot Created on: 03/Nov/19 06:59 Start Date: 03/Nov/19 06:59 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9836: [BEAM-8435] Implement PaneInfo computation for Python. URL: https://github.com/apache/beam/pull/9836#issuecomment-549110698 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 33) Time Spent: 1h 50m (was: 1h 40m) > Allow access to PaneInfo from Python DoFns > -- > > Key: BEAM-8435 > URL: https://issues.apache.org/jira/browse/BEAM-8435 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > PaneInfoParam exists, but the plumbing to actually populate it at runtime was > never added. (Nor, clearly, were any tests...) -- This message was sent by Atlassian Jira (v8.3.4#803005)