date:20191103

[jira] [Work logged] (BEAM-7303) Move Portable Runner and other of reference runner.

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7303?focusedWorklogId=338008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338008
 ]

ASF GitHub Bot logged work on BEAM-7303:


Author: ASF GitHub Bot
Created on: 04/Nov/19 07:31
Start Date: 04/Nov/19 07:31
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #9936: [BEAM-7303] Move 
PortableRunner from runners.reference to java.sdks.portability package
URL: https://github.com/apache/beam/pull/9936#issuecomment-549246286
 
 
   @angoenka , @mxm - thanks for the review, you both asked the same question :)
   My reasoning was that since technically, `PortableRunner` isn't a runner, it 
would be better to have it in SDK package. 
   
   It's also analogous to how Python classes are organized - Python 
PortableRunner is in `sdks/python` and there are no non-Java runners in 
`runners` package.
   
   I can move the runner into `runners.portability` if you think it makes more 
sense. I'm ok with either.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338008)
Time Spent: 40m  (was: 0.5h)

> Move Portable Runner and other of reference runner.
> ---
>
> Key: BEAM-7303
> URL: https://issues.apache.org/jira/browse/BEAM-7303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> PortableRunner is used by all Flink, Spark ... . 
> We should move it out of Reference Runner package to stream line the 
> dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=337934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337934
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 04/Nov/19 03:49
Start Date: 04/Nov/19 03:49
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9188: [BEAM-7886] Make row 
coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-549217718
 
 
   Excited to see this happening. Perhaps a good time to squash the fixup 
commits?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337934)
Time Spent: 14h 50m  (was: 14h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=337933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337933
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 04/Nov/19 03:40
Start Date: 04/Nov/19 03:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-549216682
 
 
   This is pretty much the same PR as before, except it checks the runner via 
string matching. It LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337933)
Time Spent: 5.5h  (was: 5h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8374) PublishResult returned by SnsIO is missing sdkResponseMetadata and sdkHttpMetadata

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8374?focusedWorklogId=337923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337923
 ]

ASF GitHub Bot logged work on BEAM-8374:


Author: ASF GitHub Bot
Created on: 04/Nov/19 02:49
Start Date: 04/Nov/19 02:49
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9758: [BEAM-8374] Fixes bug 
in SnsIO PublishResultCoder
URL: https://github.com/apache/beam/pull/9758#issuecomment-549210224
 
 
   @lukecwik @iemejia I reverted the original coder and added a new "full" (all 
fields) and "minimal" (no HTTP headers) coder, and also factored out new coders 
for the common SDK objects. If this looks more like it then I'll flesh out the 
unit tests, please let me know.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337923)
Time Spent: 3h  (was: 2h 50m)

> PublishResult returned by SnsIO is missing sdkResponseMetadata and 
> sdkHttpMetadata
> --
>
> Key: BEAM-8374
> URL: https://issues.apache.org/jira/browse/BEAM-8374
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently the PublishResultCoder in SnsIO only serializes the messageId field 
> so the PublishResult returned by Beam returns null for 
> getSdkResponseMetadata() and getSdkHttpMetadata(). This makes it impossible 
> to check the HTTP status for errors, which is necessary since this is not 
> handled in SnsIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems

2019-11-03 Thread Chad Dombrova (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966373#comment-16966373
 ] 

Chad Dombrova commented on BEAM-7870:
-

Note that there is not yet support in python for registering a native type to 
replace the row, as there is in Java.  In the example above we'd end up with 
the {{TypedDict}} subclass instead of {{gcp.pubsub.PubsubMessage}}.  This could 
be a follow up task to row coder support.  [~bhulette], do we have a Jira for 
registering native types for row converters yet?

 

> Externally configured KafkaIO / PubsubIO consumer causes coder problems
> ---
>
> Key: BEAM-7870
> URL: https://issues.apache.org/jira/browse/BEAM-7870
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-java-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>
> There are limitations for the consumer to work correctly. The biggest issue 
> is the structure of KafkaIO itself, which uses a combination of the source 
> interface and DoFns to generate the desired output. The problem is that the 
> source interface is natively translated by the Flink Runner to support 
> unbounded sources in portability, while the DoFn runs in a Java environment.
> To transfer data between the two a coder needs to be involved. It happens to 
> be that the initial read does not immediately drop the KafakRecord structure 
> which does not work together well with our current assumption of only 
> supporting "standard coders" present in all SDKs. Only the subsequent DoFn 
> converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn 
> won't have the coder available in its environment.
> There are several possible solutions:
>  1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in 
> the Flink Runner
>  2. Modify KafkaIO to immediately drop the KafkaRecord structure
>  3. Add the KafkaRecordCoder to all SDKs
>  4. Add a generic coder, e.g. AvroCoder to all SDKs
> For a workaround which uses (3), please see this patch which is not a proper 
> fix but adds KafkaRecordCoder to the SDK such that it can be used 
> encode/decode records: 
> [https://github.com/mxm/beam/commit/b31cf99c75b3972018180d8ccc7e73d311f4cfed]
>  
> See also 
> [https://github.com/apache/beam/pull/8251|https://github.com/apache/beam/pull/8251:]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems

2019-11-03 Thread Thomas Weise (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966364#comment-16966364
 ] 

Thomas Weise edited comment on BEAM-7870 at 11/4/19 2:25 AM:
-

Thanks for the update. I would also prefer the specific record type 
(PubsubMessage/KafkaRecord) over generic Row and this approach is a nice 
compromise.


was (Author: thw):
Thanks for the update. I would also prefer the specific record type 
(PubsubMessage/KafkaRecord) over generic Row.

> Externally configured KafkaIO / PubsubIO consumer causes coder problems
> ---
>
> Key: BEAM-7870
> URL: https://issues.apache.org/jira/browse/BEAM-7870
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-java-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>
> There are limitations for the consumer to work correctly. The biggest issue 
> is the structure of KafkaIO itself, which uses a combination of the source 
> interface and DoFns to generate the desired output. The problem is that the 
> source interface is natively translated by the Flink Runner to support 
> unbounded sources in portability, while the DoFn runs in a Java environment.
> To transfer data between the two a coder needs to be involved. It happens to 
> be that the initial read does not immediately drop the KafakRecord structure 
> which does not work together well with our current assumption of only 
> supporting "standard coders" present in all SDKs. Only the subsequent DoFn 
> converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn 
> won't have the coder available in its environment.
> There are several possible solutions:
>  1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in 
> the Flink Runner
>  2. Modify KafkaIO to immediately drop the KafkaRecord structure
>  3. Add the KafkaRecordCoder to all SDKs
>  4. Add a generic coder, e.g. AvroCoder to all SDKs
> For a workaround which uses (3), please see this patch which is not a proper 
> fix but adds KafkaRecordCoder to the SDK such that it can be used 
> encode/decode records: 
> [https://github.com/mxm/beam/commit/b31cf99c75b3972018180d8ccc7e73d311f4cfed]
>  
> See also 
> [https://github.com/apache/beam/pull/8251|https://github.com/apache/beam/pull/8251:]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems

2019-11-03 Thread Thomas Weise (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966364#comment-16966364
 ] 

Thomas Weise commented on BEAM-7870:


Thanks for the update. I would also prefer the specific record type 
(PubsubMessage/KafkaRecord) over generic Row.

> Externally configured KafkaIO / PubsubIO consumer causes coder problems
> ---
>
> Key: BEAM-7870
> URL: https://issues.apache.org/jira/browse/BEAM-7870
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-java-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>
> There are limitations for the consumer to work correctly. The biggest issue 
> is the structure of KafkaIO itself, which uses a combination of the source 
> interface and DoFns to generate the desired output. The problem is that the 
> source interface is natively translated by the Flink Runner to support 
> unbounded sources in portability, while the DoFn runs in a Java environment.
> To transfer data between the two a coder needs to be involved. It happens to 
> be that the initial read does not immediately drop the KafakRecord structure 
> which does not work together well with our current assumption of only 
> supporting "standard coders" present in all SDKs. Only the subsequent DoFn 
> converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn 
> won't have the coder available in its environment.
> There are several possible solutions:
>  1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in 
> the Flink Runner
>  2. Modify KafkaIO to immediately drop the KafkaRecord structure
>  3. Add the KafkaRecordCoder to all SDKs
>  4. Add a generic coder, e.g. AvroCoder to all SDKs
> For a workaround which uses (3), please see this patch which is not a proper 
> fix but adds KafkaRecordCoder to the SDK such that it can be used 
> encode/decode records: 
> [https://github.com/mxm/beam/commit/b31cf99c75b3972018180d8ccc7e73d311f4cfed]
>  
> See also 
> [https://github.com/apache/beam/pull/8251|https://github.com/apache/beam/pull/8251:]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337883
 ]

ASF GitHub Bot logged work on BEAM-8347:


Author: ASF GitHub Bot
Created on: 03/Nov/19 23:56
Start Date: 03/Nov/19 23:56
Worklog Time Spent: 10m 
  Work Description: drobert commented on issue #9820: [BEAM-8347]: 
Consistently advance UnboundedRabbitMqReader watermark
URL: https://github.com/apache/beam/pull/9820#issuecomment-549192723
 
 
   @jkff This has been rebased and should be ready to merge. @jbonofre you have 
requested changes that I believe @jkff and I disagree with. Can you comment or 
remove the request?
   
   I did make one notable change since the last review which is I now explicit 
set `current`, `currentTimestamp` and `currentRecordId` to `null` per this 
javadoc in `Source` for each corresponding getter:
   
   ```java
   @throws NoSuchElementException 
  ... if the last {@link #start} or {@link#advance} returned @code false}.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337883)
Time Spent: 3.5h  (was: 3h 20m)

> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>   FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.ZERO)
> .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337882
 ]

ASF GitHub Bot logged work on BEAM-8347:


Author: ASF GitHub Bot
Created on: 03/Nov/19 23:54
Start Date: 03/Nov/19 23:54
Worklog Time Spent: 10m 
  Work Description: drobert commented on issue #9820: [BEAM-8347]: 
Consistently advance UnboundedRabbitMqReader watermark
URL: https://github.com/apache/beam/pull/9820#issuecomment-549192723
 
 
   @jkff This has been rebased and should be ready to merge.
   
   I did make one notable change since the last review which is I now explicit 
set `current`, `currentTimestamp` and `currentRecordId` to `null` per this 
javadoc in `Source` for each corresponding getter:
   
   ```java
   @throws NoSuchElementException 
  ... if the last {@link #start} or {@link#advance} returned @code false}.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337882)
Time Spent: 3h 20m  (was: 3h 10m)

> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>   FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.ZERO)
> .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337881
 ]

ASF GitHub Bot logged work on BEAM-8347:


Author: ASF GitHub Bot
Created on: 03/Nov/19 23:49
Start Date: 03/Nov/19 23:49
Worklog Time Spent: 10m 
  Work Description: drobert commented on pull request #9820: [BEAM-8347]: 
Consistently advance UnboundedRabbitMqReader watermark
URL: https://github.com/apache/beam/pull/9820#discussion_r341881187
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java
 ##
 @@ -530,6 +543,10 @@ public boolean advance() throws IOException {
 // we consume message without autoAck (we want to do the ack ourselves)
 GetResponse delivery = channel.basicGet(queueName, false);
 if (delivery == null) {
+  current = null;
 
 Review comment:
   Notable change since the last round of reviews: The documentation for 
`Source`'s `getCurrent`, `getCurrentRecordId` and `getCurrentTimestamp` include 
the following:
   
   >  * @throws NoSuchElementException if the reader is at the beginning of 
the input and {@link
   > * #start} or {@link #advance} wasn't called, or if the last {@link 
#start} or {@link
   > * #advance} returned {@code false}.
   
   This states that if `advance()` returns `false`, subsequent calls to these 
should throw `NoSuchElementException`. As such, I'm explicitly setting these to 
`null`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337881)
Time Spent: 3h 10m  (was: 3h)

> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>   FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.ZERO)
> .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337875
 ]

ASF GitHub Bot logged work on BEAM-8347:


Author: ASF GitHub Bot
Created on: 03/Nov/19 23:34
Start Date: 03/Nov/19 23:34
Worklog Time Spent: 10m 
  Work Description: drobert commented on issue #9820: [BEAM-8347]: 
Consistently advance UnboundedRabbitMqReader watermark
URL: https://github.com/apache/beam/pull/9820#issuecomment-549189280
 
 
   Update: after running some tests, the migration to the pull-based API in 
https://github.com/apache/beam/pull/9900 proves to be unsafe for this use case. 
I misread or misunderstood the semantics but it seems `channel.basicGet` blocks 
indefinitely, as opposed to returning immediately if there are no messages. 
This means if no new messages come in for a long period of time, the Runner 
will detect that the reader is 'stuck' since it's been on a single blocking 
call for so many minutes. Basically, we need to revert to a 
push-based-but-blocking API for this.
   
   *However* that is a separate issue from the watermarks advancing. I don't 
think that broken merge should block this PR once rebased. PR for that issue is 
here: https://github.com/apache/beam/pull/9977 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337875)
Time Spent: 3h  (was: 2h 50m)

> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>   FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.ZERO)
> .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337873
 ]

ASF GitHub Bot logged work on BEAM-7434:


Author: ASF GitHub Bot
Created on: 03/Nov/19 23:32
Start Date: 03/Nov/19 23:32
Worklog Time Spent: 10m 
  Work Description: drobert commented on pull request #9977: [BEAM-7434] 
[BEAM-5895] and [BEAM-5894] Fix upgrade to rabbit amqp-client 5.x
URL: https://github.com/apache/beam/pull/9977#discussion_r341880189
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/SingleQueueingConsumer.java
 ##
 @@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.rabbitmq;
+
+import com.rabbitmq.client.AMQP;
+import com.rabbitmq.client.Channel;
+import com.rabbitmq.client.ConsumerCancelledException;
+import com.rabbitmq.client.DefaultConsumer;
+import com.rabbitmq.client.Delivery;
+import com.rabbitmq.client.Envelope;
+import com.rabbitmq.client.ShutdownSignalException;
+import java.io.IOException;
+import java.util.concurrent.BlockingDeque;
+import java.util.concurrent.LinkedBlockingDeque;
+import java.util.concurrent.TimeUnit;
+import javax.annotation.Nullable;
+
+/**
+ * An implementation of a Consumer (push-based api for rabbit) that accepts at 
most one message at a
+ * time from the server and allows the delivered message to be polled by a 
caller with a timeout.
+ *
+ * Because AMQP client 5.x removed the deprecated-in-4.x QueueingConsumer 
this is effectively a
+ * re-implementation of the 4.x line's with a few changes notable changes:
+ *
+ * 
+ *   The original was designed to solve a threading concern which no 
longer applies. The poison
+ *   message and some other workarounds should not be relevant.
+ *   The original used an unbounded LinkedBlockingQueue. Delivery of 
messages among multiple
+ *   Beam shards will be more 'fair' if each has only a single unprocessed 
message. This
+ *   implementation limits enqueued messages to a single one.
+ *   Channel thread will block for up to a configurable length of time 
(default: 10 minutes) for
+ *   the queue to have an open slot. Previous it would nominally fail 
immediately, which would
+ *   never happen as the queue was unbounded.
+ *   Exceptions have been simplified to only expose IOException to callers.
+ * 
+ *
+ * Note: setting amqp prefetch to 1 does not accomplish the same thing. 
Prefetch limits the
+ * number of messages that will be sent to the consumer (good) *which have not 
been acknowledged
+ * yet* (bad for Beam). Because acknowledgements happen during {@code 
finalizeCheckpoint}, prefetch
+ * would have to be set to the maximum number of messages Beam *could* process 
before {@code
+ * finalizeCheckpitn} is called, which is unknowable.
+ *
+ * In effect, this version will use less memory than the original but 
result in lower throughput
+ * among any single beam shard.
+ *
+ * @see
+ * 
"https://github.com/rabbitmq/rabbitmq-java-client/blob/4.x.x-stable/src/main/java/com/rabbitmq/client/QueueingConsumer.java;
+ * for the original QueuingConsumer
+ */
+public class SingleQueueingConsumer extends DefaultConsumer {
+  private final BlockingDeque queue = new LinkedBlockingDeque<>(1);
+  private final long offerTimeoutMillis;
+
+  private static final int DEFAULT_DELIVER_TIMEOUT_MINUTES = 10;
+
+  private volatile @Nullable ShutdownSignalException shutdown = null;
+  private volatile boolean cancelled = false;
+
+  /**
+   * Default implementation which will block delivering messages for up to 10 
minutes if the queue
+   * is full.
+   */
+  public SingleQueueingConsumer(Channel channel) {
+this(channel, DEFAULT_DELIVER_TIMEOUT_MINUTES, TimeUnit.MINUTES);
+  }
+
+  /**
+   * Blocks deliveries for the supplied length of time when the queue is full.
+   *
+   * @param channel
+   * @param deliverTimeout
+   * @param deliverTimeoutUnits
+   */
+  public SingleQueueingConsumer(
+  Channel channel, long deliverTimeout, TimeUnit deliverTimeoutUnits) {
+

[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337871
 ]

ASF GitHub Bot logged work on BEAM-7434:


Author: ASF GitHub Bot
Created on: 03/Nov/19 23:28
Start Date: 03/Nov/19 23:28
Worklog Time Spent: 10m 
  Work Description: drobert commented on pull request #9977: [BEAM-7434] 
[BEAM-5895] and [BEAM-5894] Fix upgrade to rabbit amqp-client 5.x
URL: https://github.com/apache/beam/pull/9977#discussion_r341879939
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqMessage.java
 ##
 @@ -19,8 +19,8 @@
 
 import com.rabbitmq.client.AMQP;
 import com.rabbitmq.client.AMQP.BasicProperties;
+import com.rabbitmq.client.Delivery;
 
 Review comment:
   In the 4.x client this was `QueueingConsumer.Delivery`. In 5.x this is a 
stand-alone class.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337871)
Time Spent: 1h 20m  (was: 1h 10m)

> RabbitMqIO uses a deprecated API
> 
>
> Key: BEAM-7434
> URL: https://issues.apache.org/jira/browse/BEAM-7434
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Reporter: Nicolas Delsaux
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The RabbitMqIo class reader (UnboundedRabbitMqReader) uses the 
> QueueingConsumer, which is denoted as deprecated on RabbitMq side. RabbitMqIo 
> should replace this consumer with the DefaultConsumer provided by RabbitMq.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337869
 ]

ASF GitHub Bot logged work on BEAM-7434:


Author: ASF GitHub Bot
Created on: 03/Nov/19 23:26
Start Date: 03/Nov/19 23:26
Worklog Time Spent: 10m 
  Work Description: drobert commented on pull request #9977: [BEAM-7434] 
[BEAM-5895] and [BEAM-5894] Fix upgrade to rabbit amqp-client 5.x
URL: https://github.com/apache/beam/pull/9977#discussion_r341879842
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java
 ##
 @@ -528,12 +533,13 @@ public boolean advance() throws IOException {
   try {
 Channel channel = connectionHandler.getChannel();
 // we consume message without autoAck (we want to do the ack ourselves)
-GetResponse delivery = channel.basicGet(queueName, false);
+Delivery delivery = consumer.poll(1, TimeUnit.SECONDS);
 
 Review comment:
   Restoring the effective functionality prior to 
https://github.com/apache/beam/pull/9900 where this will wait a maximum of 1 
second.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337869)
Time Spent: 1h 10m  (was: 1h)

> RabbitMqIO uses a deprecated API
> 
>
> Key: BEAM-7434
> URL: https://issues.apache.org/jira/browse/BEAM-7434
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Reporter: Nicolas Delsaux
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The RabbitMqIo class reader (UnboundedRabbitMqReader) uses the 
> QueueingConsumer, which is denoted as deprecated on RabbitMq side. RabbitMqIo 
> should replace this consumer with the DefaultConsumer provided by RabbitMq.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337865
 ]

ASF GitHub Bot logged work on BEAM-7434:


Author: ASF GitHub Bot
Created on: 03/Nov/19 23:25
Start Date: 03/Nov/19 23:25
Worklog Time Spent: 10m 
  Work Description: drobert commented on pull request #9977: [BEAM-7434] 
[BEAM-5895] and [BEAM-5894] Fix upgrade to rabbit amqp-client 5.x
URL: https://github.com/apache/beam/pull/9977
 
 
   In https://github.com/apache/beam/pull/9900 I migrated the rabbitmq amqp 
client from the 4.x line to the 5.x line. This upgrade removed the deprecated 
`QueueingConsumer` so I switched to a pull-based rather than push-based api.
   
   It turns out this change had a semantic difference. The `channel.basicGet` 
(pull) is a blocking call. If no messages are delivered to the caller for a 
long period of time (for example, if no messages have been published to rabbit 
for some time), the Runner may detect that the reader is stuck and fail with:
   
   >  java.io.IOException: Failed to advance source: ...
   
   So it seems it's imperative to use a non-blocking read.
   
   In this PR, Implement a slightly simpler `QueueingConsumer` replacement. 
This version uses a single-element `LinkedBlockingQueue` for fairness; that is 
if N consumers are used (from `split`) then each will have at most one 
unprocessed message. This exposes a `poll` API to the caller such that 
effectively we have a pull-based API with get-with-timeout semantics but using 
a push-based API.
   
   Note that amqp `prefetch` cannot be used to simulate this. While prefetch 
*will* cap the number of messages to delivered to the client at once, it will 
refuse to deliver any more until the delivered messages have been `ack`d, which 
we cannot guarantee as we don't `ack` until `finalizeCheckpoint`.
   
   I've chosen to allow the Consumer to block for up to 10 minutes, which means 
that if more than 10 minutes elapse between successive calls to `advance` this 
will fail. I'm not entirely certain on the upstream impact of this, although it 
is something that will be handled by the configured channel `ExceptionHandler`, 
the [default 
implementation](https://github.com/rabbitmq/rabbitmq-java-client/blob/master/src/main/java/com/rabbitmq/client/impl/StrictExceptionHandler.java#L61)
 of which will close the Connection. It's unclear to me if we want to modify 
this, extend the time, make the time configurable, or leave it as is.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build

[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337862
 ]

ASF GitHub Bot logged work on BEAM-8347:


Author: ASF GitHub Bot
Created on: 03/Nov/19 23:10
Start Date: 03/Nov/19 23:10
Worklog Time Spent: 10m 
  Work Description: drobert commented on issue #9820: [BEAM-8347]: 
Consistently advance UnboundedRabbitMqReader watermark
URL: https://github.com/apache/beam/pull/9820#issuecomment-549189280
 
 
   Update: after running some tests, the migration to the pull-based API in 
https://github.com/apache/beam/pull/9900 proves to be unsafe for this use case. 
I misread or misunderstood the semantics but it seems `channel.basicGet` blocks 
indefinitely, as opposed to returning immediately if there are no messages. 
This means if no new messages come in for a long period of time, the Runner 
will detect that the reader is 'stuck' since it's been on a single blocking 
call for so many minutes. Basically, we need to revert to a 
push-based-but-blocking API for this.
   
   *However* that is a separate issue from the watermarks advancing. I don't 
think that broken merge should block this PR once rebased. I'll have a separate 
PR to fix the other up shortly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337862)
Time Spent: 2h 50m  (was: 2h 40m)

> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>   FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.ZERO)
> .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-7278) Upgrade some Beam dependencies

2019-11-03 Thread Mujuzi Moses (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mujuzi Moses reassigned BEAM-7278:
--

Assignee: Mujuzi Moses

> Upgrade some Beam dependencies
> --
>
> Key: BEAM-7278
> URL: https://issues.apache.org/jira/browse/BEAM-7278
> Project: Beam
>  Issue Type: Task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Assignee: Mujuzi Moses
>Priority: Critical
>
> Some dependencies need to be upgraded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=337816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337816
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 03/Nov/19 11:04
Start Date: 03/Nov/19 11:04
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [WIP][BEAM-8382] Add 
rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-549125609
 
 
   Hi @aromanenko-dev, I replaced the polling interval with a rate limit policy 
configured with the following methods:
   
   `withBackoffRateLimitPolicy(FluentBackoff fluentBackoff)`
   Specifies the rate limit policy as BackoffRateLimiter.
   
   `withFixedDelayRateLimitPolicy()`
   Specifies the rate limit policy as FixedDelayRateLimiter with the default 
delay of 1 second.
   
   `withFixedDelayRateLimitPolicy(Duration delay)`
   Specifies the rate limit policy as FixedDelayRateLimiter with the given 
delay.
   
   `withCustomRateLimitPolicy(RateLimitPolicyFactory rateLimitPolicyFactory)`
   Specifies the RateLimitPolicyFactory for a custom rate limiter.
   
   The default is a no limiter policy that preserves the behavior of the 
current implementation.
   
   wdyt?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337816)
Time Spent: 5.5h  (was: 5h 20m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337779
 ]

ASF GitHub Bot logged work on BEAM-8491:


Author: ASF GitHub Bot
Created on: 03/Nov/19 07:19
Start Date: 03/Nov/19 07:19
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9912: [BEAM-8491] 
Add ability for replacing transforms with multiple outputs
URL: https://github.com/apache/beam/pull/9912
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337779)
Time Spent: 1.5h  (was: 1h 20m)

> Add ability for multiple output PCollections from composites
> 
>
> Key: BEAM-8491
> URL: https://issues.apache.org/jira/browse/BEAM-8491
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The Python SDK has DoOutputTuples which allows for a single transform to have 
> multiple outputs. However, this does not include the ability for a composite 
> transform to have multiple outputs PCollections from different transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8467) Enable reading compressed files with Python fileio

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8467?focusedWorklogId=337780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337780
 ]

ASF GitHub Bot logged work on BEAM-8467:


Author: ASF GitHub Bot
Created on: 03/Nov/19 07:19
Start Date: 03/Nov/19 07:19
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9861: [BEAM-8467] 
Enabling reading compressed files
URL: https://github.com/apache/beam/pull/9861
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337780)
Time Spent: 1h 10m  (was: 1h)

> Enable reading compressed files with Python fileio
> --
>
> Key: BEAM-8467
> URL: https://issues.apache.org/jira/browse/BEAM-8467
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-files
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8544?focusedWorklogId=337778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337778
 ]

ASF GitHub Bot logged work on BEAM-8544:


Author: ASF GitHub Bot
Created on: 03/Nov/19 06:59
Start Date: 03/Nov/19 06:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9966: [BEAM-8544] Use 
ccache for compiling the Beam Python SDK.
URL: https://github.com/apache/beam/pull/9966#issuecomment-549110720
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337778)
Time Spent: 1h  (was: 50m)

> Install Beam SDK with ccache for faster re-install.
> ---
>
> Key: BEAM-8544
> URL: https://issues.apache.org/jira/browse/BEAM-8544
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Re-compliling the C modules of the SDK takes 2-3 minutes. This adds to worker 
> startup time whenever a custom SDK is being used (in particular, during 
> development and testing). We can use ccache to re-use the old compile results 
> when the Cython files have not changed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns

2019-11-03 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=33=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-33
 ]

ASF GitHub Bot logged work on BEAM-8435:


Author: ASF GitHub Bot
Created on: 03/Nov/19 06:59
Start Date: 03/Nov/19 06:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9836: [BEAM-8435] 
Implement PaneInfo computation for Python.
URL: https://github.com/apache/beam/pull/9836#issuecomment-549110698
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 33)
Time Spent: 1h 50m  (was: 1h 40m)

> Allow access to PaneInfo from Python DoFns
> --
>
> Key: BEAM-8435
> URL: https://issues.apache.org/jira/browse/BEAM-8435
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> PaneInfoParam exists, but the plumbing to actually populate it at runtime was 
> never added. (Nor, clearly, were any tests...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7303) Move Portable Runner and other of reference runner.

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

[jira] [Work logged] (BEAM-8374) PublishResult returned by SnsIO is missing sdkResponseMetadata and sdkHttpMetadata

[jira] [Commented] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems

[jira] [Comment Edited] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems

[jira] [Commented] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems

[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API

[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API

[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API

[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API

[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

[jira] [Assigned] (BEAM-7278) Upgrade some Beam dependencies

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites

[jira] [Work logged] (BEAM-8467) Enable reading compressed files with Python fileio

[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.

[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns

22 matches

Site Navigation

Mail list logo

Footer information