Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-19 Thread Maximilian Michels
That's even better. On 19.09.19 16:35, Robert Bradshaw wrote: On Thu, Sep 19, 2019 at 4:33 PM Maximilian Michels wrote: This is obviously less than ideal for the user... Should we "fix" the Java SDK? Of is the long-terms solution here to have runners do this rewrite? I think ideal would

Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-19 Thread Robert Bradshaw
On Thu, Sep 19, 2019 at 4:33 PM Maximilian Michels wrote: > > > This is obviously less than ideal for the user... Should we "fix" the > > Java SDK? Of is the long-terms solution here to have runners do this > > rewrite? > > I think ideal would be that the Runner adds the Impulse override. That >

Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-19 Thread Maximilian Michels
This is obviously less than ideal for the user... Should we "fix" the Java SDK? Of is the long-terms solution here to have runners do this rewrite? I think ideal would be that the Runner adds the Impulse override. That way also the Python SDK would not have to have separate code paths for

Re: Publish message to pubsub topic after processing current input in beam streaming pipeline

2019-09-19 Thread Robert Bradshaw
OK, as I read it get_responses yields several elements for a single input message. It sounds like you want to defer writing to PubSub until after all of them are processed. The easiest way to do this would be for get_responses to return a list of messages and send_to_output to process the whole

KinesisIO.write throws NPE during producer.flush();

2019-09-19 Thread Ankit Jhalaria
Hey beam devs, I am using beam 2.15 and while doing KinesisIO.write() getting a NPE. This is how I am using it: KinesisIO.write() .withStreamName(“streamName") .withPartitionKey("DEFAULT_PARTITION") .withAWSClientsProvider( “A”, “B”, Regions.US_WEST_2) More specifically

RE: Publish message to pubsub topic after processing current input in beam streaming pipeline

2019-09-19 Thread Anjana Pydi
Hi Robert, Thanks for the reply. 'get_responses' method return a list of dictionaries and 'send_to_output' method takes each element of the list and simultaneously posts it to an API end point. My question is I just need to send a signal(write a message 'process completed' ) to pubsub topic2

Re: Publish message to pubsub topic after processing current input in beam streaming pipeline

2019-09-19 Thread Robert Bradshaw
On Thu, Sep 19, 2019 at 11:05 AM Anjana Pydi wrote: > > Hi , > > I have a beam streaming pipeline which reads data from pubsub topic, use it > to call an API and get responses, apply some transformations on the obtained > responses and writes to output sinks. > > Now, I need to add logic to

No filesystem found for scheme s3 using FileIO

2019-09-19 Thread Koprivica,Preston Blake
Hello everyone. I’m getting the following error when attempting to use the FileIO apis (beam-2.15.0) and integrating with a 3rd party filesystem, in this case AWS S3: java.lang.IllegalArgumentException: No filesystem found for scheme s3 at

unsubscribe

2019-09-19 Thread airwaywong

Re: # of Readers MaxNumWorkers for UnboundedSource. Causing deadlocks

2019-09-19 Thread Jan Lukavský
You can ssh to the dataflow worker and investigate jstack of the harness. The principal source of blocking would be if you wait for any data. The logic should be implemented so that if there is no data available at the moment then just return false and don't wait for anything. Another

Re: Word-count example

2019-09-19 Thread Matthew Patterson
Re Kyle: Just so you may believe me, see below(virtualenv installed, error occurs nonetheless): currently I am trying to figure out if it is a difference between .bash_profile and .bashrc. Thanks, Matt “”” Caused by: net.rubygrapefruit.platform.NativeException: Could not start 'virtualenv'

Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-19 Thread Robert Bradshaw
On Thu, Sep 19, 2019 at 11:22 AM Maximilian Michels wrote: > > The flag is insofar relevant to the PortableRunner because it affects > the translation of the pipeline. Without the flag we will generate > primitive Reads which are unsupported in portability. The workaround we > have used so far is

Re: Word-count example

2019-09-19 Thread Matthew Patterson
Have done: hard to get gradle to respect env (conda? Bash? Fish? Zsh?): From: Kyle Weaver Reply-To: "user@beam.apache.org" Date: Thursday, September 19, 2019 at 2:29 PM To: "user@beam.apache.org" Subject: Re: Word-count example CAUTION: This email originated from outside of the organization.

Re: Word-count example

2019-09-19 Thread Kyle Weaver
I'm guessing you need to install virtualenv: `pip install virtualenv` Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com On Thu, Sep 19, 2019 at 11:27 AM Matthew Patterson wrote: > Kyle, > > > > Excellent, will do: unfortunately switch to 2.16 was only thing that fixed >

Re: Word-count example

2019-09-19 Thread Matthew Patterson
Kyle, Excellent, will do: unfortunately switch to 2.16 was only thing that fixed FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':sdks:python:setupVirtualenv'. > A problem occurred starting process 'command 'virtualenv'' …if you have suggestions

Re: Word-count example

2019-09-19 Thread Kyle Weaver
You should probably use 2.15, since 2.16 release artifacts have not been published yet. Just follow the instructions that say --runner=PortableRunner, not --runner=FlinkRunner, otherwise you'll hit that other deserialization bug that was mentioned.. Kyle Weaver | Software Engineer |

Re: Word-count example

2019-09-19 Thread Matthew Patterson
Kyle, Happily: should I be working from (source) release-2.15.0 or release-2.16.0: presumably still specifying flink-1.8. Matt From: Kyle Weaver Reply-To: "user@beam.apache.org" Date: Thursday, September 19, 2019 at 2:16 PM To: "user@beam.apache.org" Subject: Re: Word-count example CAUTION:

Re: Word-count example

2019-09-19 Thread Kyle Weaver
Re Benjamin & Ankur: I don't think `--experiments=beam_fn_api` applies here. If I understand correctly, Matthew is just trying to run an old-fashioned Beam Java jar, nothing to do with portability/Python, and judging by the stack trace provided, https://issues.apache.org/jira/browse/BEAM-8037 is

Publish message to pubsub topic after processing current input in beam streaming pipeline

2019-09-19 Thread Anjana Pydi
Hi , I have a beam streaming pipeline which reads data from pubsub topic, use it to call an API and get responses, apply some transformations on the obtained responses and writes to output sinks. Now, I need to add logic to write a 'process completed' message to another pubsub topic once

Re: Word-count example

2019-09-19 Thread Matthew Patterson
Thanks Ankur, As one who speaks almost no gradle, is there a preferred way to get gradle to respect the conda configured python on `gradlew build` ? Matt From: Ankur Goenka Reply-To: "user@beam.apache.org" Date: Thursday, September 19, 2019 at 1:50 PM To: "user@beam.apache.org" Subject: Re:

Re: # of Readers MaxNumWorkers for UnboundedSource. Causing deadlocks

2019-09-19 Thread Ken Barr
The Start() seems to be working so focus on advance(). Is there any way to prove if I am blocked in advance() for dataflow runner? I have been through code and cannot see anything. But I know that does not mean much. Ken On 2019/09/19 14:57:09, Jan Lukavský wrote: > Hi Ken, > > I have

Re: Word-count example

2019-09-19 Thread Ankur Goenka
We have this bug in 2.15 which is discussed here https://lists.apache.org/thread.html/76150a1ffca859bae7af0c6fb91724dc405dc55efc51c3b515f0520b@%3Cuser.beam.apache.org%3E For now, please add "--experiments=beam_fn_api" to your pipeline to make it work. On Thu, Sep 19, 2019 at 6:03 AM Matthew

Re: KinesisIO.write() returning NoSuchMethodError: com.google.common.util.concurrent.Futures.addCallback

2019-09-19 Thread Ankit Jhalaria
Thanks for looking into this Alexey. I was able to get past this issue after I upgraded to beam 2.15.0. Let me know if you would still like a JIRA for this. That said, I am running into another issue which is that the producer within the KinesisIO is throwing a NullPointerException More

Re: Beam AmqpIO

2019-09-19 Thread Alexey Romanenko
For reference: https://issues.apache.org/jira/browse/BEAM-8282 > On 18 Sep 2019, at 13:58, Jean-Baptiste Onofré wrote: > > Hi, > > As the original author of AmqpIO, I can update, but it requires some > internal changes to the IO (especially

Re: # of Readers < MaxNumWorkers for UnboundedSource. Causing deadlocks

2019-09-19 Thread Jan Lukavský
Hi Ken, I have seen some deadlock behavior with custom sources earlier (different runner, but that might not be important). Some lessons learned:  a) please make sure your advance() or start() methods do not block, that will cause issues and possibly deadlocks you describe  b) if you want

# of Readers < MaxNumWorkers for UnboundedSource. Causing deadlocks

2019-09-19 Thread Ken Barr
I have a custom UnboundedSource IO that reads from a series of messaging queues. I have implemented this such that the IO takes a list of queues and expands a UnboundedSource/UnboundedReader for each queue. If I use autoscaling with maxNumWorkers <= # number of queues everything works well.

Re: Autoscaling stuck at 1, never see getSplitBacklogBytes() execute

2019-09-19 Thread Ken Barr
Figured it out. Basically you cannot log inside the getSplitBacklogBytes() method and get it to show in the UnboundedSource.Read log stack and could not figure out another way to get at this logs. So I added stats in this method and dumped the stats periodically in the advance() method to get

Re: Word-count example

2019-09-19 Thread Matthew Patterson
Hi Ankur, Yes, I was using 2.15, but was getting failure to deserialize. Thanks, Matt From: Ankur Goenka Reply-To: "user@beam.apache.org" Date: Wednesday, September 18, 2019 at 9:34 PM To: "user@beam.apache.org" Subject: Re: Word-count example CAUTION: This email originated from outside of