Re: Regarding Beam Slack Channel

2017-11-30 Thread Jean-Baptiste Onofré

Invite sent as well.

Regards
JB

On 11/30/2017 07:19 PM, Yanael Barbier wrote:

Hello
Can I get an invite too?

Thanks,
Yanael

Le jeu. 30 nov. 2017 à 19:15, Wesley Tanaka > a écrit :


Invite sent


On 11/30/2017 08:11 AM, Nalseez Duke wrote:

Hello

Can someone please add me to the Beam slack channel?

Thanks.



-- 
Wesley Tanaka

https://wtanaka.com/



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Using JDBC IO read transform, running out of memory on DataflowRunner.

2017-11-30 Thread Chet Aldrich
Hey Eugene, 

Thanks for this, didn’t realize this was a parameter I could tune. Fixed my 
problems straight away. 

Chet

> On Nov 29, 2017, at 2:14 PM, Eugene Kirpichov  wrote:
> 
> Hi,
> I think you're hitting something that can be fixed by configuring Redshift 
> driver:
> http://docs.aws.amazon.com/redshift/latest/dg/queries-troubleshooting.html#set-the-JDBC-fetch-size-parameter
>  
> 
> By default, the JDBC driver collects all the results for a query at one time. 
> As a result, when you attempt to retrieve a large result set over a JDBC 
> connection, you might encounter a client-side out-of-memory error. To enable 
> your client to retrieve result sets in batches instead of in a single 
> all-or-nothing fetch, set the JDBC fetch size parameter in your client 
> application.
> 
> On Wed, Nov 29, 2017 at 1:41 PM Chet Aldrich  > wrote:
> Hey all, 
> 
> I’m running a Dataflow job that uses the JDBC IO transform to pull in a bunch 
> of data (20mm rows, for reference) from Redshift, and I’m noticing that I’m 
> getting an OutofMemoryError on the Dataflow workers once I reach around 4mm 
> rows. 
> 
> It seems like given the code that I’m reading inside JDBC IO and the guide 
> here 
> (https://beam.apache.org/documentation/io/authoring-overview/#read-transforms 
> )
>  that it’s just pulling the data in from the result one-by-one and the 
> emitting each output. Considering that this is sort of a limitation of the 
> driver, this makes sense, but is there a way I can get around the memory 
> limitation somehow? It seems like Dataflow repeatedly tries to create more 
> workers to handle the work, but it can’t, which is part of the problem. 
> 
> If more info is needed in order to help me sort out what I could do to not 
> run into the memory limitations I’m happy to provide it. 
> 
> 
> Thanks,
> 
> Chet 



Re: Regarding Beam Slack Channel

2017-11-30 Thread Wesley Tanaka

Invite sent

On 11/30/2017 08:11 AM, Nalseez Duke wrote:

Hello

Can someone please add me to the Beam slack channel?

Thanks.



--
Wesley Tanaka
https://wtanaka.com/



Regarding Beam Slack Channel

2017-11-30 Thread Nalseez Duke
Hello

Can someone please add me to the Beam slack channel?

Thanks.


Re: Pubsub -> Bream -> many files

2017-11-30 Thread Eugene Kirpichov
TextIO.write().to(DynamicDestinations), available in Beam 2.2, does exactly
this.

On Thu, Nov 30, 2017, 9:35 AM Andrew Jones 
wrote:

> Hi,
>
> I'm new to Beam. I have a use case where I want to read from a Pubsub
> stream, transform the data in Beam, and write to many outputs.
>
> As a simple example, say I'm reading words from Pubsub, I get the first
> letter of the word, and then I write to a file for that letter.
>
> I want to do this programmatically, so I don't want to have to know all
> the outputs beforehand, but they can be created as we need them, based
> on the data that comes in.
>
> Has anyone done something similar with Beam, or have any examples?
>
> At the moment I'm looking at tagged outputs, but the documentation
> suggests that I need to know the outputs beforehand and create
> TupleTag's for each.
>
> Another option might simply be to use GroupByKey, but then I'm not sure
> if I can pass the result to TextIO.
>
> Thanks,
> Andrew
>