Re: dataflow HDF5 loading pipeline errors

2018-02-13 Thread Ahmet Altay
Hi Eila, The error "work item was attempted 4 times without success" indicates that some operation is consistently failing. You can find more information in Dataflow worker logs [1] about the actual error. I cannot tell for sure without looking at the logs, I suspect your issue is related to

Re: ParDo requires its input to use KvCoder in order to use state and timers

2018-02-13 Thread Kenneth Knowles
If I am connecting the threads properly, you are trying to simulate triggering based on the size of buffered data, while you are using the type of message to route messages to one GCS location or another. Yes? On Tue, Feb 13, 2018 at 10:29 AM, Carlos Alonso wrote: > I have

Fwd: dataflow HDF5 loading pipeline errors

2018-02-13 Thread OrielResearch Eila Arich-Landkof
Hello, Any help will be greatly appreciated!!! I an using dataflow to process H5 (HDF5 format ) file. The H5 file was uploaded to google storage from: https://amp.pharm.mssm. edu/archs4/download.html H5 / HDF5 is an hierarchical data structure to present

Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Eugene Kirpichov
It is quite complicated. See https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java in particular the expand() method. At a high level, it assigns a shard index to every element and then groups by destination and shard index (implicitly also

Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Carlos Alonso
Cool thanks! How does it work internally? Are all the elements routed to the same path grouped and processed within the same bundle? Thanks! On Tue, Feb 13, 2018 at 9:03 PM Eugene Kirpichov wrote: > It will do its best to throw an exception if duplicate names are

Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Carlos Alonso
Cool, thanks. What if the destination is not properly coded and the File naming policy then produces a duplicated path? Will it throw an exception? Overwrite? Thanks! On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov wrote: > Dynamic file writes generate 1 set of files

Re: ParDo requires its input to use KvCoder in order to use state and timers

2018-02-13 Thread Carlos Alonso
I have a question about this. My scenario is: * PubSub input with a timestampAttribute named doc_timestamp * Fixed windowing of one hour size. * Keys are an internal attribute of the messages (the type) * Messages of one particular type are way more frequent than the others, so it is likely a hot

Re: working with hot keys

2018-02-13 Thread Lukasz Cwik
Both are doing the same thing effectively by loading the entire iterable into memory in the first case and the partitioned iterable into memory in the second case. The side input performance varies a lot depending on whether your running a pipeline with bounded or unbounded PCollections,

Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Eugene Kirpichov
Dynamic file writes generate 1 set of files (shards) for every pane firing of every window of every destination. File naming policy is required to produce different names for every combination of (destination, shard index, window, pane) so you never have to append or overwrite. A new element

Re: working with hot keys

2018-02-13 Thread Jacob Marble
On Mon, Feb 12, 2018 at 3:59 PM, Lukasz Cwik wrote: > The optimization that you have done is that you have forced the V1 > iterable to reside in memory completely since it is now counted as a single > element. This will fall apart as soon your V1 iterable exceeds memory. >

Re: slack invite pls

2018-02-13 Thread Aviem Zur
Invitation sent On Tue, Feb 13, 2018 at 5:43 PM ramesh krishnan muthusamy < ramkrish1...@gmail.com> wrote: > Hi > > please could I have a invite to the slack channel? > > -Ramesh >

slack invite pls

2018-02-13 Thread ramesh krishnan muthusamy
Hi please could I have a invite to the slack channel? -Ramesh

Re: Slack invite please

2018-02-13 Thread Aviem Zur
Invitation sent On Tue, Feb 13, 2018 at 5:40 PM Sanjeev Nutan wrote: > Hi fellow Beamers, please could I have a invite to the slack channel? > Thanks. >

Slack invite please

2018-02-13 Thread Sanjeev Nutan
Hi fellow Beamers, please could I have a invite to the slack channel? Thanks.

Fwd: Beam Spark Dataframe support

2018-02-13 Thread ramesh krishnan muthusamy
Hi Team, Does Apache Beam provide API's for Apache Spark Dataframes as a part for Spark 2.0. -Ramesh

How does TextIO decides when to finalise a file?

2018-02-13 Thread Carlos Alonso
Hi everyone!! I'm wondering how a TextIO with dynamic routing knows/decides when to finalise a file and what happens if after it is finalised, another element routed for the same file appears. Thanks!