Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-11-28 Thread Kenneth Knowles
Nice! A clean solution and an opportunity to bikeshed on names. This has everything I love. Kenn On Wed, Nov 28, 2018 at 6:43 PM Jeff Klukas wrote: > It looks like we can add make the new interface a superinterface for the > existing SerializableFunction while maintaining binary compatibility

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-11-28 Thread Jeff Klukas
It looks like we can add make the new interface a superinterface for the existing SerializableFunction while maintaining binary compatibility [0]. We'd have: public interface NewSerializableFunction extends Serializable { OutputT apply(InputT input) throws Exception; } and then modify

Re: Handling large values

2018-11-28 Thread Lukasz Cwik
I don't believe we would need to change any other coders since SeekableInputStream wouldn't change how a regular InputStream would work so coders that don't care about the implementation would still use it as a forward only input stream. Coders that care about seeking would use the new

Question about checkpoint logic of the Dataflow Runner

2018-11-28 Thread flyisland
Hi Gurus, I need to understand the checkpoint logic of the Dataflow Runner, like when and how will the runner trigger a finalize on a checkpoint, is the finalize thread same as the reader thread? Could you share me the information, or point me to the related source code, thanks in advance!

Re: Handling large values

2018-11-28 Thread Robert Bradshaw
On Wed, Nov 28, 2018 at 11:57 PM Lukasz Cwik wrote: > > Re-adding +datapls-portability-t...@google.com > +datapls-unified-wor...@google.com > > On Wed, Nov 28, 2018 at 2:23 PM Robert Bradshaw wrote: >> >> Thanks for bringing this to the list. More below. >> >> On Wed, Nov 28, 2018 at 11:10 PM

Re: Handling large values

2018-11-28 Thread Lukasz Cwik
Re-adding +datapls-portability-t...@google.com +datapls-unified-wor...@google.com On Wed, Nov 28, 2018 at 2:23 PM Robert Bradshaw wrote: > Thanks for bringing this to the list. More below. > > On Wed, Nov 28, 2018 at 11:10 PM Kenneth Knowles wrote: > >> FWIW I deliberately limited the thread

Re: Handling large values

2018-11-28 Thread Robert Bradshaw
Thanks for bringing this to the list. More below. On Wed, Nov 28, 2018 at 11:10 PM Kenneth Knowles wrote: > FWIW I deliberately limited the thread to not mix public and private > lists, so people intending private replies do not accidentally send to > dev@beam. > > I've left them on this time,

Re: Handling large values

2018-11-28 Thread Kenneth Knowles
FWIW I deliberately limited the thread to not mix public and private lists, so people intending private replies do not accidentally send to dev@beam. I've left them on this time, to avoid contradicting your action, but I recommend removing them. Kenn On Wed, Nov 28, 2018 at 12:59 PM Lukasz Cwik

Re: Handling large values

2018-11-28 Thread Lukasz Cwik
Re-adding +datapls-portability-t...@google.com +datapls-unified-wor...@google.com On Wed, Nov 28, 2018 at 12:58 PM Lukasz Cwik wrote: > That is correct Kenn. An important point would be that SomeOtherCoder > would be given a seekable stream (instead of the forward only stream it > gets right

Re: Handling large values

2018-11-28 Thread Lukasz Cwik
That is correct Kenn. An important point would be that SomeOtherCoder would be given a seekable stream (instead of the forward only stream it gets right now) so it can either decode all the data or lazily decode parts as it needs to as in the case of an iterable coder when used to support large

Re: Handling large values

2018-11-28 Thread Kenneth Knowles
Interesting! Having large iterables within rows would be great for the interactions between SQL and the core SDK's schema/Row support, and we weren't sure how that could work, exactly. My (very basic) understanding would be that LengthPrefixedCoder(SomeOtherCoder) has an encoding that is a length

Handling large values

2018-11-28 Thread Lukasz Cwik
There is a discussion happening on a PR 7127[1] where Robert is working on providing the first implementation for supporting large iterables resulting from a GroupByKey. This is inline with the original proposal for remote references over the Fn Data & State API[2]. I had thought about this issue

[ANNOUNCEMENT] [SQL] [BEAM-6133] Support for user defined table functions (UDTF)

2018-11-28 Thread Gleb Kanterov
At the moment we support only ScalarFunction UDF, it's functions that operate on row fields. In Calcite, there are 3 kinds of UDFs: aggregate functions (that we already support), table macro and table functions. The difference between table functions and macros is that macros expand to relations,

Re: BigqueryIO field clustering

2018-11-28 Thread Chamikara Jayalath
Thanks for the contribution. I can take a look later this week. On Wed, Nov 28, 2018 at 12:29 AM Wout Scheepers < wout.scheep...@vente-exclusive.com> wrote: > Hey all, > > > > Almost two weeks ago, I create a PR to support BigQuery clustering [1]. > > Can someone please have a look? > > > >

Re: contributor in the Beam

2018-11-28 Thread Jean-Baptiste Onofré
Hi, I already upgraded locally. Let me push the PR. Regards JB On 28/11/2018 16:02, Chaim Turkel wrote: > is there any reason that the mongo client version is still on 3.2.2? > can you upgrade it to 3.9.0? > chaim > On Tue, Nov 27, 2018 at 4:48 PM Jean-Baptiste Onofré > wrote: >> >> Hi Chaim,

Re: contributor in the Beam

2018-11-28 Thread Chaim Turkel
is there any reason that the mongo client version is still on 3.2.2? can you upgrade it to 3.9.0? chaim On Tue, Nov 27, 2018 at 4:48 PM Jean-Baptiste Onofré wrote: > > Hi Chaim, > > The best is to create a Jira describing the new features you want to > add. Then, you can create a PR related to

Re: TextIO setting file dynamically issue

2018-11-28 Thread Jeff Klukas
You can likely achieve what you want using FileIO with dynamic destinations, which is described in the "Advanced features" section of the TextIO docs [0]. Your case might look something like: PCollection events = ...; events.apply(FileIO.writeDynamic() .by(event ->

Re: contributor in the Beam

2018-11-28 Thread Chaim Turkel
i have created the pull request: https://github.com/apache/beam/pull/7148 On Tue, Nov 27, 2018 at 4:48 PM Jean-Baptiste Onofré wrote: > > Hi Chaim, > > The best is to create a Jira describing the new features you want to > add. Then, you can create a PR related to this Jira. > > As I'm the

Re: [FEEDBACK REQUEST] Re: [ANNOUNCEMENT] Nexmark included to the CI

2018-11-28 Thread Etienne Chauchot
Hi Alex,Exporting results to the dashboards is as easy as writing to a BigQuery table and then configure the dashboard SQL request to display it. Here is an example:- exporting:

Re: BigqueryIO field clustering

2018-11-28 Thread Wout Scheepers
Hey all, Almost two weeks ago, I create a PR to support BigQuery clustering [1]. Can someone please have a look? Thanks, Wout 1: https://github.com/apache/beam/pull/7061 From: Lukasz Cwik Reply-To: "u...@beam.apache.org" Date: Wednesday, 29 August 2018 at 18:32 To: dev ,