date:20190531

Re: [DISCUSS] Cookbooks for users with knowledge in other frameworks

2019-05-31 Thread Ahmet Altay

Thank you Reza. That separation makes sense to me.

On Wed, May 29, 2019 at 6:26 PM Reza Rokni  wrote:

> +1
>
> I think there will be at least two layers of this;
>
> Layer 1 - Using primitives : I do join, GBK, Aggregation... with system x
> this way, what is the canonical equivalent in Beam.
> Layer 2 - Patterns : I read and join Unbounded and Bounded Data in system
> x this way, what is the canonical equivalent in Beam.
>
> I suspect as a first pass Layer 1 is reasonably well bounded work, there
> would need to be agreement on "canonical" version of how to do something in
> Beam as this could be seen to be opinionated. As there are often a
> multitude of ways of doing x
>

Once we identify a set of layer 1 items, we could crowd source the
canonical implementations. I believe we can use our usual code review
process to settle on a version that is agreeable. (Examples have the same
issue, they are probably opinionated today based on the author but it works
out.)


>
>
> On Thu, 30 May 2019 at 08:56, Ahmet Altay  wrote:
>
>> Hi all,
>>
>> Inspired by the user asking about a Spark feature in Beam [1] in the
>> release thread, I searched the user@ list and noticed a few instances of
>> people asking for question like "I can do X in Spark, how can I do that in
>> Beam?" Would it make sense to add documentation to explain how certain
>> tasks that can be accomplished in Beam with side by side examples of doing
>> the same task in Beam/Spark etc. It could help with on-boarding because it
>> will be easier for people to leverage their existing knowledge. It could
>> also help other frameworks as well, because it will serve as a Rosetta
>> stone with two translations.
>>
>> Questions I have are:
>> - Would such a thing be a helpful?
>> - Is it feasible? Would a few pages worth of examples can cover enough
>> use cases?
>>
>> Thank you!
>> Ahmet
>>
>> [1]
>> https://lists.apache.org/thread.html/b73a54aa1e6e9933628f177b04a8f907c26cac854745fa081c478eff@%3Cdev.beam.apache.org%3E
>>
>
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>

Re: 1 Million Lines of Code (1 MLOC)

2019-05-31 Thread Alex Amato

Interesting, so if we play with https://github.com/cgag/loc we could break
it down further? I.e. test files vs code files? Which folders, etc. That
could be interesting as well.

On Fri, May 31, 2019 at 4:20 PM Brian Hulette  wrote:

> Dennis Nedry needed 2 million lines of code to control Jurassic Park, and
> he only had to manage eight computers! I think we may actually need to pick
> up the pace.
>
> On Fri, May 31, 2019 at 4:11 PM Anton Kedin  wrote:
>
>> And to reduce the effort of future rewrites we should start doing it on a
>> schedule. I propose we start over once a week :)
>>
>> On Fri, May 31, 2019 at 4:02 PM Lukasz Cwik  wrote:
>>
>>> 1 million lines is too much, time to delete the entire project and start
>>> over again, :-)
>>>
>>> On Fri, May 31, 2019 at 3:12 PM Ankur Goenka  wrote:
>>>
 Thanks for sharing.
 This is really interesting metrics.
 One use I can see is to track LOC vs Comments to make sure that we keep
 up with the practice of writing maintainable code.

 On Fri, May 31, 2019 at 3:04 PM Ismaël Mejía  wrote:

> I was checking some metrics in our codebase and found by chance that
> we have passed the 1 million lines of code (MLOC). Of course lines of
> code may not matter much but anyway it is interesting to see the size
> of our project at this moment.
>
> This is the detailed information returned by loc [1]:
>
>
> 
>  Language FilesLinesBlank  Comment
>  Code
>
> 
>  Java  3681   67300778265   140753
>453989
>  Python 497   1310822256013378
> 95144
>  Go 333   1057751368111073
> 81021
>  Markdown   20531989 65260
> 25463
>  Plain Text  1121979 63590
> 15620
>  Sass92 9867 1434 1900
>  6533
>  JavaScript  19 5157 1197  467
>  3493
>  YAML14 4601  454 1104
>  3043
>  Bourne Shell30 3874  470 1028
>  2376
>  Protobuf17 4258  677 1373
>  2208
>  XML 17 2789  296  559
>  1934
>  Kotlin  19 3501  347 1370
>  1784
>  HTML60 2447  148  914
>  1385
>  Batch3  249   570
>   192
>  INI  1  206   21   16
>   169
>  C++  2   724   36
>32
>  Autoconf 1   211   16
> 4
>
> 
>  Total 5002  1000874   132497   173987
>694390
>
> 
>
> [1] https://github.com/cgag/loc
>

Re: 1 Million Lines of Code (1 MLOC)

2019-05-31 Thread Brian Hulette

Dennis Nedry needed 2 million lines of code to control Jurassic Park, and
he only had to manage eight computers! I think we may actually need to pick
up the pace.

On Fri, May 31, 2019 at 4:11 PM Anton Kedin  wrote:

> And to reduce the effort of future rewrites we should start doing it on a
> schedule. I propose we start over once a week :)
>
> On Fri, May 31, 2019 at 4:02 PM Lukasz Cwik  wrote:
>
>> 1 million lines is too much, time to delete the entire project and start
>> over again, :-)
>>
>> On Fri, May 31, 2019 at 3:12 PM Ankur Goenka  wrote:
>>
>>> Thanks for sharing.
>>> This is really interesting metrics.
>>> One use I can see is to track LOC vs Comments to make sure that we keep
>>> up with the practice of writing maintainable code.
>>>
>>> On Fri, May 31, 2019 at 3:04 PM Ismaël Mejía  wrote:
>>>
 I was checking some metrics in our codebase and found by chance that
 we have passed the 1 million lines of code (MLOC). Of course lines of
 code may not matter much but anyway it is interesting to see the size
 of our project at this moment.

 This is the detailed information returned by loc [1]:


 
  Language FilesLinesBlank  Comment
Code

 
  Java  3681   67300778265   140753
  453989
  Python 497   1310822256013378
   95144
  Go 333   1057751368111073
   81021
  Markdown   20531989 65260
   25463
  Plain Text  1121979 63590
   15620
  Sass92 9867 1434 1900
6533
  JavaScript  19 5157 1197  467
3493
  YAML14 4601  454 1104
3043
  Bourne Shell30 3874  470 1028
2376
  Protobuf17 4258  677 1373
2208
  XML 17 2789  296  559
1934
  Kotlin  19 3501  347 1370
1784
  HTML60 2447  148  914
1385
  Batch3  249   570
 192
  INI  1  206   21   16
 169
  C++  2   724   36
  32
  Autoconf 1   211   16
   4

 
  Total 5002  1000874   132497   173987
  694390

 

 [1] https://github.com/cgag/loc

>>>

Re: 1 Million Lines of Code (1 MLOC)

2019-05-31 Thread Anton Kedin

And to reduce the effort of future rewrites we should start doing it on a
schedule. I propose we start over once a week :)

On Fri, May 31, 2019 at 4:02 PM Lukasz Cwik  wrote:

> 1 million lines is too much, time to delete the entire project and start
> over again, :-)
>
> On Fri, May 31, 2019 at 3:12 PM Ankur Goenka  wrote:
>
>> Thanks for sharing.
>> This is really interesting metrics.
>> One use I can see is to track LOC vs Comments to make sure that we keep
>> up with the practice of writing maintainable code.
>>
>> On Fri, May 31, 2019 at 3:04 PM Ismaël Mejía  wrote:
>>
>>> I was checking some metrics in our codebase and found by chance that
>>> we have passed the 1 million lines of code (MLOC). Of course lines of
>>> code may not matter much but anyway it is interesting to see the size
>>> of our project at this moment.
>>>
>>> This is the detailed information returned by loc [1]:
>>>
>>>
>>> 
>>>  Language FilesLinesBlank  Comment
>>>Code
>>>
>>> 
>>>  Java  3681   67300778265   140753
>>>  453989
>>>  Python 497   1310822256013378
>>>   95144
>>>  Go 333   1057751368111073
>>>   81021
>>>  Markdown   20531989 65260
>>>   25463
>>>  Plain Text  1121979 63590
>>>   15620
>>>  Sass92 9867 1434 1900
>>>6533
>>>  JavaScript  19 5157 1197  467
>>>3493
>>>  YAML14 4601  454 1104
>>>3043
>>>  Bourne Shell30 3874  470 1028
>>>2376
>>>  Protobuf17 4258  677 1373
>>>2208
>>>  XML 17 2789  296  559
>>>1934
>>>  Kotlin  19 3501  347 1370
>>>1784
>>>  HTML60 2447  148  914
>>>1385
>>>  Batch3  249   570
>>> 192
>>>  INI  1  206   21   16
>>> 169
>>>  C++  2   724   36
>>>  32
>>>  Autoconf 1   211   16
>>>   4
>>>
>>> 
>>>  Total 5002  1000874   132497   173987
>>>  694390
>>>
>>> 
>>>
>>> [1] https://github.com/cgag/loc
>>>
>>

Re: 1 Million Lines of Code (1 MLOC)

2019-05-31 Thread Lukasz Cwik

1 million lines is too much, time to delete the entire project and start
over again, :-)

On Fri, May 31, 2019 at 3:12 PM Ankur Goenka  wrote:

> Thanks for sharing.
> This is really interesting metrics.
> One use I can see is to track LOC vs Comments to make sure that we keep up
> with the practice of writing maintainable code.
>
> On Fri, May 31, 2019 at 3:04 PM Ismaël Mejía  wrote:
>
>> I was checking some metrics in our codebase and found by chance that
>> we have passed the 1 million lines of code (MLOC). Of course lines of
>> code may not matter much but anyway it is interesting to see the size
>> of our project at this moment.
>>
>> This is the detailed information returned by loc [1]:
>>
>>
>> 
>>  Language FilesLinesBlank  Comment
>>  Code
>>
>> 
>>  Java  3681   67300778265   140753
>>  453989
>>  Python 497   1310822256013378
>> 95144
>>  Go 333   1057751368111073
>> 81021
>>  Markdown   20531989 65260
>> 25463
>>  Plain Text  1121979 63590
>> 15620
>>  Sass92 9867 1434 1900
>>  6533
>>  JavaScript  19 5157 1197  467
>>  3493
>>  YAML14 4601  454 1104
>>  3043
>>  Bourne Shell30 3874  470 1028
>>  2376
>>  Protobuf17 4258  677 1373
>>  2208
>>  XML 17 2789  296  559
>>  1934
>>  Kotlin  19 3501  347 1370
>>  1784
>>  HTML60 2447  148  914
>>  1385
>>  Batch3  249   570
>>   192
>>  INI  1  206   21   16
>>   169
>>  C++  2   724   36
>>32
>>  Autoconf 1   211   16
>> 4
>>
>> 
>>  Total 5002  1000874   132497   173987
>>  694390
>>
>> 
>>
>> [1] https://github.com/cgag/loc
>>
>

Design Proposal for Cost Estimation

2019-05-31 Thread Alireza Samadian

Dear Members of Apache Beam Dev List,

My name is Alireza; I am a Software Engineer Intern at Google, and I am
working closely with Anton on Beam SQL query optimizer. Currently, it uses
Apache Calcite without any cost estimation; I am proposing to implement the
cost estimator for it.
The first step would be implementing cost estimator for the sources; this
is my design proposal for this implementation. I will appreciate your
comments and suggestions.

https://docs.google.com/document/d/1vi1PBBu5IqSy-qZl1Gk-49CcANOpbNs1UAud6LnOaiY/edit#heading=h.6rlkpwwx7gvf

Best,
Alireza Samadian

Re: [DISCUSS] Portability representation of schemas

2019-05-31 Thread Brian Hulette

> Can you propose what the protos would look like in this case? Right now
LogicalType does not contain the to/from conversion functions in the proto.
Do you think we'll need to add these in?

Maybe. Right now the proposed LogicalType message is pretty simple/generic:
message LogicalType {
  FieldType representation = 1;
  string logical_urn = 2;
  bytes logical_payload = 3;
}

If we keep just logical_urn and logical_payload, the logical_payload could
itself be a protobuf with attributes of 1) a serialized class and 2/3)
to/from functions. Or, alternatively, we could have a generalization of the
SchemaRegistry for logical types. Implementations for standard types and
user-defined types would be registered by URN, and the SDK could look them
up given just a URN. I put a brief section about this alternative in the
doc last week [1]. What I suggested there included removing the
logical_payload field, which is probably overkill. The critical piece is
just relying on a registry in the SDK to look up types and to/from
functions rather than storing them in the portable schema itself.

I kind of like keeping the LogicalType message generic for now, since it
gives us a way to try out these various approaches, but maybe that's just a
cop out.

[1]
https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit?ts=5cdf6a5b#heading=h.jlt5hdrolfy

On Fri, May 31, 2019 at 12:36 PM Reuven Lax  wrote:

>
>
> On Tue, May 28, 2019 at 10:11 AM Brian Hulette 
> wrote:
>
>>
>>
>> On Sun, May 26, 2019 at 1:25 PM Reuven Lax  wrote:
>>
>>>
>>>
>>> On Fri, May 24, 2019 at 11:42 AM Brian Hulette 
>>> wrote:
>>>
 *tl;dr:* SchemaCoder represents a logical type with a base type of Row
 and we should think about that.

 I'm a little concerned that the current proposals for a portable
 representation don't actually fully represent Schemas. It seems to me that
 the current java-only Schemas are made up three concepts that are
 intertwined:
 (a) The Java SDK specific code for schema inference, type coercion, and
 "schema-aware" transforms.
 (b) A RowCoder[1] that encodes Rows[2] which have a particular
 Schema[3].
 (c) A SchemaCoder[4] that has a RowCoder for a particular schema, and
 functions for converting Rows with that schema to/from a Java type T. Those
 functions and the RowCoder are then composed to provider a Coder for the
 type T.

>>>
>>> RowCoder is currently just an internal implementation detail, it can be
>>> eliminated. SchemaCoder is the only thing that determines a schema today.
>>>
>> Why not keep it around? I think it would make sense to have a RowCoder
>> implementation in every SDK, as well as something like SchemaCoder that
>> defines a conversion from that SDK's "Row" to the language type.
>>
>
> The point is that from a programmer's perspective, there is nothing much
> special about Row. Any type can have a schema, and the only special thing
> about Row is that it's always guaranteed to exist. From that standpoint,
> Row is nearly an implementation detail. Today RowCoder is never set on
> _any_ PCollection, it's literally just used as a helper library, so there's
> no real need for it to exist as a "Coder."
>
>
>>
>>>

 We're not concerned with (a) at this time since that's specific to the
 SDK, not the interface between them. My understanding is we just want to
 define a portable representation for (b) and/or (c).

 What has been discussed so far is really just a portable representation
 for (b), the RowCoder, since the discussion is only around how to represent
 the schema itself and not the to/from functions.

>>>
>>> Correct. The to/from functions are actually related to a). One of the
>>> big goals of schemas was that users should not be forced to operate on rows
>>> to get schemas. A user can create PCollection and as long as
>>> the SDK can infer a schema from MyRandomType, the user never needs to even
>>> see a Row object. The to/fromRow functions are what make this work today.
>>>
>>>
>>
>> One of the points I'd like to make is that this type coercion is a useful
>> concept on it's own, separate from schemas. It's especially useful for a
>> type that has a schema and is encoded by RowCoder since that can represent
>> many more types, but the type coercion doesn't have to be tied to just
>> schemas and RowCoder. We could also do type coercion for types that are
>> effectively wrappers around an integer or a string. It could just be a
>> general way to map language types to base types (i.e. types that we have a
>> coder for). Then it just becomes a general framework for extending coders
>> to represent more language types.
>>
>
> Let's not tie those conversations. Maybe a similar concept will hold true
> for general coders (or we might decide to get rid of coders in favor of
> schemas, in which case that becomes moot), but I don't think we should
> prematurely generalize.
>
>
>>
>>
>>
>>>

Re: 1 Million Lines of Code (1 MLOC)

2019-05-31 Thread Ankur Goenka

Thanks for sharing.
This is really interesting metrics.
One use I can see is to track LOC vs Comments to make sure that we keep up
with the practice of writing maintainable code.

On Fri, May 31, 2019 at 3:04 PM Ismaël Mejía  wrote:

> I was checking some metrics in our codebase and found by chance that
> we have passed the 1 million lines of code (MLOC). Of course lines of
> code may not matter much but anyway it is interesting to see the size
> of our project at this moment.
>
> This is the detailed information returned by loc [1]:
>
>
> 
>  Language FilesLinesBlank  Comment
>  Code
>
> 
>  Java  3681   67300778265   140753
>  453989
>  Python 497   1310822256013378
> 95144
>  Go 333   1057751368111073
> 81021
>  Markdown   20531989 65260
> 25463
>  Plain Text  1121979 63590
> 15620
>  Sass92 9867 1434 1900
>  6533
>  JavaScript  19 5157 1197  467
>  3493
>  YAML14 4601  454 1104
>  3043
>  Bourne Shell30 3874  470 1028
>  2376
>  Protobuf17 4258  677 1373
>  2208
>  XML 17 2789  296  559
>  1934
>  Kotlin  19 3501  347 1370
>  1784
>  HTML60 2447  148  914
>  1385
>  Batch3  249   570
>   192
>  INI  1  206   21   16
>   169
>  C++  2   724   36
>32
>  Autoconf 1   211   16
> 4
>
> 
>  Total 5002  1000874   132497   173987
>  694390
>
> 
>
> [1] https://github.com/cgag/loc
>

1 Million Lines of Code (1 MLOC)

2019-05-31 Thread Ismaël Mejía

I was checking some metrics in our codebase and found by chance that
we have passed the 1 million lines of code (MLOC). Of course lines of
code may not matter much but anyway it is interesting to see the size
of our project at this moment.

This is the detailed information returned by loc [1]:


 Language FilesLinesBlank  Comment Code

 Java  3681   67300778265   140753   453989
 Python 497   131082225601337895144
 Go 333   105775136811107381021
 Markdown   20531989 6526025463
 Plain Text  1121979 6359015620
 Sass92 9867 1434 1900 6533
 JavaScript  19 5157 1197  467 3493
 YAML14 4601  454 1104 3043
 Bourne Shell30 3874  470 1028 2376
 Protobuf17 4258  677 1373 2208
 XML 17 2789  296  559 1934
 Kotlin  19 3501  347 1370 1784
 HTML60 2447  148  914 1385
 Batch3  249   570  192
 INI  1  206   21   16  169
 C++  2   724   36   32
 Autoconf 1   211   164

 Total 5002  1000874   132497   173987   694390


[1] https://github.com/cgag/loc

Re: [VOTE] Release 2.13.0, release candidate #2

2019-05-31 Thread Ahmet Altay

+1

I validated python 2 quickstarts.

On Fri, May 31, 2019 at 10:22 AM Lukasz Cwik  wrote:

> I did the Java local quickstart for all the runners in the release
> validation sheet and gearpump failed for me due to a missing dependency.
> Even after I fixed up the dependency, the pipeline then got stuck. I filed
> BEAM-7467 with all the details.
>
> Note that I tried the quickstart for 2.8.0 through 2.12.0
> 2.8.0 and 2.9.0 failed due to a timeout (maybe I was using the wrong
> command but this test[1] suggests that I was using a correct one)
> 2.10.0 and higher fail due to the missing gs-collections dependency.
>
> Manu, could you help figure out what is going on?
>
> 1:
> https://github.com/apache/beam/blob/2d3bcdc542536037c3e657a8b00ebc222487476b/release/src/main/groovy/quickstart-java-gearpump.groovy#L33
>
> On Thu, May 30, 2019 at 7:53 PM Ankur Goenka  wrote:
>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #2 for the version
>> 2.13.0, as follows:
>>
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint
>> 6356C1A9F089B0FA3DE8753688934A6699985948 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.13.0-RC2" [5],
>> * website pull request listing the release [6] and publishing the API
>> reference manual [7].
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2].
>> * Validation sheet with a tab for 2.13.0 release to help with validation
>> [8].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Ankur
>>
>> [1]
>> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12345166
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.13.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1070/
>> [5] https://github.com/apache/beam/tree/v2.13.0-RC2
>> [6] https://github.com/apache/beam/pull/8645
>> [7] https://github.com/apache/beam-site/pull/589
>> [8]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1031196952
>>
>

Re: [VOTE] Release 2.13.0, release candidate #2

2019-05-31 Thread Lukasz Cwik

I did the Java local quickstart for all the runners in the release
validation sheet and gearpump failed for me due to a missing dependency.
Even after I fixed up the dependency, the pipeline then got stuck. I filed
BEAM-7467 with all the details.

Note that I tried the quickstart for 2.8.0 through 2.12.0
2.8.0 and 2.9.0 failed due to a timeout (maybe I was using the wrong
command but this test[1] suggests that I was using a correct one)
2.10.0 and higher fail due to the missing gs-collections dependency.

Manu, could you help figure out what is going on?

1:
https://github.com/apache/beam/blob/2d3bcdc542536037c3e657a8b00ebc222487476b/release/src/main/groovy/quickstart-java-gearpump.groovy#L33

On Thu, May 30, 2019 at 7:53 PM Ankur Goenka  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #2 for the version 2.13.0,
> as follows:
>
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint
> 6356C1A9F089B0FA3DE8753688934A6699985948 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.13.0-RC2" [5],
> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.13.0 release to help with validation
> [8].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Ankur
>
> [1]
> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12345166
> [2] https://dist.apache.org/repos/dist/dev/beam/2.13.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1070/
> [5] https://github.com/apache/beam/tree/v2.13.0-RC2
> [6] https://github.com/apache/beam/pull/8645
> [7] https://github.com/apache/beam-site/pull/589
> [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1031196952
>

Re: Support for PaneInfo in Python SDK

2019-05-31 Thread Tanay Tummalapalli

Hi Pablo,
Thanks for that example, it would be great to be able to use
fileio.WriteToFiles transform to write files with filenames that are based
on their PaneInfo.

Thanks @Charles Chen, for adding the remaining work on the issue - the
emission of PaneInfo in the Python implementation of GBK (either in the
Python FnAPIRunner or the old Python DirectRunner / triggers.py).

I will certainly make sure that [BEAM-3759] is completed after my GSoC
project is implemented. It's a good opportunity to get into the runner code.

Regards
- TT

On Fri, May 31, 2019 at 2:35 AM Pablo Estrada  wrote:

> Hi Tanay,
> thanks for bringing this to the mailing list. I believe this is certainly
> useful, and necessary. As an example, the fileio.WriteToFiles transform
> does not work well without PaneInfo data (since we can't know how many
> firings there are for each window, and we can't give names to files based
> on this).
>
> Best
> -P.
>
> On Thu, May 30, 2019 at 1:00 PM Tanay Tummalapalli 
> wrote:
>
>> Hi everyone,
>>
>> The PR linked in [BEAM-3759] - "Add support for PaneInfo descriptor in
>> Python SDK"[1] was merged, but, the issue is still open.
>> There might be some work left on this for full support for PaneInfo. Eg:
>> Although the PaneInfo class exists, it is not accessible in a DoFn via a
>> kwarg(PaneInfoParam) like TimestampParam or WindowParam.
>>
>> Please let me know the remaining work to be done on this issue as this
>> may be needed in the near future.
>>
>> Regards
>> Tanay Tummalapalli
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-3759
>>
>

Re: [DISCUSS] Cookbooks for users with knowledge in other frameworks

Re: 1 Million Lines of Code (1 MLOC)

Re: 1 Million Lines of Code (1 MLOC)

Re: 1 Million Lines of Code (1 MLOC)

Re: 1 Million Lines of Code (1 MLOC)

Design Proposal for Cost Estimation

Re: [DISCUSS] Portability representation of schemas

Re: 1 Million Lines of Code (1 MLOC)

1 Million Lines of Code (1 MLOC)

Re: [VOTE] Release 2.13.0, release candidate #2

Re: [VOTE] Release 2.13.0, release candidate #2

Re: Support for PaneInfo in Python SDK

12 matches

Site Navigation

Mail list logo

Footer information