Re: Apache Beam, version 2.2.0

2017-12-04 Thread Jean-Baptiste Onofré
My apologizes, I thought we had a consensus already. Regards JB On 12/04/2017 11:22 PM, Eugene Kirpichov wrote: Thanks JB for sending the detailed notes about new stuff in 2.2.0! A lot of exciting things indeed. Regarding Java 8: I thought our consensus was to have the release notes say that

Re: Global sum of latest help

2017-12-04 Thread Kenneth Knowles
On Mon, Dec 4, 2017 at 3:22 PM, Lukasz Cwik wrote: > Since processing can happen out of order, for example if the input was: > ``` > {"id": "2", parent_id: "a", "timestamp": 2, "amount": 3} > {"id": "1", parent_id: "a", "timestamp": 1. "amount": 1} > {"id": "1", parent_id: "a",

Re: Callbacks/other functions run after a PDone/output transform

2017-12-04 Thread Eugene Kirpichov
I agree that the proper API for enabling the use case "do something after the data has been written" is to return a PCollection of objects where each object represents the result of writing some identifiable subset of the data. Then one can apply a ParDo to this PCollection, in order to "do

Re: Global sum of latest help

2017-12-04 Thread Lukasz Cwik
Since processing can happen out of order, for example if the input was: ``` {"id": "2", parent_id: "a", "timestamp": 2, "amount": 3} {"id": "1", parent_id: "a", "timestamp": 1. "amount": 1} {"id": "1", parent_id: "a", "timestamp": 3, "amount": 2} ``` would the output be 3 and then 5 or would you

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Lukasz Cwik
I also believe we were still in the investigatory phase for dropping support for Java 7. On Mon, Dec 4, 2017 at 2:22 PM, Eugene Kirpichov wrote: > Thanks JB for sending the detailed notes about new stuff in 2.2.0! A lot > of exciting things indeed. > > Regarding Java 8: I

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Eugene Kirpichov
Thanks JB for sending the detailed notes about new stuff in 2.2.0! A lot of exciting things indeed. Regarding Java 8: I thought our consensus was to have the release notes say that we're *considering* going Java8-only, and use that to get more opinions from the user community - but I can't find

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Vilhelm von Ehrenheim
I'm super excited about this release! Great work everyone involved! On Mon, Dec 4, 2017 at 10:58 AM, Jean-Baptiste Onofré wrote: > Just an important note that we forgot to mention. > > !! The 2.2.0 release will be the last one supporting Spark 1.x and Java 7 > !! > > Starting

Re: Callbacks/other functions run after a PDone/output transform

2017-12-04 Thread Robert Bradshaw
+1 At the very least an empty PCollection could be produced with no promises about its contents but the ability to be followed (e.g. as a side input), which is forward compatible with whatever actual metadata one may decide to produce in the future. On Mon, Dec 4, 2017 at 11:06 AM, Kenneth

Re: Spark Runner Issues with YARN

2017-12-04 Thread Lukasz Cwik
It seems like your trying to use Spark 2.1.0. Apache Beam currently relies on users using Spark 1.6.3. There is an open pull request[1] to migrate to Spark 2.2.0. 1: https://github.com/apache/beam/pull/4208/ On Mon, Dec 4, 2017 at 10:58 AM, Opitz, Daniel A wrote: > We

Re: Callbacks/other functions run after a PDone/output transform

2017-12-04 Thread Kenneth Knowles
+dev@ I am in complete agreement with Luke. Data dependencies are easy to understand and a good way for an IO to communicate and establish causal dependencies. Converting an IO from PDone to real output may spur further useful thoughts based on the design decisions about what sort of output is

Spark Runner Issues with YARN

2017-12-04 Thread Opitz, Daniel A
We are trying to submit a Spark job through YARN with the following command: spark-submit --conf spark.yarn.stagingDir=/path/to/stage --verbose --class com.my.class --jars /path/to/jar1,path/to/jar2 /path/to/main/jar/application.jar The application is being populated in the YARN scheduler

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Jean-Baptiste Onofré
Just an important note that we forgot to mention. !! The 2.2.0 release will be the last one supporting Spark 1.x and Java 7 !! Starting from Beam 2.3.0, the Spark runner will work only with Spark 2.x and we will focus only Java 8. Regards JB On 12/04/2017 10:15 AM, Jean-Baptiste Onofré

Re: Apache Beam, version 2.2.0

2017-12-04 Thread Jean-Baptiste Onofré
Thanks Reuven ! I would like to emphasize on some highlights in 2.2.0 release: - New IOs have been introduced: * TikaIO leveraging Apache Tika, allowing the deal with a lot of different data formats * RedisIO to read and write key/value pairs from a Redis server. This IO will be soon