Re: Reducing database connection with JdbcIO

2018-03-14 Thread Aleksandr
I mean that in case of many threads beam will create many connections( per
thread, per query). Lets say i have 10 different tables. So for each table
insert beam will create own connection ++ for each thread for that insert.

Lets say I have some uuid generation and BQ insert. In case of problems
with BQ service the exception will be thrown, but my job will be restored
from latest checkpoint. So I will not generate uuid for same message twice.
In case of jdbcio it is possible to get uuid for same mesage twice( in case
of multiple io it might be a problem).

Aleksandr.

14. märts 2018 10:37 PM kirjutas kuupäeval "Eugene Kirpichov" <
kirpic...@google.com>:

Aleksandr - it seems that you're assuming that every prepared statement
uses a connection. This is not the case: we open a connection, and use that
connection to create prepared statements. For any given thread, there's at
most 1 connection open at the same time, and the connection has at most 1
prepared statement open.

Create thread -> (open connection -> (open prepared statement ->
executeBatch* -> close prepared statement)* -> close connection)*

I'm not sure what you mean by checkpoints, can you elaborate?

On Wed, Mar 14, 2018 at 1:20 PM Aleksandr <aleksandr...@gmail.com> wrote:

> So lets say I have 10 prepared statements, and hundreds threads, for
> example 300. Dataflow will create 3000 connections to sql and in case of
> autoscaling another node will create again 3000 connections?
>
> Another problem here, that jdbcio dont use any checkpoints (and bq for
> example is doing that). So every connection exception will be thrown upper.
>
>
> 14. märts 2018 10:09 PM kirjutas kuupäeval "Eugene Kirpichov" <
> kirpic...@google.com>:
>
> In a streaming job it'll be roughly once per thread per worker, and
> Dataflow Streaming runner may create hundreds of threads per worker because
> it assumes that they are not heavyweight and that low latency is the
> primary goal rather than high throughput (as in batch runner).
>
> A hacky way to limit this parallelism would be to emulate the
> "repartition", by inserting a chain of transforms: pair with a random key
> in [0,n), group by key, ungroup - procesing of the result until the next
> GBK will not be parallelized more than n-wise in practice in the Dataflow
> streaming runner, so in the particular case of JdbcIO.write() with its
> current implementation it should help. It may break in the future, e.g. if
> JdbcIO.write() ever changes to include a GBK before writing. Unfortunately
> I can't recommend a long-term reliable solution for the moment.
>
> On Wed, Mar 14, 2018 at 12:57 PM Aleksandr <aleksandr...@gmail.com> wrote:
>
>> Hello,
>> How many times will the setup per node be called? Is it possible to limit
>> pardo intances in google dataflow?
>>
>> Aleksandr.
>>
>>
>>
>> 14. märts 2018 9:22 PM kirjutas kuupäeval "Eugene Kirpichov" <
>> kirpic...@google.com>:
>>
>> "Jdbcio will create for each prepared statement new connection" - this is
>> not the case: the connection is created in @Setup and deleted in @Teardown.
>> https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/
>> jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L503
>> https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/
>> jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L631
>>
>> Something else must be going wrong.
>>
>> On Wed, Mar 14, 2018 at 12:11 PM Aleksandr <aleksandr...@gmail.com>
>> wrote:
>>
>>> Hello, we had similar problem. Current jdbcio will cause alot of
>>> connection errors.
>>>
>>> Typically you have more than one prepared statement. Jdbcio will create
>>> for each prepared statement new connection(and close only in teardown) So
>>> it is possible that connection will get timeot or in case in case of auto
>>> scaling you will get to many connections to sql.
>>> Our solution was to create connection pool in setup and get connection
>>> and return back to pool in processElement.
>>>
>>> Best Regards,
>>> Aleksandr Gortujev.
>>>
>>> 14. märts 2018 8:52 PM kirjutas kuupäeval "Jean-Baptiste Onofré" <
>>> j...@nanthrax.net>:
>>>
>>> Agree especially using the current JdbcIO impl that creates connection
>>> in the @Setup. Or it means that @Teardown is never called ?
>>>
>>> Regards
>>> JB
>>> Le 14 mars 2018, à 11:40, Eugene Kirpichov <kirpic...@google.com> a
>>> écrit:
>>>>
>>>> Hi Derek - could you explain where does the "3000

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Aleksandr
So lets say I have 10 prepared statements, and hundreds threads, for
example 300. Dataflow will create 3000 connections to sql and in case of
autoscaling another node will create again 3000 connections?

Another problem here, that jdbcio dont use any checkpoints (and bq for
example is doing that). So every connection exception will be thrown upper.

14. märts 2018 10:09 PM kirjutas kuupäeval "Eugene Kirpichov" <
kirpic...@google.com>:

In a streaming job it'll be roughly once per thread per worker, and
Dataflow Streaming runner may create hundreds of threads per worker because
it assumes that they are not heavyweight and that low latency is the
primary goal rather than high throughput (as in batch runner).

A hacky way to limit this parallelism would be to emulate the
"repartition", by inserting a chain of transforms: pair with a random key
in [0,n), group by key, ungroup - procesing of the result until the next
GBK will not be parallelized more than n-wise in practice in the Dataflow
streaming runner, so in the particular case of JdbcIO.write() with its
current implementation it should help. It may break in the future, e.g. if
JdbcIO.write() ever changes to include a GBK before writing. Unfortunately
I can't recommend a long-term reliable solution for the moment.

On Wed, Mar 14, 2018 at 12:57 PM Aleksandr <aleksandr...@gmail.com> wrote:

> Hello,
> How many times will the setup per node be called? Is it possible to limit
> pardo intances in google dataflow?
>
> Aleksandr.
>
>
>
> 14. märts 2018 9:22 PM kirjutas kuupäeval "Eugene Kirpichov" <
> kirpic...@google.com>:
>
> "Jdbcio will create for each prepared statement new connection" - this is
> not the case: the connection is created in @Setup and deleted in @Teardown.
> https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/
> jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L503
> https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/
> jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L631
>
> Something else must be going wrong.
>
> On Wed, Mar 14, 2018 at 12:11 PM Aleksandr <aleksandr...@gmail.com> wrote:
>
>> Hello, we had similar problem. Current jdbcio will cause alot of
>> connection errors.
>>
>> Typically you have more than one prepared statement. Jdbcio will create
>> for each prepared statement new connection(and close only in teardown) So
>> it is possible that connection will get timeot or in case in case of auto
>> scaling you will get to many connections to sql.
>> Our solution was to create connection pool in setup and get connection
>> and return back to pool in processElement.
>>
>> Best Regards,
>> Aleksandr Gortujev.
>>
>> 14. märts 2018 8:52 PM kirjutas kuupäeval "Jean-Baptiste Onofré" <
>> j...@nanthrax.net>:
>>
>> Agree especially using the current JdbcIO impl that creates connection in
>> the @Setup. Or it means that @Teardown is never called ?
>>
>> Regards
>> JB
>> Le 14 mars 2018, à 11:40, Eugene Kirpichov <kirpic...@google.com> a
>> écrit:
>>>
>>> Hi Derek - could you explain where does the "3000 connections" number
>>> come from, i.e. how did you measure it? It's weird that 5-6 workers would
>>> use 3000 connections.
>>>
>>> On Wed, Mar 14, 2018 at 3:50 AM Derek Chan <derek...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are new to Beam and need some help.
>>>>
>>>> We are working on a flow to ingest events and writes the aggregated
>>>> counts to a database. The input rate is rather low (~2000 message per
>>>> sec), but the processing is relatively heavy, that we need to scale out
>>>> to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is also
>>>> low. But because of the number of workers, it keeps 3000 connections to
>>>> the database and it keeps hitting the database connection limits.
>>>>
>>>> Is there a way that we can reduce the concurrency only at the output
>>>> stage? (In Spark we would have done a repartition/coalesce).
>>>>
>>>> And, if it matters, we are using Apache Beam 2.2 via Scio, on Google
>>>> Dataflow.
>>>>
>>>> Thank you in advance!
>>>>
>>>>
>>>>
>>>>
>>
>


Re: Reducing database connection with JdbcIO

2018-03-14 Thread Aleksandr
Hello,
How many times will the setup per node be called? Is it possible to limit
pardo intances in google dataflow?

Aleksandr.



14. märts 2018 9:22 PM kirjutas kuupäeval "Eugene Kirpichov" <
kirpic...@google.com>:

"Jdbcio will create for each prepared statement new connection" - this is
not the case: the connection is created in @Setup and deleted in @Teardown.
https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/
jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L503
https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/
jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L631

Something else must be going wrong.

On Wed, Mar 14, 2018 at 12:11 PM Aleksandr <aleksandr...@gmail.com> wrote:

> Hello, we had similar problem. Current jdbcio will cause alot of
> connection errors.
>
> Typically you have more than one prepared statement. Jdbcio will create
> for each prepared statement new connection(and close only in teardown) So
> it is possible that connection will get timeot or in case in case of auto
> scaling you will get to many connections to sql.
> Our solution was to create connection pool in setup and get connection and
> return back to pool in processElement.
>
> Best Regards,
> Aleksandr Gortujev.
>
> 14. märts 2018 8:52 PM kirjutas kuupäeval "Jean-Baptiste Onofré" <
> j...@nanthrax.net>:
>
> Agree especially using the current JdbcIO impl that creates connection in
> the @Setup. Or it means that @Teardown is never called ?
>
> Regards
> JB
> Le 14 mars 2018, à 11:40, Eugene Kirpichov <kirpic...@google.com> a écrit:
>>
>> Hi Derek - could you explain where does the "3000 connections" number
>> come from, i.e. how did you measure it? It's weird that 5-6 workers would
>> use 3000 connections.
>>
>> On Wed, Mar 14, 2018 at 3:50 AM Derek Chan <derek...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We are new to Beam and need some help.
>>>
>>> We are working on a flow to ingest events and writes the aggregated
>>> counts to a database. The input rate is rather low (~2000 message per
>>> sec), but the processing is relatively heavy, that we need to scale out
>>> to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is also
>>> low. But because of the number of workers, it keeps 3000 connections to
>>> the database and it keeps hitting the database connection limits.
>>>
>>> Is there a way that we can reduce the concurrency only at the output
>>> stage? (In Spark we would have done a repartition/coalesce).
>>>
>>> And, if it matters, we are using Apache Beam 2.2 via Scio, on Google
>>> Dataflow.
>>>
>>> Thank you in advance!
>>>
>>>
>>>
>>>
>


Re: Reducing database connection with JdbcIO

2018-03-14 Thread Aleksandr
Hello, we had similar problem. Current jdbcio will cause alot of connection
errors.

Typically you have more than one prepared statement. Jdbcio will create for
each prepared statement new connection(and close only in teardown) So it is
possible that connection will get timeot or in case in case of auto scaling
you will get to many connections to sql.
Our solution was to create connection pool in setup and get connection and
return back to pool in processElement.

Best Regards,
Aleksandr Gortujev.

14. märts 2018 8:52 PM kirjutas kuupäeval "Jean-Baptiste Onofré" <
j...@nanthrax.net>:

Agree especially using the current JdbcIO impl that creates connection in
the @Setup. Or it means that @Teardown is never called ?

Regards
JB
Le 14 mars 2018, à 11:40, Eugene Kirpichov <kirpic...@google.com> a écrit:
>
> Hi Derek - could you explain where does the "3000 connections" number come
> from, i.e. how did you measure it? It's weird that 5-6 workers would use
> 3000 connections.
>
> On Wed, Mar 14, 2018 at 3:50 AM Derek Chan <derek...@gmail.com> wrote:
>
>> Hi,
>>
>> We are new to Beam and need some help.
>>
>> We are working on a flow to ingest events and writes the aggregated
>> counts to a database. The input rate is rather low (~2000 message per
>> sec), but the processing is relatively heavy, that we need to scale out
>> to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is also
>> low. But because of the number of workers, it keeps 3000 connections to
>> the database and it keeps hitting the database connection limits.
>>
>> Is there a way that we can reduce the concurrency only at the output
>> stage? (In Spark we would have done a repartition/coalesce).
>>
>> And, if it matters, we are using Apache Beam 2.2 via Scio, on Google
>> Dataflow.
>>
>> Thank you in advance!
>>
>>
>>
>>


Re: Reducing database connection with JdbcIO

2018-03-14 Thread Aleksandr
Hello,
We did own jdbcio with thread pool per jwm (using lazy initialization in
@Setup). In processElement we are getting/freeing connection.

Best Regards,
Aleksandr Gortujev.

14. märts 2018 12:49 PM kirjutas kuupäeval "Derek Chan" <derek...@gmail.com
>:

Hi,

We are new to Beam and need some help.

We are working on a flow to ingest events and writes the aggregated counts
to a database. The input rate is rather low (~2000 message per sec), but
the processing is relatively heavy, that we need to scale out to 5~6 nodes.
The output (via JDBC) is aggregated, so the volume is also low. But because
of the number of workers, it keeps 3000 connections to the database and it
keeps hitting the database connection limits.

Is there a way that we can reduce the concurrency only at the output stage?
(In Spark we would have done a repartition/coalesce).

And, if it matters, we are using Apache Beam 2.2 via Scio, on Google
Dataflow.

Thank you in advance!


Re: IllegalStateException when combining 3 streams?

2017-11-03 Thread Aleksandr
Hello,
You can try put PCollection after flatten into same global window with
triggers as it was before flattening.

Best regards
Aleksandr Gortujev


3. nov 2017 11:04 AM kirjutas kuupäeval "Artur Mrozowski" <art...@gmail.com
>:

Hi,
I am on second week of our PoC with Beam and I am really amazed by the
capabilities of the framework and how well engineered it is.

Amazed does not mean experienced so please bear with me.

What  we try to achieve is to join several streams using windowing and
triggers. And that is where I fear we hit the limitations  for what can be
done.

In case A we run in global windows and we are able to combine two unbounded
PCollections but when I try to combine the results with third collection I
get the exception below. I tried many diffrent trigger combinations, but
can't make it work.

Exception in thread "main" java.lang.IllegalStateException: Inputs to
Flatten had incompatible triggers: Repeatedly.forever(
AfterSynchronizedProcessingTime.pastFirstElementInPane()),
AfterEach.inOrder(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(10
seconds), AfterProcessingTime.pastFirstElementInPane().plusDelayOf(10
seconds))

In case B I use fixed windows. Again, I can successfully  join two
collections and print output in the console. When I add the third it runs
without errors, but I am not able to materialize results in the console.
Although I am able to print results of merge using Flatten so the error
above is not longer an issue.

Has anyone experience with joining three or more unbounded PCollections?
What would be successful windowing, triggering strategy for global or fixed
window respectively?

Below code snippets from fixed windows case. Windows are defined in the
same manner for all three collections, customer, claim and policy. The Join
class I use comes from https://github.com/apache/beam/blob/master/sdks/java/
extensions/join-library/src/main/java/org/apache/beam/sdk/
extensions/joinlibrary/Join.java


Would be really greateful if any of you would like to share your knowledge.

Best Regard
Artur

PCollection claimInput = pipeline

.apply((KafkaIO.<String, String>
read().withTopics(ImmutableList.of(claimTopic))

.updateConsumerProperties(consumerProps).withBootstrapServers(bootstrapServers)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class))
.withoutMetadata())
.apply(Values. create())
.apply("ParseJsonEventFn2", ParDo.of(new ParseClaim()))
.apply(Window.
into(FixedWindows.of(Duration.standardSeconds(100)))

.triggering(AfterWatermark.pastEndOfWindow()).accumulatingFiredPanes()
.withAllowedLateness(Duration.standardSeconds(1)));

 /**JOIN**/
 PCollection<KV<Integer,KV<String,String>>>
joinedCustomersAndPolicies=
Join.innerJoin(all_customers,all_policies);
PCollectionList<KV<Integer,String>> collections =
PCollectionList.of(all_customers).and(all_policies).and(all_claims);
 PCollection<KV<Integer,KV<KV<String,String>,String>>>
joinedCustomersPoliciesAndClaims =
Join.innerJoin(joinedCustomersAndPolicies,all_claims);
 //PCollectionList<KV<Integer,String>> collections =
PCollectionList.of(all_customers).and(all_policies);

PCollection<KV<Integer,String>> merged= collections.apply(Flatten.<KV<
Integer,String>>pCollections());


Re: PipelineTest with TestStreams: unable to serialize

2017-11-03 Thread Aleksandr
Hello,
Probably error is in your tuple tag classes, which are anonymous classes.
It means that your test is trying to serialise testpipeline.

Best regards
Aleksandr Gortujev



3. nov 2017 3:33 PM kirjutas kuupäeval "Matthias Baetens" <
matthias.baet...@datatonic.com>:

Hi all,

I'm currently trying to write a TestStream to validate the windowing logic
in a Beam pipeline.

I'm creating a teststream of Strings and applying the different PTransforms
to the stream, ending with a PAssert on some of the events I created

TestStream events = TestStream.create(AvroCoder.of(String.class))
.addElements("", "")
.advanceWatermarkToInfinity();

PCollection<KV<String, ArrayList>> eventsSessionised = p.apply(events)

.apply(new Processing(new 
TupleTag() {
}, new TupleTag() {
}, new TupleTag() {
}, eventsEnrichedKeyedTag, "", "", 
"")).get(eventsEnrichedKeyedTag)
.apply(new 
Sessionisation(SESSION_GAP_SIZE_HOURS, SESSION_CUT_OFF,
ALLOWED_LATENESS_MINUTES))
.apply(new Aggregation(uniqueEventsTag, new 
TupleTag() {
})).get(uniqueEventsTag).apply(ParDo.of(new 
EventToKV()));


PAssert.that(eventsSessionised).inOnTimePane(new
IntervalWindow(baseTime, endWindow1)).containsInAnyOrder(e1,
e2);

Running the test function with in a main functions (new
IngestionPipeLineTest().testOnTimeEvents();) causes the following error:

Exception in thread "main" java.lang.IllegalArgumentException: unable
to serialize

pointing at a custom DoFn which runs fine running the main pipeline.


Not sure why this error gets thrown all of a sudden, any pointers /
help would be greatly appreciated.

Full stacktrace:

Exception in thread "main" java.lang.IllegalArgumentException: unable
to serialize xxx
at 
org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:53)
at 
org.apache.beam.sdk.util.SerializableUtils.clone(SerializableUtils.java:90)
at 
org.apache.beam.sdk.transforms.ParDo$SingleOutput.(ParDo.java:591)
at org.apache.beam.sdk.transforms.ParDo.of(ParDo.java:435)
at xxx.transforms.Processing.expand(Processing.java:52)
at xxx.transforms.Processing.expand(Processing.java:1)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:514)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:454)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:284)
at 
xxx.IngestionPipeLineTest.testOnTimeEvents(IngestionPipeLineTest.java:96)
at xxx.IngestionPipeLineTest.main(IngestionPipeLineTest.java:155)
Caused by: java.io.NotSerializableException:
org.apache.beam.sdk.testing.TestPipeline
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at 
org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:49)
... 10 more


Best,

Matthias


Re: Infinite retry in streaming - is there a workaround?

2017-10-25 Thread Aleksandr
Hello Derek,
There no general solution for failing bundle. Some kind of dataflow errors
you can fix using dataflow update feature. Another solution is to catch
exceptions in ParDo function.

25. okt 2017 9:42 PM kirjutas kuupäeval "Griselda Cuevas" :

Hi Derek, yes you can use that mailing list and also the SO channel.

Cheers,
G


> BTW, do you know if there's a Dataflow mailing list for questions like
> this? Would dataflow-feedback be the appropriate mailing list?
>
> Thanks,
>
> Derek
>
> On Wed, Oct 25, 2017 at 10:58 AM, Griselda Cuevas  wrote:
>
>> Hi Derek - It sounds like this is a Dataflow specific questions so I'd
>> recommend you also reach out through the Dataflow's Stack Overflow
>> 
>> channel. I'm also cc'ing Thomas Groh who might be able to help.
>>
>>
>>
>> On 20 October 2017 at 11:35, Derek Hao Hu  wrote:
>>
>>> ​Kindly ping as I'm really curious about this. :p
>>>
>>> Derek​
>>>
>>> On Thu, Oct 19, 2017 at 2:15 PM, Derek Hao Hu 
>>> wrote:
>>>
 Hi,

 ​We are trying to use Dataflow in Prod and right now one of our main
 concerns is this "infinite retry" behavior which might stall the whole
 pipeline.

 Right now for all the DoFns we've implemented ourselves we've added
 some error handling or exception swallowing mechanism to make sure some
 bundles can just fail and we log the exceptions. But we are a bit concerned
 about the other Beam native transforms which we can not easily wrap, e.g.
 PubSubIO transforms and DatastoreV1 transforms.

 A few days ago I asked a specific question in this group about how one
 can catch exception in DatastoreV1 transforms and the recommended approach
 is to 1) either duplicate the code in the current DatastoreV1
 implementation and swallow the exception instead of throwing or 2) Follow
 the implementation of BigQueryIO to add the ability to support custom retry
 policy. Both are feasible options but I'm a bit concerned in that doesn't
 that mean eventually all Beam native transforms need to implement something
 like 2) if we want to use them in Prod?

 So in short, I want to know right now what is the recommended approach
 or workaround to say, hey, just let this bundle fail and we can process the
 rest of the elements instead of just stall the pipeline?

 Thanks!
 --
 Derek Hao Hu

 Software Engineer | Snapchat
 Snap Inc.

>>>
>>>
>>>
>>> --
>>> Derek Hao Hu
>>>
>>> Software Engineer | Snapchat
>>> Snap Inc.
>>>
>>
>>
>
>
> --
> Derek Hao Hu
>
> Software Engineer | Snapchat
> Snap Inc.
>


Re: [VOTE] [DISCUSSION] Remove support for Java 7

2017-10-17 Thread Aleksandr
+1

17. okt 2017 7:17 PM kirjutas kuupäeval "Ismaël Mejía" :

We have discussed recently in the developer mailing list about the
idea of removing support for Java 7 on Beam. There are multiple
reasons for this:

- Java 7 has not received public updates for almost two years and most
companies are moving / have already moved to Java 8.
- A good amount of the systems Beam users rely on have decided to drop
Java 7 support, e.g. Spark, Flink, Elasticsearch, even Hadoop plans to
do it on version 3.
- Most Big data distributions and Cloud managed Spark/Hadoop services
have already moved to Java 8.
- Recent versions of core libraries Beam uses are moving to be Java 8
only (or mostly), e.g. Guava, Google Auto, etc.
- Java 8 has some nice features that can make Beam code nicer e.g.
lambdas, streams.

Considering that Beam is a ‘recent’ project we expect users to be
already using Java 8. However we wanted first to ask the opinion of
the Beam users on this subject. It could be the case that some of the
users are still dealing with some old cluster running on Java 7 or
have another argument to keep the Java 7 compatibility.

So, please vote:
+1 Yes, go ahead and move Beam support to Java 8.
 0 Do whatever you want. I don’t have a preference.
-1 Please keep Java 7 compatibility (if possible add your argument to
keep supporting for Java 7).