[jira] [Created] (FLINK-8711) Flink with YARN use wrong SlotsPerTaskManager

2018-02-19 Thread Aleksandr (JIRA)
Aleksandr created FLINK-8711:


 Summary: Flink with YARN use wrong SlotsPerTaskManager
 Key: FLINK-8711
 URL: https://issues.apache.org/jira/browse/FLINK-8711
 Project: Flink
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.2
Reporter: Aleksandr


I see wrong behavior for Flink in YARN.

I tried to setup SlotsPerTaskManager using "-ys 2 ", but it used only 1 slot.

I found the code 
[https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L387]

For example, when I have :"-yn 7 -ys 2  -p 2" in log I see:

@james.tan @yinhua I see strange behavior:"-yn 7 -ys 2 -p 2" in log I see
"The YARN cluster has 14 slots available, but the user requested a parallelism 
of 2 on YARN. Each of the 7 TaskManagers will get 1 slots."

Why can't we use -ys with -p?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8710) AbstractYarnClusterDescriptor doesn't use pre-defined configs in Hadoop's YarnConfiguration

2018-02-19 Thread Bowen Li (JIRA)
Bowen Li created FLINK-8710:
---

 Summary: AbstractYarnClusterDescriptor doesn't use pre-defined 
configs in Hadoop's YarnConfiguration
 Key: FLINK-8710
 URL: https://issues.apache.org/jira/browse/FLINK-8710
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.5.0
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.5.0


{{AbstractYarnClusterDescriptor}} should use Hadoop's 
{{YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB}} rather than raw string 
"yarn.scheduler.minimum-allocation-mb"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8709) Flaky test: SlotPoolRpcTest.testCancelSlotAllocationWithoutResourceManager

2018-02-19 Thread Bowen Li (JIRA)
Bowen Li created FLINK-8709:
---

 Summary: Flaky test: 
SlotPoolRpcTest.testCancelSlotAllocationWithoutResourceManager
 Key: FLINK-8709
 URL: https://issues.apache.org/jira/browse/FLINK-8709
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.5.0
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.5.0



https://travis-ci.org/apache/flink/jobs/343258724

{code:java}
Running org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest
Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.77 sec <<< 
FAILURE! - in org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest
testCancelSlotAllocationWithoutResourceManager(org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest)
  Time elapsed: 0.622 sec  <<< FAILURE!
java.lang.AssertionError: expected:<0> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest.testCancelSlotAllocationWithoutResourceManager(SlotPoolRpcTest.java:171)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8708) Unintended integer division in StandaloneThreadedGenerator

2018-02-19 Thread Ted Yu (JIRA)
Ted Yu created FLINK-8708:
-

 Summary: Unintended integer division in StandaloneThreadedGenerator
 Key: FLINK-8708
 URL: https://issues.apache.org/jira/browse/FLINK-8708
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu


In 
flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/statemachine/generator/StandaloneThreadedGenerator.java
 :
{code}
double factor = (ts - lastTimeStamp) / 1000;
{code}
Proper casting should be done before the integer division



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8707) Excessive amount of files opened by flink task manager

2018-02-19 Thread Alexander Gardner (JIRA)
Alexander Gardner created FLINK-8707:


 Summary: Excessive amount of files opened by flink task manager
 Key: FLINK-8707
 URL: https://issues.apache.org/jira/browse/FLINK-8707
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 1.3.2
 Environment: NAME="Red Hat Enterprise Linux Server"
VERSION="7.3 (Maipo)"



Two boxes, each with a Job Manager & Task Manager, using Zookeeper for HA.

flink.yaml below with some settings (removed exact box names) etc:

env.log.dir: ...some dir...residing on the same box
env.pid.dir: some dir...residing on the same box
metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter
metrics.reporters: jmx
state.backend: filesystem
state.backend.fs.checkpointdir: file:///some_nfs_mount
state.checkpoints.dir: file:///some_nfs_mount
state.checkpoints.num-retained: 3
high-availability.cluster-id: /tst
high-availability.storageDir: file:///some_nfs_mount/ha
high-availability: zookeeper
high-availability.zookeeper.path.root: /flink
high-availability.zookeeper.quorum: ...list of zookeeper boxes
env.java.opts.jobmanager: ...some extra jar args
jobmanager.archive.fs.dir: some dir...residing on the same box
jobmanager.web.submit.enable: true
jobmanager.web.tmpdir:  some dir...residing on the same box
env.java.opts.taskmanager: some extra jar args
taskmanager.tmp.dirs:  some dir...residing on the same box/var/tmp
taskmanager.network.memory.min: 1024MB
taskmanager.network.memory.max: 2048MB
blob.storage.directory:  some dir...residing on the same box
Reporter: Alexander Gardner


** NOTE ** - THE COMPONENT IS TASK MANAGER NOT JOB MANAGER 

The job manager has less FDs than the task manager.

 

Hi

A support alert indicated that there were a lot of open files for the boxes 
running Flink.

There were 4 flink jobs that were dormant but had consumed a number of msgs 
from Kafka using the FlinkKafkaConsumer010.

A simple general lsof:

$ lsof | wc -l       ->  returned 153114 open file descriptors.

Focusing on the TaskManager process (process ID = 12154):

$ lsof | grep 12154 | wc -l-    > returned 129322 open FDs

$ lsof -p 12154 | wc -l   -> returned 531 FDs

There were 228 threads running for the task manager.

 

Drilling down a bit further, looking at a_inode and FIFO entries: 

$ lsof -p 12154 | grep a_inode | wc -l = 100 FDs

$ lsof -p 12154 | grep FIFO | wc -l  = 200 FDs

$ /proc/12154/maps = 920 entries.

Apart from lsof identifying lots of JARs and SOs being referenced there were 
also 244 child processes for the task manager process.

Noticed that in each environment, a creep of file descriptors...are the above 
figures deemed excessive for the no of FDs in use? I know Flink uses Netty - is 
it using a separate Selector for reads & writes? 

Additionally Flink uses memory mapped files? or direct bytebuffers are these 
skewing the numbers of FDs shown?

Example of one child process ID 6633:

java 12154 6633 dfdev 387u a_inode 0,9 0 5869 [eventpoll]
 java 12154 6633 dfdev 388r FIFO 0,8 0t0 459758080 pipe
 java 12154 6633 dfdev 389w FIFO 0,8 0t0 459758080 pipe

Lasty, cannot identify yet the reason for the creep in FDs even if Flink is 
pretty dormant or has dormant jobs. Production nodes are not experiencing 
excessive amounts of throughput yet either.

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Flink 1.3.2 -- mvn clean package -Pbuild-jar command takes to much time

2018-02-19 Thread Melekh, Gregory
Thanks for quick reply.
If I put all this code in comment the project will compile in 7 seconds.

build.log file attached to mail contains debug of  “mvn clean package 
-Pbuild-jar” command. 





On 19/02/2018, 12:29, "Stephan Ewen"  wrote:

Hi!

It looks like something is not correct with your project setup. The build
puts countless dependencies of core flink into your application jar (like
akka, scala, netty).

The important thing is to set the core Flink dependencies to "provided"
(they will not be in the Jar file), but keep the connectors and libraries
in the proper scope.

If you use one of the newer Flink quickstart projects, this should
automatically happen.

Best,
Stephan


On Sun, Feb 18, 2018 at 3:38 PM, Melekh, Gregory 
wrote:

> Hi all.
> I have streaming job that reads from Kafka 0.10 manipulates data and write
> to Cassandra (Tuple18)
> Also this job has window and CustomReducer class involved to reduce data.
> If groupedBy_windowed_stream DataStream defined with 9 fields (Tuple9)
> compilation takes 5 seconds.
> In current (Tuple11) setup it takes more then 16 minutes.
>
> The artifact is Ok, it runs and does the job…
> The problem is only to long compilation time.
>
> Has someone been met with this issue?
>
>
> DataStream Integer, String,Integer, Integer,Integer>> forAggregations
> = streamParsedWithTimestamps  // This stream is Tuple18
> .map(x -> new Tuple11<>(x.f6, x.f7, x.f8, x.f0, x.f1, x.f2, x.f3,
> x.f11,x.f16,x.f17,new Integer(1)));
>
>
> WindowedStream Integer, String, Integer,Integer,Integer>, Tuple, TimeWindow>
> windowed_stream = forAggregations
> .keyBy(7)
> .window(TumblingProcessingTimeWindows.of(Time.hours(1)));
>
>
>
> DataStream Integer, String, Integer,Integer,Integer>> groupedBy_windowed_stream =
> windowed_stream
>  .reduce(new CustomReducer());
>
> …….
>
>
> private static class CustomReducer
> implements ReduceFunction Integer, Integer, Integer, String, Integer,Integer,Integer>> {
> @Override
> public  Tuple11 Integer, String, Integer,Integer,Integer> reduce(
> Tuple11 Integer, String, Integer,Integer,Integer> v1,
> Tuple11 Integer, String, Integer,Integer,Integer> v2) throws Exception {
>
> v1.f8 += v2.f8;
> v1.f9 += v2.f9;
> v1.f10 += v2.f10;
>
> return new Tuple11<>(v1.f0,v1.f1,v1.f2,
> v1.f3,v1.f4,v1.f5,v1.f6,v1.f7,v1.f8,v1.f9,v1.f10);
> }
>
> }
>
>
>
>  mvn clean package -Pbuild-jar
>
> [INFO] Scanning for projects...
>
> [INFO]
>
> [INFO] 
> 
>
> [INFO] Building analytics 1.0-SNAPSHOT
>
> [INFO] 
> 
>
> [INFO]
>
> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ analytics ---
>
> [INFO] Deleting /Users/gm5806/activity-monitoring/analytics/target
>
> [INFO]
>
> [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @
> analytics ---
>
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>
> [INFO] Copying 1 resource
>
> [INFO]
>
> [INFO] --- maven-compiler-plugin:3.7.0:compile (default-compile) @
> analytics ---
>
> [INFO] Changes detected - recompiling the module!
>
> [INFO] Compiling 12 source files to /Users/gm5806/activity-
> monitoring/analytics/target/classes
>
> [INFO]
>
> [INFO] --- maven-resources-plugin:2.6:testResources
> (default-testResources) @ analytics ---
>
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>
> [INFO] skip non existing resourceDirectory /Users/gm5806/activity-
> monitoring/analytics/src/test/resources
>
> [INFO]
>
> [INFO] --- maven-compiler-plugin:3.7.0:testCompile (default-testCompile)
> @ analytics ---
>
> [INFO] No sources to compile
>
> [INFO]
>
> [INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ analytics
> ---
>
> [INFO] No tests to run.
>
> [INFO]
>
> [INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ analytics ---
>
> 

[jira] [Created] (FLINK-8706) Supported Data Types - Six or Seven

2018-02-19 Thread Matt Hagen (JIRA)
Matt Hagen created FLINK-8706:
-

 Summary: Supported Data Types - Six or Seven
 Key: FLINK-8706
 URL: https://issues.apache.org/jira/browse/FLINK-8706
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.4.1
Reporter: Matt Hagen
 Attachments: supported-data-types.png

See [Supported Data 
Types|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/api_concepts.html#supported-data-types]
 in the online documentation. The text specifies that there are "six different 
categories of data types," but the list below contains seven items. I suggest 
that you omit a specific number in the sentence preceding the list. Here is a 
suggestion: "DataSet and DataStream support the following categories of data 
elements." See the attached screenshot. Note: Please let me know if these types 
of issues are too trivial to report. Thx.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8705) Integrate Remote(Stream)Environment with Flip-6 cluster

2018-02-19 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-8705:


 Summary: Integrate Remote(Stream)Environment with Flip-6 cluster
 Key: FLINK-8705
 URL: https://issues.apache.org/jira/browse/FLINK-8705
 Project: Flink
  Issue Type: New Feature
  Components: Client
Affects Versions: 1.5.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.5.0


Allow the {{Remote(Stream)Environment}} to submit jobs to a Flip-6 cluster. 
This entails that we create the correct {{ClusterClient}} to communicate with 
the respective cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8704) Migrate tests from TestingCluster to MiniClusterResource

2018-02-19 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-8704:
---

 Summary: Migrate tests from TestingCluster to MiniClusterResource
 Key: FLINK-8704
 URL: https://issues.apache.org/jira/browse/FLINK-8704
 Project: Flink
  Issue Type: Sub-task
Reporter: Aljoscha Krettek






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8703) Migrate tests from LocalFlinkMiniCluster to MiniClusterResource

2018-02-19 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-8703:
---

 Summary: Migrate tests from LocalFlinkMiniCluster to 
MiniClusterResource
 Key: FLINK-8703
 URL: https://issues.apache.org/jira/browse/FLINK-8703
 Project: Flink
  Issue Type: Sub-task
Reporter: Aljoscha Krettek






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8702) Migrate tests from FlinkMiniCluster to MiniClusterResource

2018-02-19 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-8702:
---

 Summary: Migrate tests from FlinkMiniCluster to MiniClusterResource
 Key: FLINK-8702
 URL: https://issues.apache.org/jira/browse/FLINK-8702
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Aljoscha Krettek
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8701) Migrate SavepointMigrationTestBase to MiniClusterResource

2018-02-19 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-8701:
---

 Summary: Migrate SavepointMigrationTestBase to MiniClusterResource
 Key: FLINK-8701
 URL: https://issues.apache.org/jira/browse/FLINK-8701
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Aljoscha Krettek
 Fix For: 1.5.0


This requires adding support for {{ClusterClient.getAccumulators()}} to the new 
{{RestClusterClient}} and requires migrating the custom cluster communication 
that the test does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Release flink-shaded 3.0, release candidate #2

2018-02-19 Thread Chesnay Schepler

+1

 * compiles without error
 * every jar is self-contained
 * all classes are properly relocated
 * jackson services are properly relocated
 * all added dependencies use the Apache License 2.0
 o com.fasterxml.jackson.module:jackson-module-jsonSchema
 + javax.validation:validation-api
 o com.fasterxml.jackson.dataformat:jackson-dataformat-yaml
 + org.yaml:snakeyaml
 * verified compatibility with Flink
 o force-shading can be replaced
 o REST docs are correctly generated
 o flink-sql-client tests ran successfully

On 19.02.2018 14:13, Chesnay Schepler wrote:

Hi everyone,
Please review and vote on the release candidate #2 for the version 3.0 
of flink-shaded, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)
The complete staging area is available for your review, which includes:
* GitHub release notes [1],
* the official Apache source release to be deployed to dist.apache.org 
[2], which are signed with the key with 
fingerprint19F2195E1B4816D765A2C324C2EED7B111D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag “release-3.0-rc2”[5].
* A complete list of all new commits in release-3.0-rc2, since 
release-2.0 [6]



The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.


Thanks,
Chesnay
[1] https://github.com/apache/flink-shaded/milestone/3?closed=1
[2] http://people.apache.org/~chesnay/flink-shaded-3.0-rc2/ 


[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] 
https://repository.apache.org/content/repositories/orgapacheflink-1147/
[5] 
https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=commit;h=fd0f8cc5b555bf17e0af7599c0976585da24cec3

[6]
fd0f8cc5b555bf17e0af7599c0976585da24cec3 (#38) Include missing javax 
validation-api dependency
7b4fe16f8a8217771b39495cd8f14af86041c8f5 (#37) Include 
jackson-dataformat-yaml and snakeyaml

1233f1bb0e2b9fafa4260603aa130b7eb9995a7a (#35) Bump netty to 4.0.56
eb064f6e24e98f3ae91bdfd91fc96a32c821620d (#31) Hard-code 
jackson-parent version
f7f3eca859b417a1dcb2aa4c53b1bd9076fce43e (#31) Add extended 
flink-shadaed-jackson-module-jsonSchema module
36dc4840b3624fd83a0d00295013a933297801b7 (#32) Add 
flink-shaded-force-shading
2b69aa648a7d5c689c1e3c2709d7e55cb765b0a2 (#28) Bump maven-shade-plugin 
version to 3.0.0

0c4e83f87d36060c0a5c6c139e00b9afec4f5d19 Increment version to 3.0








[jira] [Created] (FLINK-8700) Port tests from FlinkMiniCluster to MiniClusterResource

2018-02-19 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-8700:
---

 Summary: Port tests from FlinkMiniCluster to MiniClusterResource
 Key: FLINK-8700
 URL: https://issues.apache.org/jira/browse/FLINK-8700
 Project: Flink
  Issue Type: New Feature
  Components: Tests
Reporter: Aljoscha Krettek
 Fix For: 1.5.0


Quite some ITCases rely on {{FlinkMiniCluster}} and its subclasses 
({{LocalFlinkMiniCluster}} and {{TestingCluster}}). This means they use the 
legacy {{JobManager}} and {{TaskManager}} and not the new FLIP-6 components 
which are enabled by default in Flink 1.5.0.

{{AbstractTestBase}} uses the new {{MiniClusterResource}} which encapsulates 
creation of a FLIP-6 cluster or legacy cluster. We should use this in all 
ITCases, which probably means that we have to extend it a bit, for example to 
allow access to a {{ClusterClient}}.

Transitively, {{TestBaseUtils.startCluster()}} also uses the legacy 
{{LocalFlinkMiniCluster}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[VOTE] Release flink-shaded 3.0, release candidate #2

2018-02-19 Thread Chesnay Schepler

Hi everyone,
Please review and vote on the release candidate #2 for the version 3.0 
of flink-shaded, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)
The complete staging area is available for your review, which includes:
* GitHub release notes [1],
* the official Apache source release to be deployed to dist.apache.org 
[2], which are signed with the key with 
fingerprint19F2195E1B4816D765A2C324C2EED7B111D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag “release-3.0-rc2”[5].
* A complete list of all new commits in release-3.0-rc2, since 
release-2.0 [6]



The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.


Thanks,
Chesnay
[1] https://github.com/apache/flink-shaded/milestone/3?closed=1
[2] http://people.apache.org/~chesnay/flink-shaded-3.0-rc2/ 


[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1147/
[5] 
https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=commit;h=fd0f8cc5b555bf17e0af7599c0976585da24cec3

[6]
fd0f8cc5b555bf17e0af7599c0976585da24cec3 (#38) Include missing javax 
validation-api dependency
7b4fe16f8a8217771b39495cd8f14af86041c8f5 (#37) Include 
jackson-dataformat-yaml and snakeyaml

1233f1bb0e2b9fafa4260603aa130b7eb9995a7a (#35) Bump netty to 4.0.56
eb064f6e24e98f3ae91bdfd91fc96a32c821620d (#31) Hard-code jackson-parent 
version
f7f3eca859b417a1dcb2aa4c53b1bd9076fce43e (#31) Add extended 
flink-shadaed-jackson-module-jsonSchema module
36dc4840b3624fd83a0d00295013a933297801b7 (#32) Add 
flink-shaded-force-shading
2b69aa648a7d5c689c1e3c2709d7e55cb765b0a2 (#28) Bump maven-shade-plugin 
version to 3.0.0

0c4e83f87d36060c0a5c6c139e00b9afec4f5d19 Increment version to 3.0





[CANCEL] Release flink-shaded 3.0, release candidate #1

2018-02-19 Thread Chesnay Schepler
I'll cancel the vote to include jackson-dataformats-yaml in 
flink-shaded-jackson.


This comes at no overhead btw. since I found another issue:
flink-shaded-jackson-module-jsonSchema is not self-contained as it 
is missing javax.validation:validation-api


On 19.02.2018 13:39, Timo Walther wrote:

Hi Chesnay,

thanks for working on a new flink-shaded version. I just had an 
offline discussion with Stephan Ewen about shading Jackson also in 
flink-sql-client. The problem is that we use jackson-dataformat-yaml 
there which is incompatible with our shaded version. Would it be 
possible to include this as well in the next shaded version in order 
to have a cleaner 1.5 release?


Regards,
Timo

Am 2/19/18 um 1:29 PM schrieb Chesnay Schepler:

Previous mail contained some funky pipes, let's try this again...

Hi everyone,
Please review and vote on the release candidate #1 for the version 
3.0 of flink-shaded, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)
The complete staging area is available for your review, which includes:
* GitHub release notes [1],

* the official Apache source release to be deployed to 
dist.apache.org [2], which are signed with the key with 
fingerprint19F2195E1B4816D765A2C324C2EED7B111D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag “release-3.0-rc1”[5].
* A complete list of all new commits in release-3.0-rc1, since 
release-2.0 [6]



The vote will be open for at least 72 hours. It is adopted by 
majority approval, with at least 3 PMC affirmative votes.


Thanks,
Chesnay
[1] https://github.com/apache/flink-shaded/milestone/3?closed=1
[2] http://people.apache.org/~chesnay/flink-shaded-3.0-rc1/ 


[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] 
https://repository.apache.org/content/repositories/orgapacheflink-1146/
[5] 
https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=commit;h=1233f1bb0e2b9fafa4260603aa130b7eb9995a7a

[6]
1233f1bb0e2b9fafa4260603aa130b7eb9995a7a (#35) Bump netty to 4.0.56
eb064f6e24e98f3ae91bdfd91fc96a32c821620d (#31) Hard-code 
jackson-parent version
f7f3eca859b417a1dcb2aa4c53b1bd9076fce43e (#31) Add extended 
flink-shadaed-jackson-module-jsonSchema module
36dc4840b3624fd83a0d00295013a933297801b7 (#32) Add 
flink-shaded-force-shading
2b69aa648a7d5c689c1e3c2709d7e55cb765b0a2 (#28) Bump 
maven-shade-plugin version to 3.0.0

0c4e83f87d36060c0a5c6c139e00b9afec4f5d19 Increment version to 3.0








Re: [VOTE] Release flink-shaded 3.0, release candidate #1

2018-02-19 Thread Timo Walther

Hi Chesnay,

thanks for working on a new flink-shaded version. I just had an offline 
discussion with Stephan Ewen about shading Jackson also in 
flink-sql-client. The problem is that we use jackson-dataformat-yaml 
there which is incompatible with our shaded version. Would it be 
possible to include this as well in the next shaded version in order to 
have a cleaner 1.5 release?


Regards,
Timo

Am 2/19/18 um 1:29 PM schrieb Chesnay Schepler:

Previous mail contained some funky pipes, let's try this again...

Hi everyone,
Please review and vote on the release candidate #1 for the version 3.0 
of flink-shaded, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)
The complete staging area is available for your review, which includes:
* GitHub release notes [1],

* the official Apache source release to be deployed to dist.apache.org 
[2], which are signed with the key with 
fingerprint19F2195E1B4816D765A2C324C2EED7B111D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag “release-3.0-rc1”[5].
* A complete list of all new commits in release-3.0-rc1, since 
release-2.0 [6]



The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.


Thanks,
Chesnay
[1] https://github.com/apache/flink-shaded/milestone/3?closed=1
[2] http://people.apache.org/~chesnay/flink-shaded-3.0-rc1/ 


[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] 
https://repository.apache.org/content/repositories/orgapacheflink-1146/
[5] 
https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=commit;h=1233f1bb0e2b9fafa4260603aa130b7eb9995a7a

[6]
1233f1bb0e2b9fafa4260603aa130b7eb9995a7a (#35) Bump netty to 4.0.56
eb064f6e24e98f3ae91bdfd91fc96a32c821620d (#31) Hard-code 
jackson-parent version
f7f3eca859b417a1dcb2aa4c53b1bd9076fce43e (#31) Add extended 
flink-shadaed-jackson-module-jsonSchema module
36dc4840b3624fd83a0d00295013a933297801b7 (#32) Add 
flink-shaded-force-shading
2b69aa648a7d5c689c1e3c2709d7e55cb765b0a2 (#28) Bump maven-shade-plugin 
version to 3.0.0

0c4e83f87d36060c0a5c6c139e00b9afec4f5d19 Increment version to 3.0





Re: [VOTE] Release flink-shaded 3.0, release candidate #1

2018-02-19 Thread Chesnay Schepler

Previous mail contained some funky pipes, let's try this again...

Hi everyone,
Please review and vote on the release candidate #1 for the version 3.0 
of flink-shaded, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)
The complete staging area is available for your review, which includes:
* GitHub release notes [1],

* the official Apache source release to be deployed to dist.apache.org 
[2], which are signed with the key with 
fingerprint19F2195E1B4816D765A2C324C2EED7B111D464BA [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag “release-3.0-rc1”[5].
* A complete list of all new commits in release-3.0-rc1, since 
release-2.0 [6]



The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.


Thanks,
Chesnay
[1] https://github.com/apache/flink-shaded/milestone/3?closed=1
[2] http://people.apache.org/~chesnay/flink-shaded-3.0-rc1/ 


[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1146/
[5] 
https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=commit;h=1233f1bb0e2b9fafa4260603aa130b7eb9995a7a

[6]
1233f1bb0e2b9fafa4260603aa130b7eb9995a7a (#35) Bump netty to 4.0.56
eb064f6e24e98f3ae91bdfd91fc96a32c821620d (#31) Hard-code jackson-parent 
version
f7f3eca859b417a1dcb2aa4c53b1bd9076fce43e (#31) Add extended 
flink-shadaed-jackson-module-jsonSchema module
36dc4840b3624fd83a0d00295013a933297801b7 (#32) Add 
flink-shaded-force-shading
2b69aa648a7d5c689c1e3c2709d7e55cb765b0a2 (#28) Bump maven-shade-plugin 
version to 3.0.0

0c4e83f87d36060c0a5c6c139e00b9afec4f5d19 Increment version to 3.0




[VOTE] Release flink-shaded 3.0, release candidate #1

2018-02-19 Thread Chesnay Schepler

|Hi everyone,|
|Please review and vote on the release candidate #1 for the version 3.0 
of flink-shaded, as follows:|

|[ ] +1, Approve the release|
|[ ] -1, Do not approve the release (please provide specific comments)|
|The complete staging area is available for your review, which includes:
* GitHub release notes [1],
|
|* the official Apache source release to be deployed to dist.apache.org 
[2], which are signed with the key with 
fingerprint|19F2195E1B4816D765A2C324C2EED7B111D464BA |[3],|||

|* all artifacts to be deployed to the Maven Central Repository [4],
* |source code tag “release-3.0-rc1”|||[5].|
* A complete list of all new commits in release-3.0-rc1, since 
release-2.0 [6]



|The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.

|
|Thanks,|
|Chesnay|
|[1] https://github.com/apache/flink-shaded/milestone/3?closed=1
[2] http://people.apache.org/~chesnay/flink-shaded-3.0-rc1/ 
|

|[3] https://dist.apache.org/repos/dist/release/flink/KEYS|
|[4] 
https://repository.apache.org/content/repositories/orgapacheflink-1146/|
|[5] 
https://gitbox.apache.org/repos/asf?p=flink-shaded.git;a=commit;h=1233f1bb0e2b9fafa4260603aa130b7eb9995a7a|

[6]
1233f1bb0e2b9fafa4260603aa130b7eb9995a7a (#35) Bump netty to 4.0.56
eb064f6e24e98f3ae91bdfd91fc96a32c821620d (#31) Hard-code jackson-parent 
version
f7f3eca859b417a1dcb2aa4c53b1bd9076fce43e (#31) Add extended 
flink-shadaed-jackson-module-jsonSchema module
36dc4840b3624fd83a0d00295013a933297801b7 (#32) Add 
flink-shaded-force-shading
2b69aa648a7d5c689c1e3c2709d7e55cb765b0a2 (#28) Bump maven-shade-plugin 
version to 3.0.0

0c4e83f87d36060c0a5c6c139e00b9afec4f5d19 Increment version to 3.0


[jira] [Created] (FLINK-8699) Fix concurrency problem in rocksdb full checkpoint

2018-02-19 Thread Sihua Zhou (JIRA)
Sihua Zhou created FLINK-8699:
-

 Summary: Fix concurrency problem in rocksdb full checkpoint
 Key: FLINK-8699
 URL: https://issues.apache.org/jira/browse/FLINK-8699
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Affects Versions: 1.5.0
Reporter: Sihua Zhou
Assignee: Sihua Zhou
 Fix For: 1.5.0


In full checkpoint, `kvStateInformation` is not a copied object and it can be 
changed when writeKVStateMetaData() is invoking ... This can lead to 
problematic, which is serious.
{code}
private void writeKVStateMetaData() throws IOException {
  // ...
for (Map.Entry> column :
stateBackend.kvStateInformation.entrySet()) {
}
  //...
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8697) Rename DummyFlinkKafkaConsumer in Kinesis tests

2018-02-19 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-8697:
---

 Summary: Rename DummyFlinkKafkaConsumer in Kinesis tests
 Key: FLINK-8697
 URL: https://issues.apache.org/jira/browse/FLINK-8697
 Project: Flink
  Issue Type: Improvement
  Components: Kinesis Connector, Tests
Affects Versions: 1.5.0
Reporter: Chesnay Schepler
 Fix For: 1.5.0


In {{KinesisDataFetcherTest}} exists a class
{code}
private static class DummyFlinkKafkaConsumer extends FlinkKinesisConsumer 
{
{code}

The class should be called {{DummyFlinkKinesisConsumer}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8696) Remove JobManager local mode from the Unix Shell Scripts

2018-02-19 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8696:
---

 Summary: Remove JobManager local mode from the Unix Shell Scripts
 Key: FLINK-8696
 URL: https://issues.apache.org/jira/browse/FLINK-8696
 Project: Flink
  Issue Type: Sub-task
  Components: Startup Shell Scripts
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


In order to work towards removing the local JobManager mode, the shell scripts 
need to be changed to not use/assume that mode any more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] Apache Flink 1.4.1 released

2018-02-19 Thread Stephan Ewen
Great, thanks a lot for being the release manager, Gordon!

On Fri, Feb 16, 2018 at 12:54 AM, Hao Sun  wrote:

> This is great!
>
> On Thu, Feb 15, 2018 at 2:50 PM Bowen Li  wrote:
>
>> Congratulations everyone!
>>
>> On Thu, Feb 15, 2018 at 10:04 AM, Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org> wrote:
>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.4.1, which is the first bugfix release for the Apache Flink
>>> 1.4 series.
>>>
>>>
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data streaming
>>> applications.
>>>
>>>
>>> The release is available for download at:
>>>
>>> https://flink.apache.org/downloads.html
>>>
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this bugfix release:
>>>
>>> https://flink.apache.org/news/2018/02/15/release-1.4.1.html
>>>
>>>
>>> The full release notes are available in Jira:
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>>> projectId=12315522=12342212
>>>
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>>
>>>
>>> Cheers,
>>>
>>> Gordon
>>>
>>>
>>


Re: Flink 1.3.2 -- mvn clean package -Pbuild-jar command takes to much time

2018-02-19 Thread Stephan Ewen
Hi!

It looks like something is not correct with your project setup. The build
puts countless dependencies of core flink into your application jar (like
akka, scala, netty).

The important thing is to set the core Flink dependencies to "provided"
(they will not be in the Jar file), but keep the connectors and libraries
in the proper scope.

If you use one of the newer Flink quickstart projects, this should
automatically happen.

Best,
Stephan


On Sun, Feb 18, 2018 at 3:38 PM, Melekh, Gregory 
wrote:

> Hi all.
> I have streaming job that reads from Kafka 0.10 manipulates data and write
> to Cassandra (Tuple18)
> Also this job has window and CustomReducer class involved to reduce data.
> If groupedBy_windowed_stream DataStream defined with 9 fields (Tuple9)
> compilation takes 5 seconds.
> In current (Tuple11) setup it takes more then 16 minutes.
>
> The artifact is Ok, it runs and does the job…
> The problem is only to long compilation time.
>
> Has someone been met with this issue?
>
>
> DataStream Integer, String,Integer, Integer,Integer>> forAggregations
> = streamParsedWithTimestamps  // This stream is Tuple18
> .map(x -> new Tuple11<>(x.f6, x.f7, x.f8, x.f0, x.f1, x.f2, x.f3,
> x.f11,x.f16,x.f17,new Integer(1)));
>
>
> WindowedStream Integer, String, Integer,Integer,Integer>, Tuple, TimeWindow>
> windowed_stream = forAggregations
> .keyBy(7)
> .window(TumblingProcessingTimeWindows.of(Time.hours(1)));
>
>
>
> DataStream Integer, String, Integer,Integer,Integer>> groupedBy_windowed_stream =
> windowed_stream
>  .reduce(new CustomReducer());
>
> …….
>
>
> private static class CustomReducer
> implements ReduceFunction Integer, Integer, Integer, String, Integer,Integer,Integer>> {
> @Override
> public  Tuple11 Integer, String, Integer,Integer,Integer> reduce(
> Tuple11 Integer, String, Integer,Integer,Integer> v1,
> Tuple11 Integer, String, Integer,Integer,Integer> v2) throws Exception {
>
> v1.f8 += v2.f8;
> v1.f9 += v2.f9;
> v1.f10 += v2.f10;
>
> return new Tuple11<>(v1.f0,v1.f1,v1.f2,
> v1.f3,v1.f4,v1.f5,v1.f6,v1.f7,v1.f8,v1.f9,v1.f10);
> }
>
> }
>
>
>
>  mvn clean package -Pbuild-jar
>
> [INFO] Scanning for projects...
>
> [INFO]
>
> [INFO] 
> 
>
> [INFO] Building analytics 1.0-SNAPSHOT
>
> [INFO] 
> 
>
> [INFO]
>
> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ analytics ---
>
> [INFO] Deleting /Users/gm5806/activity-monitoring/analytics/target
>
> [INFO]
>
> [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @
> analytics ---
>
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>
> [INFO] Copying 1 resource
>
> [INFO]
>
> [INFO] --- maven-compiler-plugin:3.7.0:compile (default-compile) @
> analytics ---
>
> [INFO] Changes detected - recompiling the module!
>
> [INFO] Compiling 12 source files to /Users/gm5806/activity-
> monitoring/analytics/target/classes
>
> [INFO]
>
> [INFO] --- maven-resources-plugin:2.6:testResources
> (default-testResources) @ analytics ---
>
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>
> [INFO] skip non existing resourceDirectory /Users/gm5806/activity-
> monitoring/analytics/src/test/resources
>
> [INFO]
>
> [INFO] --- maven-compiler-plugin:3.7.0:testCompile (default-testCompile)
> @ analytics ---
>
> [INFO] No sources to compile
>
> [INFO]
>
> [INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ analytics
> ---
>
> [INFO] No tests to run.
>
> [INFO]
>
> [INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ analytics ---
>
> [INFO] Building jar: /Users/gm5806/activity-monitoring/analytics/target/
> analytics-1.0-SNAPSHOT.jar
>
> [INFO]
>
> [INFO] --- maven-shade-plugin:2.4.1:shade (default) @ analytics ---
>
> [INFO] Including org.slf4j:slf4j-api:jar:1.7.7 in the shaded jar.
>
> [INFO] Including org.xerial.snappy:snappy-java:jar:1.0.5 in the shaded
> jar.
>
> [INFO] Including org.apache.flink:force-shading:jar:1.3.2 in the shaded
> jar.
>
> [INFO] Including org.apache.flink:flink-table_2.11:jar:1.3.2 in the
> shaded jar.
>
> [INFO] Including org.scala-lang:scala-library:jar:2.11.7 in the shaded
> jar.
>
> [INFO] Including org.scala-lang.modules:scala-
> parser-combinators_2.11:jar:1.0.4 in the shaded jar.
>
> [INFO] Including org.apache.flink:flink-cep_2.11:jar:1.3.2 in the shaded
> jar.
>
> [INFO] Including 

Re: Reading a single input file in parallel?

2018-02-19 Thread Fabian Hueske
Hi Niels,

Jörn is right, although offering different methods, Flink's InputFormat is
very similar to Hadoop's InputFormat interface.
The InputFormat.createInputSplits() method generates splits that can be
read in parallel.
The FileInputFormat splits files by fixed boundaries (usually HDFS
blocksize) and expects the InputFormat to find the right place to start and
end.
For line-wise read files (TextInputFormat) or files with a record delimiter
(DelimiterInputFormat), the formats read the first record after they found
the first delimiter in their split and stop at the first delimiter after
the split boundary.
The BinaryInputFormat extends FileInputFormat but overrides the
createInputSplits method.

So, how exactly a file is read in parallel depends on the
createInputSplits() method of the InputFormat.

Hope this helps,
Fabian


2018-02-18 13:36 GMT+01:00 Jörn Franke :

> AFAIK Flink has a similar notion of splittable as Hadoop. Furthermore you
> can set for custom Fileibputformats the attribute unsplittable = true if
> your file format cannot be split
>
> > On 18. Feb 2018, at 13:28, Niels Basjes  wrote:
> >
> > Hi,
> >
> > In Hadoop MapReduce there is the notion of "splittable" in the
> > FileInputFormat. This has the effect that a single input file can be fed
> > into multiple separate instances of the mapper that read the data.
> > A lot has been documented (i.e. text is splittable per line, gzipped text
> > is not splittable) and designed into the various file formats (like Avro
> > and Parquet) to allow splittability.
> >
> > The goal is that reading and parsing files can be done by multiple
> > cpus/systems in parallel.
> >
> > How is this handled in Flink?
> > Can Flink read a single file in parallel?
> > How does Flink administrate/handle the possibilities regarding the
> various
> > file formats?
> >
> >
> > The reason I ask is because I want to see if I can port this (now Hadoop
> > specific) hobby project of mine to work with Flink:
> > https://github.com/nielsbasjes/splittablegzip
> >
> > Thanks.
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
>


[jira] [Created] (FLINK-8695) Move RocksDB State Backend from 'flink-contrib' to 'flink-state-backends'

2018-02-19 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8695:
---

 Summary: Move RocksDB State Backend from 'flink-contrib' to 
'flink-state-backends'
 Key: FLINK-8695
 URL: https://issues.apache.org/jira/browse/FLINK-8695
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


Having the RocksDB State Backend in {{flink-contrib}} is a bit of an 
understatement...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Multiple windows on a single stream

2018-02-19 Thread Aljoscha Krettek
Hi Carsten,

If you're using event-time windowing you can do something like this:

souce = env.addSource(...)

window1 = source
  .keyBy()
  .window(10 sec)
  .aggregate()/reduce()

window1.addSink(...)

window2 = window1
  .keyBy()
  .windwo(30 sec)
  .aggregate()/reduce()

window2.addSink(...)

And so on...

Does this solve your problem?

Best,
Aljoscha

> On 17. Feb 2018, at 09:40, Alexandru Gutan  wrote:
> 
> Dear Carsten,
> 
> Maybe you need a window with multiple triggers.
> 
> Best,
> Alex.
> 
> On 17 February 2018 at 01:39, Carsten 
> wrote:
> 
>> Hello all,
>> 
>> for some of our sensor data we would like to aggregate data for 10sec,
>> 30sec, 1 min etc., thus conceptually have multiple windows on a single
>> stream. Currently, I am simply duplicating the data stream (separate
>> execution environments etc) and process each of the required windows. Is
>> there a better way? I heard about cascading windows but I am not sure if
>> this approach exits, needs to implemented from scratch,  or how to use it.
>> 
>> 
>> Any link/hint/suggestion, would be greatly appreciated.
>> 
>> 
>> Have a great day,
>> 
>> Carsten
>> 



[jira] [Created] (FLINK-8694) Make notifyDataAvailable call reliable

2018-02-19 Thread Piotr Nowojski (JIRA)
Piotr Nowojski created FLINK-8694:
-

 Summary: Make notifyDataAvailable call reliable
 Key: FLINK-8694
 URL: https://issues.apache.org/jira/browse/FLINK-8694
 Project: Flink
  Issue Type: Sub-task
Reporter: Piotr Nowojski
Assignee: Piotr Nowojski


After FLINK-8591 
org.apache.flink.runtime.io.network.netty.SequenceNumberingViewReader#notifyDataAvailable
 (and the same for Credit base flow control) due to race condition can be 
sometimes ignored. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)