Re: Flink Run command :Replace placeholder value in spring xml

2017-01-17 Thread raikarsunil
Hi,

Let me give a overall picture :

I am using properties file which contains
values(passwords,kafkahostnames,schemaName etc..) related to DB,kafka ,flink
etc.. which is Environment Specific(Dev,QA,Production) .

Using this properties file in spring xml ,set values to beans .




file:#{systemProperties['configFileName']}


SYSTEM_PROPERTIES_MODE_OVERRIDE









*NOTE: the ${} values are in properties file *

I bundle this into a jar containing a main class which takes arguments.

By using *System.setProperty("configFileName", args[0])* , try to set value
for *configFileName*.

By using below java command I am able to successfully run the code .

java -jar my-code-1.0.0-SNAPSHOT.jar
"/home/MyFileLocation/property.properties"

But using below Flink command it fails.

./flink run ../../my-code-1.0.0-SNAPSHOT.jar
"/home/MyFileLocation/property.properties"

Thanks,
Sunil





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Run-command-Replace-placeholder-value-in-spring-xml-tp11109p11124.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Rolling sink parquet/Avro output

2017-01-17 Thread Biswajit Das
Hi There ,

Does any have Rolling sink parquet/Avro writer reference ??. I'm seeing
some issue given stream is handle at rolling sink and I don't see much
option override  or even open at subclass . I could resolve the same with a
custom rolling sink writer, just wondering if any one has done similar or
something out there , please correct me if I'm missing anything here.

~Biswajit


Re: 1.1.1: JobManager config endpoint no longer supplies port

2017-01-17 Thread Shannon Carey
A followup (in case anyone is interested): we worked around this by making a 
request to the "/jars" endpoint of the UI. The response has an attribute called 
"address" which includes the DNS name and port where the UI is accessible.


RE: Release 1.2?

2017-01-17 Thread denis.dollfus
Thanks for the quick update, sounds perfect!
And I‘ll try the staging repo trick in pom.xml.

Denis

From: Stephan Ewen [mailto:se...@apache.org]
Sent: mardi 17 janvier 2017 11:42
To: user@flink.apache.org
Subject: Re: Release 1.2?

So far it looks as if the next release candidate comes in a few days (end of 
this week, beginning of next).

Keep the fingers crossed that it passes!

If you want to help speed the release up, I would recommend to check the next 
release candidate (simply use it as the deployed version and see if it works 
for you).
You can use release candidates like regular releases, if you add the apache 
staging repository to the sources in you application's pom.xml file.

The email announcing the release candidate usually has the details for that,


On Tue, Jan 17, 2017 at 10:26 AM, Timo Walther 
> wrote:
Hi Denis,

the first 1.2 RC0 has already been released and the RC1 is on the way (maybe 
already this week). I think that we can expect a 1.2 release in 3-4 weeks.

Regards,
Timo


Am 17/01/17 um 10:04 schrieb 
denis.doll...@thomsonreuters.com:
Hi all,

Do you have some ballpark estimate for a stable release of Flink 1.2?
We are still at a proof-of-concept stage and are interested in several features 
of 1.2, notably async stream operations (FLINK-4391).

Thank you,

Denis



This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website.





Re: Zeppelin: Flink Kafka Connector

2017-01-17 Thread Neil Derraugh
I re-read that enough times and it finally made sense. I wasn’t paying 
attention and thought 0.10.2 was the Kafka version —which hasn’t been released 
yet either - ha :(.  I switched to a recent version and it’s all good. :)

Thanks !
Neil

> On Jan 17, 2017, at 11:14 AM, Neil Derraugh 
>  wrote:
> 
> Hi Timo & Fabian,
> 
> Thanks for replying.  I'm using Zeppelin built off master.  And Flink 1.2
> built off the release-1.2 branch.  Is that the right branch?
> 
> Neil
> 
> 
> 
> --
> View this message in context: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Zeppelin-Flink-Kafka-Connector-tp3p9.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at 
> Nabble.com.



Re: Zeppelin: Flink Kafka Connector

2017-01-17 Thread Neil Derraugh
Hi Timo & Fabian,

Thanks for replying.  I'm using Zeppelin built off master.  And Flink 1.2
built off the release-1.2 branch.  Is that the right branch?

Neil



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Zeppelin-Flink-Kafka-Connector-tp3p9.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Zeppelin: Flink Kafka Connector

2017-01-17 Thread Fabian Hueske
The connectors are included in the release and available as individual
Maven artifacts.
So Flink 1.2.0 will provide a flink-connector-kafka-0.10 artifact (with
version 1.2.0).

2017-01-17 16:22 GMT+01:00 Foster, Craig :

> Are connectors being included in the 1.2.0 release or do you mean Kafka
> specifically?
>
>
>
> *From: *Fabian Hueske 
> *Reply-To: *"user@flink.apache.org" 
> *Date: *Tuesday, January 17, 2017 at 7:10 AM
> *To: *"user@flink.apache.org" 
> *Subject: *Re: Zeppelin: Flink Kafka Connector
>
>
>
> One thing to add: Flink 1.2.0 has not been release yet.
> The FlinkKafkaConsumer010 is only available in a SNAPSHOT release or the
> first release candidate (RC0).
>
> Best, Fabian
>
>
>
> 2017-01-17 16:08 GMT+01:00 Timo Walther :
>
> You are using an old version of Flink (0.10.2). A FlinkKafkaConsumer010
> was not present at that time. You need to upgrade to Flink 1.2.
>
> Timo
>
>
> Am 17/01/17 um 15:58 schrieb Neil Derraugh:
>
>
>
> This is really a Zeppelin question, and I’ve already posted to the user
> list there.  I’m just trying to draw in as many relevant eyeballs as
> possible.  If you can help please reply on the Zeppelin mailing list.
>
> In my Zeppelin notebook I’m having a problem importing the Kafka streaming
> library for Flink.
>
> I added org.apache.flink:flink-connector-kafka_2.11:0.10.2 to the
> Dependencies on the Flink interpreter.
>
> The Flink interpreter runs code, just not if I have the following import.
> import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010
>
> I get this error:
> :72: error: object FlinkKafkaConsumer010 is not a member of
> package org.apache.flink.streaming.connectors.kafka
> import org.apache.flink.streaming.connectors.kafka.
> FlinkKafkaConsumer010
>
> Am I doing something wrong here?
>
> Neil
>
>
>
>
>


Re: Zeppelin: Flink Kafka Connector

2017-01-17 Thread Foster, Craig
Are connectors being included in the 1.2.0 release or do you mean Kafka 
specifically?

From: Fabian Hueske 
Reply-To: "user@flink.apache.org" 
Date: Tuesday, January 17, 2017 at 7:10 AM
To: "user@flink.apache.org" 
Subject: Re: Zeppelin: Flink Kafka Connector

One thing to add: Flink 1.2.0 has not been release yet.
The FlinkKafkaConsumer010 is only available in a SNAPSHOT release or the first 
release candidate (RC0).
Best, Fabian

2017-01-17 16:08 GMT+01:00 Timo Walther 
>:
You are using an old version of Flink (0.10.2). A FlinkKafkaConsumer010 was not 
present at that time. You need to upgrade to Flink 1.2.

Timo


Am 17/01/17 um 15:58 schrieb Neil Derraugh:

This is really a Zeppelin question, and I’ve already posted to the user list 
there.  I’m just trying to draw in as many relevant eyeballs as possible.  If 
you can help please reply on the Zeppelin mailing list.

In my Zeppelin notebook I’m having a problem importing the Kafka streaming 
library for Flink.

I added org.apache.flink:flink-connector-kafka_2.11:0.10.2 to the Dependencies 
on the Flink interpreter.

The Flink interpreter runs code, just not if I have the following import.
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010

I get this error:
:72: error: object FlinkKafkaConsumer010 is not a member of package 
org.apache.flink.streaming.connectors.kafka
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010

Am I doing something wrong here?

Neil




Re: Zeppelin: Flink Kafka Connector

2017-01-17 Thread Fabian Hueske
One thing to add: Flink 1.2.0 has not been release yet.
The FlinkKafkaConsumer010 is only available in a SNAPSHOT release or the
first release candidate (RC0).

Best, Fabian

2017-01-17 16:08 GMT+01:00 Timo Walther :

> You are using an old version of Flink (0.10.2). A FlinkKafkaConsumer010
> was not present at that time. You need to upgrade to Flink 1.2.
>
> Timo
>
>
> Am 17/01/17 um 15:58 schrieb Neil Derraugh:
>
> This is really a Zeppelin question, and I’ve already posted to the user
>> list there.  I’m just trying to draw in as many relevant eyeballs as
>> possible.  If you can help please reply on the Zeppelin mailing list.
>>
>> In my Zeppelin notebook I’m having a problem importing the Kafka
>> streaming library for Flink.
>>
>> I added org.apache.flink:flink-connector-kafka_2.11:0.10.2 to the
>> Dependencies on the Flink interpreter.
>>
>> The Flink interpreter runs code, just not if I have the following import.
>> import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010
>>
>> I get this error:
>> :72: error: object FlinkKafkaConsumer010 is not a member of
>> package org.apache.flink.streaming.connectors.kafka
>> import org.apache.flink.streaming.con
>> nectors.kafka.FlinkKafkaConsumer010
>>
>> Am I doing something wrong here?
>>
>> Neil
>>
>
>
>


Re: Zeppelin: Flink Kafka Connector

2017-01-17 Thread Timo Walther
You are using an old version of Flink (0.10.2). A FlinkKafkaConsumer010 
was not present at that time. You need to upgrade to Flink 1.2.


Timo


Am 17/01/17 um 15:58 schrieb Neil Derraugh:

This is really a Zeppelin question, and I’ve already posted to the user list 
there.  I’m just trying to draw in as many relevant eyeballs as possible.  If 
you can help please reply on the Zeppelin mailing list.

In my Zeppelin notebook I’m having a problem importing the Kafka streaming 
library for Flink.

I added org.apache.flink:flink-connector-kafka_2.11:0.10.2 to the Dependencies 
on the Flink interpreter.

The Flink interpreter runs code, just not if I have the following import.
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010

I get this error:
:72: error: object FlinkKafkaConsumer010 is not a member of package 
org.apache.flink.streaming.connectors.kafka
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010

Am I doing something wrong here?

Neil





Zeppelin: Flink Kafka Connector

2017-01-17 Thread Neil Derraugh
This is really a Zeppelin question, and I’ve already posted to the user list 
there.  I’m just trying to draw in as many relevant eyeballs as possible.  If 
you can help please reply on the Zeppelin mailing list.

In my Zeppelin notebook I’m having a problem importing the Kafka streaming 
library for Flink.

I added org.apache.flink:flink-connector-kafka_2.11:0.10.2 to the Dependencies 
on the Flink interpreter.

The Flink interpreter runs code, just not if I have the following import.
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010

I get this error:
:72: error: object FlinkKafkaConsumer010 is not a member of package 
org.apache.flink.streaming.connectors.kafka
   import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010

Am I doing something wrong here?

Neil

Re: Flink Run command :Replace placeholder value in spring xml

2017-01-17 Thread Timo Walther

Hi Sunil,

what is the content of args[0] when you execute public static void 
main(String[] args) { System.out.println(args[0]); }


Am 17/01/17 um 14:55 schrieb raikarsunil:
Hi, I am not able to replace value into spring place holder .Below is 
the xml code snippet . file:#{systemProperties['configFileName']} In 
the above code I need to replace *configFileName* with actual file 
which I need to provide externally. Below is the java code with 
*args="/home/myfilepath/file.properties"*: public static void 
main(String[] args) { System.setProperty("configFileName", args[0]); } 
The above does not replace *configFileName* with *args *provided. 
Flink Command :*./flink run ../../my-code-1.0.0-SNAPSHOT.jar 
"/home/myfilepath/file.properties"* Any help on this? Thanks, Sunil


View this message in context: Flink Run command :Replace placeholder 
value in spring xml 

Sent from the Apache Flink User Mailing List archive. mailing list 
archive 
 
at Nabble.com.





Re: Possible JVM native memory leak

2017-01-17 Thread Timo Walther

This sounds like a RocksDB issue. Maybe Stefan (in CC) has an idea?

Timo


Am 17/01/17 um 14:52 schrieb Avihai Berkovitz:


Hello,

I am running a streaming job on a small cluster, and after a few hours 
I noticed that my TaskManager processes are being killed by the OOM 
killer. The processes were using too much memory. After a bit of 
monitoring, I have the following status:


  * The maximum heap size (Xmx) is 4M
  * Native Memory Tracking reports that the process has 44180459KB
committed, which is reasonable given the GC and threads overhead
(the full summery report is attached later)
  * There are 644 threads
  * The Status.JVM.Memory.NonHeap.Committed metric is 245563392
  * The Status.JVM.Memory.Direct.MemoryUsed metric is 354777032
  * Using pmap we find that the private committed memory is
*54879428K* and mapped is 62237852K

So we have about 10GB of memory that was allocated in the process but 
is unknown to the JVM itself.


Some more info:

  * I am running Flink 1.1.4
  * I am using RocksDB for state
  * The state is saved to Azure Blob Store, using the
NativeAzureFileSystem HDFS connector over the wasbs protocol
  * The cluster is a standalone HA cluster
  * The machine is an Ubuntu 14.04.5 LTS 64 bit server
  * I have around 2 GB of state per TaskManager

Another thing I noticed is that the job sometimes fails (due to 
external DB connectivity issues) and is restarted automatically as 
expected. But in some cases the failures also cause one or more of the 
following error logs:


  * Could not close the file system output stream. Trying to delete
the underlying file.
  * Could not discard the 1th operator state of checkpoint 93 for
operator Operator2.

java.lang.NullPointerException: null

at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointThread.run(StreamTask.java:953) 
~[flink-dist_2.10-1.1.4.jar:1.1.4]


I have 2 theories, and I hope to hear any ideas from you:

 1. RocksDB uses this memory for internal caching. If so, how can this
usage be tracked, and what options can tune and limit it?
 2. Internal RocksDB objects are not being disposed of properly,
probably during the aforementioned job restarts, and so we have a
growing memory leak. If so, do you have any idea what can cause this?

Thank you,

Avihai

Attached Native Memory Tracking (jcmd  VM.native_memory summary):

Total: reserved=44399603KB, committed=44180459KB

- Java Heap (reserved=4096KB, committed=4096KB)

(mmap: reserved=4096KB, 
committed=4096KB)


- Class (reserved=134031KB, committed=132751KB)

(classes #22310)

(malloc=2959KB #43612)

(mmap: reserved=131072KB, committed=129792KB)

-Thread (reserved=716331KB, committed=716331KB)

(thread #694)

(stack: reserved=712404KB, committed=712404KB)

(malloc=2283KB #3483)

(arena=1644KB #1387)

-  Code (reserved=273273KB, committed=135409KB)

(malloc=23673KB #30410)

(mmap: reserved=249600KB, committed=111736KB)

-GC (reserved=1635902KB, committed=1635902KB)

(malloc=83134KB #70605)

(mmap: reserved=1552768KB, 
committed=1552768KB)


-  Compiler (reserved=1634KB, committed=1634KB)

(malloc=1504KB #2062)

(arena=131KB #3)

-  Internal (reserved=575283KB, committed=575283KB)

(malloc=575251KB #106644)

(mmap: reserved=32KB, committed=32KB)

-Symbol (reserved=16394KB, committed=16394KB)

(malloc=14468KB #132075)

(arena=1926KB #1)

-Native Memory Tracking (reserved=6516KB, committed=6516KB)

(malloc=338KB #5024)

(tracking overhead=6178KB)

-   Arena Chunk (reserved=237KB, committed=237KB)

(malloc=237KB)

-   Unknown (reserved=8KB, committed=0KB)

(mmap: reserved=8KB, committed=0KB)





Flink Run command :Replace placeholder value in spring xml

2017-01-17 Thread raikarsunil
Hi,I am not able to replace value into spring place holder .Below is the xml
code snippet .  
file:#{systemProperties['configFileName']}  In the above
code I need to replace *configFileName* with actual file which I need to
provide externally.Below is the java code with
*args="/home/myfilepath/file.properties"*:public static void main(String[]
args) {System.setProperty("configFileName", args[0]);}The above does
not replace *configFileName* with *args *provided.Flink Command :*./flink
run ../../my-code-1.0.0-SNAPSHOT.jar "/home/myfilepath/file.properties"*Any
help on this?Thanks,Sunil



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Run-command-Replace-placeholder-value-in-spring-xml-tp11109.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Possible JVM native memory leak

2017-01-17 Thread Avihai Berkovitz
Hello,

I am running a streaming job on a small cluster, and after a few hours I 
noticed that my TaskManager processes are being killed by the OOM killer. The 
processes were using too much memory. After a bit of monitoring, I have the 
following status:

  *   The maximum heap size (Xmx) is 4M
  *   Native Memory Tracking reports that the process has 44180459KB committed, 
which is reasonable given the GC and threads overhead (the full summery report 
is attached later)
  *   There are 644 threads
  *   The Status.JVM.Memory.NonHeap.Committed metric is 245563392
  *   The Status.JVM.Memory.Direct.MemoryUsed metric is 354777032
  *   Using pmap we find that the private committed memory is 54879428K and 
mapped is 62237852K

So we have about 10GB of memory that was allocated in the process but is 
unknown to the JVM itself.

Some more info:

  *   I am running Flink 1.1.4
  *   I am using RocksDB for state
  *   The state is saved to Azure Blob Store, using the NativeAzureFileSystem 
HDFS connector over the wasbs protocol
  *   The cluster is a standalone HA cluster
  *   The machine is an Ubuntu 14.04.5 LTS 64 bit server
  *   I have around 2 GB of state per TaskManager

Another thing I noticed is that the job sometimes fails (due to external DB 
connectivity issues) and is restarted automatically as expected. But in some 
cases the failures also cause one or more of the following error logs:

  *   Could not close the file system output stream. Trying to delete the 
underlying file.
  *   Could not discard the 1th operator state of checkpoint 93 for operator 
Operator2.
java.lang.NullPointerException: null

at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointThread.run(StreamTask.java:953)
 ~[flink-dist_2.10-1.1.4.jar:1.1.4]

I have 2 theories, and I hope to hear any ideas from you:

  1.  RocksDB uses this memory for internal caching. If so, how can this usage 
be tracked, and what options can tune and limit it?
  2.  Internal RocksDB objects are not being disposed of properly, probably 
during the aforementioned job restarts, and so we have a growing memory leak. 
If so, do you have any idea what can cause this?

Thank you,
Avihai


Attached Native Memory Tracking (jcmd  VM.native_memory summary):

Total: reserved=44399603KB, committed=44180459KB
- Java Heap (reserved=4096KB, committed=4096KB)
(mmap: reserved=4096KB, committed=4096KB)

- Class (reserved=134031KB, committed=132751KB)
(classes #22310)
(malloc=2959KB #43612)
(mmap: reserved=131072KB, committed=129792KB)

-Thread (reserved=716331KB, committed=716331KB)
(thread #694)
(stack: reserved=712404KB, committed=712404KB)
(malloc=2283KB #3483)
(arena=1644KB #1387)

-  Code (reserved=273273KB, committed=135409KB)
(malloc=23673KB #30410)
(mmap: reserved=249600KB, committed=111736KB)

-GC (reserved=1635902KB, committed=1635902KB)
(malloc=83134KB #70605)
(mmap: reserved=1552768KB, committed=1552768KB)

-  Compiler (reserved=1634KB, committed=1634KB)
(malloc=1504KB #2062)
(arena=131KB #3)

-  Internal (reserved=575283KB, committed=575283KB)
(malloc=575251KB #106644)
(mmap: reserved=32KB, committed=32KB)

-Symbol (reserved=16394KB, committed=16394KB)
(malloc=14468KB #132075)
(arena=1926KB #1)

-Native Memory Tracking (reserved=6516KB, committed=6516KB)
(malloc=338KB #5024)
(tracking overhead=6178KB)

-   Arena Chunk (reserved=237KB, committed=237KB)
(malloc=237KB)

-   Unknown (reserved=8KB, committed=0KB)
(mmap: reserved=8KB, committed=0KB)



Re: Three input stream operator and back pressure

2017-01-17 Thread Stephan Ewen
Hi!

Just to avoid confusion: the DataStream network readers does currently not
support backpressuring only one input (as this conflicts with other design
aspects). (The DataSet network readers do support that FYI)

How about simply "correcting" the order later? If you have pre-sorted data
per stream, you can generate frequent watermarks trivially (every 100 ms,
based the event's timestamp that you would use for the merge) and then
apply event time windows of say 100ms, inside which you sort and emit the
elements. The windows are strictly evaluated in order, so the resulting
stream is sorted. This would be similar to a form of incremental
"bucketing" sort over the merged stream.

That will give you a sorted stream easily, any may even be not too
expensive.

Stephan


On Tue, Jan 17, 2017 at 1:05 PM, Dmitry Golubets 
wrote:

> Hi Stephan,
>
> In one of our components we have to process events in order, due to
> business logic requirements.
> That is for sure introduces a bottleneck, but other aspects are fine.
>
> I'm not taking about really resorting data, but just about consuming it in
> the right order.
> I.e. if two streams are already in order, all that has to be done is to
> consume one that has the Min element at it's head and backpressure another
> one.
>
> What I can do ofc is to create a custom Source for it. But I would prefer
> not to mix source dependent logic (e.g. Kafka connection, etc) and merging
> logic.
>
> Best regards,
> Dmitry
>
> On Tue, Jan 17, 2017 at 10:46 AM, Stephan Ewen  wrote:
>
>> Hi Dmitry!
>>
>> The streaming runtime makes a conscious decision to not merge streams as
>> in an ordered merge.
>> The reason is that this is at large scale typically bad for scalability /
>> network performance.
>> Also, in certain DAGs, it may lead to deadlocks.
>>
>> Even the two input operator delivers records on a low level in a
>> first-come-first-serve order as driven by network events (NIO events).
>>
>> Flink's operators tolerate out-of-order records to compensate for that.
>> Overall, that seemed the more scalable design to us.
>> Can your use case follow a similar approach?
>>
>> Stephan
>>
>>
>>
>> On Tue, Jan 17, 2017 at 10:57 AM, Dmitry Golubets 
>> wrote:
>>
>>> Hi Timo,
>>>
>>> I don't have any key to join on, so I'm not sure Window Join would work
>>> for me.
>>>
>>> Can I implement my own "low level" operator in any way?
>>> I would appreciate if you can give me a hint or a link to example of how
>>> to do it.
>>>
>>>
>>>
>>> Best regards,
>>> Dmitry
>>>
>>> On Tue, Jan 17, 2017 at 9:24 AM, Timo Walther 
>>> wrote:
>>>
 Hi Dmitry,

 the runtime supports an arbitrary number of inputs, however, the API
 does currently not provide a convenient way. You could use the "union"
 operator to reduce the number of inputs. Otherwise I think you have to
 implement your own operator. That depends on your use case though.

 You can maintain backpressure by using Flink's operator state. But did
 you also thought about a Window Join instead?

 I hope that helps.

 Timo




 Am 17/01/17 um 00:20 schrieb Dmitry Golubets:

 Hi,

 there are only *two *interfaces defined at the moment:
 *OneInputStreamOperator*
 and
 *TwoInputStreamOperator.*

 Is there any way to define an operator with arbitrary number of inputs?

 My another concern is how to maintain *backpressure *in the operator?
 Let's say I read events from two Kafka sources, both of which are
 ordered by time. I want to merge them keeping the global order. But to do
 it, I need to stop block one input if another one has no data yet.

 Best regards,
 Dmitry



>>>
>>
>


Re: Three input stream operator and back pressure

2017-01-17 Thread Dmitry Golubets
Hi Stephan,

In one of our components we have to process events in order, due to
business logic requirements.
That is for sure introduces a bottleneck, but other aspects are fine.

I'm not taking about really resorting data, but just about consuming it in
the right order.
I.e. if two streams are already in order, all that has to be done is to
consume one that has the Min element at it's head and backpressure another
one.

What I can do ofc is to create a custom Source for it. But I would prefer
not to mix source dependent logic (e.g. Kafka connection, etc) and merging
logic.

Best regards,
Dmitry

On Tue, Jan 17, 2017 at 10:46 AM, Stephan Ewen  wrote:

> Hi Dmitry!
>
> The streaming runtime makes a conscious decision to not merge streams as
> in an ordered merge.
> The reason is that this is at large scale typically bad for scalability /
> network performance.
> Also, in certain DAGs, it may lead to deadlocks.
>
> Even the two input operator delivers records on a low level in a
> first-come-first-serve order as driven by network events (NIO events).
>
> Flink's operators tolerate out-of-order records to compensate for that.
> Overall, that seemed the more scalable design to us.
> Can your use case follow a similar approach?
>
> Stephan
>
>
>
> On Tue, Jan 17, 2017 at 10:57 AM, Dmitry Golubets 
> wrote:
>
>> Hi Timo,
>>
>> I don't have any key to join on, so I'm not sure Window Join would work
>> for me.
>>
>> Can I implement my own "low level" operator in any way?
>> I would appreciate if you can give me a hint or a link to example of how
>> to do it.
>>
>>
>>
>> Best regards,
>> Dmitry
>>
>> On Tue, Jan 17, 2017 at 9:24 AM, Timo Walther  wrote:
>>
>>> Hi Dmitry,
>>>
>>> the runtime supports an arbitrary number of inputs, however, the API
>>> does currently not provide a convenient way. You could use the "union"
>>> operator to reduce the number of inputs. Otherwise I think you have to
>>> implement your own operator. That depends on your use case though.
>>>
>>> You can maintain backpressure by using Flink's operator state. But did
>>> you also thought about a Window Join instead?
>>>
>>> I hope that helps.
>>>
>>> Timo
>>>
>>>
>>>
>>>
>>> Am 17/01/17 um 00:20 schrieb Dmitry Golubets:
>>>
>>> Hi,
>>>
>>> there are only *two *interfaces defined at the moment:
>>> *OneInputStreamOperator*
>>> and
>>> *TwoInputStreamOperator.*
>>>
>>> Is there any way to define an operator with arbitrary number of inputs?
>>>
>>> My another concern is how to maintain *backpressure *in the operator?
>>> Let's say I read events from two Kafka sources, both of which are
>>> ordered by time. I want to merge them keeping the global order. But to do
>>> it, I need to stop block one input if another one has no data yet.
>>>
>>> Best regards,
>>> Dmitry
>>>
>>>
>>>
>>
>


Re: Three input stream operator and back pressure

2017-01-17 Thread Stephan Ewen
Hi Dmitry!

The streaming runtime makes a conscious decision to not merge streams as in
an ordered merge.
The reason is that this is at large scale typically bad for scalability /
network performance.
Also, in certain DAGs, it may lead to deadlocks.

Even the two input operator delivers records on a low level in a
first-come-first-serve order as driven by network events (NIO events).

Flink's operators tolerate out-of-order records to compensate for that.
Overall, that seemed the more scalable design to us.
Can your use case follow a similar approach?

Stephan



On Tue, Jan 17, 2017 at 10:57 AM, Dmitry Golubets 
wrote:

> Hi Timo,
>
> I don't have any key to join on, so I'm not sure Window Join would work
> for me.
>
> Can I implement my own "low level" operator in any way?
> I would appreciate if you can give me a hint or a link to example of how
> to do it.
>
>
>
> Best regards,
> Dmitry
>
> On Tue, Jan 17, 2017 at 9:24 AM, Timo Walther  wrote:
>
>> Hi Dmitry,
>>
>> the runtime supports an arbitrary number of inputs, however, the API does
>> currently not provide a convenient way. You could use the "union" operator
>> to reduce the number of inputs. Otherwise I think you have to implement
>> your own operator. That depends on your use case though.
>>
>> You can maintain backpressure by using Flink's operator state. But did
>> you also thought about a Window Join instead?
>>
>> I hope that helps.
>>
>> Timo
>>
>>
>>
>>
>> Am 17/01/17 um 00:20 schrieb Dmitry Golubets:
>>
>> Hi,
>>
>> there are only *two *interfaces defined at the moment:
>> *OneInputStreamOperator*
>> and
>> *TwoInputStreamOperator.*
>>
>> Is there any way to define an operator with arbitrary number of inputs?
>>
>> My another concern is how to maintain *backpressure *in the operator?
>> Let's say I read events from two Kafka sources, both of which are ordered
>> by time. I want to merge them keeping the global order. But to do it, I
>> need to stop block one input if another one has no data yet.
>>
>> Best regards,
>> Dmitry
>>
>>
>>
>


Re: Release 1.2?

2017-01-17 Thread Stephan Ewen
So far it looks as if the next release candidate comes in a few days (end
of this week, beginning of next).

Keep the fingers crossed that it passes!

If you want to help speed the release up, I would recommend to check the
next release candidate (simply use it as the deployed version and see if it
works for you).
You can use release candidates like regular releases, if you add the apache
staging repository to the sources in you application's pom.xml file.

The email announcing the release candidate usually has the details for that,


On Tue, Jan 17, 2017 at 10:26 AM, Timo Walther  wrote:

> Hi Denis,
>
> the first 1.2 RC0 has already been released and the RC1 is on the way
> (maybe already this week). I think that we can expect a 1.2 release in 3-4
> weeks.
>
> Regards,
> Timo
>
>
> Am 17/01/17 um 10:04 schrieb denis.doll...@thomsonreuters.com:
>
> Hi all,
>
>
>
> Do you have some ballpark estimate for a stable release of Flink 1.2?
>
> We are still at a proof-of-concept stage and are interested in several
> features of 1.2, notably async stream operations (FLINK-4391).
>
>
>
> Thank you,
>
>
>
> Denis
>
> --
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website.
> 
>
>
>


Re: Help using HBase with Flink 1.1.4

2017-01-17 Thread Stephan Ewen
Flavio is right: Flink should not expose Guava at all. Make sure you build
it following this trick:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/building.html#dependency-shading

On Tue, Jan 17, 2017 at 11:18 AM, Flavio Pompermaier 
wrote:

> I had very annoying problem in deploying a Flink job for Hbase 1.2 on
> cloudera cdh 5.9.0the problem was caused by the fact that with maven <
> 3.3 you could build flink dist just using mvn clean install, with maven >=
> 3.3 you should do another mvn clean install from the flink-dist directory
> (I still don't know why).
> See https://ci.apache.org/projects/flink/flink-docs-
> release-1.1/setup/building.html#dependency-shading for more details...
>
> I hope this could help,
> Flavio
>
>
>
> On 17 Jan 2017 02:31, "Ted Yu"  wrote:
>
>> Logged FLINK-5517 for upgrading hbase version to 1.3.0
>>
>> On Mon, Jan 16, 2017 at 5:26 PM, Ted Yu  wrote:
>>
>>> hbase uses Guava 12.0.1 and Flink uses 18.0 where Stopwatch.()V
>>> is no longer accessible.
>>> HBASE-14963 removes the use of Stopwatch at this location.
>>>
>>> hbase 1.3.0 RC has passed voting period.
>>>
>>> Please use 1.3.0 where you wouldn't see the IllegalAccessError
>>>
>>> On Mon, Jan 16, 2017 at 4:50 PM, Giuliano Caliari <
>>> giuliano.cali...@gmail.com> wrote:
>>>
 Hello,

 I'm trying to use HBase on one of my stream transformations and I'm
 running into the Guava/Stopwatch dependency problem

 java.lang.IllegalAccessError: tried to access method 
 com.google.common.base.Stopwatch.()V from class 
 org.apache.hadoop.hbase.zookeeper.MetaTableLocator


 Reading on the problem it seems that there is a way to avoid it using
 shading:
 https://ci.apache.org/projects/flink/flink-docs-release-1.1/
 setup/building.html#dependency-shading

 But I can't get it to work.
 I followed the documented steps and it builds but when I try to run the
 newly built version it fails when trying to connect to the Resource 
 Manager:

 2017-01-17 00:42:05,872 INFO  org.apache.flink.yarn.YarnClusterDescriptor
   - Using values:
 2017-01-17 00:42:05,872 INFO  org.apache.flink.yarn.YarnClusterDescriptor
   - TaskManager count = 4
 2017-01-17 00:42:05,873 INFO  org.apache.flink.yarn.YarnClusterDescriptor
   - JobManager memory = 1024
 2017-01-17 00:42:05,873 INFO  org.apache.flink.yarn.YarnClusterDescriptor
   - TaskManager memory = 32768
 2017-01-17 00:42:05,892 INFO  org.apache.hadoop.yarn.client.RMProxy
   - Connecting to ResourceManager at /0.0.0.0:8032
 2017-01-17 00:42:07,023 INFO  org.apache.hadoop.ipc.Client
  - Retrying connect to server: 0.0.0.0/0.0.0.0:8032.
 Already tried 0 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
 sleepTime=1000 MILLISECONDS)
 2017-01-17 00:42:08,024 INFO  org.apache.hadoop.ipc.Client
  - Retrying connect to server: 0.0.0.0/0.0.0.0:8032.
 Already tried 1 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
 sleepTime=1000 MILLISECONDS)


 I'm currently building version 1.1.4 of Flink based on the github repo.
 Building it without shading (not executing `mvn clean install` on the
 flink-dist sub-project) works fine until I try to use HBase, at which point
 I get the Stopwatch exception.

 Has anyone been able to solve this?

 Thanks you,

 Giuliano Caliari
 --
 --
 Giuliano Caliari (+55 11 984898464 <+55%2011%2098489-8464>)
 +Google
 
 Twitter 

 Master Software Engineer by Escola Politécnica da USP
 Bachelor in Computer Science by Instituto de Matemática e Estatística
 da USP


>>>
>>


Re: Help using HBase with Flink 1.1.4

2017-01-17 Thread Flavio Pompermaier
I had very annoying problem in deploying a Flink job for Hbase 1.2 on
cloudera cdh 5.9.0the problem was caused by the fact that with maven <
3.3 you could build flink dist just using mvn clean install, with maven >=
3.3 you should do another mvn clean install from the flink-dist directory
(I still don't know why).
See
https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/building.html#dependency-shading
for more details...

I hope this could help,
Flavio



On 17 Jan 2017 02:31, "Ted Yu"  wrote:

> Logged FLINK-5517 for upgrading hbase version to 1.3.0
>
> On Mon, Jan 16, 2017 at 5:26 PM, Ted Yu  wrote:
>
>> hbase uses Guava 12.0.1 and Flink uses 18.0 where Stopwatch.()V is
>> no longer accessible.
>> HBASE-14963 removes the use of Stopwatch at this location.
>>
>> hbase 1.3.0 RC has passed voting period.
>>
>> Please use 1.3.0 where you wouldn't see the IllegalAccessError
>>
>> On Mon, Jan 16, 2017 at 4:50 PM, Giuliano Caliari <
>> giuliano.cali...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I'm trying to use HBase on one of my stream transformations and I'm
>>> running into the Guava/Stopwatch dependency problem
>>>
>>> java.lang.IllegalAccessError: tried to access method 
>>> com.google.common.base.Stopwatch.()V from class 
>>> org.apache.hadoop.hbase.zookeeper.MetaTableLocator
>>>
>>>
>>> Reading on the problem it seems that there is a way to avoid it using
>>> shading:
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.1/
>>> setup/building.html#dependency-shading
>>>
>>> But I can't get it to work.
>>> I followed the documented steps and it builds but when I try to run the
>>> newly built version it fails when trying to connect to the Resource Manager:
>>>
>>> 2017-01-17 00:42:05,872 INFO  org.apache.flink.yarn.YarnClusterDescriptor
>>>   - Using values:
>>> 2017-01-17 00:42:05,872 INFO  org.apache.flink.yarn.YarnClusterDescriptor
>>>   - TaskManager count = 4
>>> 2017-01-17 00:42:05,873 INFO  org.apache.flink.yarn.YarnClusterDescriptor
>>>   - JobManager memory = 1024
>>> 2017-01-17 00:42:05,873 INFO  org.apache.flink.yarn.YarnClusterDescriptor
>>>   - TaskManager memory = 32768
>>> 2017-01-17 00:42:05,892 INFO  org.apache.hadoop.yarn.client.RMProxy
>>> - Connecting to ResourceManager at /0.0.0.0:8032
>>> 2017-01-17 00:42:07,023 INFO  org.apache.hadoop.ipc.Client
>>>- Retrying connect to server: 0.0.0.0/0.0.0.0:8032.
>>> Already tried 0 time(s); retry policy is 
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>>> sleepTime=1000 MILLISECONDS)
>>> 2017-01-17 00:42:08,024 INFO  org.apache.hadoop.ipc.Client
>>>- Retrying connect to server: 0.0.0.0/0.0.0.0:8032.
>>> Already tried 1 time(s); retry policy is 
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>>> sleepTime=1000 MILLISECONDS)
>>>
>>>
>>> I'm currently building version 1.1.4 of Flink based on the github repo.
>>> Building it without shading (not executing `mvn clean install` on the
>>> flink-dist sub-project) works fine until I try to use HBase, at which point
>>> I get the Stopwatch exception.
>>>
>>> Has anyone been able to solve this?
>>>
>>> Thanks you,
>>>
>>> Giuliano Caliari
>>> --
>>> --
>>> Giuliano Caliari (+55 11 984898464 <+55%2011%2098489-8464>)
>>> +Google
>>> 
>>> Twitter 
>>>
>>> Master Software Engineer by Escola Politécnica da USP
>>> Bachelor in Computer Science by Instituto de Matemática e Estatística da
>>> USP
>>>
>>>
>>
>


Re: Three input stream operator and back pressure

2017-01-17 Thread Dmitry Golubets
Hi Timo,

I don't have any key to join on, so I'm not sure Window Join would work for
me.

Can I implement my own "low level" operator in any way?
I would appreciate if you can give me a hint or a link to example of how to
do it.



Best regards,
Dmitry

On Tue, Jan 17, 2017 at 9:24 AM, Timo Walther  wrote:

> Hi Dmitry,
>
> the runtime supports an arbitrary number of inputs, however, the API does
> currently not provide a convenient way. You could use the "union" operator
> to reduce the number of inputs. Otherwise I think you have to implement
> your own operator. That depends on your use case though.
>
> You can maintain backpressure by using Flink's operator state. But did you
> also thought about a Window Join instead?
>
> I hope that helps.
>
> Timo
>
>
>
>
> Am 17/01/17 um 00:20 schrieb Dmitry Golubets:
>
> Hi,
>
> there are only *two *interfaces defined at the moment:
> *OneInputStreamOperator*
> and
> *TwoInputStreamOperator.*
>
> Is there any way to define an operator with arbitrary number of inputs?
>
> My another concern is how to maintain *backpressure *in the operator?
> Let's say I read events from two Kafka sources, both of which are ordered
> by time. I want to merge them keeping the global order. But to do it, I
> need to stop block one input if another one has no data yet.
>
> Best regards,
> Dmitry
>
>
>


Re: Apache Flink 1.1.4 - Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException

2017-01-17 Thread Miguel Coimbra
Hello Vasia,

I am going to look into this.
Hopefully I will contribute to the implementation and documentation.

Regards,

-- Forwarded message --
From: Vasiliki Kalavri 
To: user@flink.apache.org
Cc:
Date: Sun, 15 Jan 2017 18:01:41 +0100
Subject: Re: Apache Flink 1.1.4 - Java 8 - CommunityDetection.java:158 -
java.lang.NullPointerException
Hi Miguel,

this is a bug, thanks a lot for reporting! I think the problem is that the
implementation assumes that labelsWithHighestScores contains the vertex
itself as initial label.

Could you please open a JIRA ticket for this and attach your code and data
as an example to reproduce? We should also improve the documentation for
this library method. I see that you are initializing vertex values and you
have called getUndirected(), but the library method already does both of
these operations internally.

Cheers,
-Vasia.

On 13 January 2017 at 17:12, Miguel Coimbra 
wrote:
Hello,

If I missed the answer to this or some essential step of the documentation,
please do tell.
I am having the following problem while trying out the
org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly
API (Java).

Specs: JDK 1.8.0_102 x64
Apache Flink: 1.1.4

Suppose I have a very small (I tried with an example with 38 vertices as
well) dataset stored in a tab-separated file 3-vertex.tsv:

#id1 id2 score
010
020
030

This is just a central vertex with 3 neighbors (disconnected between
themselves).
I am loading the dataset and executing the algorithm with the following
code:


---
// Load the data from the .tsv file.
final DataSet> edgeTuples =
env.readCsvFile(inputPath)
.fieldDelimiter("\t") // node IDs are separated by spaces
.ignoreComments("#")  // comments start with "%"
.types(Long.class, Long.class, Double.class);

// Generate a graph and add reverse edges (undirected).
final Graph graph = Graph.fromTupleDataSet(edgeTuples,
new MapFunction() {
private static final long serialVersionUID =
8713516577419451509L;
public Long map(Long value) {
return value;
}
},
env).getUndirected();

// CommunityDetection parameters.
final double hopAttenuationDelta = 0.5d;
final int iterationCount = 10;

// Prepare and trigger the execution.
DataSet> vs = graph.run(new
org.apache.flink.graph.library.CommunityDetection(iterationCount,
hopAttenuationDelta)).getVertices();

vs.print();
​---​

​Running this code throws the following exception​ (check the bold line):

​org.apache.flink.runtime.client.JobExecutionException: Job execution
failed.
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$
handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$
handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$
handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.lifte
dTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(F
uture.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.
exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExec
All(ForkJoinPool.java:1253)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
ForkJoinPool.java:1346)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo
l.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW
orkerThread.java:107)

*Caused by: java.lang.NullPointerExceptionat
org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158)*
at org.apache.flink.graph.spargel.ScatterGatherIteration$Gather
UdfSimpleVV.coGroup(ScatterGatherIteration.java:389)
at org.apache.flink.runtime.operators.CoGroupWithSolutionSetSec
ondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
at org.apache.flink.runtime.iterative.task.AbstractIterativeTas
k.run(AbstractIterativeTask.java:146)
at org.apache.flink.runtime.iterative.task.IterationTailTask.
run(IterationTailTask.java:107)
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTas
k.java:351)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:642)
at java.lang.Thread.run(Thread.java:745)​


​After a further look, I set a breakpoint (Eclipse IDE 

Re: Release 1.2?

2017-01-17 Thread Timo Walther

Hi Denis,

the first 1.2 RC0 has already been released and the RC1 is on the way 
(maybe already this week). I think that we can expect a 1.2 release in 
3-4 weeks.


Regards,
Timo


Am 17/01/17 um 10:04 schrieb denis.doll...@thomsonreuters.com:


Hi all,

Do you have some ballpark estimate for a stable release of Flink 1.2?

We are still at a proof-of-concept stage and are interested in several 
features of 1.2, notably async stream operations (FLINK-4391).


Thank you,

Denis




This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not 
an intended recipient, please notify the sender by return e-mail and 
delete this e-mail and any attachments. Certain required legal entity 
disclosures can be accessed on our website. 






Re: Three input stream operator and back pressure

2017-01-17 Thread Timo Walther

Hi Dmitry,

the runtime supports an arbitrary number of inputs, however, the API 
does currently not provide a convenient way. You could use the "union" 
operator to reduce the number of inputs. Otherwise I think you have to 
implement your own operator. That depends on your use case though.


You can maintain backpressure by using Flink's operator state. But did 
you also thought about a Window Join instead?


I hope that helps.

Timo




Am 17/01/17 um 00:20 schrieb Dmitry Golubets:

Hi,

there are only *two *interfaces defined at the moment:
/OneInputStreamOperator/
and
/TwoInputStreamOperator./

Is there any way to define an operator with arbitrary number of inputs?

My another concern is how to maintain *backpressure *in the operator?
Let's say I read events from two Kafka sources, both of which are 
ordered by time. I want to merge them keeping the global order. But to 
do it, I need to stop block one input if another one has no data yet.


Best regards,
Dmitry





Release 1.2?

2017-01-17 Thread denis.dollfus
Hi all,

Do you have some ballpark estimate for a stable release of Flink 1.2?
We are still at a proof-of-concept stage and are interested in several features 
of 1.2, notably async stream operations (FLINK-4391).

Thank you,

Denis



This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website.