How to customize trigger for Count Time Window

2018-07-14 Thread Soheil Pourbafrani
I want to have a time window to trigger data processing in two following
condition:
1 - The window has 3 messages
2- Or any number of message (less than 3) is in the window and it reaches a
timeout

I know someone should extend Trigger class:

public static class MyWindowTrigger  extends
Trigger {

@Override
public TriggerResult onElement(Object o, long l, W w,
TriggerContext triggerContext) throws Exception {

}

@Override
public TriggerResult onProcessingTime(long l, W w, TriggerContext
triggerContext) throws Exception {
return TriggerResult.CONTINUE ;
}

@Override
public TriggerResult onEventTime(long l, W w, TriggerContext
triggerContext) throws Exception {
return TriggerResult.CONTINUE ;
}

@Override
public void clear(W w, TriggerContext triggerContext) throws Exception {

}

But I don't know how should I check the number of messages in the window
and set a timeout. Can someone help?


Re: StateMigrationException when switching from TypeInformation.of to createTypeInformation

2018-07-14 Thread Elias Levy
Apologies for the delay.  I've been traveling.

On Mon, Jul 9, 2018 at 8:44 AM Till Rohrmann  wrote:

> could you check whether the `TypeInformation` returned by
> `TypeInformation.of(new TypeHint[ConfigState]() {}))` and
> `createTypeInformation[ConfigState]` return the same `TypeInformation`
> subtype? The problem is that the former goes through the Java TypeExtractor
> whereas the latter goes through the Scala `TypeUtils#createTypeInfo` where
> the resulting `TypeInformation` is created via Scala macros. It must be the
> case that the Scala `TypeUtils` generate a different `TypeInformation`
> (e.g. Java generating a GenericTypeInfo whereas Scala generates a
> TraversableTypeInfo).
>

TypeInformation.of to returns a GenericTypeInfo and toString reports it as
GenericType.

createTypeInformation returns an anonymous class but toString reports it as
interface scala.collection.mutable.Map[scala.Tuple2(_1: String, _2:
scala.Tuple2(_1: GenericType, _2:
byte[]))].

Looks like you are correct about the Java version using GenericTypeInfo.  I
suppose the only way around this if we wanted to move over to
createTypeInformation
is to release a job that supports both types and upgrade the state from one
to the other, then drop support for the older state.  Yes?

It would also be helpful if you could share the definition of `ConfigState`
> in order to test it ourselves.
>

ConfigState is defined as type ConfigState =
mutable.Map[String,ConfigStateValue] and ConfigStateValue is defined as type
ConfigStateValue = (LazyObject,Array[Byte]).  LazyObject is from the
Doubledutch LazyJSON  package.


Flink Query Optimizer

2018-07-14 Thread Albert Jonathan
Hello,

I am just wondering, does Flink use Apache Calcite's query optimizer to
generate an optimal logical plan for stream queries, or does it have its
own independent query optimizer?
>From what I observed so far, the Flink's query optimizer only groups
operator together without changing the order of aggregation operators
(e.g., join). Did I miss anything?

I am thinking of extending Flink to apply query optimization as in the
context of DBMS by either integrating it with Calcite or implementing it as
a new module.
Any feedback or guidelines will be highly appreciated.

Thank you,
Albert


Re: understanding purpose of TextInputFormat

2018-07-14 Thread Jörn Franke
Textinputformat defines the format of the data, it could be also different from 
text , eg orc, parquet etc

> On 14. Jul 2018, at 19:15, chrisr123  wrote:
> 
> I'm building a streaming app that continuously monitors a directory for new
> files and I'm confused about why I have to specify a TextInputFormat - see
> source code below.  It seems redundant but it is a required parameter.  It
> makes perfect sense to specify the directory I want to monitor, but what
> purpose is the TextInputFormat filling and what should I set it to? Example:
> Simple Word Count App that reads lines of text.  
> 
> 
>TextInputFormat format = new TextInputFormat(
>new org.apache.flink.core.fs.Path("file:///tmp/dir/"));
> 
>DataStream inputStream = env.readFile(
>format,
>"file:///tmp/dir/",
>FileProcessingMode.PROCESS_CONTINUOUSLY,
>100);
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: SinkFunction invoke method signature

2018-07-14 Thread Chesnay Schepler

The variables T and IN aren't related to each other.

That is, whether the context interface is defined as Context or 
Context makes no difference,

since it is an interface which are always static.

At runtime, the context given to the function should be of type Context,
but I don't know why the invoke method (and StreamSink for that matter) 
use raw parameters.


On 13.07.2018 19:35, Ashwin Sinha wrote:

+1

We encountered the exact same issue today.

On Fri, Jul 13, 2018 at 10:51 PM Philip Doctor 
mailto:philip.doc...@physiq.com>> wrote:


Dear Flink Users,
I noticed my sink's `invoke` method was deprecated, so I went to
update it, I'm a little surprised by the new method signature,
especially on Context (copy+pasted below for ease of discussion). 
Shouldn't Context be Context not Context ? based on the

docs?  I'm having a hard time understanding what's getting sent to
me here in Context.  Anyone have any insights on why these might
be different ?

/** * Interface for implementing user defined sink functionality.
* * @param  Input type parameter. */ @Public public interface 
SinkFunction extends Function, Serializable {


/**

* Context that {@link SinkFunction SinkFunctions } can use for
getting additional data about * an input record. * * The
context is only valid for the duration of a * {@link
SinkFunction#invoke(Object, Context)} call. Do not store the
context and use * afterwards! * * @param  The type of elements
accepted by the sink. */ @Public // Interface might be extended in
the future with additional methods. interface Context {



org.apache.flink/flink-streaming-java_2.11/1.5.0/784a58515da2194a2e74666e8d53d50fac8c03/flink-streaming-java_2.11-1.5.0-sources.jar!/org/apache/flink/streaming/api/functions/sink/SinkFunction.java



--
*Ashwin Sinha *| Data Engineer
ashwin.si...@go-mmt.com  | 9452075361



::DISCLAIMER::




This message is intended only for the use of the addressee and may 
contain information that is privileged, confidential and exempt from 
disclosure under applicable law. If the reader of this message is not 
the intended recipient, or the employee or agent responsible for 
delivering the message to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this 
communication is strictly prohibited. If you have received this e-mail 
in error, please notify us immediately by return e-mail and delete 
this e-mail and all attachments from your system.






specifying prefix for print(), printToErr() ?

2018-07-14 Thread chrisr123


The documentation states that there is a way to specify a prefix msg to
distinguish between different calls to print() (see below), but I have not
found a way to do this? Can anyone show me how I would code this?

What I'd like to do conceptually, and have the prefix msg show up in the
output so I know where I am
in the transformation process: 

DataStream myStream = ...
myStream.print("beforeFilter");


DataStream myFilteredStream = myStream.filter(new MyFilter());
myFilteredStream.print("afterFilter");


>From Docs:
print() / printToErr() - Prints the toString() value of each element on the
standard out / standard error stream. Optionally, a prefix (msg) can be
provided which is prepended to the output. This can help to distinguish
between different calls to print. If the parallelism is greater than 1, the
output will also be prepended with the identifier of the task which produced
the output.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: SinkFunction invoke method signature

2018-07-14 Thread Philip Doctor
> That is, whether the context interface is defined as Context or 
> Context makes no difference,
since it is an interface which are always static.


I don't think this is the case.  Context<> is an inner interface,  has a 
meaning in that scope,  does not, so there's a very real difference.  When 
you go to consume it, you have to consume <*> in order to meet the requirements 
of the interface, in my example, I want to write:

override fun invoke(value: ByteArray, context: SinkFunction.Context) 
{


but I can't, I have to write

override fun invoke(value: ByteArray, context: SinkFunction.Context<*>) {is m

In order to avoid a compile error and actually override the interface.


This means Context<> to me, as a consumer, I have no type information about 
Context, and need to just unsafely downcast if I wanted to use it.  This feels, 
at a minimum like a confusing API to consume.  Can you provide some guidance on 
how I would consume this other than unsafely downcasting the contents of 
Context<>?


physIQ


From: Chesnay Schepler 
Sent: Saturday, July 14, 2018 3:54:33 AM
To: Ashwin Sinha; Philip Doctor
Cc: user@flink.apache.org
Subject: Re: SinkFunction invoke method signature

The variables T and IN aren't related to each other.

That is, whether the context interface is defined as Context or Context 
makes no difference,
since it is an interface which are always static.

At runtime, the context given to the function should be of type Context,
but I don't know why the invoke method (and StreamSink for that matter) use raw 
parameters.

On 13.07.2018 19:35, Ashwin Sinha wrote:
+1

We encountered the exact same issue today.

On Fri, Jul 13, 2018 at 10:51 PM Philip Doctor 
mailto:philip.doc...@physiq.com>> wrote:
Dear Flink Users,
I noticed my sink's `invoke` method was deprecated, so I went to update it, I'm 
a little surprised by the new method signature, especially on Context 
(copy+pasted below for ease of discussion).  Shouldn't Context be Context 
not Context ? based on the docs?  I'm having a hard time understanding 
what's getting sent to me here in Context.  Anyone have any insights on why 
these might be different ?


/**
 * Interface for implementing user defined sink functionality.
 *
 * @param  Input type parameter.
 */
@Public
public interface SinkFunction extends Function, Serializable {


/**

 * Context that {@link SinkFunction SinkFunctions } can use for getting 
additional data about
 * an input record.
 *
 * The context is only valid for the duration of a
 * {@link SinkFunction#invoke(Object, Context)} call. Do not store the context 
and use
 * afterwards!
 *
 * @param  The type of elements accepted by the sink.
 */
@Public // Interface might be extended in the future with additional methods.
interface Context {


org.apache.flink/flink-streaming-java_2.11/1.5.0/784a58515da2194a2e74666e8d53d50fac8c03/flink-streaming-java_2.11-1.5.0-sources.jar!/org/apache/flink/streaming/api/functions/sink/SinkFunction.java


--
Ashwin Sinha | Data Engineer
ashwin.si...@go-mmt.com | 9452075361
[http://www.mailmktg.makemytrip.com/images/mmt-sign-logo.png][http://www.mailmktg.makemytrip.com/images/ibibo-sign-logo.png][http://www.mailmktg.makemytrip.com/images/redbus-sign-logo.png]



::DISCLAIMER::




This message is intended only for the use of the addressee and may contain 
information that is privileged, confidential and exempt from disclosure under 
applicable law. If the reader of this message is not the intended recipient, or 
the employee or agent responsible for delivering the message to the intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of this communication is strictly prohibited. If you have received this 
e-mail in error, please notify us immediately by return e-mail and delete this 
e-mail and all attachments from your system.



Multi-tenancy environment with mutual auth

2018-07-14 Thread ashish pok
All,
We are running into a blocking production deployment issue. It looks like Flink 
inter-communications doesnt support SSL mutual auth. Any plans/ways to support 
it? We are going to have to create DMZ for each tenant without that, not 
preferable of course.


- Ashish

Filtering and mapping data after window opertator

2018-07-14 Thread Soheil Pourbafrani
Hi, I'm getting data stream from a source and after gathering data in a
time window I want to do some operation like filtering and mapping on
windowed data, but the output of time window operation just allow reduce,
aggregate or ... function and after that, I want to apply functions like
filter or map. How can I apply filter function to the windowed data without
using reduce function before that?

temp.keyBy( 0).
timeWindow(Time.milliseconds(INTERVAL_TIME)).reduce(new
ReduceFunction>() {


reduce a data stream to other type

2018-07-14 Thread Soheil Pourbafrani
Hi, I have a keyed datastream in the type of Tuple2. I want
to reduce it and merge all of the byte[] for a key. (the first filed (Long)
is the key). So I need reduce function return the type Tuple2>, but reduce function didn't allow that! How can I do
such job in Flink?


understanding purpose of TextInputFormat

2018-07-14 Thread chrisr123
I'm building a streaming app that continuously monitors a directory for new
files and I'm confused about why I have to specify a TextInputFormat - see
source code below.  It seems redundant but it is a required parameter.  It
makes perfect sense to specify the directory I want to monitor, but what
purpose is the TextInputFormat filling and what should I set it to? Example:
Simple Word Count App that reads lines of text.  


TextInputFormat format = new TextInputFormat(
new org.apache.flink.core.fs.Path("file:///tmp/dir/"));

DataStream inputStream = env.readFile(
format,
"file:///tmp/dir/",
FileProcessingMode.PROCESS_CONTINUOUSLY,
100);



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/