Re: counting words (not frequency)

2016-07-22 Thread Roshan Naik
Seems a bit convoluted for such a simple problem. I am thinking a custom
streaming count() operator will simplify. Wasn¹t able to find examples for
custom Streaming operators.
-roshan


On 7/21/16, 8:00 PM, "hrajaram"  wrote:

>Can't you use a KeyedStream, I mean keyBy with the sameKey?  something
>like
>this,
>source.flatMap(new Tokenizer()).keyBy(0).sum(2).project(2).print();
>
>Assuming tokenizer is giving Tuple3
>
>1-> is always the same key, say "test"
>2->the actual word
>3-> 1
>
>
>
>There might be some other good choices but this is the first thing that
>quickly came in my mind :-)
>
>Hari
>
>
>
>
>--
>View this message in context:
>http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/counti
>ng-words-not-frequency-tp8099p8100.html
>Sent from the Apache Flink User Mailing List archive. mailing list
>archive at Nabble.com.
>



flink batch data processing

2016-07-22 Thread Paul Joireman
I'm evaluating for some processing batches of data.  As a simple example say I 
have 2000 points which I would like to pass through an FIR filter using 
functionality provided by the Python scipy libraryjk.  The scipy filter is a 
simple function which accepts a set of coefficients and the data to filter and 
returns the data.   Is is possible to create a transformation to handle this in 
flink?  It seems flink transformations are applied on a point by point basis 
but I may be missing something.

Paul


FlinkShell with standalone HA cluster

2016-07-22 Thread Scott Clasen
Hi All-

  I am having trouble using the FlinkShell against a standalone HA cluster
(recovery.mode: zookeeper)

 If I remove the zookeeper conf from flink-conf.yaml and restart the
cluster, I can execute stuff from the shell just fine. (One master is
running)

  Adding back the config, and restarting, the cluster is up, with
taskmanagers registered, etc, but even when pointing the FlinkShell at the
master and its dynamically chosen port, I get an error "he program
execution failed: Communication with JobManager failed: Lost connection to
the JobManager."

Is there a requirement that multiple masters be up, or some other way to
tell FlinkShell that the cluster is a standalone HA cluster?

Thanks!
SC


Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Stephan Ewen
Initializing in "open(Configuration)" means that the ObjectMapper is
created only in the cluster once the MapFunction is started.

Otherwise it is created before (on the client) and Serialization-copied
into the cluster, together with the MapFunction.

If the second approach works well (i.e., the ObjectMapper is serialization
friendly), then there is no downside to it.

On Fri, Jul 22, 2016 at 6:01 PM, Dong iL, Kim  wrote:

> declare objectMapper out of map class.
>
> final ObjectMapper objectMapper = new ObjectMapper();
>
> source.map(str -> objectMapper.readValue(value, Request.class));
>
> On Sat, Jul 23, 2016 at 12:28 AM, Yassin Marzouki 
> wrote:
>
>> Thank you Stephan and Kim, that solved the problem.
>> Just to make sure, is using a MapFunction as in the following code any
>> different? i.e. does it initialize the objectMapper for every element in
>> the stream?
>>
>> .map(new MapFunction() {
>>
>> private ObjectMapper objectMapper = new ObjectMapper();
>>
>> @Override
>>  public Request map(String value) throws Exception {
>>  return objectMapper.readValue(value, Request.class);
>> }
>> })
>>
>> On Fri, Jul 22, 2016 at 5:20 PM, Dong iL, Kim  wrote:
>>
>>> oops. stephan already answered.
>>> sorry. T^T
>>>
>>> On Sat, Jul 23, 2016 at 12:16 AM, Dong iL, Kim 
>>> wrote:
>>>
 is open method signature right? or typo?

 void open(Configuration parameters) throws Exception;

 On Sat, Jul 23, 2016 at 12:09 AM, Stephan Ewen 
 wrote:

> I think you overrode the open method with the wrong signature. The
> right signature would be "open(Configuration cfg) {...}". You probably
> overlooked this because you missed the "@Override" annotation.
>
> On Fri, Jul 22, 2016 at 4:49 PM, Yassin Marzouki  > wrote:
>
>> Hi everyone,
>>
>> I want to convert a stream of json strings to POJOs using Jackson, so
>> I did the following:
>>
>> .map(new RichMapFunction() {
>>
>> private ObjectMapper objectMapper;
>>
>> public void open() {
>> objectMapper = new ObjectMapper();
>> }
>>
>> @Override
>>  public Request map(String value) throws Exception {
>>  return objectMapper.readValue(value, Request.class);
>> }
>> })
>>
>> But this code gave me a NullPointerException because the objectMapper
>> was not initialized successfully.
>>
>> 1. Isn't the open() method supposed to be called before map() and
>> initialize objectMapper?
>> 2. I figured out that initializing objectMapper before the open()
>> method resolves the problem, and that it works also with a simple
>> MapFunction. In that case, is there an advantage for using a
>> RichMapFunction?
>>
>> Best,
>> Yassine
>>
>
>


 --
 http://www.kiva.org; TARGET="_top">
 http://www.kiva.org/images/bannerlong.png; WIDTH="460"
 HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0"
 ALIGN="BOTTOM">

>>>
>>>
>>>
>>> --
>>> http://www.kiva.org; TARGET="_top">
>>> http://www.kiva.org/images/bannerlong.png; WIDTH="460"
>>> HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0"
>>> ALIGN="BOTTOM">
>>>
>>
>>
>
>
> --
> http://www.kiva.org; TARGET="_top">
> http://www.kiva.org/images/bannerlong.png; WIDTH="460"
> HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0"
> ALIGN="BOTTOM">
>


Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Dong iL, Kim
declare objectMapper out of map class.

final ObjectMapper objectMapper = new ObjectMapper();

source.map(str -> objectMapper.readValue(value, Request.class));

On Sat, Jul 23, 2016 at 12:28 AM, Yassin Marzouki 
wrote:

> Thank you Stephan and Kim, that solved the problem.
> Just to make sure, is using a MapFunction as in the following code any
> different? i.e. does it initialize the objectMapper for every element in
> the stream?
>
> .map(new MapFunction() {
>
> private ObjectMapper objectMapper = new ObjectMapper();
>
> @Override
>  public Request map(String value) throws Exception {
>  return objectMapper.readValue(value, Request.class);
> }
> })
>
> On Fri, Jul 22, 2016 at 5:20 PM, Dong iL, Kim  wrote:
>
>> oops. stephan already answered.
>> sorry. T^T
>>
>> On Sat, Jul 23, 2016 at 12:16 AM, Dong iL, Kim 
>> wrote:
>>
>>> is open method signature right? or typo?
>>>
>>> void open(Configuration parameters) throws Exception;
>>>
>>> On Sat, Jul 23, 2016 at 12:09 AM, Stephan Ewen  wrote:
>>>
 I think you overrode the open method with the wrong signature. The
 right signature would be "open(Configuration cfg) {...}". You probably
 overlooked this because you missed the "@Override" annotation.

 On Fri, Jul 22, 2016 at 4:49 PM, Yassin Marzouki 
 wrote:

> Hi everyone,
>
> I want to convert a stream of json strings to POJOs using Jackson, so
> I did the following:
>
> .map(new RichMapFunction() {
>
> private ObjectMapper objectMapper;
>
> public void open() {
> objectMapper = new ObjectMapper();
> }
>
> @Override
>  public Request map(String value) throws Exception {
>  return objectMapper.readValue(value, Request.class);
> }
> })
>
> But this code gave me a NullPointerException because the objectMapper
> was not initialized successfully.
>
> 1. Isn't the open() method supposed to be called before map() and
> initialize objectMapper?
> 2. I figured out that initializing objectMapper before the open()
> method resolves the problem, and that it works also with a simple
> MapFunction. In that case, is there an advantage for using a
> RichMapFunction?
>
> Best,
> Yassine
>


>>>
>>>
>>> --
>>> http://www.kiva.org; TARGET="_top">
>>> http://www.kiva.org/images/bannerlong.png; WIDTH="460"
>>> HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0"
>>> ALIGN="BOTTOM">
>>>
>>
>>
>>
>> --
>> http://www.kiva.org; TARGET="_top">
>> http://www.kiva.org/images/bannerlong.png; WIDTH="460"
>> HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0"
>> ALIGN="BOTTOM">
>>
>
>


-- 
http://www.kiva.org; TARGET="_top">
http://www.kiva.org/images/bannerlong.png; WIDTH="460"
HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0"
ALIGN="BOTTOM">


Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Yassin Marzouki
Thank you Stephan and Kim, that solved the problem.
Just to make sure, is using a MapFunction as in the following code any
different? i.e. does it initialize the objectMapper for every element in
the stream?

.map(new MapFunction() {

private ObjectMapper objectMapper = new ObjectMapper();

@Override
 public Request map(String value) throws Exception {
 return objectMapper.readValue(value, Request.class);
}
})

On Fri, Jul 22, 2016 at 5:20 PM, Dong iL, Kim  wrote:

> oops. stephan already answered.
> sorry. T^T
>
> On Sat, Jul 23, 2016 at 12:16 AM, Dong iL, Kim  wrote:
>
>> is open method signature right? or typo?
>>
>> void open(Configuration parameters) throws Exception;
>>
>> On Sat, Jul 23, 2016 at 12:09 AM, Stephan Ewen  wrote:
>>
>>> I think you overrode the open method with the wrong signature. The right
>>> signature would be "open(Configuration cfg) {...}". You probably overlooked
>>> this because you missed the "@Override" annotation.
>>>
>>> On Fri, Jul 22, 2016 at 4:49 PM, Yassin Marzouki 
>>> wrote:
>>>
 Hi everyone,

 I want to convert a stream of json strings to POJOs using Jackson, so I
 did the following:

 .map(new RichMapFunction() {

 private ObjectMapper objectMapper;

 public void open() {
 objectMapper = new ObjectMapper();
 }

 @Override
  public Request map(String value) throws Exception {
  return objectMapper.readValue(value, Request.class);
 }
 })

 But this code gave me a NullPointerException because the objectMapper
 was not initialized successfully.

 1. Isn't the open() method supposed to be called before map() and
 initialize objectMapper?
 2. I figured out that initializing objectMapper before the open()
 method resolves the problem, and that it works also with a simple
 MapFunction. In that case, is there an advantage for using a
 RichMapFunction?

 Best,
 Yassine

>>>
>>>
>>
>>
>> --
>> http://www.kiva.org; TARGET="_top">
>> http://www.kiva.org/images/bannerlong.png; WIDTH="460"
>> HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0"
>> ALIGN="BOTTOM">
>>
>
>
>
> --
> http://www.kiva.org; TARGET="_top">
> http://www.kiva.org/images/bannerlong.png; WIDTH="460"
> HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0"
> ALIGN="BOTTOM">
>


Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Dong iL, Kim
oops. stephan already answered.
sorry. T^T

On Sat, Jul 23, 2016 at 12:16 AM, Dong iL, Kim  wrote:

> is open method signature right? or typo?
>
> void open(Configuration parameters) throws Exception;
>
> On Sat, Jul 23, 2016 at 12:09 AM, Stephan Ewen  wrote:
>
>> I think you overrode the open method with the wrong signature. The right
>> signature would be "open(Configuration cfg) {...}". You probably overlooked
>> this because you missed the "@Override" annotation.
>>
>> On Fri, Jul 22, 2016 at 4:49 PM, Yassin Marzouki 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I want to convert a stream of json strings to POJOs using Jackson, so I
>>> did the following:
>>>
>>> .map(new RichMapFunction() {
>>>
>>> private ObjectMapper objectMapper;
>>>
>>> public void open() {
>>> objectMapper = new ObjectMapper();
>>> }
>>>
>>> @Override
>>>  public Request map(String value) throws Exception {
>>>  return objectMapper.readValue(value, Request.class);
>>> }
>>> })
>>>
>>> But this code gave me a NullPointerException because the objectMapper
>>> was not initialized successfully.
>>>
>>> 1. Isn't the open() method supposed to be called before map() and
>>> initialize objectMapper?
>>> 2. I figured out that initializing objectMapper before the open() method
>>> resolves the problem, and that it works also with a simple MapFunction. In
>>> that case, is there an advantage for using a RichMapFunction?
>>>
>>> Best,
>>> Yassine
>>>
>>
>>
>
>
> --
> http://www.kiva.org; TARGET="_top">
> http://www.kiva.org/images/bannerlong.png; WIDTH="460"
> HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0"
> ALIGN="BOTTOM">
>



-- 
http://www.kiva.org; TARGET="_top">
http://www.kiva.org/images/bannerlong.png; WIDTH="460"
HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0"
ALIGN="BOTTOM">


Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Dong iL, Kim
is open method signature right? or typo?

void open(Configuration parameters) throws Exception;

On Sat, Jul 23, 2016 at 12:09 AM, Stephan Ewen  wrote:

> I think you overrode the open method with the wrong signature. The right
> signature would be "open(Configuration cfg) {...}". You probably overlooked
> this because you missed the "@Override" annotation.
>
> On Fri, Jul 22, 2016 at 4:49 PM, Yassin Marzouki 
> wrote:
>
>> Hi everyone,
>>
>> I want to convert a stream of json strings to POJOs using Jackson, so I
>> did the following:
>>
>> .map(new RichMapFunction() {
>>
>> private ObjectMapper objectMapper;
>>
>> public void open() {
>> objectMapper = new ObjectMapper();
>> }
>>
>> @Override
>>  public Request map(String value) throws Exception {
>>  return objectMapper.readValue(value, Request.class);
>> }
>> })
>>
>> But this code gave me a NullPointerException because the objectMapper was
>> not initialized successfully.
>>
>> 1. Isn't the open() method supposed to be called before map() and
>> initialize objectMapper?
>> 2. I figured out that initializing objectMapper before the open() method
>> resolves the problem, and that it works also with a simple MapFunction. In
>> that case, is there an advantage for using a RichMapFunction?
>>
>> Best,
>> Yassine
>>
>
>


-- 
http://www.kiva.org; TARGET="_top">
http://www.kiva.org/images/bannerlong.png; WIDTH="460"
HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0"
ALIGN="BOTTOM">


Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Stephan Ewen
I think you overrode the open method with the wrong signature. The right
signature would be "open(Configuration cfg) {...}". You probably overlooked
this because you missed the "@Override" annotation.

On Fri, Jul 22, 2016 at 4:49 PM, Yassin Marzouki 
wrote:

> Hi everyone,
>
> I want to convert a stream of json strings to POJOs using Jackson, so I
> did the following:
>
> .map(new RichMapFunction() {
>
> private ObjectMapper objectMapper;
>
> public void open() {
> objectMapper = new ObjectMapper();
> }
>
> @Override
>  public Request map(String value) throws Exception {
>  return objectMapper.readValue(value, Request.class);
> }
> })
>
> But this code gave me a NullPointerException because the objectMapper was
> not initialized successfully.
>
> 1. Isn't the open() method supposed to be called before map() and
> initialize objectMapper?
> 2. I figured out that initializing objectMapper before the open() method
> resolves the problem, and that it works also with a simple MapFunction. In
> that case, is there an advantage for using a RichMapFunction?
>
> Best,
> Yassine
>


Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Yassin Marzouki
Hi everyone,

I want to convert a stream of json strings to POJOs using Jackson, so I did
the following:

.map(new RichMapFunction() {

private ObjectMapper objectMapper;

public void open() {
objectMapper = new ObjectMapper();
}

@Override
 public Request map(String value) throws Exception {
 return objectMapper.readValue(value, Request.class);
}
})

But this code gave me a NullPointerException because the objectMapper was
not initialized successfully.

1. Isn't the open() method supposed to be called before map() and
initialize objectMapper?
2. I figured out that initializing objectMapper before the open() method
resolves the problem, and that it works also with a simple MapFunction. In
that case, is there an advantage for using a RichMapFunction?

Best,
Yassine


AW: Getting the NumberOfParallelSubtask

2016-07-22 Thread Paschek, Robert
Hi Chesnay, hi Robert

Thank you for your explanations : - )
(And sorry for the late reply).

Regards,
Robert

Von: Robert Metzger [mailto:rmetz...@apache.org]
Gesendet: Dienstag, 21. Juni 2016 12:12
An: user@flink.apache.org
Betreff: Re: Getting the NumberOfParallelSubtask

Hi Robert,

the number of parallel subtasks is the parallelism of the job or the individual 
operator. Only when executing Flink locally, the parallelism is set to the CPU 
cores.
The number of groups generated by the groupBy() transformation doesn't affect 
the parallelism. Very often the number of groups is much higher than the 
parallelism, in those cases, each parallel instance will process multiple 
groups.

If you want to know the parallelism of your operators globally, you'll need to 
set it manually (say all operators to a parallelism of 8).

Regards,
Robert


On Mon, Jun 20, 2016 at 10:00 PM, Chesnay Schepler 
> wrote:
Within the mapper you cannot access the parallelism of the following nor 
preceding operation.


On 20.06.2016 15:56, Paschek, Robert wrote:
Hi Mailing list,

using a RichMapPartitionFunction i can access the total number m of this mapper 
utilized in my job with
int m = getRuntimeContext().getNumberOfParallelSubtasks();

I think that would be - in general - the total number of CPU Cores used by 
Apache Flink among the cluster.

Is there a way to access the number of the following reducer?

In general i would assume that the number of the following reducers depends on 
the number of groups generated by the groupBy() transformation. So the number 
of the reducer r would be 1 <= r <= m.

My Job:
DataSet output = input
.mapPartition(new MR_GPMRS_Mapper())
.groupBy(0)
.reduceGroup(new MR_GPMRS_Reducer());

Thank you in advance
Robert




Re: State in external db (dynamodb)

2016-07-22 Thread Josh
Hi all,

>(1)  Only write to the DB upon a checkpoint, at which point it is known
that no replay of that data will occur any more. Values from partially
successful writes will be overwritten >with correct value. I assume that is
what you thought of when referring to the State Backend, because in some sense,
that is what that state backend would do.

>I think it is simpler to realize that in a custom sink, than developing a
new state backend.  Another Flink committer (Chesnay) has developed some
nice tooling for that, to >be merged into Flink soon.

I am planning to implement something like this:

Say I have a topology which looks like this: [source => operator => sink],
I would like it to work like this:
1. Upon receiving an element, the operator retrieves some state from an
external key-value store (would like to put an in-memory cache on top of
this with a TTL)
2. The operator emits a new state (and updates its in-memory cache with the
new state)
3. The sink batches up all the new states and upon checkpoint flushes them
to the external store

Could anyone point me at the work that's already been done on this? Has it
already been merged into Flink?

Thanks,
Josh

On Thu, Apr 7, 2016 at 12:53 PM, Aljoscha Krettek 
wrote:

> Hi,
> regarding windows and incremental aggregation. This is already happening
> in Flink as of now. When you give a ReduceFunction on a window, which "sum"
> internally does, the result for a window is incrementally updated whenever
> a new element comes in. This incremental aggregation only happens when you
> specify a ReduceFunction or a FoldFunction, not for the general case of a
> WindowFunction, where all elements in the window are required.
>
> You are right about incremental snapshots. We mainly want to introduce
> them to reduce latency incurred by snapshotting. Right now, processing
> stalls when a checkpoint happens.
>
> Cheers,
> Aljoscha
>
> On Thu, 7 Apr 2016 at 13:12 Shannon Carey  wrote:
>
>> Thanks very kindly for your response, Stephan!
>>
>> We will definitely use a custom sink for persistence of idempotent
>> mutations whenever possible. Exposing state as read-only to external
>> systems is a complication we will try to avoid. Also, we will definitely
>> only write to the DB upon checkpoint, and the write will be synchronous and
>> transactional (no possibility of partial success/failure).
>>
>> However, we do want Flink state to be durable, we want it to be in memory
>> when possible, and we want to avoid running out of memory due to the size
>> of the state. For example, if you have a wide window that hasn't gotten an
>> event for a long time, we want to evict that window state from memory.
>> We're now thinking of using Redis (via AWS Elasticache) which also
>> conveniently has TTL, instead of DynamoDB.
>>
>> I just wanted to check whether eviction of (inactive/quiet) state from
>> memory is something that I should consider implementing, or whether Flink
>> already had some built-in way of doing it.
>>
>> Along the same lines, I am also wondering whether Flink already has means
>> of compacting the state of a window by applying an aggregation function to
>> the elements so-far (eg. every time window is triggered)? For example, if
>> you are only executing a sum on the contents of the window, the window
>> state doesn't need to store all the individual items in the window, it only
>> needs to store the sum. Aggregations other than "sum" might have that
>> characteristic too. I don't know if Flink is already that intelligent or
>> whether I should figure out how to aggregate window contents myself when
>> possible with something like a window fold? Another poster (Aljoscha) was
>> talking about adding incremental snapshots, but it sounds like that would
>> only improve the write throughput not the memory usage.
>>
>> Thanks again!
>> Shannon Carey
>>
>>
>> From: Stephan Ewen 
>> Date: Wednesday, April 6, 2016 at 10:37 PM
>> To: 
>> Subject: Re: State in external db (dynamodb)
>>
>> Hi Shannon!
>>
>> Welcome to the Flink community!
>>
>> You are right, sinks need in general to be idempotent if you want
>> "exactly-once" semantics, because there can be a replay of elements that
>> were already written.
>>
>> However, what you describe later, overwriting of a key with a new value
>> (or the same value again) is pretty much sufficient. That means that when a
>> duplicate write happens during replay, the value for the key is simply
>> overwritten with the same value again.
>> As long as all computation is purely in Flink and you only write to the
>> key/value store (rather than read from k/v, modify in Flink, write to k/v),
>> you get the consistency that for example counts/aggregates never have
>> duplicates.
>>
>> If Flink needs to look up state from the database (because it is no
>> longer in Flink), it is a bit more tricky. I assume that is where you are
>> going with "Subsequently, 

Re: Logical plan optimization with Calcite

2016-07-22 Thread gallenvara
Thanks Max and Timo for the explanation. :)



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Logical-plan-optimization-with-Calcite-tp8037p8106.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.