Re: [Dev] Dependencies issue related to implementing InputFormat Interface

2017-01-17 Thread Fabian Hueske
Hi Pawan,

If you want to read a file, you might want to extend the FileInputFormat
class. It has already a lot of file-related functionality implemented.
OT is the type of the records produced by the InputFormat. For example
Tuple2 if the input format produce a tuple with two fields
of String and Integer types.

Best, Fabian

2017-01-18 4:52 GMT+01:00 Pawan Manishka Gunarathna <
pawan.manis...@gmail.com>:

> Hi,
> Yeah I also wrote in the way you have written..
>
> public class ReadFromFile implements InputFormat{
> }
>
> Is that a problem with that declaration or dependencies ?
>
> Thanks,
> Pawan
>
> On Tue, Jan 17, 2017 at 7:56 PM, Chesnay Schepler 
> wrote:
>
> > Hello,
> >
> > Did you write something like this?
> >
> >public class MyInputFormat implements InputFormat >InputSplit> {
> > 
> >}
> >
> > Regards,
> > Chesnay
> >
> > On 17.01.2017 04:18, Pawan Manishka Gunarathna wrote:
> >
> >> Hi,
> >>
> >> I'm currently working on Flink InputFormat Interface implementation. I'm
> >> writing a java program to read data from a file using InputputFormat
> >> Interface. I used maven project and I have added following dependencies
> to
> >> the pom.xml.
> >>
> >> 
> >>  
> >>  org.apache.flink
> >>  flink-core
> >>  1.1.4
> >>  
> >>
> >>  
> >>  org.apache.flink
> >>  flink-clients_2.11
> >>  1.1.4
> >>  
> >>
> >>  
> >>  org.apache.flink
> >>  flink-java
> >>  1.1.4
> >>  
> >>
> >> 
> >>
> >>
> >> I have a java class that implements InputFormat. It works with
> >> *InputFormat.
> >> *But it didn't allow to used *InputFormat.
> *That
> >> OT field didn't recognized.
> >>
> >> I need a any kind of help to solve this problem.
> >>
> >> Thanks,
> >> Pawan
> >>
> >>
> >
>
>
> --
>
> *Pawan Gunaratne*
> *Mob: +94 770373556*
>


[jira] [Created] (FLINK-5545) remove FlinkAggregateExpandDistinctAggregatesRule when we upgrade calcite

2017-01-17 Thread Zhenghua Gao (JIRA)
Zhenghua Gao created FLINK-5545:
---

 Summary: remove FlinkAggregateExpandDistinctAggregatesRule when we 
upgrade calcite
 Key: FLINK-5545
 URL: https://issues.apache.org/jira/browse/FLINK-5545
 Project: Flink
  Issue Type: Bug
Reporter: Zhenghua Gao
Assignee: Zhenghua Gao
Priority: Minor


We copy calcite's AggregateExpandDistinctAggregatesRule to Flink project, and 
do a quick fix to avoid some bad case mentioned in CALCITE-1558.
Should drop it and use calcite's AggregateExpandDistinctAggregatesRule when we 
upgrade to calcite 1.12(above)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Created] (FLINK-5544) Implement Internal Timer Service in RocksDB

2017-01-17 Thread SHI Xiaogang
Hi Florian,

The memory usage depends on the types of keys and namespaces.

We have not experienced that many concurrently open windows.
But given that each open window needs several bytes for its timer,  2M open
windows may cost up to hundreds of MB.

Regards
Xiaogang

2017-01-18 14:45 GMT+08:00 Florian König :

> Hi,
>
> that sounds very useful. We are using quite a lot of timers in custom
> windows. Does anybody have experience with the memory requirements of,
> let’s say, 2 million concurrently open windows and the associated timers?
>
> Thanks
> Florian
>
> > Am 18.01.2017 um 04:40 schrieb Xiaogang Shi (JIRA) :
> >
> > Xiaogang Shi created FLINK-5544:
> > ---
> >
> > Summary: Implement Internal Timer Service in RocksDB
> > Key: FLINK-5544
> > URL: https://issues.apache.org/jira/browse/FLINK-5544
> > Project: Flink
> >  Issue Type: Bug
> >  Components: Streaming
> >Reporter: Xiaogang Shi
> >
> >
> > Now the only implementation of internal timer service is
> HeapInternalTimerService which stores all timers in memory. In the cases
> where the number of keys is very large, the timer service will cost too
> much memory. A implementation which stores timers in RocksDB seems good to
> deal with these cases.
> >
> > It might be a little challenging to implement a RocksDB timer service
> because the timers are accessed in different ways. When timers are
> triggered, we need to access timers in the order of timestamp. But when
> performing checkpoints, we must have a method to obtain all timers of a
> given key group.
> >
> > A good implementation, as suggested by [~StephanEwen], follows the idea
> of merge sorting. We can store timers in RocksDB with the format
> {{KEY_GROUP#TIMER#KEY}}. In this way, the timers under a key group are put
> together and are sorted.
> >
> > Then we can deploy an in-memory heap which keeps the first timer of each
> key group to get the next timer to trigger. When a key group's first timer
> is updated, we can efficiently update the heap.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
>
>
>


Re: [jira] [Created] (FLINK-5544) Implement Internal Timer Service in RocksDB

2017-01-17 Thread Florian König
Hi,

that sounds very useful. We are using quite a lot of timers in custom windows. 
Does anybody have experience with the memory requirements of, let’s say, 2 
million concurrently open windows and the associated timers?

Thanks
Florian

> Am 18.01.2017 um 04:40 schrieb Xiaogang Shi (JIRA) :
> 
> Xiaogang Shi created FLINK-5544:
> ---
> 
> Summary: Implement Internal Timer Service in RocksDB
> Key: FLINK-5544
> URL: https://issues.apache.org/jira/browse/FLINK-5544
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Xiaogang Shi
> 
> 
> Now the only implementation of internal timer service is 
> HeapInternalTimerService which stores all timers in memory. In the cases 
> where the number of keys is very large, the timer service will cost too much 
> memory. A implementation which stores timers in RocksDB seems good to deal 
> with these cases.
> 
> It might be a little challenging to implement a RocksDB timer service because 
> the timers are accessed in different ways. When timers are triggered, we need 
> to access timers in the order of timestamp. But when performing checkpoints, 
> we must have a method to obtain all timers of a given key group.
> 
> A good implementation, as suggested by [~StephanEwen], follows the idea of 
> merge sorting. We can store timers in RocksDB with the format 
> {{KEY_GROUP#TIMER#KEY}}. In this way, the timers under a key group are put 
> together and are sorted. 
> 
> Then we can deploy an in-memory heap which keeps the first timer of each key 
> group to get the next timer to trigger. When a key group's first timer is 
> updated, we can efficiently update the heap.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)




Re: [Dev] Dependencies issue related to implementing InputFormat Interface

2017-01-17 Thread Pawan Manishka Gunarathna
Hi,
Yeah I also wrote in the way you have written..

public class ReadFromFile implements InputFormat{
}

Is that a problem with that declaration or dependencies ?

Thanks,
Pawan

On Tue, Jan 17, 2017 at 7:56 PM, Chesnay Schepler 
wrote:

> Hello,
>
> Did you write something like this?
>
>public class MyInputFormat implements InputFormatInputSplit> {
> 
>}
>
> Regards,
> Chesnay
>
> On 17.01.2017 04:18, Pawan Manishka Gunarathna wrote:
>
>> Hi,
>>
>> I'm currently working on Flink InputFormat Interface implementation. I'm
>> writing a java program to read data from a file using InputputFormat
>> Interface. I used maven project and I have added following dependencies to
>> the pom.xml.
>>
>> 
>>  
>>  org.apache.flink
>>  flink-core
>>  1.1.4
>>  
>>
>>  
>>  org.apache.flink
>>  flink-clients_2.11
>>  1.1.4
>>  
>>
>>  
>>  org.apache.flink
>>  flink-java
>>  1.1.4
>>  
>>
>> 
>>
>>
>> I have a java class that implements InputFormat. It works with
>> *InputFormat.
>> *But it didn't allow to used *InputFormat. *That
>> OT field didn't recognized.
>>
>> I need a any kind of help to solve this problem.
>>
>> Thanks,
>> Pawan
>>
>>
>


-- 

*Pawan Gunaratne*
*Mob: +94 770373556*


[jira] [Created] (FLINK-5544) Implement Internal Timer Service in RocksDB

2017-01-17 Thread Xiaogang Shi (JIRA)
Xiaogang Shi created FLINK-5544:
---

 Summary: Implement Internal Timer Service in RocksDB
 Key: FLINK-5544
 URL: https://issues.apache.org/jira/browse/FLINK-5544
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Xiaogang Shi


Now the only implementation of internal timer service is 
HeapInternalTimerService which stores all timers in memory. In the cases where 
the number of keys is very large, the timer service will cost too much memory. 
A implementation which stores timers in RocksDB seems good to deal with these 
cases.

It might be a little challenging to implement a RocksDB timer service because 
the timers are accessed in different ways. When timers are triggered, we need 
to access timers in the order of timestamp. But when performing checkpoints, we 
must have a method to obtain all timers of a given key group.

A good implementation, as suggested by [~StephanEwen], follows the idea of 
merge sorting. We can store timers in RocksDB with the format 
{{KEY_GROUP#TIMER#KEY}}. In this way, the timers under a key group are put 
together and are sorted. 

Then we can deploy an in-memory heap which keeps the first timer of each key 
group to get the next timer to trigger. When a key group's first timer is 
updated, we can efficiently update the heap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5543) customCommandLine tips in CliFrontend

2017-01-17 Thread shijinkui (JIRA)
shijinkui created FLINK-5543:


 Summary: customCommandLine tips in CliFrontend
 Key: FLINK-5543
 URL: https://issues.apache.org/jira/browse/FLINK-5543
 Project: Flink
  Issue Type: Improvement
  Components: Client
Reporter: shijinkui


Tips: DefaultCLI must be added at the end, because 
getActiveCustomCommandLine(..) will get the active CustomCommandLine in order 
and DefaultCLI isActive always return true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5542) YARN client incorrectly uses local YARN config to check vcore capacity

2017-01-17 Thread Shannon Carey (JIRA)
Shannon Carey created FLINK-5542:


 Summary: YARN client incorrectly uses local YARN config to check 
vcore capacity
 Key: FLINK-5542
 URL: https://issues.apache.org/jira/browse/FLINK-5542
 Project: Flink
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.1.4
Reporter: Shannon Carey


See 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/1-1-4-on-YARN-vcores-change-td11016.html

When using bin/yarn-session.sh, AbstractYarnClusterDescriptor line 271 in 1.1.4 
is comparing the user's selected number of vcores to the vcores configured in 
the local node's YARN config (from YarnConfiguration eg. yarn-site.xml and 
yarn-default.xml). It incorrectly prevents Flink from launching even if there 
is sufficient vcore capacity on the cluster.

That is not correct, because the application will not necessarily run on the 
local node. For example, if running the yarn-session.sh client from the AWS EMR 
master node, the vcore count there may be different from the vcore count on the 
core nodes where Flink will actually run.

A reasonable way to fix this would probably be to reuse the logic from 
"yarn-session.sh -q" (FlinkYarnSessionCli line 550) which knows how to get 
vcore information from the real worker nodes.  Alternatively, perhaps we could 
remove the check entirely and rely on YARN's Scheduler to determine whether 
sufficient resources exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5541) Missing null check for localJar in FlinkSubmitter#submitTopology()

2017-01-17 Thread Ted Yu (JIRA)
Ted Yu created FLINK-5541:
-

 Summary: Missing null check for localJar in 
FlinkSubmitter#submitTopology()
 Key: FLINK-5541
 URL: https://issues.apache.org/jira/browse/FLINK-5541
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
  if (localJar == null) {
try {
  for (final URL url : ((ContextEnvironment) 
ExecutionEnvironment.getExecutionEnvironment())
  .getJars()) {
// TODO verify that there is only one jar
localJar = new File(url.toURI()).getAbsolutePath();
  }
} catch (final URISyntaxException e) {
  // ignore
} catch (final ClassCastException e) {
  // ignore
}
  }

  logger.info("Submitting topology " + name + " in distributed mode with 
conf " + serConf);
  client.submitTopologyWithOpts(name, localJar, topology);
{code}
Since the try block may encounter URISyntaxException / ClassCastException, we 
should check that localJar is not null before calling submitTopologyWithOpts().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 答复: States split over to external storage

2017-01-17 Thread Chen Qin
Hi liuxinchun,

Thanks for expedite feedback!

I think if dev community find it makes sense to invest on this feature,
allowing user config eviction strategy(2) makes sense to me. Given the
nature how flink job states increase various a lot, there might be a
interface allow state backend decide which state can be evicted or
restored.

Regarding to (1), I see there are optimizations can give performance boost
immediately. I would suggest raise a jira and discuss with whole dev
community. There might be cases it will conflict with upcoming refactors.
Notice Flink devs are super busy releasing 1.2 so expecting late response :)

Thanks,
Chen


>
> (1) The organization form of current sliding 
> window(SlidingProcessingTimeWindow
> and SlidingEventTimeWindow) have a drawback: When using ListState, a
> element may be kept in multiple windows (size / slide). It's time consuming
> and waste storage when checkpointing.
>   Opinion: I think this is a optimal point. Elements can be organized
> according to the key and split(maybe also can called as pane). When
> triggering cleanup, only the oldest split(pane) can be cleanup.
> (2) Incremental backup strategy. In original idea, we plan to only backup
> the new coming element, and that means a whole window may span several
> checkpoints, and we have develop this idea in our private SPS. But in
> Flink, the window may not keep raw data(for example, ReducingState and
> FoldingState). The idea of Chen Qin maybe a candidate strategy. We can keep
> in touch and exchange our respective strategy.
> -邮件原件-
> 发件人: Chen Qin [mailto:c...@uber.com]
> 发送时间: 2017年1月17日 13:30
> 收件人: dev@flink.apache.org
> 抄送: iuxinc...@huawei.com; Aljoscha Krettek; shijinkui
> 主题: States split over to external storage
>
> Hi there,
>
> I would like to discuss split over local states to external storage. The
> use case is NOT another external state backend like HDFS, rather just to
> expand beyond what local disk/ memory can hold when large key space exceeds
> what task managers could handle. Realizing FLINK-4266 might be hard to
> tacking all-in-one, I would live give a shot to split-over first.
>
> An intuitive approach would be treat HeapStatebackend as LRU cache and
> split over to external key/value storage when threshold triggered. To make
> this happen, we need minor refactor to runtime and adding set/get logic.
> One nice thing of keeping HDFS to store snapshots would be avoid
> versioning conflicts. Once checkpoint restore happens, partial write data
> will be overwritten with previously checkpointed value.
>
> Comments?
>
> --
> -Chen Qin
>



-- 
-Chen Qin


[jira] [Created] (FLINK-5539) CLI: info/list/stop/cancel

2017-01-17 Thread Eron Wright (JIRA)
Eron Wright  created FLINK-5539:
---

 Summary: CLI: info/list/stop/cancel
 Key: FLINK-5539
 URL: https://issues.apache.org/jira/browse/FLINK-5539
 Project: Flink
  Issue Type: Sub-task
Reporter: Eron Wright 


Implement the remaining CLI options (other than savepoint which is tracked by a 
different sub-task).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5540) CLI: savepoint

2017-01-17 Thread Eron Wright (JIRA)
Eron Wright  created FLINK-5540:
---

 Summary: CLI: savepoint
 Key: FLINK-5540
 URL: https://issues.apache.org/jira/browse/FLINK-5540
 Project: Flink
  Issue Type: Sub-task
Reporter: Eron Wright 



Implement CLI support for savepoints, in both the 'run' and 'cancel' operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5538) Config option: Kerberos

2017-01-17 Thread Eron Wright (JIRA)
Eron Wright  created FLINK-5538:
---

 Summary: Config option: Kerberos
 Key: FLINK-5538
 URL: https://issues.apache.org/jira/browse/FLINK-5538
 Project: Flink
  Issue Type: Sub-task
Reporter: Eron Wright 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5537) Config option: SSL

2017-01-17 Thread Eron Wright (JIRA)
Eron Wright  created FLINK-5537:
---

 Summary: Config option: SSL
 Key: FLINK-5537
 URL: https://issues.apache.org/jira/browse/FLINK-5537
 Project: Flink
  Issue Type: Sub-task
Reporter: Eron Wright 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5535) Config option: HDFS

2017-01-17 Thread Eron Wright (JIRA)
Eron Wright  created FLINK-5535:
---

 Summary: Config option: HDFS
 Key: FLINK-5535
 URL: https://issues.apache.org/jira/browse/FLINK-5535
 Project: Flink
  Issue Type: Sub-task
Reporter: Eron Wright 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5536) Config option: HA

2017-01-17 Thread Eron Wright (JIRA)
Eron Wright  created FLINK-5536:
---

 Summary: Config option: HA
 Key: FLINK-5536
 URL: https://issues.apache.org/jira/browse/FLINK-5536
 Project: Flink
  Issue Type: Sub-task
Reporter: Eron Wright 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5534) Config option: 'flink-options'

2017-01-17 Thread Eron Wright (JIRA)
Eron Wright  created FLINK-5534:
---

 Summary: Config option: 'flink-options'
 Key: FLINK-5534
 URL: https://issues.apache.org/jira/browse/FLINK-5534
 Project: Flink
  Issue Type: Sub-task
Reporter: Eron Wright 
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5533) DCOS Integration

2017-01-17 Thread Eron Wright (JIRA)
Eron Wright  created FLINK-5533:
---

 Summary: DCOS Integration
 Key: FLINK-5533
 URL: https://issues.apache.org/jira/browse/FLINK-5533
 Project: Flink
  Issue Type: New Feature
  Components: Mesos
Reporter: Eron Wright 
Assignee: Till Rohrmann



Umbrella issue for DCOS integration, including production-level features but 
not future improvements/bugs (for which a new 'DCOS' component might work best).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5532) Make the marker windowassignes for the fast aligned windows non-extendable.

2017-01-17 Thread Kostas Kloudas (JIRA)
Kostas Kloudas created FLINK-5532:
-

 Summary: Make the marker windowassignes for the fast aligned 
windows non-extendable.
 Key: FLINK-5532
 URL: https://issues.apache.org/jira/browse/FLINK-5532
 Project: Flink
  Issue Type: Bug
  Components: Windowing Operators
Affects Versions: 1.2.0
Reporter: Kostas Kloudas
 Fix For: 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5531) SSl code block formatting is broken

2017-01-17 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-5531:
---

 Summary: SSl code block formatting is broken
 Key: FLINK-5531
 URL: https://issues.apache.org/jira/browse/FLINK-5531
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.2.0, 1.3.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.2.0, 1.3.0


Most code blocks on the ssl page aren't rendered properly and are simply shown 
as text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5530) race condition in AbstractRocksDBState#getSerializedValue

2017-01-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5530:
--

 Summary: race condition in AbstractRocksDBState#getSerializedValue
 Key: FLINK-5530
 URL: https://issues.apache.org/jira/browse/FLINK-5530
 Project: Flink
  Issue Type: Bug
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber
Priority: Blocker


AbstractRocksDBState#getSerializedValue() uses the same key serialisation 
stream as the ordinary state access methods but is called in parallel during 
state queries thus violating the assumption of only one thread accessing it. 

This may lead to either wrong results in queries or corrupt data while queries 
are executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [Dev] Dependencies issue related to implementing InputFormat Interface

2017-01-17 Thread Chesnay Schepler

Hello,

Did you write something like this?

   public class MyInputFormat implements InputFormat {

   }

Regards,
Chesnay

On 17.01.2017 04:18, Pawan Manishka Gunarathna wrote:

Hi,

I'm currently working on Flink InputFormat Interface implementation. I'm
writing a java program to read data from a file using InputputFormat
Interface. I used maven project and I have added following dependencies to
the pom.xml.


 
 org.apache.flink
 flink-core
 1.1.4
 

 
 org.apache.flink
 flink-clients_2.11
 1.1.4
 

 
 org.apache.flink
 flink-java
 1.1.4
 




I have a java class that implements InputFormat. It works with *InputFormat.
*But it didn't allow to used *InputFormat. *That
OT field didn't recognized.

I need a any kind of help to solve this problem.

Thanks,
Pawan





[jira] [Created] (FLINK-5529) Improve / extends windowing documentation

2017-01-17 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-5529:
---

 Summary: Improve / extends windowing documentation
 Key: FLINK-5529
 URL: https://issues.apache.org/jira/browse/FLINK-5529
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Stephan Ewen
Assignee: Kostas Kloudas
 Fix For: 1.2.0, 1.3.0


Suggested Outline:

{code}
Windows

(0) Outline: The anatomy of a window operation

  stream
 [.keyBy(...)] <-  keyed versus non-keyed windows
  .window(...) <-  required: "assigner"
 [.trigger(...)]   <-  optional: "trigger" (else default trigger)
 [.evictor(...)]   <-  optional: "evictor" (else no evictor)
 [.allowedLateness()]  <-  optional, else zero
  .reduce/fold/apply() <-  required: "function"

(1) Types of windows

  - tumble
  - slide
  - session
  - global

(2) Pre-defined windows

   timeWindow() (tumble, slide)
   countWindow() (tumble, slide)
 - mention that count windows are inherently
   resource leaky unless limited key space

(3) Window Functions

  - apply: most basic, iterates over elements in window
  
  - aggregating: reduce and fold, can be used with "apply()" which will get one 
element
  
  - forward reference to state size section

(4) Advanced Windows

  - assigner
- simple
- merging
  - trigger
- registering timers (processing time, event time)
- state in triggers
  - life cycle of a window
- create
- state
- cleanup
  - when is window contents purged
  - when is state dropped
  - when is metadata (like merging set) dropped

(5) Late data
  - picture
  - fire vs fire_and_purge: late accumulates vs late resurrects (cf discarding 
mode)
  
(6) Evictors
  - TDB
  
(7) State size: How large will the state be?

Basic rule: Each element has one copy per window it is assigned to
  --> num windows * num elements in window
  --> example: tumbline is one copy, sliding(n,m) is n/m copies
  --> per key

Pre-aggregation:
  - if reduce or fold is set -> one element per window (rather than num 
elements in window)
  - evictor voids pre-aggregation from the perspective of state

Special rules:
  - fold cannot pre-aggregate on session windows (and other merging windows)


(8) Non-keyed windows
  - all elements through the same windows
  - currently not parallel
  - possible parallel in the future when having pre-aggregation functions
  - inherently (by definition) produce a result stream with parallelism one
  - state similar to one key of keyed windows
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5528) tests: reduce the retry delay in QueryableStateITCase

2017-01-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5528:
--

 Summary: tests: reduce the retry delay in QueryableStateITCase
 Key: FLINK-5528
 URL: https://issues.apache.org/jira/browse/FLINK-5528
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The QueryableStateITCase uses a retry of 1 second, e.g. if a queried key does 
not exist yet. This seems a bit too conservative as the job may not take that 
long to deploy and especially since getKvStateWithRetries() recovers from 
failures by retrying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5527) QueryableState: requesting a non-existing key in MemoryStateBackend or FsStateBackend does not return the default value

2017-01-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5527:
--

 Summary: QueryableState: requesting a non-existing key in 
MemoryStateBackend or FsStateBackend does not return the default value
 Key: FLINK-5527
 URL: https://issues.apache.org/jira/browse/FLINK-5527
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Querying for a non-existing key for a state that has a default value set 
currently results in an UnknownKeyOrNamespace exception when the 
MemoryStateBackend or FsStateBackend is used. It should return the default 
value instead just like the RocksDBStateBackend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5526) QueryableState: notify upon receiving a query but having queryable state disabled

2017-01-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5526:
--

 Summary: QueryableState: notify upon receiving a query but having 
queryable state disabled
 Key: FLINK-5526
 URL: https://issues.apache.org/jira/browse/FLINK-5526
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Priority: Minor


When querying state but having it disabled in the config, a warning should be 
presented to the user that a query was received but the component is disabled. 
This is in addition to the query itself failing with a rather generic exception 
that is not pointing to this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5525) Streaming Version of a Linear Regression model

2017-01-17 Thread Stavros Kontopoulos (JIRA)
Stavros Kontopoulos created FLINK-5525:
--

 Summary: Streaming Version of a Linear Regression model
 Key: FLINK-5525
 URL: https://issues.apache.org/jira/browse/FLINK-5525
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Stavros Kontopoulos


Given the nature of Flink we should have a streaming version of the algorithms 
when possible.
Update of the model should be done on a per window basis.
An extreme case is: https://en.wikipedia.org/wiki/Online_machine_learning

Resources

[1] 
http://scikit-learn.org/dev/modules/scaling_strategies.html#incremental-learning
[2] 
http://stats.stackexchange.com/questions/6920/efficient-online-linear-regression




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: New Flink team member - Kate Eri.

2017-01-17 Thread Katherin Sotenko
ok, I've got it.
I will take a look at  https://github.com/apache/flink/pull/2735.

вт, 17 янв. 2017 г. в 14:36, Theodore Vasiloudis <
theodoros.vasilou...@gmail.com>:

> Hello Katherin,
>
> Welcome to the Flink community!
>
> The ML component definitely needs a lot of work you are correct, we are
> facing similar problems to CEP, which we'll hopefully resolve with the
> restructuring Stephan has mentioned in that thread.
>
> If you'd like to help out with PRs we have many open, one I have started
> reviewing but got side-tracked is the Word2Vec one [1].
>
> Best,
> Theodore
>
> [1] https://github.com/apache/flink/pull/2735
>
> On Tue, Jan 17, 2017 at 12:17 PM, Fabian Hueske  wrote:
>
> > Hi Katherin,
> >
> > welcome to the Flink community!
> > Help with reviewing PRs is always very welcome and a great way to
> > contribute.
> >
> > Best, Fabian
> >
> >
> >
> > 2017-01-17 11:17 GMT+01:00 Katherin Sotenko :
> >
> > > Thank you, Timo.
> > > I have started the analysis of the topic.
> > > And if it necessary, I will try to perform the review of other pulls)
> > >
> > >
> > > вт, 17 янв. 2017 г. в 13:09, Timo Walther :
> > >
> > > > Hi Katherin,
> > > >
> > > > great to hear that you would like to contribute! Welcome!
> > > >
> > > > I gave you contributor permissions. You can now assign issues to
> > > > yourself. I assigned FLINK-1750 to you.
> > > > Right now there are many open ML pull requests, you are very welcome
> to
> > > > review the code of others, too.
> > > >
> > > > Timo
> > > >
> > > >
> > > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko:
> > > > > Hello, All!
> > > > >
> > > > >
> > > > >
> > > > > I'm Kate Eri, I'm java developer with 6-year enterprise experience,
> > > also
> > > > I
> > > > > have some expertise with scala (half of the year).
> > > > >
> > > > > Last 2 years I have participated in several BigData projects that
> > were
> > > > > related to Machine Learning (Time series analysis, Recommender
> > systems,
> > > > > Social networking) and ETL. I have experience with Hadoop, Apache
> > Spark
> > > > and
> > > > > Hive.
> > > > >
> > > > >
> > > > > I’m fond of ML topic, and I see that Flink project requires some
> work
> > > in
> > > > > this area, so that’s why I would like to join Flink and ask me to
> > grant
> > > > the
> > > > > assignment of the ticket
> > > > https://issues.apache.org/jira/browse/FLINK-1750
> > > > > to me.
> > > > >
> > > >
> > > >
> > >
> >
>


Re: New Flink team member - Kate Eri.

2017-01-17 Thread Theodore Vasiloudis
Hello Katherin,

Welcome to the Flink community!

The ML component definitely needs a lot of work you are correct, we are
facing similar problems to CEP, which we'll hopefully resolve with the
restructuring Stephan has mentioned in that thread.

If you'd like to help out with PRs we have many open, one I have started
reviewing but got side-tracked is the Word2Vec one [1].

Best,
Theodore

[1] https://github.com/apache/flink/pull/2735

On Tue, Jan 17, 2017 at 12:17 PM, Fabian Hueske  wrote:

> Hi Katherin,
>
> welcome to the Flink community!
> Help with reviewing PRs is always very welcome and a great way to
> contribute.
>
> Best, Fabian
>
>
>
> 2017-01-17 11:17 GMT+01:00 Katherin Sotenko :
>
> > Thank you, Timo.
> > I have started the analysis of the topic.
> > And if it necessary, I will try to perform the review of other pulls)
> >
> >
> > вт, 17 янв. 2017 г. в 13:09, Timo Walther :
> >
> > > Hi Katherin,
> > >
> > > great to hear that you would like to contribute! Welcome!
> > >
> > > I gave you contributor permissions. You can now assign issues to
> > > yourself. I assigned FLINK-1750 to you.
> > > Right now there are many open ML pull requests, you are very welcome to
> > > review the code of others, too.
> > >
> > > Timo
> > >
> > >
> > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko:
> > > > Hello, All!
> > > >
> > > >
> > > >
> > > > I'm Kate Eri, I'm java developer with 6-year enterprise experience,
> > also
> > > I
> > > > have some expertise with scala (half of the year).
> > > >
> > > > Last 2 years I have participated in several BigData projects that
> were
> > > > related to Machine Learning (Time series analysis, Recommender
> systems,
> > > > Social networking) and ETL. I have experience with Hadoop, Apache
> Spark
> > > and
> > > > Hive.
> > > >
> > > >
> > > > I’m fond of ML topic, and I see that Flink project requires some work
> > in
> > > > this area, so that’s why I would like to join Flink and ask me to
> grant
> > > the
> > > > assignment of the ticket
> > > https://issues.apache.org/jira/browse/FLINK-1750
> > > > to me.
> > > >
> > >
> > >
> >
>


[jira] [Created] (FLINK-5524) Support early out for code generated conjunctive conditions

2017-01-17 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-5524:


 Summary: Support early out for code generated conjunctive 
conditions
 Key: FLINK-5524
 URL: https://issues.apache.org/jira/browse/FLINK-5524
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Affects Versions: 1.1.4, 1.2.0, 1.3.0
Reporter: Fabian Hueske


Currently, all nested conditions for a conjunctive predicate are evaluated 
before the conjunction is checked.

A condition like {{(v1 == v2) && (v3 < 5)}} would be compiled into

{code}
boolean res1;
if (v1 == v2) {
  res1 = true;
} else {
  res1 = false;
}

boolean res2;
if (v3 < 5) {
  res2 = true;
} else {
  res2 = false;
}

boolean res3;
if (res1 && res2) {
  res3 = true;
} else {
  res3 = false;
}

if (res3) {
  // emit something
}
{code}

It would be better to leave the generated code as early as possible, e.g., with 
a {{return}} instead of {{res1 = false}}. The code generator needs a bit of 
context information for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5523) Improve access of fields in code generated functions with filters

2017-01-17 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-5523:


 Summary: Improve access of fields in code generated functions with 
filters
 Key: FLINK-5523
 URL: https://issues.apache.org/jira/browse/FLINK-5523
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Affects Versions: 1.1.4, 1.2.0, 1.3.0
Reporter: Fabian Hueske
Priority: Minor


The generated code for Table API / queries, accesses all required fields (for 
conditions and projections) and performs null checks before the first condition 
is evaluated.

It would be better to move the access of fields which are only required for 
projection behind the condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: New Flink team member - Kate Eri.

2017-01-17 Thread Fabian Hueske
Hi Katherin,

welcome to the Flink community!
Help with reviewing PRs is always very welcome and a great way to
contribute.

Best, Fabian



2017-01-17 11:17 GMT+01:00 Katherin Sotenko :

> Thank you, Timo.
> I have started the analysis of the topic.
> And if it necessary, I will try to perform the review of other pulls)
>
>
> вт, 17 янв. 2017 г. в 13:09, Timo Walther :
>
> > Hi Katherin,
> >
> > great to hear that you would like to contribute! Welcome!
> >
> > I gave you contributor permissions. You can now assign issues to
> > yourself. I assigned FLINK-1750 to you.
> > Right now there are many open ML pull requests, you are very welcome to
> > review the code of others, too.
> >
> > Timo
> >
> >
> > Am 17/01/17 um 10:39 schrieb Katherin Sotenko:
> > > Hello, All!
> > >
> > >
> > >
> > > I'm Kate Eri, I'm java developer with 6-year enterprise experience,
> also
> > I
> > > have some expertise with scala (half of the year).
> > >
> > > Last 2 years I have participated in several BigData projects that were
> > > related to Machine Learning (Time series analysis, Recommender systems,
> > > Social networking) and ETL. I have experience with Hadoop, Apache Spark
> > and
> > > Hive.
> > >
> > >
> > > I’m fond of ML topic, and I see that Flink project requires some work
> in
> > > this area, so that’s why I would like to join Flink and ask me to grant
> > the
> > > assignment of the ticket
> > https://issues.apache.org/jira/browse/FLINK-1750
> > > to me.
> > >
> >
> >
>


[jira] [Created] (FLINK-5522) Storm LocalCluster can't run with powermock

2017-01-17 Thread liuyuzhong7 (JIRA)
liuyuzhong7 created FLINK-5522:
--

 Summary: Storm LocalCluster can't run with powermock
 Key: FLINK-5522
 URL: https://issues.apache.org/jira/browse/FLINK-5522
 Project: Flink
  Issue Type: Improvement
  Components: DataStream API
Affects Versions: 1.3.0
Reporter: liuyuzhong7
 Fix For: 1.3.0


Strom LocalCluster can't run with powermock. For example:

The codes which commented in WrapperSetupHelperTest.testCreateTopologyContext




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: New Flink team member - Kate Eri.

2017-01-17 Thread Katherin Sotenko
Thank you, Timo.
I have started the analysis of the topic.
And if it necessary, I will try to perform the review of other pulls)


вт, 17 янв. 2017 г. в 13:09, Timo Walther :

> Hi Katherin,
>
> great to hear that you would like to contribute! Welcome!
>
> I gave you contributor permissions. You can now assign issues to
> yourself. I assigned FLINK-1750 to you.
> Right now there are many open ML pull requests, you are very welcome to
> review the code of others, too.
>
> Timo
>
>
> Am 17/01/17 um 10:39 schrieb Katherin Sotenko:
> > Hello, All!
> >
> >
> >
> > I'm Kate Eri, I'm java developer with 6-year enterprise experience, also
> I
> > have some expertise with scala (half of the year).
> >
> > Last 2 years I have participated in several BigData projects that were
> > related to Machine Learning (Time series analysis, Recommender systems,
> > Social networking) and ETL. I have experience with Hadoop, Apache Spark
> and
> > Hive.
> >
> >
> > I’m fond of ML topic, and I see that Flink project requires some work in
> > this area, so that’s why I would like to join Flink and ask me to grant
> the
> > assignment of the ticket
> https://issues.apache.org/jira/browse/FLINK-1750
> > to me.
> >
>
>


Re: New Flink team member - Kate Eri.

2017-01-17 Thread Timo Walther

Hi Katherin,

great to hear that you would like to contribute! Welcome!

I gave you contributor permissions. You can now assign issues to 
yourself. I assigned FLINK-1750 to you.
Right now there are many open ML pull requests, you are very welcome to 
review the code of others, too.


Timo


Am 17/01/17 um 10:39 schrieb Katherin Sotenko:

Hello, All!



I'm Kate Eri, I'm java developer with 6-year enterprise experience, also I
have some expertise with scala (half of the year).

Last 2 years I have participated in several BigData projects that were
related to Machine Learning (Time series analysis, Recommender systems,
Social networking) and ETL. I have experience with Hadoop, Apache Spark and
Hive.


I’m fond of ML topic, and I see that Flink project requires some work in
this area, so that’s why I would like to join Flink and ask me to grant the
assignment of the ticket https://issues.apache.org/jira/browse/FLINK-1750
to me.





[jira] [Created] (FLINK-5521) remove unused KvStateRequestSerializer#serializeList

2017-01-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5521:
--

 Summary: remove unused KvStateRequestSerializer#serializeList
 Key: FLINK-5521
 URL: https://issues.apache.org/jira/browse/FLINK-5521
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


KvStateRequestSerializer#serializeList is unused and instead the state 
backends' serialisation functions are used. Therefore, remove this one and make 
sure KvStateRequestSerializer#deserializeList works with the state backends' 
ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


New Flink team member - Kate Eri.

2017-01-17 Thread Katherin Sotenko
Hello, All!



I'm Kate Eri, I'm java developer with 6-year enterprise experience, also I
have some expertise with scala (half of the year).

Last 2 years I have participated in several BigData projects that were
related to Machine Learning (Time series analysis, Recommender systems,
Social networking) and ETL. I have experience with Hadoop, Apache Spark and
Hive.


I’m fond of ML topic, and I see that Flink project requires some work in
this area, so that’s why I would like to join Flink and ask me to grant the
assignment of the ticket https://issues.apache.org/jira/browse/FLINK-1750
to me.