date:20160926

[jira] [Created] (FLINK-4690) Replace SlotAllocationFuture with flink's own future

2016-09-26 Thread Kurt Young (JIRA)

Kurt Young created FLINK-4690:
-

 Summary: Replace SlotAllocationFuture with flink's own future
 Key: FLINK-4690
 URL: https://issues.apache.org/jira/browse/FLINK-4690
 Project: Flink
  Issue Type: Sub-task
Reporter: Kurt Young
Assignee: Kurt Young






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Flink: How to handle external app configuration changes in flink

2016-09-26 Thread Govindarajan Srinivasaraghavan

Hi Jamie,

Thanks a lot for the response. Appreciate your help.

Regards,
Govind

On Mon, Sep 26, 2016 at 3:26 AM, Jamie Grier 
wrote:

> Hi Govindarajan,
>
> Typically the way people do this is to create a stream of configuration
> changes and consume this like any other stream.  For the specific case of
> filtering for example you may have a data stream and a stream of filters
> that you want to run the data through.  The typically approach in the Flink
> API would then be
>
> val dataStream = env.addSource(dataSource).keyBy("userId")val
> filterStream = env.addSource(filterSource).keyBy("userId")
> val connectedStream = dataStream
>   .connect(filterStream)
>   .flatMap(yourFilterFunction)
>
> 
> You would maintain your filters as state in your filter function.  Notice
> that in this example both streams are keyed the same way.
>
> If it is not possible to distribute the configuration by key (it really
> depends on your use case) you can instead "broadcast" that state so that
> each instance of yourFilterFunction sees the same configuration messages
> and will end up building the same state.  For example:
>
> val dataStream = env.addSource(dataSource).keyBy("userId")val
> filterStream = env.addSource(filterSource).broadcast()
> val connectedStream = dataStream
>   .connect(filterStream)
>   .flatMap(yourFilterFunction)
>
> 
> I hope that helps.
>
> -Jamie
>
>
>
>
> On Mon, Sep 26, 2016 at 4:34 AM, Govindarajan Srinivasaraghavan <
> govindragh...@gmail.com> wrote:
>
> > Hi,
> >
> > My requirement is to stream millions of records in a day and it has huge
> > dependency on external configuration parameters. For example, a user can
> go
> > and change the required setting anytime in the web application and after
> > the change is made, the streaming has to happen with the new application
> > config parameters. These are app level configurations and we also have
> some
> > dynamic exclude parameters which each data has to be passed through and
> > filtered.
> >
> > I see that flink doesn’t have global state which is shared across all
> task
> > managers and subtasks. Having a centralized cache is an option but for
> each
> > parameter I would have to read it from cache which will increase the
> > latency. Please advise on the better approach to handle these kind of
> > scenarios and how other applications are handling it. Thanks.
> >
>
>
>
> --
>
> Jamie Grier
> data Artisans, Director of Applications Engineering
> @jamiegrier 
> ja...@data-artisans.com
>

[jira] [Created] (FLINK-4689) Implement a simple slot provider for the new job manager

2016-09-26 Thread Kurt Young (JIRA)

Kurt Young created FLINK-4689:
-

 Summary: Implement a simple slot provider for the new job manager
 Key: FLINK-4689
 URL: https://issues.apache.org/jira/browse/FLINK-4689
 Project: Flink
  Issue Type: Sub-task
Reporter: Kurt Young
Assignee: Kurt Young


In flip-6 branch, we need to adjust existing scheduling model. In the first 
step, we should introduce a simple / naive slot provider which just ignore all 
the sharing or location constraint, to make whole thing work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[DISCUSS] Mesos Dispatcher (FF2016)

2016-09-26 Thread Wright, Eron


 Hello,

The code I presented at FF2016 represents a 'status-quo' approach to realizing 
a specific scenario - "mesos-session.sh".   But the final solution will involve 
CLI changes and the full realization of a dispatcher, which conflicts with 
FLIP-6.   We should advance the client/dispatcher design of FLIP-6 before I 
make further changes.Also we must decide whether the Mesos work should take 
FLIP-6 as a dependency, and/or strive to land a useable subset into master.

Here's a summary of the code I presented, as reference for ongoing FLIP-6 
design discussion.  A fresh PR for convenience:
https://github.com/EronWright/flink/pull/1/files

1. MesosDispatcher (Backend).   The 'backend' acts as a Mesos framework, to 
launch an AppMaster as a Mesos task.   For each session, the backend accepts 
"session parameters" which define the libraries, configuration, resource 
profile and other parameters needed to launch the AppMaster.The backend 
persists the information for recovery purposes.Leader election is also 
used.   Not included is a dispatcher 'frontend' - a formalized API surface or 
REST server.

2. SessionParameters.   Captures the information needed to launch a session 
based on an AppMaster.A session is a stateful execution environment for a 
program.   The session parameters can also be understood as the historical 
inputs to yarn-session.sh, plus job inputs per FLIP-6.   I acknowledge that the 
term 'session' is a working term.

3 . MesosDispatcherRunner.  The runner for the 'remote dispatcher' scenario, 
which would be started by Marathon, expose a REST API with which to 
submit/manage jobs, and host the above dispatcher backend.

4. FlinkMesosSessionCli.   This class mirrors the FlinkYarnSessionCli, which is 
used in both the 'flink run' scenario and 'yarn-session' scenario.I didn't 
fully implement the CustomCommandLine interface which yields a ClusterClient, 
because the dispatcher API must be fleshed out first.

5. SessionArtifactHelper.   An attempt to consolidate logic related to session 
artifacts (i.e. ship files).

6. DispatcherClient.   The client interface for the dispatcher.  In concept 
there could be numerous implementations - a 
'remote' impl which would make REST calls to a remote dispatcher, a 'local' 
impl which would host the dispatcher directly.  Seen in this PR is only the 
latter but it might be throw-away code.

7. LaunchableMesosSession.   This class generates the effective container 
environment at launch time.

8. ContaineredJobMasterParameters.  Refactored from YARN code for sharing 
purposes.

-Eron

[jira] [Created] (FLINK-4688) Optimizer hangs for hours when optimizing complex plans

2016-09-26 Thread Fabian Hueske (JIRA)

Fabian Hueske created FLINK-4688:


 Summary: Optimizer hangs for hours when optimizing complex plans
 Key: FLINK-4688
 URL: https://issues.apache.org/jira/browse/FLINK-4688
 Project: Flink
  Issue Type: Bug
  Components: Optimizer
Affects Versions: 1.2.0, 1.1.3
Reporter: Fabian Hueske


When optimizing a plan with many operators (more than 250), the optimizer gets 
stuck for hours.

A user reported this problem on the user@f.a.o list [1] and provided 
stacktraces taken at different points in time (shortly after submission, 32 
minutes and 76 minutes after submission) (see attachments).

The stacktraces show deeply recursive {{hasDamOnPathDownTo()}} calls. Maybe it 
is possible to improve the performance by caching the results?

[1] 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Complex-batch-workflow-needs-too-much-time-to-create-executionPlan-tp8596.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4687) Add getAddress method to RpcService

2016-09-26 Thread Till Rohrmann (JIRA)

Till Rohrmann created FLINK-4687:


 Summary: Add getAddress method to RpcService
 Key: FLINK-4687
 URL: https://issues.apache.org/jira/browse/FLINK-4687
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination
Affects Versions: 1.2.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann


It would be useful to expose the hostname to which the {{RpcService}} has been 
bound. This can then be used to retrieve the network interface which is 
reachable from the outside.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4686) Add possibility to get column names

2016-09-26 Thread Timo Walther (JIRA)

Timo Walther created FLINK-4686:
---

 Summary: Add possibility to get column names
 Key: FLINK-4686
 URL: https://issues.apache.org/jira/browse/FLINK-4686
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: Timo Walther


For debugging and maybe for visualization in future (e.g. in a shell) it would 
be good to have the possibilty to get the names of {{Table}} columns. At the 
moment the user has no idea how the table columns are named; if they need to be 
matched with POJO fields for example.

My suggestion:

{code}
Schema s = table.schema();
TypeInformation type = s.getType(1);
TypeInformation type = s.getType("col");
String s = s.getColumnName(1);
String[] s = s.getColumnNames();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4685) Gather operator checkpoint durations data sizes from the runtime

2016-09-26 Thread Stephan Ewen (JIRA)

Stephan Ewen created FLINK-4685:
---

 Summary: Gather operator checkpoint durations data sizes from the 
runtime
 Key: FLINK-4685
 URL: https://issues.apache.org/jira/browse/FLINK-4685
 Project: Flink
  Issue Type: Sub-task
  Components: State Backends, Checkpointing
Affects Versions: 1.1.2
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4684) Remove obsolete classloader from CheckpointCoordinator

2016-09-26 Thread Stephan Ewen (JIRA)

Stephan Ewen created FLINK-4684:
---

 Summary: Remove obsolete classloader from CheckpointCoordinator
 Key: FLINK-4684
 URL: https://issues.apache.org/jira/browse/FLINK-4684
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Affects Versions: 1.2.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Minor
 Fix For: 1.2.0


With the latest checkpointing changes, the {{CheckpointCoordinator}} should not 
execute user code any more, and this not use a User Code ClassLoader any more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Flink JDBCOutputFormat - Flush last batch enhancement

2016-09-26 Thread Chesnay Schepler


* setting the batch interval _to 1_

On 26.09.2016 15:25, Chesnay Schepler wrote:

Hello Swapnil,

setting the batch interval should be pretty much equivalent to having 
a streaming jdbc connector.


Regards,
Chesnay

On 26.09.2016 13:21, Swapnil Chougule wrote:

Hi Stephen/Chesnay,

I have used JDBCOutputFormat from batch connectors for my streaming use
case as I didn't find jdbc connector from streaming connectors.
If it is not there, may we have jdbc connector for streaming use cases?

Thanks,
Swapnil

On Mon, Sep 26, 2016 at 3:32 PM, Chesnay Schepler 
wrote:


The JDBCOutputFormat writes records in batches, that's what he is
referring to.


On 26.09.2016 11:48, Stephan Ewen wrote:


Hi!

I am not sure I understand what you want to do, but here are some
comments:

- There is no "batching" in Flink's streaming API, not sure 
what you

are
referring to in with the "last batch"
- JDBC connections are not closed between windows, they remain 
open as

long as the operator is open.

Thanks,
Stephan


On Mon, Sep 26, 2016 at 9:29 AM, Swapnil Chougule <
the.swapni...@gmail.com>
wrote:

Hi Team,
Can we handle one case in connector JDBCOutputFormat to update 
last batch
(might be batch count is less than batch interval) without closing 
jdbc

connection?

During use case of my streaming project, I am updating jdbc sink 
(mysql

db)
after every window.

Case : Say I have 450 queries to be updated in mysql with batch 
interval

100.
400 queries are executed in 4 batches (4 x 100).
Last 50 queries go into pending state in batch to be executed & 
wait for

next 50 queries from next window.
If window is of size 5 minutes, then it will take next 4-5 minutes to
reflect last 50 queries in mysql.

Can we have functionality in JDBCOuputFormat to flush last batch 
to jdbc

sink persisting same db connection.?

Thanks,
Swapnil

Ynt: No support for request PutMappingRequest

2016-09-26 Thread Ozan DENİZ

Hi Aljoscha,


We are trying to add sortable feature for elasticsearch. To do this, we need to 
add mapping to index.


We try to sort some fields in elasticsearch. To make it our json format should 
like this;


"tweet": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}



We should add "not_analyzed" to map.


private XContentBuilder buildMapping( String typeName )
{

XContentBuilder mapping = null;
try
{
mapping = jsonBuilder()
.startObject()
.startObject(typeName)
.startObject("properties")


.startObject(LogFields.MESSAGE)
.field("type","string")
.field("analyzer", "standard")
.startObject("fields")
.startObject("raw")
.field("type","string")
.field("index", "not_analyzed")
.endObject()
.endObject()
.endObject()

.endObject()
.endObject()
.endObject();
}
catch ( IOException e )
{
e.printStackTrace();
}

return mapping;
}





Gönderen: Aljoscha Krettek 
Gönderildi: 26 Eylül 2016 Pazartesi 16:27:12
Kime: dev@flink.apache.org
Konu: Re: No support for request PutMappingRequest

Hi,
I think PutMappingRequest is a request that can only be sent using
IndicesAdminClient. In my understanding this is an administrative command
that isn't related to actually storing data in an index.

What are you trying to store with the PutMappingRequest?

Cheers,
Aljoscha

On Mon, 26 Sep 2016 at 15:16 Ozan DENİZ  wrote:

> Hi,
>
>
> I am sending the error message below;
>
>
> java.lang.IllegalArgumentException: No support for request
> [org.elasticsearch.action.admin.indices.mapping.put.PutMappingRequest@3ed05d8b
> ]
> at org.elasticsearch.action.bulk.BulkRequest.add(BulkRequest.java:107)
> ~[elasticsearch-2.3.5.jar:2.3.5]
> at
> org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:284)
> ~[elasticsearch-2.3.5.jar:2.3.5]
> at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:268)
> ~[elasticsearch-2.3.5.jar:2.3.5]
> at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:264)
> ~[elasticsearch-2.3.5.jar:2.3.5]
> at
> org.apache.flink.streaming.connectors.elasticsearch2.BulkProcessorIndexer.add(BulkProcessorIndexer.java:32)
> ~[flink-connector-elasticsearch2_2.10-1.2-20160926.041955-45.jar:1.2-SNAPSHOT]
> at sink.ESSink.process(ESSink.java:77) ~[classes/:?]
> at sink.ESSink.process(ESSink.java:25) ~[classes/:?]
> at
> org.apache.flink.streaming.connectors.elasticsearch2.ElasticsearchSink.invoke(ElasticsearchSink.java:232)
> ~[flink-connector-elasticsearch2_2.10-1.2-20160926.041955-45.jar:1.2-SNAPSHOT]
> at
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:39)
> ~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
> at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:176)
> ~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
> at
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:66)
> ~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266)
> ~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
> [flink-runtime_2.10-1.1.1.jar:1.1.1]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
> 09/26/2016 16:14:05 Sink: Unnamed(1/1) switched to FAILED
>
>
> We try to add mapping request to elastic search. We cannot access to
> client attribute (it is private) in elasticsearch class.
>
>
> Is there any way to overcome this problem.
>
>
> Thanks,
>
>
> Ozan
>
>
>
> 
> Gönderen: Till Rohrmann 
> Gönderildi: 26 Eylül 2016 Pazartesi 13:30:34
> Kime: dev@flink.apache.org
> Bilgi: Aljoscha Krettek
> Konu: Re: No support for request PutMappingRequest
>
> Hi Ozan,
>
> I'm not super experienced with Flink's elasticsearch connector, but could
> you post the complete stack trace to figure out where the problem comes
> from?
>
> I've also pulled in Aljoscha, the original author of the elasticsearch
> sink. Maybe he can give you a detailed answer.
>
> Cheers,
> Till
>
> On Fri, Sep 23, 2016 at 1:41 PM, Ozan DENİZ  wrote:
>
> > Hi everyone,
> >
> >
> > We are trying to use elasticsearch (2.x) connector for Flink application.
> > However, we encounter a problem when we try to add mapping to
> elasticsearch
> > index.
> >
> >
> > The error message is below when we run the Flink application.
> >
> >
> >  No support for request

[jira] [Created] (FLINK-4678) Add SessionRow row-windows for streaming tables (FLIP-11)

2016-09-26 Thread Fabian Hueske (JIRA)

Fabian Hueske created FLINK-4678:


 Summary: Add SessionRow row-windows for streaming tables (FLIP-11)
 Key: FLINK-4678
 URL: https://issues.apache.org/jira/browse/FLINK-4678
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Affects Versions: 1.2.0
Reporter: Fabian Hueske


Add SessionRow row-windows for streaming tables as described in FLIP-11. 

This task requires to implement a custom stream operator and integrate it with 
checkpointing and timestamp / watermark logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4683) Add SlideRow row-windows for batch tables

2016-09-26 Thread Fabian Hueske (JIRA)

Fabian Hueske created FLINK-4683:


 Summary: Add SlideRow row-windows for batch tables
 Key: FLINK-4683
 URL: https://issues.apache.org/jira/browse/FLINK-4683
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Affects Versions: 1.2.0
Reporter: Fabian Hueske


Add SlideRow row-windows for batch tables as described in 
[FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations].
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4682) Add TumbleRow row-windows for batch tables.

2016-09-26 Thread Fabian Hueske (JIRA)

Fabian Hueske created FLINK-4682:


 Summary: Add TumbleRow row-windows for batch tables.
 Key: FLINK-4682
 URL: https://issues.apache.org/jira/browse/FLINK-4682
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Affects Versions: 1.2.0
Reporter: Fabian Hueske


Add TumbleRow row-windows for batch tables as described in 
[FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations].
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4681) Add SessionRow row-windows for batch tables.

2016-09-26 Thread Fabian Hueske (JIRA)

Fabian Hueske created FLINK-4681:


 Summary: Add SessionRow row-windows for batch tables.
 Key: FLINK-4681
 URL: https://issues.apache.org/jira/browse/FLINK-4681
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Affects Versions: 1.2.0
Reporter: Fabian Hueske


Add SessionRow row-windows for batch tables as described in 
[FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations].
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4680) Add SlidingRow row-windows for streaming tables

2016-09-26 Thread Fabian Hueske (JIRA)

Fabian Hueske created FLINK-4680:


 Summary: Add SlidingRow row-windows for streaming tables
 Key: FLINK-4680
 URL: https://issues.apache.org/jira/browse/FLINK-4680
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Affects Versions: 1.2.0
Reporter: Fabian Hueske


Add SlideRow row-windows for streaming tables as described in 
[FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations].
 

This task requires to implement a custom stream operator and integrate it with 
checkpointing and timestamp / watermark logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4679) Add TumbleRow row-windows for streaming tables

2016-09-26 Thread Fabian Hueske (JIRA)

Fabian Hueske created FLINK-4679:


 Summary: Add TumbleRow row-windows for streaming tables
 Key: FLINK-4679
 URL: https://issues.apache.org/jira/browse/FLINK-4679
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Affects Versions: 1.2.0
Reporter: Fabian Hueske


Add TumbleRow row-windows for streaming tables as described in 
[FLIP-11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations].
 

This task requires to implement a custom stream operator and integrate it with 
checkpointing and timestamp / watermark logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: No support for request PutMappingRequest

2016-09-26 Thread Aljoscha Krettek

Hi,
I think PutMappingRequest is a request that can only be sent using
IndicesAdminClient. In my understanding this is an administrative command
that isn't related to actually storing data in an index.

What are you trying to store with the PutMappingRequest?

Cheers,
Aljoscha

On Mon, 26 Sep 2016 at 15:16 Ozan DENİZ  wrote:

> Hi,
>
>
> I am sending the error message below;
>
>
> java.lang.IllegalArgumentException: No support for request
> [org.elasticsearch.action.admin.indices.mapping.put.PutMappingRequest@3ed05d8b
> ]
> at org.elasticsearch.action.bulk.BulkRequest.add(BulkRequest.java:107)
> ~[elasticsearch-2.3.5.jar:2.3.5]
> at
> org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:284)
> ~[elasticsearch-2.3.5.jar:2.3.5]
> at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:268)
> ~[elasticsearch-2.3.5.jar:2.3.5]
> at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:264)
> ~[elasticsearch-2.3.5.jar:2.3.5]
> at
> org.apache.flink.streaming.connectors.elasticsearch2.BulkProcessorIndexer.add(BulkProcessorIndexer.java:32)
> ~[flink-connector-elasticsearch2_2.10-1.2-20160926.041955-45.jar:1.2-SNAPSHOT]
> at sink.ESSink.process(ESSink.java:77) ~[classes/:?]
> at sink.ESSink.process(ESSink.java:25) ~[classes/:?]
> at
> org.apache.flink.streaming.connectors.elasticsearch2.ElasticsearchSink.invoke(ElasticsearchSink.java:232)
> ~[flink-connector-elasticsearch2_2.10-1.2-20160926.041955-45.jar:1.2-SNAPSHOT]
> at
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:39)
> ~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
> at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:176)
> ~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
> at
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:66)
> ~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266)
> ~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
> [flink-runtime_2.10-1.1.1.jar:1.1.1]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
> 09/26/2016 16:14:05 Sink: Unnamed(1/1) switched to FAILED
>
>
> We try to add mapping request to elastic search. We cannot access to
> client attribute (it is private) in elasticsearch class.
>
>
> Is there any way to overcome this problem.
>
>
> Thanks,
>
>
> Ozan
>
>
>
> 
> Gönderen: Till Rohrmann 
> Gönderildi: 26 Eylül 2016 Pazartesi 13:30:34
> Kime: dev@flink.apache.org
> Bilgi: Aljoscha Krettek
> Konu: Re: No support for request PutMappingRequest
>
> Hi Ozan,
>
> I'm not super experienced with Flink's elasticsearch connector, but could
> you post the complete stack trace to figure out where the problem comes
> from?
>
> I've also pulled in Aljoscha, the original author of the elasticsearch
> sink. Maybe he can give you a detailed answer.
>
> Cheers,
> Till
>
> On Fri, Sep 23, 2016 at 1:41 PM, Ozan DENİZ  wrote:
>
> > Hi everyone,
> >
> >
> > We are trying to use elasticsearch (2.x) connector for Flink application.
> > However, we encounter a problem when we try to add mapping to
> elasticsearch
> > index.
> >
> >
> > The error message is below when we run the Flink application.
> >
> >
> >  No support for request [org.elasticsearch.action.
> > admin.indices.mapping.put.PutMappingRequest]
> >
> >
> > We are using "putMappingRequest" function for mapping. In Flink, is there
> > any way to add mapping to indexes in elasticsearch?
> >
> >
> > If not, can we contribute for adding mapping feature to elasticsearch
> > connector?
> >
>

Re: Flink JDBCOutputFormat - Flush last batch enhancement

2016-09-26 Thread Chesnay Schepler


Hello Swapnil,

setting the batch interval should be pretty much equivalent to having a 
streaming jdbc connector.


Regards,
Chesnay

On 26.09.2016 13:21, Swapnil Chougule wrote:

Hi Stephen/Chesnay,

I have used JDBCOutputFormat from batch connectors for my streaming use
case as I didn't find jdbc connector from streaming connectors.
If it is not there, may we have jdbc connector for streaming use cases?

Thanks,
Swapnil

On Mon, Sep 26, 2016 at 3:32 PM, Chesnay Schepler 
wrote:


The JDBCOutputFormat writes records in batches, that's what he is
referring to.


On 26.09.2016 11:48, Stephan Ewen wrote:


Hi!

I am not sure I understand what you want to do, but here are some
comments:

- There is no "batching" in Flink's streaming API, not sure what you
are
referring to in with the "last batch"
- JDBC connections are not closed between windows, they remain open as
long as the operator is open.

Thanks,
Stephan


On Mon, Sep 26, 2016 at 9:29 AM, Swapnil Chougule <
the.swapni...@gmail.com>
wrote:

Hi Team,

Can we handle one case in connector JDBCOutputFormat to update last batch
(might be batch count is less than batch interval) without closing jdbc
connection?

During use case of my streaming project, I am updating jdbc sink (mysql
db)
after every window.

Case : Say I have 450 queries to be updated in mysql with batch interval
100.
400 queries are executed in 4 batches (4 x 100).
Last 50 queries go into pending state in batch to be executed & wait for
next 50 queries from next window.
If window is of size 5 minutes, then it will take next 4-5 minutes to
reflect last 50 queries in mysql.

Can we have functionality in JDBCOuputFormat to flush last batch to jdbc
sink persisting same db connection.?

Thanks,
Swapnil

Ynt: No support for request PutMappingRequest

2016-09-26 Thread Ozan DENİZ

Hi,


I am sending the error message below;


java.lang.IllegalArgumentException: No support for request 
[org.elasticsearch.action.admin.indices.mapping.put.PutMappingRequest@3ed05d8b]
at org.elasticsearch.action.bulk.BulkRequest.add(BulkRequest.java:107) 
~[elasticsearch-2.3.5.jar:2.3.5]
at 
org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:284) 
~[elasticsearch-2.3.5.jar:2.3.5]
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:268) 
~[elasticsearch-2.3.5.jar:2.3.5]
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:264) 
~[elasticsearch-2.3.5.jar:2.3.5]
at 
org.apache.flink.streaming.connectors.elasticsearch2.BulkProcessorIndexer.add(BulkProcessorIndexer.java:32)
 ~[flink-connector-elasticsearch2_2.10-1.2-20160926.041955-45.jar:1.2-SNAPSHOT]
at sink.ESSink.process(ESSink.java:77) ~[classes/:?]
at sink.ESSink.process(ESSink.java:25) ~[classes/:?]
at 
org.apache.flink.streaming.connectors.elasticsearch2.ElasticsearchSink.invoke(ElasticsearchSink.java:232)
 ~[flink-connector-elasticsearch2_2.10-1.2-20160926.041955-45.jar:1.2-SNAPSHOT]
at 
org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:39)
 ~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:176)
 ~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:66)
 ~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266) 
~[flink-streaming-java_2.10-1.1.1.jar:1.1.1]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) 
[flink-runtime_2.10-1.1.1.jar:1.1.1]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
09/26/2016 16:14:05 Sink: Unnamed(1/1) switched to FAILED


We try to add mapping request to elastic search. We cannot access to client 
attribute (it is private) in elasticsearch class.


Is there any way to overcome this problem.


Thanks,


Ozan




Gönderen: Till Rohrmann 
Gönderildi: 26 Eylül 2016 Pazartesi 13:30:34
Kime: dev@flink.apache.org
Bilgi: Aljoscha Krettek
Konu: Re: No support for request PutMappingRequest

Hi Ozan,

I'm not super experienced with Flink's elasticsearch connector, but could
you post the complete stack trace to figure out where the problem comes
from?

I've also pulled in Aljoscha, the original author of the elasticsearch
sink. Maybe he can give you a detailed answer.

Cheers,
Till

On Fri, Sep 23, 2016 at 1:41 PM, Ozan DENİZ  wrote:

> Hi everyone,
>
>
> We are trying to use elasticsearch (2.x) connector for Flink application.
> However, we encounter a problem when we try to add mapping to elasticsearch
> index.
>
>
> The error message is below when we run the Flink application.
>
>
>  No support for request [org.elasticsearch.action.
> admin.indices.mapping.put.PutMappingRequest]
>
>
> We are using "putMappingRequest" function for mapping. In Flink, is there
> any way to add mapping to indexes in elasticsearch?
>
>
> If not, can we contribute for adding mapping feature to elasticsearch
> connector?
>

Re: [DISCUSS] FLIP-11: Table API Stream Aggregations

2016-09-26 Thread Fabian Hueske

Hi everybody,

Timo proposed our FLIP-11 a bit more than three weeks ago.
I will update the status of the FLIP to accepted.

Thanks,
Fabian

2016-09-19 9:16 GMT+02:00 Timo Walther :

> Hi Jark,
>
> yes I think enough time has passed. We can start implementing the changes.
> What do you think Fabian?
>
> If there are no objections, I will create the subtasks in Jira today. For
> FLIP-11/1 I already have implemented a prototype, I just have to do some
> refactoring/documentation before opening a PR.
>
> Timo
>
>
> Am 18/09/16 um 04:46 schrieb Jark Wu:
>
> Hi all,
>>
>> It seems that there’s no objections to the window design. So could we
>> open subtasks to start working on it now ?
>>
>> - Jark Wu
>>
>> 在 2016年9月7日，下午4:29，Jark Wu  写道：
>>>
>>> Hi Fabian,
>>>
>>> Thanks for sharing your ideas.
>>>
>>> They all make sense to me. Regarding to reassigning timestamp, I do not
>>> have an use case. I come up with this because DataStream has a
>>> TimestampAssigner :)
>>>
>>> +1 for this FLIP.
>>>
>>> - Jark Wu
>>>
>>> 在 2016年9月7日，下午2:59，Fabian Hueske >> fhue...@gmail.com>> 写道：

 Hi,

 thanks for your comments and questions!
 Actually, you are bringing up the points that Timo and I discussed the
 most
 when designing the FLIP ;-)

 - We also thought about the syntactic shortcut for running aggregates
 like
 you proposed (table.groupBy(‘a).select(…)). Our motivation to not allow
 this shortcut is to prevent users from accidentally performing a
 "dangerous" operation. The problem with unbounded sliding row-windows is
 that their state does never expire. If you have an evolving key space,
 you
 will likely run into problems at some point because the operator state
 grows too large. IMO, a row-window session is a better approach,
 because it
 defines a timeout after which state can be discarded. groupBy.select is
 a
 very common operation in batch but its semantics in streaming are very
 different. In my opinion it makes sense to make users aware of these
 differences through the API.

 - Reassigning timestamps and watermarks is a very delicate issue. You
 are
 right, that Calcite exposes this field which is necessary due to the
 semantics of SQL. However, also in Calcite you cannot freely choose the
 timestamp attribute for streaming queries (it must be a monotone or
 quasi-monotone attribute) which is hard to reason about (and guarantee)
 after a few operators have been applied. Streaming tables in Flink will
 likely have a time attribute which is identical to the initial rowtime.
 However, Flink does modify timestamps internally, e.g., for records that
 are emitted from time windows, in order to ensure that consecutive
 windows
 perform as expected. Modify or reassign timestamps in the middle of a
 job
 can result in unexpected results which are very hard to reason about. Do
 you have a concrete use case in mind for reassigning timestamps?

 - The idea to represent rowtime and systime as object is good. Our
 motivation to go for reserved Scala symbols was to have a uniform syntax
 with windows over streaming and batch tables. On batch tables you can
 compute time windows basically over every time attribute (they are
 treated
 similar to grouping attributes with a bit of extra logic to extract the
 grouping key for sliding and session windows). If you write
 window(Tumble
 over 10.minutes on 'rowtime) on a streaming table, 'rowtime would
 indicate
 event-time. On a batch table with a 'rowtime attribute, the same
 operator
 would be internally converted into a group by. By going for the object
 approach we would lose this compatibility (or would need to introduce an
 additional column attribute to specifiy the window attribute for batch
 tables).

 As usual some of the design decisions are based on preferences.
 Do they make sense to you? Let me know what you think.

 Best, Fabian


 2016-09-07 5:12 GMT+02:00 Jark Wu >> wuchong...@alibaba-inc.com>>:

 Hi all,
>
> I'm on vacation for about five days , sorry to have missed this great
> FLIP.
>
> Yes, the non-windowed aggregates is a special case of row-window. And
> the
> proposal looks really good.  Can we have a simplified form for the
> special
> case? Such as : table.groupBy(‘a).rowWindow(Sl
> ideRows.unboundedPreceding).select(…)
> can be simplified to  table.groupBy(‘a).select(…). The latter will
> actually
> call the former.
>
> Another question is about the rowtime. As the FLIP said, DataStream and
> StreamTableSource is responsible to assign timestamps and watermarks,
> furthermore “rowtime” and “systemtime” are not real column. IMO, it is

[DISCUSS] add netty tcp/restful pushed source support

2016-09-26 Thread shijinkui

Hi, all

1.In order to support end-to-end pushed source, I create 
FLINK-4630. I want to know 
whether is this idea worth?

---
When source stream get start, listen a provided tcp port, receive stream data 
from user data source.
This netty tcp source is keepping alive and end-to-end, that is from business 
system to flink worker directly.

user app push ->  netty server source of Flink

describe the source in detail below:

1.source run as a netty tcp server
2.user provide a tcp port, if the port is in used, increace the port number 
between 1024 to 65535. Source can parallel.
3.callback the provided url to report the real port to listen
4.user push streaming data to netty server, then collect the data to flink


Thanks

Jinkui Shi

[jira] [Created] (FLINK-4677) Jars with no job executions produces NullPointerException in ClusterClient

2016-09-26 Thread Maximilian Michels (JIRA)

Maximilian Michels created FLINK-4677:
-

 Summary: Jars with no job executions produces NullPointerException 
in ClusterClient
 Key: FLINK-4677
 URL: https://issues.apache.org/jira/browse/FLINK-4677
 Project: Flink
  Issue Type: Bug
  Components: Client
Affects Versions: 1.1.2, 1.2.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
 Fix For: 1.2.0, 1.1.3


When the user jar contains no job executions, the command-line client displays 
a NullPointerException. This is not a big issue but should be changed to 
something more descriptive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Merge batch and stream connector modules

2016-09-26 Thread Fabian Hueske

Thanks everybody for your comments.

I opened FLINK-4676 [1] for merging the connector modules.

[1] https://issues.apache.org/jira/browse/FLINK-4676

2016-09-26 13:17 GMT+02:00 Robert Metzger :

> +1 good suggestion.
>
> On Mon, Sep 26, 2016 at 1:03 PM, Stephan Ewen  wrote:
>
> > The module would have both dependencies, but both are provided anyways,
> so
> > that would not be much of an issue, I think.
> >
> > On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann 
> > wrote:
> >
> > > I think this only holds true for modules which depend on the batch or
> > > streaming counter part, respectively. We could refactor these modules
> by
> > > pulling out common types which are independent of streaming/batch and
> are
> > > used by the batch and streaming module.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek <
> aljos...@apache.org>
> > > wrote:
> > >
> > > > I don't think it's that easy. The streaming connectors have
> > > flink-streaming
> > > > as dependency while the batch connectors have the batch dependencies.
> > > >
> > > > Combining them would mean that users always have all dependencies,
> > right?
> > > >
> > > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen  wrote:
> > > >
> > > > > +1 for Fabian's suggestion
> > > > >
> > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule <
> > > > the.swapni...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > > It will be good to have one module flink-connectors (union of
> > > streaming
> > > > > and
> > > > > > batch connectors).
> > > > > >
> > > > > > Regards,
> > > > > > Swapnil
> > > > > >
> > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske <
> fhue...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi everybody,
> > > > > > >
> > > > > > > right now, we have two separate Maven modules for batch and
> > > streaming
> > > > > > > connectors (flink-batch-connectors and
> > flink-streaming-connectors)
> > > > that
> > > > > > > contain modules for the individual external systems and storage
> > > > formats
> > > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc.
> > > > > > >
> > > > > > > Some of these systems can be used in streaming as well as batch
> > > jobs
> > > > as
> > > > > > for
> > > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to
> the
> > > > > > separate
> > > > > > > main modules for streaming and batch connectors, we currently
> > need
> > > to
> > > > > > > decide where to put a connector. For example, the
> > > > > > flink-connector-cassandra
> > > > > > > module is located in flink-streaming-connectors but includes a
> > > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch
> > > source
> > > > > and
> > > > > > > sink).
> > > > > > >
> > > > > > > In my opinion, it would be better to just merge
> > > > flink-batch-connectors
> > > > > > and
> > > > > > > flink-streaming-connectors into a joint flink-connectors
> module.
> > > > > > >
> > > > > > > This would be only an internal restructuring of code and not be
> > > > visible
> > > > > > to
> > > > > > > users (unless we change the module names of the individual
> > > connectors
> > > > > > which
> > > > > > > is not necessary, IMO).
> > > > > > >
> > > > > > > What do others think?
> > > > > > >
> > > > > > > Best, Fabian
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

[jira] [Created] (FLINK-4676) Merge flink-batch-con

2016-09-26 Thread Fabian Hueske (JIRA)

Fabian Hueske created FLINK-4676:


 Summary: Merge flink-batch-con
 Key: FLINK-4676
 URL: https://issues.apache.org/jira/browse/FLINK-4676
 Project: Flink
  Issue Type: Task
Reporter: Fabian Hueske






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Flink JDBCOutputFormat - Flush last batch enhancement

2016-09-26 Thread Swapnil Chougule

Hi Stephen/Chesnay,

I have used JDBCOutputFormat from batch connectors for my streaming use
case as I didn't find jdbc connector from streaming connectors.
If it is not there, may we have jdbc connector for streaming use cases?

Thanks,
Swapnil

On Mon, Sep 26, 2016 at 3:32 PM, Chesnay Schepler 
wrote:

> The JDBCOutputFormat writes records in batches, that's what he is
> referring to.
>
>
> On 26.09.2016 11:48, Stephan Ewen wrote:
>
>> Hi!
>>
>> I am not sure I understand what you want to do, but here are some
>> comments:
>>
>>- There is no "batching" in Flink's streaming API, not sure what you
>> are
>> referring to in with the "last batch"
>>- JDBC connections are not closed between windows, they remain open as
>> long as the operator is open.
>>
>> Thanks,
>> Stephan
>>
>>
>> On Mon, Sep 26, 2016 at 9:29 AM, Swapnil Chougule <
>> the.swapni...@gmail.com>
>> wrote:
>>
>> Hi Team,
>>>
>>> Can we handle one case in connector JDBCOutputFormat to update last batch
>>> (might be batch count is less than batch interval) without closing jdbc
>>> connection?
>>>
>>> During use case of my streaming project, I am updating jdbc sink (mysql
>>> db)
>>> after every window.
>>>
>>> Case : Say I have 450 queries to be updated in mysql with batch interval
>>> 100.
>>> 400 queries are executed in 4 batches (4 x 100).
>>> Last 50 queries go into pending state in batch to be executed & wait for
>>> next 50 queries from next window.
>>> If window is of size 5 minutes, then it will take next 4-5 minutes to
>>> reflect last 50 queries in mysql.
>>>
>>> Can we have functionality in JDBCOuputFormat to flush last batch to jdbc
>>> sink persisting same db connection.?
>>>
>>> Thanks,
>>> Swapnil
>>>
>>>
>

答复: 答复: [discuss] merge module flink-yarn and flink-yarn-test

2016-09-26 Thread shijinkui

Hi，Maximilian Michels

Thank for your reply.

In order to test submit Flink job to yarn cluster, that is testing client and 
session cli, split unit test from flink-yarn.
First of all, such yarn cluster unit test should be rename 'flink-yarn-tests' 
as 'flink-yarn-cluster-test', and move to 'flink-tests' folder as its 
sub-module.

I think unit test can be divide into module's unit test which can be execute 
when module is building and unit test module which should be executed 
independently.
The top module in the root fold shouldn't be a unit test module, not only for 
good-looking, for its duty.

-邮件原件-
发件人: Maximilian Michels [mailto:m...@apache.org] 
发送时间: 2016年9月26日 16:33
收件人: dev@flink.apache.org
主题: Re: 答复: [discuss] merge module flink-yarn and flink-yarn-test

Hello Jinkui Shi,

Due to the nature of most of the Yarn tests, we need them to be in a separate 
module. More concretely, these tests have a dependency on 'flink-dist' because 
they need to deploy the Flink fat jar to the Yarn tests cluster. The fat jar 
also contains the 'flink-yarn' code. Thus, 'flink-yarn' needs to be a separate 
module and built before 'flink-yarn-tests'.

That being said, some of the tests don't need the fat jar, so we could move 
some of the tests to 'flink-yarn'. However, that is mostly a cosmetic change 
and not important for the testing coverage.

Best,
Max

On Thu, Sep 22, 2016 at 12:26 PM, Stephan Ewen  wrote:
>
> "flink-test-utils" contains, as the name says, utils for testing. 
> Intended to be used by users in writing their own tests.
> "flink-tests" contains cross module tests, no user should ever need to 
> have a dependency on that.
>
> They are different because users explicitly asked for test utils to be 
> factored into a separate project.
>
> As an honest reply here: Setting up a project as huge as Flink need to 
> take many things into account
>
>   - Multiple languages (Java / Scala), with limitations of IDEs in mind
>   - Dependency conflicts and much shading magic
>   - Dependency matrices (multiple hadoop and scala versions)
>   - Supporting earlier Java versions
>   - clean scope differentiation, so users can reuse utils and testing 
> code
>
>
> That simply requires some extra modules once in a while. Flink people 
> have worked hard on coming up with a structure that serves the need of 
> the production users and automated build/testing systems. These 
> production user requests are most important to us, and sometimes, we 
> need to take cuts in "beauty of directory structure" to help them.
>
> Constantly accusing the community of creating bad structures before 
> even trying to understand the reasoning behind that does not come 
> across as very friendly. Constantly accusing the community of sloppy 
> work just because your laptop settings are incompatible with the 
> default configuration likewise.
>
> I hope you understand that.
>
>
> On Thu, Sep 22, 2016 at 2:58 AM, shijinkui  wrote:
>
> > Hi, Stephan
> >
> > Thanks for your reply.
> >
> > In my mind, Maven-shade-plugin and sbt-assembly both default exclude 
> > test code for the fat jar.
> >
> > In fact, unit tests are use to test the main code, ensure our code 
> > logic fit our expect . This is general convention. I think. Flink 
> > has be a top apache project. We shouldn't be special. We're 
> > programmer, should be professional.
> >
> > Even more, there are `flink-tes-utils-parent` and `flink-tests` 
> > module, what's the relation between them.
> >
> > I have to ask why they are exist? Where is the start of such 
> > confusion modules?
> >
> > I think we shouldn't do nothing for this. Code and design should be 
> > comfortable.
> >
> > Thanks
> >
> > From Jinkui Shi
> >
> > -邮件原件-
> > 发件人: Stephan Ewen [mailto:se...@apache.org]
> > 发送时间: 2016年9月21日 22:19
> > 收件人: dev@flink.apache.org
> > 主题: Re: [discuss] merge module flink-yarn and flink-yarn-test
> >
> > I would like Robert to comment on this.
> >
> > I think there was a reason to have different modules, which had 
> > again something to do with the Maven Shade Plugin Dependencies and 
> > shading really seem the trickiest thing in bigger Java/Scala 
> > projects ;-)
> >
> > On Wed, Sep 21, 2016 at 11:04 AM, shijinkui  wrote:
> >
> > > Hi, All
> > >
> > > There too much module in the root. There are no necessary to 
> > > separate the test code from sub-module.
> > >
> > > I never see such design: two modules, one is main code, the other 
> > > is test code.
> > >
> > > Is there some special reason?
> > >
> > > From Jinkui Shi
> > >
> >

Re: [DISCUSS] Merge batch and stream connector modules

2016-09-26 Thread Robert Metzger

+1 good suggestion.

On Mon, Sep 26, 2016 at 1:03 PM, Stephan Ewen  wrote:

> The module would have both dependencies, but both are provided anyways, so
> that would not be much of an issue, I think.
>
> On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann 
> wrote:
>
> > I think this only holds true for modules which depend on the batch or
> > streaming counter part, respectively. We could refactor these modules by
> > pulling out common types which are independent of streaming/batch and are
> > used by the batch and streaming module.
> >
> > Cheers,
> > Till
> >
> > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek 
> > wrote:
> >
> > > I don't think it's that easy. The streaming connectors have
> > flink-streaming
> > > as dependency while the batch connectors have the batch dependencies.
> > >
> > > Combining them would mean that users always have all dependencies,
> right?
> > >
> > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen  wrote:
> > >
> > > > +1 for Fabian's suggestion
> > > >
> > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule <
> > > the.swapni...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > +1
> > > > > It will be good to have one module flink-connectors (union of
> > streaming
> > > > and
> > > > > batch connectors).
> > > > >
> > > > > Regards,
> > > > > Swapnil
> > > > >
> > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske 
> > > > wrote:
> > > > >
> > > > > > Hi everybody,
> > > > > >
> > > > > > right now, we have two separate Maven modules for batch and
> > streaming
> > > > > > connectors (flink-batch-connectors and
> flink-streaming-connectors)
> > > that
> > > > > > contain modules for the individual external systems and storage
> > > formats
> > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc.
> > > > > >
> > > > > > Some of these systems can be used in streaming as well as batch
> > jobs
> > > as
> > > > > for
> > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to the
> > > > > separate
> > > > > > main modules for streaming and batch connectors, we currently
> need
> > to
> > > > > > decide where to put a connector. For example, the
> > > > > flink-connector-cassandra
> > > > > > module is located in flink-streaming-connectors but includes a
> > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch
> > source
> > > > and
> > > > > > sink).
> > > > > >
> > > > > > In my opinion, it would be better to just merge
> > > flink-batch-connectors
> > > > > and
> > > > > > flink-streaming-connectors into a joint flink-connectors module.
> > > > > >
> > > > > > This would be only an internal restructuring of code and not be
> > > visible
> > > > > to
> > > > > > users (unless we change the module names of the individual
> > connectors
> > > > > which
> > > > > > is not necessary, IMO).
> > > > > >
> > > > > > What do others think?
> > > > > >
> > > > > > Best, Fabian
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Merge batch and stream connector modules

2016-09-26 Thread Stephan Ewen

The module would have both dependencies, but both are provided anyways, so
that would not be much of an issue, I think.

On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann 
wrote:

> I think this only holds true for modules which depend on the batch or
> streaming counter part, respectively. We could refactor these modules by
> pulling out common types which are independent of streaming/batch and are
> used by the batch and streaming module.
>
> Cheers,
> Till
>
> On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek 
> wrote:
>
> > I don't think it's that easy. The streaming connectors have
> flink-streaming
> > as dependency while the batch connectors have the batch dependencies.
> >
> > Combining them would mean that users always have all dependencies, right?
> >
> > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen  wrote:
> >
> > > +1 for Fabian's suggestion
> > >
> > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule <
> > the.swapni...@gmail.com
> > > >
> > > wrote:
> > >
> > > > +1
> > > > It will be good to have one module flink-connectors (union of
> streaming
> > > and
> > > > batch connectors).
> > > >
> > > > Regards,
> > > > Swapnil
> > > >
> > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske 
> > > wrote:
> > > >
> > > > > Hi everybody,
> > > > >
> > > > > right now, we have two separate Maven modules for batch and
> streaming
> > > > > connectors (flink-batch-connectors and flink-streaming-connectors)
> > that
> > > > > contain modules for the individual external systems and storage
> > formats
> > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc.
> > > > >
> > > > > Some of these systems can be used in streaming as well as batch
> jobs
> > as
> > > > for
> > > > > instance HBase, Cassandra, and Elasticsearch. However, due to the
> > > > separate
> > > > > main modules for streaming and batch connectors, we currently need
> to
> > > > > decide where to put a connector. For example, the
> > > > flink-connector-cassandra
> > > > > module is located in flink-streaming-connectors but includes a
> > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch
> source
> > > and
> > > > > sink).
> > > > >
> > > > > In my opinion, it would be better to just merge
> > flink-batch-connectors
> > > > and
> > > > > flink-streaming-connectors into a joint flink-connectors module.
> > > > >
> > > > > This would be only an internal restructuring of code and not be
> > visible
> > > > to
> > > > > users (unless we change the module names of the individual
> connectors
> > > > which
> > > > > is not necessary, IMO).
> > > > >
> > > > > What do others think?
> > > > >
> > > > > Best, Fabian
> > > > >
> > > >
> > >
> >
>

Re: Exception from in-progress implementation of Python API bulk iterations

2016-09-26 Thread Chesnay Schepler

Hello Geoffrey,

i could not reproduce this issue with the commits and plan you provided.

I tried out both the FLINK-4098 and bulk-iterations branches (and
reverted back to the specified commits) and built Flink from scratch.

Could you double check that the code you provided produces the error?
Also, which OS/python version are you using?

Regards,
Chesnay

On 20.09.2016 11:13, Chesnay Schepler wrote:

Hello,

I'll try to take a look this week.

Regards,
Chesnay

On 20.09.2016 02:38, Geoffrey Mon wrote:

Hello all,

I have recently been working on adding bulk iterations to the Python
API of

Flink in order to facilitate a research project I am working on. The
current changes can be seen in this GitHub diff:
https://github.com/apache/flink/compare/master...GEOFBOT:e8c9833b43675af66ce897da9880c4f8cd16aad0

This implementation seems to work for, at least, simple examples,
such as

incrementing numbers in a data set. However, with the transformations
required for my project, I get an exception
"java.lang.ClassCastException:
[B cannot be cast to org.apache.flink.api.java.tuple.Tuple" thrown
from the

deserializers called by
org.apache.flink.python.api.streaming.data.PythonReceiver.collectBuffer.
I've created the following simplified Python plan by stripping down my
research project code to the problem-causing parts:
https://gist.github.com/GEOFBOT/abb7f81030aab160e6908093ebaa3b4a

I have been working on this issue but I don't have any ideas on what
might

be the problem. Perhaps someone more knowledgeable about the interior of
the Python API could kindly help?

Thank you very much.

Geoffrey Mon

Re: No support for request PutMappingRequest

2016-09-26 Thread Till Rohrmann

Hi Ozan,

I'm not super experienced with Flink's elasticsearch connector, but could
you post the complete stack trace to figure out where the problem comes
from?

I've also pulled in Aljoscha, the original author of the elasticsearch
sink. Maybe he can give you a detailed answer.

Cheers,
Till

On Fri, Sep 23, 2016 at 1:41 PM, Ozan DENİZ  wrote:

> Hi everyone,
>
>
> We are trying to use elasticsearch (2.x) connector for Flink application.
> However, we encounter a problem when we try to add mapping to elasticsearch
> index.
>
>
> The error message is below when we run the Flink application.
>
>
>  No support for request [org.elasticsearch.action.
> admin.indices.mapping.put.PutMappingRequest]
>
>
> We are using "putMappingRequest" function for mapping. In Flink, is there
> any way to add mapping to indexes in elasticsearch?
>
>
> If not, can we contribute for adding mapping feature to elasticsearch
> connector?
>

Re: Flink: How to handle external app configuration changes in flink

2016-09-26 Thread Jamie Grier

Hi Govindarajan,

Typically the way people do this is to create a stream of configuration
changes and consume this like any other stream.  For the specific case of
filtering for example you may have a data stream and a stream of filters
that you want to run the data through.  The typically approach in the Flink
API would then be

val dataStream = env.addSource(dataSource).keyBy("userId")val
filterStream = env.addSource(filterSource).keyBy("userId")
val connectedStream = dataStream
  .connect(filterStream)
  .flatMap(yourFilterFunction)

You would maintain your filters as state in your filter function.  Notice
that in this example both streams are keyed the same way.

If it is not possible to distribute the configuration by key (it really
depends on your use case) you can instead "broadcast" that state so that
each instance of yourFilterFunction sees the same configuration messages
and will end up building the same state.  For example:

val dataStream = env.addSource(dataSource).keyBy("userId")val
filterStream = env.addSource(filterSource).broadcast()
val connectedStream = dataStream
  .connect(filterStream)
  .flatMap(yourFilterFunction)

I hope that helps.

-Jamie

On Mon, Sep 26, 2016 at 4:34 AM, Govindarajan Srinivasaraghavan <
govindragh...@gmail.com> wrote:

> Hi,
>
> My requirement is to stream millions of records in a day and it has huge
> dependency on external configuration parameters. For example, a user can go
> and change the required setting anytime in the web application and after
> the change is made, the streaming has to happen with the new application
> config parameters. These are app level configurations and we also have some
> dynamic exclude parameters which each data has to be passed through and
> filtered.
>
> I see that flink doesn’t have global state which is shared across all task
> managers and subtasks. Having a centralized cache is an option but for each
> parameter I would have to read it from cache which will increase the
> latency. Please advise on the better approach to handle these kind of
> scenarios and how other applications are handling it. Thanks.
>

-- 

Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier 
ja...@data-artisans.com

Re: [DISCUSS] Merge batch and stream connector modules

2016-09-26 Thread Till Rohrmann

I think this only holds true for modules which depend on the batch or
streaming counter part, respectively. We could refactor these modules by
pulling out common types which are independent of streaming/batch and are
used by the batch and streaming module.

Cheers,
Till

On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek 
wrote:

> I don't think it's that easy. The streaming connectors have flink-streaming
> as dependency while the batch connectors have the batch dependencies.
>
> Combining them would mean that users always have all dependencies, right?
>
> On Thu, 22 Sep 2016 at 15:41 Stephan Ewen  wrote:
>
> > +1 for Fabian's suggestion
> >
> > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule <
> the.swapni...@gmail.com
> > >
> > wrote:
> >
> > > +1
> > > It will be good to have one module flink-connectors (union of streaming
> > and
> > > batch connectors).
> > >
> > > Regards,
> > > Swapnil
> > >
> > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske 
> > wrote:
> > >
> > > > Hi everybody,
> > > >
> > > > right now, we have two separate Maven modules for batch and streaming
> > > > connectors (flink-batch-connectors and flink-streaming-connectors)
> that
> > > > contain modules for the individual external systems and storage
> formats
> > > > such as HBase, Cassandra, Avro, Elasticsearch, etc.
> > > >
> > > > Some of these systems can be used in streaming as well as batch jobs
> as
> > > for
> > > > instance HBase, Cassandra, and Elasticsearch. However, due to the
> > > separate
> > > > main modules for streaming and batch connectors, we currently need to
> > > > decide where to put a connector. For example, the
> > > flink-connector-cassandra
> > > > module is located in flink-streaming-connectors but includes a
> > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch source
> > and
> > > > sink).
> > > >
> > > > In my opinion, it would be better to just merge
> flink-batch-connectors
> > > and
> > > > flink-streaming-connectors into a joint flink-connectors module.
> > > >
> > > > This would be only an internal restructuring of code and not be
> visible
> > > to
> > > > users (unless we change the module names of the individual connectors
> > > which
> > > > is not necessary, IMO).
> > > >
> > > > What do others think?
> > > >
> > > > Best, Fabian
> > > >
> > >
> >
>

Re: FailureRate Restart Strategy is not picked from Config file

2016-09-26 Thread Till Rohrmann

Hi Deepak,

are you running Flink streaming jobs with checkpointing enabled? In this
case Flink will check if you've set a restart strategy at your job and if
not it will set the fixed delay restart strategy. This will effectively
overwrite the default restart strategy which you define in the
flink-conf.yaml file.

Cheers,
Till

On Thu, Sep 22, 2016 at 10:01 PM, Deepak Jha  wrote:

> Hi All,
> I tried to use FailureRate restart strategy by setting values for it in
> flink-conf.yaml but flink (v 1.1.2) did not pick it up.
>
> # Flink Restart strategy
> restart-strategy: failure-rate
> restart-strategy.failure-rate.delay: 120 s
> restart-strategy.failure-rate.failure-rate-interval: 12 minute
> restart-strategy.failure-rate.max-failures-per-interval: 300
>
> It works when I set it up explicitly in topology using
> *env.setRestartStrategy *
>
> PFA snapshot of the Jobmanager log.
> Thanks,
> Deepak Jha
>
>

Re: Questions on flink

2016-09-26 Thread Jamie Grier

Hi Govindarajan,

I've put some answers in-line below..

On Sat, Sep 24, 2016 at 7:32 PM, Govindarajan Srinivasaraghavan <
govindragh...@gmail.com> wrote:

> Hi,
>
> I'm working on apache flink for data streaming and I have few questions.
> Any help is greatly appreciated. Thanks.
>
> 1) Are there any restrictions on creating tumbling windows. For example,
> if I want to create a tumbling window per user id for 2 secs and let’s say
> if I have more than 10 million user id's would that be a problem. (I'm
> using keyBy user id and then creating a timeWindow for 2 secs)? How are
> these windows maintained internally in flink?
>

That should not be a problem in general.  An important question may be how
many unique keys will you see in two seconds.  This is more important than
your total key cardinality of 10 Million and probably a *much* smaller
number unless your input message rate is really high.

>
> 2) I looked at rebalance for round robin partitioning. Let’s say I have a
> cluster set up and if I have a parallelism of 1 for source and if I do a
> rebalance, will my data be shuffled across machines to improve performance?
> If so is there a specific port using which the data is transferred to other
> nodes in the cluster?
>

Yes, rebalance() does a round-robin distribution of messages to other
machines in the cluster.  There is not a specific port used for each
TaskManager to communicate on but rather an available port is assigned at
runtime.  This is the default.  You can also set this to a specific port if
you have reason and a lot depends on how you will deploy -- via YARN or as
a standalone Flink cluster.

>
> 3) Are there any limitations on state maintenance? I'm planning to
> maintain some user id related data which could grow very large. I read
> about flink using rocks db to maintain the state. Just wanted to check if
> there are any limitations on how much data can be maintained?
>

Yes, there are limits.  The total data that can be maintained today is
determined by the fact that Flink has to periodically snapshot this data
and copy it to a persistent storage system such as HDFS whether you are
using RocksDB or not.  The aggregate bandwidth required to your storage
system (like HDFS) is your total Flink state size multiplied by your Flink
checkpoint interval.

> 4) Also where is the state maintained if the amount of data is less? (I
> guess in JVM memory) If I have several machines on my cluster can every
> node get the current state version?
>

I'm not exactly sure what you're asking here.  All data is check-pointed to
a persistent store which must be accessible from each machine in the
cluster.

> 5) I need a way to send external configuration changes to flink. Lets say
> there is a new parameter that has to added or an external change which has
> to be updated inside flink's state, how can this be done?
>

The typical way to do this is to consume that configuration as a stream and
hold the configuration internally in the state of a particular user
function.

>
> Thanks
>

-- 

Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier 
ja...@data-artisans.com

Re: Flink JDBCOutputFormat - Flush last batch enhancement

2016-09-26 Thread Chesnay Schepler

The JDBCOutputFormat writes records in batches, that's what he is 
referring to.


On 26.09.2016 11:48, Stephan Ewen wrote:

Hi!

I am not sure I understand what you want to do, but here are some comments:

   - There is no "batching" in Flink's streaming API, not sure what you are
referring to in with the "last batch"
   - JDBC connections are not closed between windows, they remain open as
long as the operator is open.

Thanks,
Stephan


On Mon, Sep 26, 2016 at 9:29 AM, Swapnil Chougule 
wrote:


Hi Team,

Can we handle one case in connector JDBCOutputFormat to update last batch
(might be batch count is less than batch interval) without closing jdbc
connection?

During use case of my streaming project, I am updating jdbc sink (mysql db)
after every window.

Case : Say I have 450 queries to be updated in mysql with batch interval
100.
400 queries are executed in 4 batches (4 x 100).
Last 50 queries go into pending state in batch to be executed & wait for
next 50 queries from next window.
If window is of size 5 minutes, then it will take next 4-5 minutes to
reflect last 50 queries in mysql.

Can we have functionality in JDBCOuputFormat to flush last batch to jdbc
sink persisting same db connection.?

Thanks,
Swapnil

Re: Flink JDBCOutputFormat - Flush last batch enhancement

2016-09-26 Thread Stephan Ewen

Hi!

I am not sure I understand what you want to do, but here are some comments:

  - There is no "batching" in Flink's streaming API, not sure what you are
referring to in with the "last batch"
  - JDBC connections are not closed between windows, they remain open as
long as the operator is open.

Thanks,
Stephan


On Mon, Sep 26, 2016 at 9:29 AM, Swapnil Chougule 
wrote:

> Hi Team,
>
> Can we handle one case in connector JDBCOutputFormat to update last batch
> (might be batch count is less than batch interval) without closing jdbc
> connection?
>
> During use case of my streaming project, I am updating jdbc sink (mysql db)
> after every window.
>
> Case : Say I have 450 queries to be updated in mysql with batch interval
> 100.
> 400 queries are executed in 4 batches (4 x 100).
> Last 50 queries go into pending state in batch to be executed & wait for
> next 50 queries from next window.
> If window is of size 5 minutes, then it will take next 4-5 minutes to
> reflect last 50 queries in mysql.
>
> Can we have functionality in JDBCOuputFormat to flush last batch to jdbc
> sink persisting same db connection.?
>
> Thanks,
> Swapnil
>

Re: 答复: [discuss] merge module flink-yarn and flink-yarn-test

2016-09-26 Thread Maximilian Michels

Hello Jinkui Shi,

Due to the nature of most of the Yarn tests, we need them to be in a
separate module. More concretely, these tests have a dependency on
'flink-dist' because they need to deploy the Flink fat jar to the Yarn
tests cluster. The fat jar also contains the 'flink-yarn' code. Thus,
'flink-yarn' needs to be a separate module and built before
'flink-yarn-tests'.

That being said, some of the tests don't need the fat jar, so we could
move some of the tests to 'flink-yarn'. However, that is mostly a
cosmetic change and not important for the testing coverage.

Best,
Max

On Thu, Sep 22, 2016 at 12:26 PM, Stephan Ewen  wrote:
>
> "flink-test-utils" contains, as the name says, utils for testing. Intended
> to be used by users in writing their own tests.
> "flink-tests" contains cross module tests, no user should ever need to have
> a dependency on that.
>
> They are different because users explicitly asked for test utils to be
> factored into a separate project.
>
> As an honest reply here: Setting up a project as huge as Flink need to take
> many things into account
>
>   - Multiple languages (Java / Scala), with limitations of IDEs in mind
>   - Dependency conflicts and much shading magic
>   - Dependency matrices (multiple hadoop and scala versions)
>   - Supporting earlier Java versions
>   - clean scope differentiation, so users can reuse utils and testing code
>
>
> That simply requires some extra modules once in a while. Flink people have
> worked hard on coming up with a structure that serves the need of the
> production users and automated build/testing systems. These production user
> requests are most important to us, and sometimes, we need to take cuts in
> "beauty of directory structure" to help them.
>
> Constantly accusing the community of creating bad structures before even
> trying to understand the reasoning behind that does not come across as very
> friendly. Constantly accusing the community of sloppy work just because
> your laptop settings are incompatible with the default configuration
> likewise.
>
> I hope you understand that.
>
>
> On Thu, Sep 22, 2016 at 2:58 AM, shijinkui  wrote:
>
> > Hi, Stephan
> >
> > Thanks for your reply.
> >
> > In my mind, Maven-shade-plugin and sbt-assembly both default exclude test
> > code for the fat jar.
> >
> > In fact, unit tests are use to test the main code, ensure our code logic
> > fit our expect . This is general convention. I think. Flink has be a top
> > apache project. We shouldn't be special. We're programmer, should be
> > professional.
> >
> > Even more, there are `flink-tes-utils-parent` and `flink-tests` module,
> > what's the relation between them.
> >
> > I have to ask why they are exist? Where is the start of such confusion
> > modules?
> >
> > I think we shouldn't do nothing for this. Code and design should be
> > comfortable.
> >
> > Thanks
> >
> > From Jinkui Shi
> >
> > -邮件原件-
> > 发件人: Stephan Ewen [mailto:se...@apache.org]
> > 发送时间: 2016年9月21日 22:19
> > 收件人: dev@flink.apache.org
> > 主题: Re: [discuss] merge module flink-yarn and flink-yarn-test
> >
> > I would like Robert to comment on this.
> >
> > I think there was a reason to have different modules, which had again
> > something to do with the Maven Shade Plugin Dependencies and shading really
> > seem the trickiest thing in bigger Java/Scala projects ;-)
> >
> > On Wed, Sep 21, 2016 at 11:04 AM, shijinkui  wrote:
> >
> > > Hi, All
> > >
> > > There too much module in the root. There are no necessary to separate
> > > the test code from sub-module.
> > >
> > > I never see such design: two modules, one is main code, the other is
> > > test code.
> > >
> > > Is there some special reason?
> > >
> > > From Jinkui Shi
> > >
> >

Flink JDBCOutputFormat - Flush last batch enhancement

2016-09-26 Thread Swapnil Chougule

Hi Team,

Can we handle one case in connector JDBCOutputFormat to update last batch
(might be batch count is less than batch interval) without closing jdbc
connection?

During use case of my streaming project, I am updating jdbc sink (mysql db)
after every window.

Case : Say I have 450 queries to be updated in mysql with batch interval
100.
400 queries are executed in 4 batches (4 x 100).
Last 50 queries go into pending state in batch to be executed & wait for
next 50 queries from next window.
If window is of size 5 minutes, then it will take next 4-5 minutes to
reflect last 50 queries in mysql.

Can we have functionality in JDBCOuputFormat to flush last batch to jdbc
sink persisting same db connection.?

Thanks,
Swapnil

39 matches

Mail list logo