[jira] [Created] (FLINK-4666) Make constants final in ParameterTool

2016-09-22 Thread Alexander Pivovarov (JIRA)
Alexander Pivovarov created FLINK-4666:
--

 Summary: Make constants final in ParameterTool
 Key: FLINK-4666
 URL: https://issues.apache.org/jira/browse/FLINK-4666
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Alexander Pivovarov
Priority: Trivial


NO_VALUE_KEY and DEFAULT_UNDEFINED in ParameterTool should be final



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4665) Remove boxing/unboxing to parse a primitive

2016-09-22 Thread Alexander Pivovarov (JIRA)
Alexander Pivovarov created FLINK-4665:
--

 Summary: Remove boxing/unboxing to parse a primitive
 Key: FLINK-4665
 URL: https://issues.apache.org/jira/browse/FLINK-4665
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Alexander Pivovarov
Priority: Trivial


I found the following issues with boxing/unboxing and Integer

1. Current code doing boxing/unboxing to parse a primitive - It is more 
efficient to just call the static parseXXX method.

2. boxing/unboxing to do type cast

3. new Integer instead of valueOf - Using new Integer(int) is guaranteed to 
always result in a new object whereas Integer.valueOf(int) allows caching of 
values to be done by the compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Flink Scala - data load from CSV to postgress database.

2016-09-22 Thread Jagan
Hi Guys,

We have requirement like – loading data from local CSV file to Postgress
database using Flink Scala…

Do you have any sample Flink scala code for this?

We have tried and searched in Google/Flinkweb website for data load, we
haven’t found any sample code for this requisite.

Code: Flink Scala.

Load: from CSV to local postgres database.

Thanks

Jagan

On 22 September 2016 at 20:37, Jagan  wrote:

> Hi Team,
>
>   Will you be able to guide me on this? Is this a known issue that we
> can't implement  dataload in flink scala ?
>
>data load from csv to postgress or any relational database in Flink
> Scala
>
> Thanks
>
> Jagan.
>
> On 22 September 2016 at 20:15, Jagan  wrote:
>
>> Thanks Suneel,
>>
>> but client want to implement the data load in Flink Scala..
>>
>>
>> On 22 September 2016 at 20:07, Suneel Marthi  wrote:
>>
>>> Couldn't u use SQLLoader or something for doing that?
>>>
>>> http://stackoverflow.com/questions/2987433/how-to-import-csv
>>> -file-data-into-a-postgresql-table
>>>
>>>
>>>
>>> On Thu, Sep 22, 2016 at 3:01 PM, Jagan  wrote:
>>>
>>> > Hi Guys,
>>> >
>>> > We have a requirement like – loading data from local CSV file to
>>> Postgress
>>> > database using Flink Scala…We have tried number of ways all failed
>>> >
>>> > Do you have any example for this? With dependency libraries to
>>> understand
>>> > how to load data from CSV to postgres
>>> >
>>> > We have tried and searched in Google/Flinkweb website for data load, we
>>> > haven’t found any sample code for this requisite.
>>> >
>>> > Code: Flink Scala.
>>> >
>>> > Load: from CSV to local postgres database.
>>> >
>>> > Thanks
>>> >
>>> > Jagan
>>> >
>>> > 0044-7411239688
>>> >
>>> >
>>> >
>>> > --
>>> > Regards.
>>> > Jagan.
>>> >
>>> > 
>>> > -
>>> > *  The Good You Do Today, People will often forget tomorrow: Do
>>> Good
>>> > Anyway 
>>> >
>>>
>>
>>
>>
>> --
>> Regards.
>> Jagan.
>>
>> 
>> -
>> *  The Good You Do Today, People will often forget tomorrow: Do Good
>> Anyway 
>>
>
>
>
> --
> Regards.
> Jagan.
>
> 
> -
> *  The Good You Do Today, People will often forget tomorrow: Do Good
> Anyway 
>



-- 
Regards.
Jagan.

-
*  The Good You Do Today, People will often forget tomorrow: Do Good
Anyway 


Flink Accumulators vs Metrics

2016-09-22 Thread Chawla,Sumit
Hi All

Based on my code reading,  I have following understanding of the Metrics
and Accumulators.

1.  Accumulators for a Flink JOB work like global counters.  They are
designed so that accumulator values from different instances of Execution
Vertex can be combined.  They are essentially distributed counters.

2.  Flink Metrics are local to Task Manager which is reporting those, and
need external aggregation for a Job centric view

I see that one can defined User Metrics as part of writing Flink Programs.
But these metrics would not be consolidated when the job is running same
task on different task managers.  Having said that,  Is it fair to classify
that Metrics are for surfacing operation details only, and would not be
replacing Accumulators anytime.

For my use case, I wanted to maintain some Global counters/ histograms. (
like the one available in Storm - e.g. Total Messages Processed in last 1
minute, last 10 minutes etc).   Metrics would have been perfect fit for
these but one would need to employ external aggregations to come up with
holistic view of metrics at JOB level.

Please correct my understanding if i am missing something here.


Regards
Sumit Chawla


Re: Get Flink ExecutionGraph Programmatically

2016-09-22 Thread Chawla,Sumit
HI Aljoscha

I was able to get the ClusterClient and Accumulators using following:

DefaultCLI defaultCLI = new DefaultCLI();
CommandLine line = new DefaultParser().parse(new Options(), new
String[]{}, true);
ClusterClient clusterClient = defaultCLI.retrieveCluster(line,configuration);



Regards
Sumit Chawla


On Thu, Sep 22, 2016 at 4:55 AM, Aljoscha Krettek 
wrote:

> Hi,
> there is ClusterClient.getAccumulators(JobID jobID) which should be able
> to
> get the accumulators for a running job. If you can construct a
> ClusterClient that should be a good solution.
>
> Cheers,
> Aljoscha
>
> On Wed, 21 Sep 2016 at 21:15 Chawla,Sumit  wrote:
>
> > Hi Sean
> >
> > My goal here is to get User Accumulators.  I know there exists the REST
> > Calls.  But since i am running my code in the same JVM, i wanted to avoid
> > go over HTTP.  I saw this code in JobAccumulatorsHandler and tried to use
> > this.  Would you suggest some alternative approach to avoid this over the
> > network serialization for Akka?
> >
> > Regards
> > Sumit Chawla
> >
> >
> > On Wed, Sep 21, 2016 at 11:37 AM, Stephan Ewen  wrote:
> >
> > > Between two different actor systems in the same JVM, messages are still
> > > serialized (they go through a local socket, I think).
> > >
> > > Getting the execution graph is not easily possible, and not intended,
> as
> > it
> > > actually contains RPC resources, etc.
> > >
> > > What do you need from the execution graph? Maybe there is another way
> to
> > > achieve that...
> > >
> > > On Wed, Sep 21, 2016 at 8:08 PM, Chawla,Sumit 
> > > wrote:
> > >
> > > > Hi Chesney
> > > >
> > > > I am actually running this code in the same JVM as the WebInterface
> and
> > > > JobManager.  I am programmatically, starting the JobManager. and
> then
> > > > running this code in same JVM to query metrics.  Only difference
> could
> > be
> > > > that i am creating a new Akka ActorSystem, and ActorGateway. Not sure
> > if
> > > it
> > > > forces it to execute the code as if request is coming over the
> wire.  I
> > > am
> > > > not very well aware of Akka internals, so may be somebody can shed
> some
> > > > light on it.
> > > >
> > > > Regards
> > > > Sumit Chawla
> > > >
> > > >
> > > > On Wed, Sep 21, 2016 at 1:06 AM, Chesnay Schepler <
> ches...@apache.org>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > this is a rather subtle issue you stumbled upon here.
> > > > >
> > > > > The ExecutionGraph is not serializable. The only reason why the
> > > > > WebInterface can access it is because it runs in the same JVM as
> the
> > > > > JobManager.
> > > > >
> > > > > I'm not sure if there is a way for what you are trying to do.
> > > > >
> > > > > Regards,
> > > > > Chesnay
> > > > >
> > > > >
> > > > > On 21.09.2016 06:11, Chawla,Sumit wrote:
> > > > >
> > > > >> Hi All
> > > > >>
> > > > >>
> > > > >> I am trying to get JOB  accumulators.  ( I am aware that I can get
> > the
> > > > >> accumulators through REST APIs as well, but i wanted to avoid JSON
> > > > >> parsing).
> > > > >>
> > > > >> Looking at JobAccumulatorsHandler i am trying to get execution
> graph
> > > for
> > > > >> currently running job.  Following is my code:
> > > > >>
> > > > >>InetSocketAddress initialJobManagerAddress=new
> > > > >> InetSocketAddress(hostName,port);
> > > > >>  InetAddress ownHostname;
> > > > >>  ownHostname=
> > > > >> ConnectionUtils.findConnectingAddress(
> initialJobManagerAddress,2000,
> > > > 400);
> > > > >>
> > > > >>  ActorSystem actorSystem=
> AkkaUtils.createActorSystem(co
> > > > >> nfiguration,
> > > > >>  new Some(new
> > > > >> Tuple2(ownHostname.getCanonicalHostName(),0)));
> > > > >>
> > > > >>  FiniteDuration timeout= FiniteDuration.apply(10,
> > > > >> TimeUnit.SECONDS);
> > > > >>
> > > > >>  ActorGateway akkaActorGateway=
> > > > >> LeaderRetrievalUtils.retrieveLeaderGateway(
> > > > >>
> > > > >> LeaderRetrievalUtils.createLeaderRetrievalService(configuration),
> > > > >>  actorSystem,timeout
> > > > >>  );
> > > > >>
> > > > >>
> > > > >>  Future future=akkaActorGateway.ask(new
> > > > >> RequestJobDetails(true,false),timeout);
> > > > >>
> > > > >>  MultipleJobsDetails result=(MultipleJobsDetails)
> > > > >> Await.result(future,timeout);
> > > > >>  ExecutionGraphHolder executionGraphHolder=new
> > > > >> ExecutionGraphHolder(timeout);
> > > > >>  LOG.info(result.toString());
> > > > >>  for(JobDetails detail:result.getRunningJobs()){
> > > > >>  LOG.info(detail.getJobName() + "  ID " +
> > > > >> detail.getJobId());
> > > > >>
> > > > >> *ExecutionGraph
> > > > >> executionGraph=executionGraphHolder.getExecutionGraph(detail.
> > > > getJobId(),
> > > > >> akkaActorGateway);*
> > > > >>
> > 

[jira] [Created] (FLINK-4664) Add translator to NullValue

2016-09-22 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-4664:
-

 Summary: Add translator to NullValue
 Key: FLINK-4664
 URL: https://issues.apache.org/jira/browse/FLINK-4664
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 1.2.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor
 Fix For: 1.2.0


Existing translators convert from LongValue (the output label type of graph 
generators) to IntValue, StringValue, and an offset LongValue. Translators can 
also be used to convert vertex or edge values. This translator will be 
appropriate for translating these vertex or edge values to NullValue when the 
values are not used in an algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


FailureRate Restart Strategy is not picked from Config file

2016-09-22 Thread Deepak Jha
Hi All,
I tried to use FailureRate restart strategy by setting values for it in
flink-conf.yaml but flink (v 1.1.2) did not pick it up.

# Flink Restart strategy
restart-strategy: failure-rate
restart-strategy.failure-rate.delay: 120 s
restart-strategy.failure-rate.failure-rate-interval: 12 minute
restart-strategy.failure-rate.max-failures-per-interval: 300

It works when I set it up explicitly in topology using
*env.setRestartStrategy *

PFA snapshot of the Jobmanager log.
Thanks,
Deepak Jha


Flink Scala data load from CSV to postgress database.

2016-09-22 Thread Jagan
Hi Guys,

We have requirement like – loading data from local CSV file to Postgress
database using Flink Scala…We have tried number of ways all faile

Do you have any example for this? With dependency libraries to understand
how to load data from CSV to postgres

We have tried and searched in Google/Flinkweb website for data load, we
haven’t found any sample code for this requisite.

Code: Flink Scala.

Load: from CSV to local postgres database.

Thanks

Jagan

0044-7411239688


Re: Flink Scala - data load from CSV to postgress database.

2016-09-22 Thread Jagan
Hi Team,

  Will you be able to guide me on this? Is this a known issue that we
can't implement  dataload in flink scala ?

   data load from csv to postgress or any relational database in Flink
Scala

Thanks

Jagan.

On 22 September 2016 at 20:15, Jagan  wrote:

> Thanks Suneel,
>
> but client want to implement the data load in Flink Scala..
>
>
> On 22 September 2016 at 20:07, Suneel Marthi  wrote:
>
>> Couldn't u use SQLLoader or something for doing that?
>>
>> http://stackoverflow.com/questions/2987433/how-to-import-
>> csv-file-data-into-a-postgresql-table
>>
>>
>>
>> On Thu, Sep 22, 2016 at 3:01 PM, Jagan  wrote:
>>
>> > Hi Guys,
>> >
>> > We have a requirement like – loading data from local CSV file to
>> Postgress
>> > database using Flink Scala…We have tried number of ways all failed
>> >
>> > Do you have any example for this? With dependency libraries to
>> understand
>> > how to load data from CSV to postgres
>> >
>> > We have tried and searched in Google/Flinkweb website for data load, we
>> > haven’t found any sample code for this requisite.
>> >
>> > Code: Flink Scala.
>> >
>> > Load: from CSV to local postgres database.
>> >
>> > Thanks
>> >
>> > Jagan
>> >
>> > 0044-7411239688
>> >
>> >
>> >
>> > --
>> > Regards.
>> > Jagan.
>> >
>> > 
>> > -
>> > *  The Good You Do Today, People will often forget tomorrow: Do Good
>> > Anyway 
>> >
>>
>
>
>
> --
> Regards.
> Jagan.
>
> 
> -
> *  The Good You Do Today, People will often forget tomorrow: Do Good
> Anyway 
>



-- 
Regards.
Jagan.

-
*  The Good You Do Today, People will often forget tomorrow: Do Good
Anyway 


Re: Flink Scala - data load from CSV to postgress database.

2016-09-22 Thread Jagan
Thanks Suneel,

but client want to implement the data load in Flink Scala..


On 22 September 2016 at 20:07, Suneel Marthi  wrote:

> Couldn't u use SQLLoader or something for doing that?
>
> http://stackoverflow.com/questions/2987433/how-to-
> import-csv-file-data-into-a-postgresql-table
>
>
>
> On Thu, Sep 22, 2016 at 3:01 PM, Jagan  wrote:
>
> > Hi Guys,
> >
> > We have a requirement like – loading data from local CSV file to
> Postgress
> > database using Flink Scala…We have tried number of ways all failed
> >
> > Do you have any example for this? With dependency libraries to understand
> > how to load data from CSV to postgres
> >
> > We have tried and searched in Google/Flinkweb website for data load, we
> > haven’t found any sample code for this requisite.
> >
> > Code: Flink Scala.
> >
> > Load: from CSV to local postgres database.
> >
> > Thanks
> >
> > Jagan
> >
> > 0044-7411239688
> >
> >
> >
> > --
> > Regards.
> > Jagan.
> >
> > 
> > -
> > *  The Good You Do Today, People will often forget tomorrow: Do Good
> > Anyway 
> >
>



-- 
Regards.
Jagan.

-
*  The Good You Do Today, People will often forget tomorrow: Do Good
Anyway 


Re: Flink Scala - data load from CSV to postgress database.

2016-09-22 Thread Suneel Marthi
Couldn't u use SQLLoader or something for doing that?

http://stackoverflow.com/questions/2987433/how-to-import-csv-file-data-into-a-postgresql-table



On Thu, Sep 22, 2016 at 3:01 PM, Jagan  wrote:

> Hi Guys,
>
> We have a requirement like – loading data from local CSV file to Postgress
> database using Flink Scala…We have tried number of ways all failed
>
> Do you have any example for this? With dependency libraries to understand
> how to load data from CSV to postgres
>
> We have tried and searched in Google/Flinkweb website for data load, we
> haven’t found any sample code for this requisite.
>
> Code: Flink Scala.
>
> Load: from CSV to local postgres database.
>
> Thanks
>
> Jagan
>
> 0044-7411239688
>
>
>
> --
> Regards.
> Jagan.
>
> 
> -
> *  The Good You Do Today, People will often forget tomorrow: Do Good
> Anyway 
>


Fwd: Flink Scala - data load from CSV to postgress database.

2016-09-22 Thread Jagan
Hi Guys,

We have a requirement like – loading data from local CSV file to Postgress
database using Flink Scala…We have tried number of ways all failed

Do you have any example for this? With dependency libraries to understand
how to load data from CSV to postgres

We have tried and searched in Google/Flinkweb website for data load, we
haven’t found any sample code for this requisite.

Code: Flink Scala.

Load: from CSV to local postgres database.

Thanks

Jagan

0044-7411239688



-- 
Regards.
Jagan.

-
*  The Good You Do Today, People will often forget tomorrow: Do Good
Anyway 


Re: [DISCUSS] Proposal for Asynchronous I/O in FLINK

2016-09-22 Thread Shannon Carey
Not to derail this thread onto another topic but the problem with using a 
static instance is that there's no way to shut it down when the job stops. So 
if, for example, it starts threads, I don't think those threads will stop when 
the job stops. I'm not very well versed in how various Java 8 implementations 
perform unloading of classloaders & class definitions/statics therein, but it 
seems problematic unless the job provides a shutdown hook to which user code 
can subscribe.




On 9/21/16, 8:05 PM, "David Wang"  wrote:

>Hi Shannon,
>
>That's right. This FLIP aims to boost TPS of the task workers with async
>i/o operation.
>
>As what Stephan has mentioned, by placing static attribute to shared
>resources(like event pool, connection), it is possible to share those
>resources among different slots in the same JVM.
>
>I will make a note in the FLIP about how to share resources ;D
>
>Thanks,
>David
>
>2016-09-22 1:46 GMT+08:00 Stephan Ewen :
>
>> @Shannon: One could have a "static" broker to share the same netty across
>> slots in the same JVM. Implicitly, Flink does the same with broadcast
>> variables.
>>
>> On Wed, Sep 21, 2016 at 5:04 PM, Shannon Carey  wrote:
>>
>> > David,
>> >
>> > I just wanted to say "thanks" for making this proposal! I'm also
>> > interested in performing nonblocking I/O (multiplexing threads/reactive
>> > programming) within Flink operators so that we can, for example,
>> > communicate with external web services with Netty/RxNetty without
>> blocking
>> > an entire Flink slot (aka a thread) while we wait for the operation to
>> > complete. It looks like your FLIP will enable that use case.
>> >
>> > I'm not sure whether it will be possible to share one Netty
>> EventLoopGroup
>> > (or the equivalent for any other non-blocking framework, connection pool,
>> > etc.) among multiple slots in a single JVM though. Flink supports
>> > open/close operation on a RichFunction, but that's on a per-slot basis. I
>> > don't know of a way to open/close objects on a per-job-JVM basis. But I
>> > suppose that's an issue that should be discussed and resolved separately.
>> >
>> > -Shannon
>> >
>> >
>>


Re: [DISCUSS] Merge batch and stream connector modules

2016-09-22 Thread Stephan Ewen
+1 for Fabian's suggestion

On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule 
wrote:

> +1
> It will be good to have one module flink-connectors (union of streaming and
> batch connectors).
>
> Regards,
> Swapnil
>
> On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske  wrote:
>
> > Hi everybody,
> >
> > right now, we have two separate Maven modules for batch and streaming
> > connectors (flink-batch-connectors and flink-streaming-connectors) that
> > contain modules for the individual external systems and storage formats
> > such as HBase, Cassandra, Avro, Elasticsearch, etc.
> >
> > Some of these systems can be used in streaming as well as batch jobs as
> for
> > instance HBase, Cassandra, and Elasticsearch. However, due to the
> separate
> > main modules for streaming and batch connectors, we currently need to
> > decide where to put a connector. For example, the
> flink-connector-cassandra
> > module is located in flink-streaming-connectors but includes a
> > CassandraInputFormat and CassandraOutputFormat (i.e., a batch source and
> > sink).
> >
> > In my opinion, it would be better to just merge flink-batch-connectors
> and
> > flink-streaming-connectors into a joint flink-connectors module.
> >
> > This would be only an internal restructuring of code and not be visible
> to
> > users (unless we change the module names of the individual connectors
> which
> > is not necessary, IMO).
> >
> > What do others think?
> >
> > Best, Fabian
> >
>


Re: [DISCUSS] Merge batch and stream connector modules

2016-09-22 Thread Swapnil Chougule
+1
It will be good to have one module flink-connectors (union of streaming and
batch connectors).

Regards,
Swapnil

On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske  wrote:

> Hi everybody,
>
> right now, we have two separate Maven modules for batch and streaming
> connectors (flink-batch-connectors and flink-streaming-connectors) that
> contain modules for the individual external systems and storage formats
> such as HBase, Cassandra, Avro, Elasticsearch, etc.
>
> Some of these systems can be used in streaming as well as batch jobs as for
> instance HBase, Cassandra, and Elasticsearch. However, due to the separate
> main modules for streaming and batch connectors, we currently need to
> decide where to put a connector. For example, the flink-connector-cassandra
> module is located in flink-streaming-connectors but includes a
> CassandraInputFormat and CassandraOutputFormat (i.e., a batch source and
> sink).
>
> In my opinion, it would be better to just merge flink-batch-connectors and
> flink-streaming-connectors into a joint flink-connectors module.
>
> This would be only an internal restructuring of code and not be visible to
> users (unless we change the module names of the individual connectors which
> is not necessary, IMO).
>
> What do others think?
>
> Best, Fabian
>


[DISCUSS] Merge batch and stream connector modules

2016-09-22 Thread Fabian Hueske
Hi everybody,

right now, we have two separate Maven modules for batch and streaming
connectors (flink-batch-connectors and flink-streaming-connectors) that
contain modules for the individual external systems and storage formats
such as HBase, Cassandra, Avro, Elasticsearch, etc.

Some of these systems can be used in streaming as well as batch jobs as for
instance HBase, Cassandra, and Elasticsearch. However, due to the separate
main modules for streaming and batch connectors, we currently need to
decide where to put a connector. For example, the flink-connector-cassandra
module is located in flink-streaming-connectors but includes a
CassandraInputFormat and CassandraOutputFormat (i.e., a batch source and
sink).

In my opinion, it would be better to just merge flink-batch-connectors and
flink-streaming-connectors into a joint flink-connectors module.

This would be only an internal restructuring of code and not be visible to
users (unless we change the module names of the individual connectors which
is not necessary, IMO).

What do others think?

Best, Fabian


Re: Get Flink ExecutionGraph Programmatically

2016-09-22 Thread Aljoscha Krettek
Hi,
there is ClusterClient.getAccumulators(JobID jobID) which should be able to
get the accumulators for a running job. If you can construct a
ClusterClient that should be a good solution.

Cheers,
Aljoscha

On Wed, 21 Sep 2016 at 21:15 Chawla,Sumit  wrote:

> Hi Sean
>
> My goal here is to get User Accumulators.  I know there exists the REST
> Calls.  But since i am running my code in the same JVM, i wanted to avoid
> go over HTTP.  I saw this code in JobAccumulatorsHandler and tried to use
> this.  Would you suggest some alternative approach to avoid this over the
> network serialization for Akka?
>
> Regards
> Sumit Chawla
>
>
> On Wed, Sep 21, 2016 at 11:37 AM, Stephan Ewen  wrote:
>
> > Between two different actor systems in the same JVM, messages are still
> > serialized (they go through a local socket, I think).
> >
> > Getting the execution graph is not easily possible, and not intended, as
> it
> > actually contains RPC resources, etc.
> >
> > What do you need from the execution graph? Maybe there is another way to
> > achieve that...
> >
> > On Wed, Sep 21, 2016 at 8:08 PM, Chawla,Sumit 
> > wrote:
> >
> > > Hi Chesney
> > >
> > > I am actually running this code in the same JVM as the WebInterface and
> > > JobManager.  I am programmatically, starting the JobManager. and  then
> > > running this code in same JVM to query metrics.  Only difference could
> be
> > > that i am creating a new Akka ActorSystem, and ActorGateway. Not sure
> if
> > it
> > > forces it to execute the code as if request is coming over the wire.  I
> > am
> > > not very well aware of Akka internals, so may be somebody can shed some
> > > light on it.
> > >
> > > Regards
> > > Sumit Chawla
> > >
> > >
> > > On Wed, Sep 21, 2016 at 1:06 AM, Chesnay Schepler 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > this is a rather subtle issue you stumbled upon here.
> > > >
> > > > The ExecutionGraph is not serializable. The only reason why the
> > > > WebInterface can access it is because it runs in the same JVM as the
> > > > JobManager.
> > > >
> > > > I'm not sure if there is a way for what you are trying to do.
> > > >
> > > > Regards,
> > > > Chesnay
> > > >
> > > >
> > > > On 21.09.2016 06:11, Chawla,Sumit wrote:
> > > >
> > > >> Hi All
> > > >>
> > > >>
> > > >> I am trying to get JOB  accumulators.  ( I am aware that I can get
> the
> > > >> accumulators through REST APIs as well, but i wanted to avoid JSON
> > > >> parsing).
> > > >>
> > > >> Looking at JobAccumulatorsHandler i am trying to get execution graph
> > for
> > > >> currently running job.  Following is my code:
> > > >>
> > > >>InetSocketAddress initialJobManagerAddress=new
> > > >> InetSocketAddress(hostName,port);
> > > >>  InetAddress ownHostname;
> > > >>  ownHostname=
> > > >> ConnectionUtils.findConnectingAddress(initialJobManagerAddress,2000,
> > > 400);
> > > >>
> > > >>  ActorSystem actorSystem= AkkaUtils.createActorSystem(co
> > > >> nfiguration,
> > > >>  new Some(new
> > > >> Tuple2(ownHostname.getCanonicalHostName(),0)));
> > > >>
> > > >>  FiniteDuration timeout= FiniteDuration.apply(10,
> > > >> TimeUnit.SECONDS);
> > > >>
> > > >>  ActorGateway akkaActorGateway=
> > > >> LeaderRetrievalUtils.retrieveLeaderGateway(
> > > >>
> > > >> LeaderRetrievalUtils.createLeaderRetrievalService(configuration),
> > > >>  actorSystem,timeout
> > > >>  );
> > > >>
> > > >>
> > > >>  Future future=akkaActorGateway.ask(new
> > > >> RequestJobDetails(true,false),timeout);
> > > >>
> > > >>  MultipleJobsDetails result=(MultipleJobsDetails)
> > > >> Await.result(future,timeout);
> > > >>  ExecutionGraphHolder executionGraphHolder=new
> > > >> ExecutionGraphHolder(timeout);
> > > >>  LOG.info(result.toString());
> > > >>  for(JobDetails detail:result.getRunningJobs()){
> > > >>  LOG.info(detail.getJobName() + "  ID " +
> > > >> detail.getJobId());
> > > >>
> > > >> *ExecutionGraph
> > > >> executionGraph=executionGraphHolder.getExecutionGraph(detail.
> > > getJobId(),
> > > >> akkaActorGateway);*
> > > >>
> > > >> LOG.info("Accumulators " +
> > > >> executionGraph.aggregateUserAccumulators());
> > > >>  }
> > > >>
> > > >>
> > > >> However, i am receiving following error in Flink:
> > > >>
> > > >> 2016-09-20T14:19:53,201 [flink-akka.actor.default-dispatcher-3]
> > nobody
> > > >> ERROR akka.remote.EndpointWriter - Transient association error
> > > >> (association
> > > >> remains live)
> > > >> java.io.NotSerializableException: org.apache.flink.runtime.
> > checkpoint.
> > > >> CheckpointCoordinator
> > > >>  at java.io.ObjectOutputStream.writeObject0(
> > ObjectOutputStream.
> > > >> java:1184)
> > > >> ~[?:1.8.0_92]
> > > >>   

[jira] [Created] (FLINK-4663) Flink JDBCOutputFormat logs wrong WARN message

2016-09-22 Thread Swapnil Chougule (JIRA)
Swapnil Chougule created FLINK-4663:
---

 Summary: Flink JDBCOutputFormat logs wrong WARN message
 Key: FLINK-4663
 URL: https://issues.apache.org/jira/browse/FLINK-4663
 Project: Flink
  Issue Type: Bug
  Components: Batch Connectors and Input/Output Formats
Affects Versions: 1.1.2, 1.1.1
 Environment: Across Platform
Reporter: Swapnil Chougule
 Fix For: 1.1.3


Flink JDBCOutputFormat logs wrong WARN message as 
"Column SQL types array doesn't match arity of passed Row! Check the passed 
array..."
even if there is no mismatch is SQL types array & arity of passed Row.

 
(flink/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCOutputFormat.java)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: On (FLINK-1526) JIRA issue

2016-09-22 Thread Vasiliki Kalavri
Exactly :) That's why we haven't added neither the spanning tree nor the
strongly connected components algorithms yet.

On Sep 22, 2016 12:16 PM, "Stephan Ewen"  wrote:

> Just as a general comment:
>
> A program with nested loops is most likely not going to be performant on
> any way. It makes sense to re-think the algorithm, come up with a modified
> or different pattern, rather than trying to implement the exact algorithm
> line by line.
>
> It may be worth checking that, because I am not sure if Gelly should have
> algorithms that don't perform well.
>
> On Thu, Sep 22, 2016 at 11:40 AM, Vasiliki Kalavri <
> vasilikikala...@gmail.com> wrote:
>
> > Hi Olga,
> >
> > when you use mapEdges() or mapVertices() with generics, Flink cannot
> > determine the type because of type erasure, like the exception says.
> That's
> > why we also provide methods that take the type information as a
> parameter.
> > You can use those to make the return type explicit. In your example, you
> > should do something like the following (line 41):
> >
> > final TypeInformation longType = BasicTypeInfo.LONG_TYPE_INFO;
> > final TypeInformation doubleType = BasicTypeInfo.DOUBLE_TYPE_
> INFO;
> > Graph> graphOut =
> > graph.mapEdges(new InitializeEdges(), new
> > TupleTypeInfo(longType, longType,
> > new TupleTypeInfo>(doubleType,
> > longType,longType)));
> >
> > Regarding the nested loops, I am almost sure that you will face problems
> if
> > you try to experiment with large datasets. I haven't looked into your
> code
> > yet, but according to the JIRA discussion, we've faced this problem
> before
> > and afaik, this is still an issue.
> >
> > Cheers,
> > -Vasia.
> >
> > On 22 September 2016 at 01:12, Olga Golovneva 
> wrote:
> >
> > > Hi Vasia,
> > >
> > > I have uploaded these tests on github:
> > > https://github.com/OlgaGolovneva/MST/tree/master/tests
> > >
> > > I have also uploaded source code, but I'm still working on it:
> > > https://github.com/OlgaGolovneva/MST/tree/master/src
> > >
> > > ​>I think you cannot add attachments to the mailing list. Could you
> > upload
> > > >your example somewhere and post a link here? I'm actually surprised
> that
> > > >the while-loop works without problems.
> > >
> > > I have run the program on several simple tests, and I was going to try
> > > large datasets in the next few days. Please, let me know if this
> approach
> > > is wrong.
> > >
> > > Thanks,
> > > Olga
> > >
> > > On Wed, Sep 21, 2016 at 4:55 PM, Vasiliki Kalavri <
> > > vasilikikala...@gmail.com
> > > > wrote:
> > >
> > > > Hi Olga,
> > > >
> > > > On 21 September 2016 at 18:50, Olga Golovneva 
> > > wrote:
> > > >
> > > > > Hi devs,
> > > > >
> > > > > I was working on  (FLINK-1526) "Add Minimum Spanning Tree library
> > > method
> > > > > and example" issue. I've developed (Java) code that implements
> > > > distributed
> > > > > Boruvka's algorithm in Gelly library. I've run several tests and it
> > > seems
> > > > > to work fine, although I didn't test it on extremely large input
> > graphs
> > > > > yet, and I'm also trying to optimize my code.
> > > > > Particularly, I have two main issues:
> > > > >
> > > > > 1. Nested loops.
> > > > > I have to use nested loops, and I do not see the way to avoid them.
> > As
> > > > > they are currently not supported, I'm using Bulk Iterations inside
> a
> > > > > "classic" while loop. I've included in attachment simple example
> > > > > MyNestedIterationExample that shows this issue.
> > > > >
> > > >
> > > > ​I think you cannot add attachments to the mailing list. Could you
> > upload
> > > > your example somewhere and post a link here? I'm actually surprised
> > that
> > > > the while-loop works without problems.
> > > >
> > > >
> > > > >
> > > > > 2. For some reason I cannot create class that works with types with
> > > > > generic variables in Tuple2(or Tuple3), thus my code does not
> support
> > > > > generic types. I also included simple example MyTuple3Example. Here
> > is
> > > > the
> > > > > Exception I get:
> > > > > "Exception in thread "main" org.apache.flink.api.common.functions.
> > > > InvalidTypesException:
> > > > > Type of TypeVariable 'EV' in 'class org.apache.flink.graph.
> > > > > examples.MyTuple3Example$InitializeEdges' could not be determined.
> > > This
> > > > > is most likely a type erasure problem. The type extraction
> currently
> > > > > supports types with generic variables only in cases where all
> > variables
> > > > in
> > > > > the return type can be deduced from the input type(s)."
> > > > >
> > > >
> > > > ​Can you upload this example and link to it too?
> > > >
> > > > Thanks,
> > > > -Vasia.
> > > >
> > > >
> > > > >
> > > > > I would really appreciate if someone could explain me know how to
> > avoid
> > > > > this Exception. Otherwise, I could submit my code for testing.
> > > > >
> > > 

Re: 答复: [discuss] merge module flink-yarn and flink-yarn-test

2016-09-22 Thread Stephan Ewen
"flink-test-utils" contains, as the name says, utils for testing. Intended
to be used by users in writing their own tests.
"flink-tests" contains cross module tests, no user should ever need to have
a dependency on that.

They are different because users explicitly asked for test utils to be
factored into a separate project.

As an honest reply here: Setting up a project as huge as Flink need to take
many things into account

  - Multiple languages (Java / Scala), with limitations of IDEs in mind
  - Dependency conflicts and much shading magic
  - Dependency matrices (multiple hadoop and scala versions)
  - Supporting earlier Java versions
  - clean scope differentiation, so users can reuse utils and testing code


That simply requires some extra modules once in a while. Flink people have
worked hard on coming up with a structure that serves the need of the
production users and automated build/testing systems. These production user
requests are most important to us, and sometimes, we need to take cuts in
"beauty of directory structure" to help them.

Constantly accusing the community of creating bad structures before even
trying to understand the reasoning behind that does not come across as very
friendly. Constantly accusing the community of sloppy work just because
your laptop settings are incompatible with the default configuration
likewise.

I hope you understand that.


On Thu, Sep 22, 2016 at 2:58 AM, shijinkui  wrote:

> Hi, Stephan
>
> Thanks for your reply.
>
> In my mind, Maven-shade-plugin and sbt-assembly both default exclude test
> code for the fat jar.
>
> In fact, unit tests are use to test the main code, ensure our code logic
> fit our expect . This is general convention. I think. Flink has be a top
> apache project. We shouldn't be special. We're programmer, should be
> professional.
>
> Even more, there are `flink-tes-utils-parent` and `flink-tests` module,
> what's the relation between them.
>
> I have to ask why they are exist? Where is the start of such confusion
> modules?
>
> I think we shouldn't do nothing for this. Code and design should be
> comfortable.
>
> Thanks
>
> From Jinkui Shi
>
> -邮件原件-
> 发件人: Stephan Ewen [mailto:se...@apache.org]
> 发送时间: 2016年9月21日 22:19
> 收件人: dev@flink.apache.org
> 主题: Re: [discuss] merge module flink-yarn and flink-yarn-test
>
> I would like Robert to comment on this.
>
> I think there was a reason to have different modules, which had again
> something to do with the Maven Shade Plugin Dependencies and shading really
> seem the trickiest thing in bigger Java/Scala projects ;-)
>
> On Wed, Sep 21, 2016 at 11:04 AM, shijinkui  wrote:
>
> > Hi, All
> >
> > There too much module in the root. There are no necessary to separate
> > the test code from sub-module.
> >
> > I never see such design: two modules, one is main code, the other is
> > test code.
> >
> > Is there some special reason?
> >
> > From Jinkui Shi
> >
>


Re: On (FLINK-1526) JIRA issue

2016-09-22 Thread Stephan Ewen
Just as a general comment:

A program with nested loops is most likely not going to be performant on
any way. It makes sense to re-think the algorithm, come up with a modified
or different pattern, rather than trying to implement the exact algorithm
line by line.

It may be worth checking that, because I am not sure if Gelly should have
algorithms that don't perform well.

On Thu, Sep 22, 2016 at 11:40 AM, Vasiliki Kalavri <
vasilikikala...@gmail.com> wrote:

> Hi Olga,
>
> when you use mapEdges() or mapVertices() with generics, Flink cannot
> determine the type because of type erasure, like the exception says. That's
> why we also provide methods that take the type information as a parameter.
> You can use those to make the return type explicit. In your example, you
> should do something like the following (line 41):
>
> final TypeInformation longType = BasicTypeInfo.LONG_TYPE_INFO;
> final TypeInformation doubleType = BasicTypeInfo.DOUBLE_TYPE_INFO;
> Graph> graphOut =
> graph.mapEdges(new InitializeEdges(), new
> TupleTypeInfo(longType, longType,
> new TupleTypeInfo>(doubleType,
> longType,longType)));
>
> Regarding the nested loops, I am almost sure that you will face problems if
> you try to experiment with large datasets. I haven't looked into your code
> yet, but according to the JIRA discussion, we've faced this problem before
> and afaik, this is still an issue.
>
> Cheers,
> -Vasia.
>
> On 22 September 2016 at 01:12, Olga Golovneva  wrote:
>
> > Hi Vasia,
> >
> > I have uploaded these tests on github:
> > https://github.com/OlgaGolovneva/MST/tree/master/tests
> >
> > I have also uploaded source code, but I'm still working on it:
> > https://github.com/OlgaGolovneva/MST/tree/master/src
> >
> > ​>I think you cannot add attachments to the mailing list. Could you
> upload
> > >your example somewhere and post a link here? I'm actually surprised that
> > >the while-loop works without problems.
> >
> > I have run the program on several simple tests, and I was going to try
> > large datasets in the next few days. Please, let me know if this approach
> > is wrong.
> >
> > Thanks,
> > Olga
> >
> > On Wed, Sep 21, 2016 at 4:55 PM, Vasiliki Kalavri <
> > vasilikikala...@gmail.com
> > > wrote:
> >
> > > Hi Olga,
> > >
> > > On 21 September 2016 at 18:50, Olga Golovneva 
> > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I was working on  (FLINK-1526) "Add Minimum Spanning Tree library
> > method
> > > > and example" issue. I've developed (Java) code that implements
> > > distributed
> > > > Boruvka's algorithm in Gelly library. I've run several tests and it
> > seems
> > > > to work fine, although I didn't test it on extremely large input
> graphs
> > > > yet, and I'm also trying to optimize my code.
> > > > Particularly, I have two main issues:
> > > >
> > > > 1. Nested loops.
> > > > I have to use nested loops, and I do not see the way to avoid them.
> As
> > > > they are currently not supported, I'm using Bulk Iterations inside a
> > > > "classic" while loop. I've included in attachment simple example
> > > > MyNestedIterationExample that shows this issue.
> > > >
> > >
> > > ​I think you cannot add attachments to the mailing list. Could you
> upload
> > > your example somewhere and post a link here? I'm actually surprised
> that
> > > the while-loop works without problems.
> > >
> > >
> > > >
> > > > 2. For some reason I cannot create class that works with types with
> > > > generic variables in Tuple2(or Tuple3), thus my code does not support
> > > > generic types. I also included simple example MyTuple3Example. Here
> is
> > > the
> > > > Exception I get:
> > > > "Exception in thread "main" org.apache.flink.api.common.functions.
> > > InvalidTypesException:
> > > > Type of TypeVariable 'EV' in 'class org.apache.flink.graph.
> > > > examples.MyTuple3Example$InitializeEdges' could not be determined.
> > This
> > > > is most likely a type erasure problem. The type extraction currently
> > > > supports types with generic variables only in cases where all
> variables
> > > in
> > > > the return type can be deduced from the input type(s)."
> > > >
> > >
> > > ​Can you upload this example and link to it too?
> > >
> > > Thanks,
> > > -Vasia.
> > >
> > >
> > > >
> > > > I would really appreciate if someone could explain me know how to
> avoid
> > > > this Exception. Otherwise, I could submit my code for testing.
> > > >
> > > > Best regards,
> > > > Olga Golovneva
> > > >
> > >
> >
>


[jira] [Created] (FLINK-4662) Bump Calcite version up to 1.9

2016-09-22 Thread Timo Walther (JIRA)
Timo Walther created FLINK-4662:
---

 Summary: Bump Calcite version up to 1.9
 Key: FLINK-4662
 URL: https://issues.apache.org/jira/browse/FLINK-4662
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Timo Walther


Calcite just released the 1.9 version. We should adopt it also in the Table API 
especially for FLINK-4294.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: On (FLINK-1526) JIRA issue

2016-09-22 Thread Vasiliki Kalavri
Hi Olga,

when you use mapEdges() or mapVertices() with generics, Flink cannot
determine the type because of type erasure, like the exception says. That's
why we also provide methods that take the type information as a parameter.
You can use those to make the return type explicit. In your example, you
should do something like the following (line 41):

final TypeInformation longType = BasicTypeInfo.LONG_TYPE_INFO;
final TypeInformation doubleType = BasicTypeInfo.DOUBLE_TYPE_INFO;
Graph> graphOut =
graph.mapEdges(new InitializeEdges(), new
TupleTypeInfo(longType, longType,
new TupleTypeInfo>(doubleType,
longType,longType)));

Regarding the nested loops, I am almost sure that you will face problems if
you try to experiment with large datasets. I haven't looked into your code
yet, but according to the JIRA discussion, we've faced this problem before
and afaik, this is still an issue.

Cheers,
-Vasia.

On 22 September 2016 at 01:12, Olga Golovneva  wrote:

> Hi Vasia,
>
> I have uploaded these tests on github:
> https://github.com/OlgaGolovneva/MST/tree/master/tests
>
> I have also uploaded source code, but I'm still working on it:
> https://github.com/OlgaGolovneva/MST/tree/master/src
>
> ​>I think you cannot add attachments to the mailing list. Could you upload
> >your example somewhere and post a link here? I'm actually surprised that
> >the while-loop works without problems.
>
> I have run the program on several simple tests, and I was going to try
> large datasets in the next few days. Please, let me know if this approach
> is wrong.
>
> Thanks,
> Olga
>
> On Wed, Sep 21, 2016 at 4:55 PM, Vasiliki Kalavri <
> vasilikikala...@gmail.com
> > wrote:
>
> > Hi Olga,
> >
> > On 21 September 2016 at 18:50, Olga Golovneva 
> wrote:
> >
> > > Hi devs,
> > >
> > > I was working on  (FLINK-1526) "Add Minimum Spanning Tree library
> method
> > > and example" issue. I've developed (Java) code that implements
> > distributed
> > > Boruvka's algorithm in Gelly library. I've run several tests and it
> seems
> > > to work fine, although I didn't test it on extremely large input graphs
> > > yet, and I'm also trying to optimize my code.
> > > Particularly, I have two main issues:
> > >
> > > 1. Nested loops.
> > > I have to use nested loops, and I do not see the way to avoid them. As
> > > they are currently not supported, I'm using Bulk Iterations inside a
> > > "classic" while loop. I've included in attachment simple example
> > > MyNestedIterationExample that shows this issue.
> > >
> >
> > ​I think you cannot add attachments to the mailing list. Could you upload
> > your example somewhere and post a link here? I'm actually surprised that
> > the while-loop works without problems.
> >
> >
> > >
> > > 2. For some reason I cannot create class that works with types with
> > > generic variables in Tuple2(or Tuple3), thus my code does not support
> > > generic types. I also included simple example MyTuple3Example. Here is
> > the
> > > Exception I get:
> > > "Exception in thread "main" org.apache.flink.api.common.functions.
> > InvalidTypesException:
> > > Type of TypeVariable 'EV' in 'class org.apache.flink.graph.
> > > examples.MyTuple3Example$InitializeEdges' could not be determined.
> This
> > > is most likely a type erasure problem. The type extraction currently
> > > supports types with generic variables only in cases where all variables
> > in
> > > the return type can be deduced from the input type(s)."
> > >
> >
> > ​Can you upload this example and link to it too?
> >
> > Thanks,
> > -Vasia.
> >
> >
> > >
> > > I would really appreciate if someone could explain me know how to avoid
> > > this Exception. Otherwise, I could submit my code for testing.
> > >
> > > Best regards,
> > > Olga Golovneva
> > >
> >
>


[jira] [Created] (FLINK-4661) Failure to find org.apache.flink:flink-runtime_2.10:jar:tests

2016-09-22 Thread shijinkui (JIRA)
shijinkui created FLINK-4661:


 Summary: Failure to find 
org.apache.flink:flink-runtime_2.10:jar:tests
 Key: FLINK-4661
 URL: https://issues.apache.org/jira/browse/FLINK-4661
 Project: Flink
  Issue Type: Bug
Reporter: shijinkui


[ERROR] Failed to execute goal on project flink-streaming-java_2.10: Could not 
resolve dependencies for project 
org.apache.flink:flink-streaming-java_2.10:jar:1.2-SNAPSHOT: Failure to find 
org.apache.flink:flink-runtime_2.10:jar:tests:1.2-SNAPSHOT in 
http://localhost:/repository/maven-public/ was cached in the local 
repository, resolution will not be reattempted until the update interval of 
nexus-releases has elapsed or updates are forced -> [Help 1]


Failure to find org.apache.flink:flink-runtime_2.10:jar:tests

I can't find where this tests jar is generated.

By the way, recently half month, I start to use flink. There is zero time I can 
compile the Flink project with default setting..




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)