[jira] [Created] (FLINK-5500) Error when created array literals

2017-01-16 Thread Yuhong Hong (JIRA)
Yuhong Hong created FLINK-5500:
--

 Summary: Error when created array literals 
 Key: FLINK-5500
 URL: https://issues.apache.org/jira/browse/FLINK-5500
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.2.0
Reporter: Yuhong Hong


It will report error when i create an array literals on TableAPI or SQL.
TableAPI:
dataStream.toTable(tEnv).select("array(1,2,3)")
The complete stacktrace looks like:
{code:java}
Exception in thread "main" org.apache.flink.table.api.TableException: Result 
field does not match expected type. Expected: int[]; Actual: 
ObjectArrayTypeInfo
at 
org.apache.flink.table.typeutils.TypeConverter$.determineReturnType(TypeConverter.scala:109)
at 
org.apache.flink.table.plan.nodes.datastream.DataStreamCalc.translateToPlan(DataStreamCalc.scala:79)
at 
org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:340)
at 
org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:322)
at 
org.apache.flink.table.api.scala.StreamTableEnvironment.toDataStream(StreamTableEnvironment.scala:142)
at 
org.apache.flink.table.api.scala.TableConversions.toDataStream(TableConversions.scala:52)
at com.huawei.example.flink.SimpleSql$.main(SimpleSql.scala:77)
at com.huawei.example.flink.SimpleSql.main(SimpleSql.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
{code}

SQL:
tEnv.sql( "select ARRAY\[1,2,3\]  FROM  OrderA ")
{code:java}
Exception in thread "main" org.apache.flink.table.api.TableException: Type is 
not supported: ARRAY
at org.apache.flink.table.api.TableException$.apply(exceptions.scala:51)
at 
org.apache.flink.table.calcite.FlinkTypeFactory$.toTypeInfo(FlinkTypeFactory.scala:227)
at 
org.apache.flink.table.plan.logical.LogicalRelNode$$anonfun$15.apply(operators.scala:516)
at 
org.apache.flink.table.plan.logical.LogicalRelNode$$anonfun$15.apply(operators.scala:515)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.flink.table.plan.logical.LogicalRelNode.(operators.scala:515)
at 
org.apache.flink.table.api.StreamTableEnvironment.sql(StreamTableEnvironment.scala:183)
at com.huawei.example.flink.SimpleSql$.main(SimpleSql.scala:62)
{code}

The calcite parse the sql and will translate the array to type ArraySqlType, 
but the toTypeInfo in FlinkTypeFactory just support ArrayRelType.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Need guidance for write a client connector for 'Flink'

2017-01-16 Thread Fabian Hueske
Hi Pawan,

this sounds like you need to implement a custom InputFormat [1].
An InputFormat is basically executed in two phases. In the first phase it
generates InputSplits. An InputSplit references a a chunk of data that
needs to be read. Hence, InputSplits define how the input data is split to
be read in parallel. In the second phase, multiple InputFormats are started
and request InputSplits from an InputSplitProvider. Each instance of the
InputFormat processes one InputSplit at a time.

It is hard to give general advice on implementing InputFormats because this
very much depends on the data source and data format to read from.

I'd suggest to have a look at other InputFormats.

Best, Fabian

[1]
https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/io/InputFormat.java


2017-01-16 6:18 GMT+01:00 Pawan Manishka Gunarathna <
pawan.manis...@gmail.com>:

> Hi,
>
> we have a data analytics server that has analytics data tables. So I need
> to write a custom *Java* implementation for read data from that data source
> and do processing (*batch* processing) using Apache Flink. Basically it's
> like a new client connector for Flink.
>
> So It would be great if you can provide a guidance for my requirement.
>
> Thanks,
> Pawan
>


Re: Need guidance for write a client connector for 'Flink'

2017-01-16 Thread Pawan Manishka Gunarathna
Hi Fabian,
Thanks for providing those information.

On Mon, Jan 16, 2017 at 2:36 PM, Fabian Hueske  wrote:

> Hi Pawan,
>
> this sounds like you need to implement a custom InputFormat [1].
> An InputFormat is basically executed in two phases. In the first phase it
> generates InputSplits. An InputSplit references a a chunk of data that
> needs to be read. Hence, InputSplits define how the input data is split to
> be read in parallel. In the second phase, multiple InputFormats are started
> and request InputSplits from an InputSplitProvider. Each instance of the
> InputFormat processes one InputSplit at a time.
>
> It is hard to give general advice on implementing InputFormats because this
> very much depends on the data source and data format to read from.
>
> I'd suggest to have a look at other InputFormats.
>
> Best, Fabian
>
> [1]
> https://github.com/apache/flink/blob/master/flink-core/
> src/main/java/org/apache/flink/api/common/io/InputFormat.java
>
>
> 2017-01-16 6:18 GMT+01:00 Pawan Manishka Gunarathna <
> pawan.manis...@gmail.com>:
>
> > Hi,
> >
> > we have a data analytics server that has analytics data tables. So I need
> > to write a custom *Java* implementation for read data from that data
> source
> > and do processing (*batch* processing) using Apache Flink. Basically it's
> > like a new client connector for Flink.
> >
> > So It would be great if you can provide a guidance for my requirement.
> >
> > Thanks,
> > Pawan
> >
>



-- 

*Pawan Gunaratne*
*Mob: +94 770373556*


[jira] [Created] (FLINK-5501) Determine whether the job starts from last JobManager failure

2017-01-16 Thread Zhijiang Wang (JIRA)
Zhijiang Wang created FLINK-5501:


 Summary: Determine whether the job starts from last JobManager 
failure
 Key: FLINK-5501
 URL: https://issues.apache.org/jira/browse/FLINK-5501
 Project: Flink
  Issue Type: Sub-task
  Components: JobManager
Reporter: Zhijiang Wang


When the {{JobManagerRunner}} grants leadership, it should check whether the 
current job is already running or not. If the job is running, the 
{{JobManager}} should reconcile itself (enter RECONCILING state) and waits for 
the {{TaskManager}} reporting task status. Otherwise the {{JobManger}} can 
schedule the {{ExecutionGraph}} in common way.

The {{RunningJobsRegistry}} can provide the way to check the job running 
status, but we should expand the current interface and fix the related process 
to support this function.

1. {{RunningJobsRegistry}} sets RUNNING status after {{JobManagerRunner}} 
granting leadership at the first time.
2. If the job finishes, the job status will be set FINISHED by 
{{RunningJobsRegistry}} and the status will be deleted before exit. 
3. If the {{JobManager}} fails, the job status will be still in RUNNING, so 
when the {{JobManagerRunner}} (the previous or new one) grants leadership 
again, it checks the job status and enters {{RECONCILING}} state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Taking time off

2017-01-16 Thread Vasiliki Kalavri
Hi Max,

thank you for all your work! Enjoy your time off and hope to have you back
with us soon ^^

Cheers,
-Vasia.

On 14 January 2017 at 09:03, Maximilian Michels  wrote:

> Dear Squirrels,
>
> Thank you! It's been very exciting to see the Flink community grow and
> flourish over the past two years.
>
> For the beginning of this year, I decided to take some time off, which
> means I'll be less engaged on the mailing list or on GitHub/JIRA.
>
> In the meantime, if you have any questions I might be able to answer, feel
> free to contact me. Looking forward to see the squirrels rise further!
>
> Best,
> Max
>


[jira] [Created] (FLINK-5502) Add documentation about migrating from 1.1 to 1.2

2017-01-16 Thread Kostas Kloudas (JIRA)
Kostas Kloudas created FLINK-5502:
-

 Summary: Add documentation about migrating from 1.1 to 1.2
 Key: FLINK-5502
 URL: https://issues.apache.org/jira/browse/FLINK-5502
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.2.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas
 Fix For: 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Flink Code Base Best Practices

2017-01-16 Thread Ufuk Celebi
Hey all,

I've just created a Wiki page with a loose collection of some coding
"best practices" for the Flink project. The list was the result of a
discussion with other PMCs, committers, and contributors. It is not a
set of enforced rules, but a general list of tips, links to Flink
utilities, common patterns etc.

https://cwiki.apache.org/confluence/display/FLINK/Best+Practices+and+Lessons+Learned

Since the project only has a minimum set of enforced or automatically
checked rules, the goal of the document is to help both new and old
contributors alike with some guidelines. In the future we might
consider translating some of the listed items to automatic checks or
IDE setup templates.

The document is in progress. Feel free to comment, add or remove items
while browsing or working on the Flink code base.

– Ufuk


Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-16 Thread Fabian Hueske
A user reported that outer joins on the Table API and SQL compute wrong
results:

https://issues.apache.org/jira/browse/FLINK-5498

2017-01-15 20:23 GMT+01:00 Till Rohrmann :

> I found two problematic issues with Mesos HA mode which breaks it:
>
> https://issues.apache.org/jira/browse/FLINK-5495
> https://issues.apache.org/jira/browse/FLINK-5496
>
> On Fri, Jan 13, 2017 at 11:29 AM, Fabian Hueske  wrote:
>
> > I tested the Table API / SQL a bit.
> >
> > I implemented a windowed aggregation with the streaming Table API and it
> > produced the same results as a DataStream API implementation.
> > Joining a stream with a TableFunction also seemed to work well.
> > Moreover, I checked the results of a bunch of TPC-H queries (batch SQL)
> > and all produced correct results.
> >
> >
> >
> > 2017-01-12 17:45 GMT+01:00 Till Rohrmann :
> >
> >> I'm wondering whether we should not depend the webserver encryption on
> the
> >> global encryption activation and activating it instead per default.
> >>
> >> On Thu, Jan 12, 2017 at 4:54 PM, Chesnay Schepler 
> >> wrote:
> >>
> >> > FLINK-5470 is a duplicate of FLINK-5298 for which there is also an
> open
> >> PR.
> >> >
> >> > FLINK-5472 is imo invalid since the webserver does support https, you
> >> just
> >> > have to enable it as per the security documentation.
> >> >
> >> >
> >> > On 12.01.2017 16:20, Till Rohrmann wrote:
> >> >
> >> > I also found an issue:
> >> >
> >> > https://issues.apache.org/jira/browse/FLINK-5470
> >> >
> >> > I also noticed that Flink's webserver does not support https requests.
> >> It
> >> > might be worthwhile to add it, though.
> >> >
> >> > https://issues.apache.org/jira/browse/FLINK-5472
> >> >
> >> > On Thu, Jan 12, 2017 at 11:24 AM, Robert Metzger  >
> >> > wrote:
> >> >
> >> >> I also found a bunch of issues
> >> >>
> >> >> https://issues.apache.org/jira/browse/FLINK-5465
> >> >> https://issues.apache.org/jira/browse/FLINK-5462
> >> >> https://issues.apache.org/jira/browse/FLINK-5464
> >> >> https://issues.apache.org/jira/browse/FLINK-5463
> >> >>
> >> >>
> >> >> On Thu, Jan 12, 2017 at 9:56 AM, Fabian Hueske < 
> >> >> fhue...@gmail.com> wrote:
> >> >>
> >> >> > I have another bugfix for 1.2.:
> >> >> >
> >> >> > https://issues.apache.org/jira/browse/FLINK-2662 (pending PR)
> >> >> >
> >> >> > 2017-01-10 15:16 GMT+01:00 Robert Metzger < 
> >> >> rmetz...@apache.org>:
> >> >> >
> >> >> > > Hi,
> >> >> > >
> >> >> > > this depends a lot on the number of issues we find during the
> >> testing.
> >> >> > >
> >> >> > >
> >> >> > > These are the issues I found so far:
> >> >> > >
> >> >> > > https://issues.apache.org/jira/browse/FLINK-5379 (unresolved)
> >> >> > > https://issues.apache.org/jira/browse/FLINK-5383 (resolved)
> >> >> > > https://issues.apache.org/jira/browse/FLINK-5382 (resolved)
> >> >> > > https://issues.apache.org/jira/browse/FLINK-5381 (resolved)
> >> >> > > https://issues.apache.org/jira/browse/FLINK-5380 (pending PR)
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > On Tue, Jan 10, 2017 at 11:58 AM, shijinkui <
> shijin...@huawei.com>
> >> >> > wrote:
> >> >> > >
> >> >> > > > Do we have a probable time of 1.2 release? This month or Next
> >> month?
> >> >> > > >
> >> >> > > > -邮件原件-
> >> >> > > > 发件人: Robert Metzger [mailto: 
> >> >> rmetz...@apache.org]
> >> >> > > > 发送时间: 2017年1月3日 20:44
> >> >> > > > 收件人: dev@flink.apache.org
> >> >> > > > 抄送: u...@flink.apache.org
> >> >> > > > 主题: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing
> release
> >> >> > > candidate)
> >> >> > > >
> >> >> > > > Hi,
> >> >> > > >
> >> >> > > > First of all, I wish everybody a happy new year 2017.
> >> >> > > >
> >> >> > > > I've set user@flink in CC so that users who are interested in
> >> >> helping
> >> >> > > > with the testing get notified. Please respond only to the dev@
> >> >> list to
> >> >> > > > keep the discussion there!
> >> >> > > >
> >> >> > > > According to the 1.2 release discussion thread, I've created a
> >> first
> >> >> > > > release candidate for Flink 1.2.
> >> >> > > > The release candidate will not be the final release, because
> I'm
> >> >> > certain
> >> >> > > > that we'll find at least one blocking issue in the candidate :)
> >> >> > > >
> >> >> > > > Therefore, the RC is meant as a testing only release candidate.
> >> >> > > > Please report every issue we need to fix before the next RC in
> >> this
> >> >> > > thread
> >> >> > > > so that we have a good overview.
> >> >> > > >
> >> >> > > > The release artifacts are located here:
> >> >> > > > http://people.apache.org/~rmetzger/flink-1.2.0-rc0/
> >> >> > > >
> >> >> > > > The maven staging repository is located here:
> >> >> > > > https://repository.apache.org/content/repositories/orgapache
> >> >> flink-
> >> >> > > >
> >> >> > > > The release commit (in branch "release-1.2.0-rc0"):
> >> >> > > > http://git-wip-us.apache.org/repos/asf/flink/commit/f3c59ced
> >> >> > > >
> >> >> > > >
> >> >> > > > Happy testing!
> >> >> > > >
> >> >

[jira] [Created] (FLINK-5503) mesos-appmaster.sh script could print return value message

2017-01-16 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-5503:


 Summary: mesos-appmaster.sh script could print return value message
 Key: FLINK-5503
 URL: https://issues.apache.org/jira/browse/FLINK-5503
 Project: Flink
  Issue Type: Improvement
  Components: Mesos
Affects Versions: 1.2.0, 1.3.0
Reporter: Till Rohrmann
Priority: Minor
 Fix For: 1.3.0


The {{mesos-appmaster.sh}} does not print an error message if the return value 
of {{MesosApplicationMasterRunner}} is nonzero. This could help the user to 
realize that something went wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5504) mesos-appmaster.sh logs to wrong directory

2017-01-16 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-5504:


 Summary: mesos-appmaster.sh logs to wrong directory
 Key: FLINK-5504
 URL: https://issues.apache.org/jira/browse/FLINK-5504
 Project: Flink
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.2.0, 1.3.0
Reporter: Till Rohrmann
Priority: Minor
 Fix For: 1.3.0


The {{mesos-appmaster.sh}} script does not create the log file under 
{{FLINK_HOME/log}} and does not follow the naming convention. I think we should 
correct the behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5505) Rename recovery.zookeeper.path.mesos-workers into high-availability.zookeeper.path.mesos-workers

2017-01-16 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-5505:


 Summary: Rename recovery.zookeeper.path.mesos-workers into 
high-availability.zookeeper.path.mesos-workers
 Key: FLINK-5505
 URL: https://issues.apache.org/jira/browse/FLINK-5505
 Project: Flink
  Issue Type: Improvement
  Components: Mesos
Affects Versions: 1.2.0, 1.3.0
Reporter: Till Rohrmann
Priority: Trivial
 Fix For: 1.3.0


In order to harmonize configuration parameter names I think we should rename 
{{recovery.zookeeper.path.mesos-workers}} into 
{{high-availability.zookeeper.path.mesos-workers}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Flink CEP development is stalling

2017-01-16 Thread Stephan Ewen
Hi Ivan!

Thank you for bringing this thread up. I agree, we need to do something
about how some modules are currently handled.

The CEP library definitely needs more active committers. Adding new
committers will be necessary, I think, but as you mentioned, it needs at
least one (better more) experienced committers that can help the new
committers to get into the process and the technical matter. Otherwise we
cannot keep up a good quality.

How important the involvement of experienced committers is even for
something that seems more or less self-contained (like the CEP library),
has already been visible in some previous pull requests to the CEP library
- those were not compatible with the overall design or with the strategy
for making streaming applications rescalable. It is hard for new committers
to be aware of all that, hence the need for some experienced committers to
help.

Currently, the community is pushing hard on the 1.2 release: testing, docs,
fixes, usability. I expect that to take not too much longer.
After that, we will kick off some threads discussing about community and
project structure. That should involve how to deal with projects like the
CEP library, and also with the sheer size of the project and code base in
general.

Greetings,
Stephan


On Sun, Jan 15, 2017 at 11:28 PM, Ivan Mushketyk 
wrote:

> Hi Alexey,
>
> I agree with you. Most contributors are overloaded, but PRs for other
> sub-projects are reviewed much faster. In my experience, in most cases, you
> can get a first review for a PR in less than a week and it's usually merged
> within a month or less.
> Flink CEP is a notable exception. I believe the main reason for that is
> that there is only one core committer who currently can review Flink CEP
> PRs (Till) and he is very busy with other work.
>
> Best regards,
> Ivan.
>
> On Sun, 15 Jan 2017 at 19:22 Alexey Demin  wrote:
>
> > Hi Ivan,
> >
> > I think problem not only with CEP project.
> > Main contributors overloaded and simple fixes frequently are staying as
> PR
> > without merge.
> >
> > You can see how amount of open PR increasing over time.
> >
> > Thanks,
> > Alexey Diomin
> >
> >
> > 2017-01-15 17:18 GMT+04:00 Ivan Mushketyk :
> >
> > > Hi Dmitry,
> > >
> > > Your contributions are welcomed, but right now the most critical issue
> is
> > > that CEP project does not have an experienced Flink contributor who can
> > > review and approve new pull requests.
> > >
> > > I hope that Flink community will promptly resolve the issue, so feel
> free
> > > to take select a JIRA issue and work on it.
> > >
> > > Best regards,
> > > Ivan.
> > >
> > > On Sat, 14 Jan 2017 at 12:29 Dmitry Vorobiov <2belikespr...@gmail.com>
> > > wrote:
> > >
> > > I would be interested to contribute to CEP. I am following Flink
> project,
> > > but haven't contributed yet. Next 2 weeks I am a bit busy with my work,
> > but
> > > then I llbe happy to dig into it. I used to work in IoT so event
> > processing
> > > is a close topic for me.
> > >
> > > Dmitry.
> > > On Fri, 13 Jan 2017 at 14:58, Ivan Mushketyk  >
> > > wrote:
> > >
> > > > Hi Till,
> > > >
> > > > Thank you for your reply.
> > > >
> > > > I wonder if the following will work.
> > > > What if you can find a Flink committer/committers that will review
> and
> > > > iterate on CEP PRs before you review them. They don't need to know
> all
> > > CEP
> > > > internals, but they will help to eradicate most of the issues.
> > > > Then you will have to review PRs only when most of the issues are
> fixed
> > > and
> > > > to make a final decision about whether to merge a PR or not. In this
> > > case,
> > > > you probably won't need to spend much time on reviewing CEP PRs. As
> an
> > > > additional bonus, after some time these new CEP reviewers will learn
> > > enough
> > > > about CEP to review them by themselves without your input.
> > > >
> > > > What do you think about this?
> > > >
> > > > Best regards,
> > > > Ivan.
> > > >
> > > > On Fri, 13 Jan 2017 at 11:28 Till Rohrmann 
> > wrote:
> > > >
> > > > Hi Ivan,
> > > >
> > > > first of all let me apologise for the bad experience you've had with
> > > > opening CEP PRs in the past.
> > > >
> > > > The general problem as you've said is that there is nobody who
> reviews
> > > the
> > > > open PRs. I used to do this in the but at the moment I hardly find
> time
> > > due
> > > > to other commitments.
> > > >
> > > > I think the way to mitigate the problem is to attract more
> contributors
> > > and
> > > > committers who are willing to spend time on PR reviews and finally
> > (this
> > > > applies only to committers) to commit the PRs. I can try to reach out
> > to
> > > > other committers to make them aware of the CEP library.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Jan 12, 2017 at 9:15 AM,  >
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > I have some clientes interested in CEP features
> > > > >
> > > > > El 11/1/17 16:23, "Ivan Mushketyk" 
> > > escribió:
> >

[DISCUSS] (Not) tagging reviewers

2017-01-16 Thread Stephan Ewen
Hi!

I have seen that recently many pull requests designate reviews by writing
"@personA review please" or so.

I am personally quite strongly against that, I think it hurts the community
work:

  - The same few people get usually "designated" and will typically get
overloaded and often not do the review.

  - At the same time, this discourages other community members from looking
at the pull request, which is totally undesirable.

  - In general, review participation should be "pull based" (person decides
what they want to work on) not "push based" (random person pushes work to
another person). Push-based just creates the wrong feeling in a community
of volunteers.

  - In many cases the designated reviews are not the ones most
knowledgeable in the code, which is understandable, because how should
contributors know whom to tag?


Long story short, why don't we just drop that habit?


Greetings,
Stephan


Re: [DISCUSS] (Not) tagging reviewers

2017-01-16 Thread Fabian Hueske
Thanks for bringing this up Stephan.
I completely agree with you.

Cheers, Fabian

2017-01-16 12:42 GMT+01:00 Stephan Ewen :

> Hi!
>
> I have seen that recently many pull requests designate reviews by writing
> "@personA review please" or so.
>
> I am personally quite strongly against that, I think it hurts the community
> work:
>
>   - The same few people get usually "designated" and will typically get
> overloaded and often not do the review.
>
>   - At the same time, this discourages other community members from looking
> at the pull request, which is totally undesirable.
>
>   - In general, review participation should be "pull based" (person decides
> what they want to work on) not "push based" (random person pushes work to
> another person). Push-based just creates the wrong feeling in a community
> of volunteers.
>
>   - In many cases the designated reviews are not the ones most
> knowledgeable in the code, which is understandable, because how should
> contributors know whom to tag?
>
>
> Long story short, why don't we just drop that habit?
>
>
> Greetings,
> Stephan
>


[jira] [Created] (FLINK-5506) Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException

2017-01-16 Thread Miguel Carvalho Valente Esaguy Coimbra (JIRA)
Miguel Carvalho Valente Esaguy Coimbra created FLINK-5506:
-

 Summary: Java 8 - CommunityDetection.java:158 - 
java.lang.NullPointerException
 Key: FLINK-5506
 URL: https://issues.apache.org/jira/browse/FLINK-5506
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 1.1.4
Reporter: Miguel Carvalho Valente Esaguy Coimbra


Reporting this here as per Vasia's advice.
I am having the following problem while trying out the 
org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly API 
(Java).

Specs: JDK 1.8.0_102 x64
Apache Flink: 1.1.4

Suppose I have a very small (I tried an example with 38 vertices as well) 
dataset stored in a tab-separated file 3-vertex.tsv:

#id1 id2 score
010
020
030

This is just a central vertex with 3 neighbors (disconnected between 
themselves).
I am loading the dataset and executing the algorithm with the following code:


---
{{// Load the data from the .tsv file.
final DataSet> edgeTuples = 
env.readCsvFile(inputPath)
.fieldDelimiter("\t") // node IDs are separated by spaces
.ignoreComments("#")  // comments start with "%"
.types(Long.class, Long.class, Double.class);

// Generate a graph and add reverse edges (undirected).
final Graph graph = Graph.fromTupleDataSet(edgeTuples, new 
MapFunction() {
private static final long serialVersionUID = 8713516577419451509L;
public Long map(Long value) {
return value;
}
},
env).getUndirected();

// CommunityDetection parameters.
final double hopAttenuationDelta = 0.5d;
final int iterationCount = 10;

// Prepare and trigger the execution.
DataSet> vs = graph.run(new 
org.apache.flink.graph.library.CommunityDetection(iterationCount, 
hopAttenuationDelta)).getVertices();

vs.print();}}
​---​

​Running this code throws the following exception​ (check the bold line):

{{​org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.NullPointerException
at 
org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158)
at 
org.apache.flink.graph.spargel.ScatterGatherIteration$GatherUdfSimpleVV.coGroup(ScatterGatherIteration.java:389)
at 
org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
at 
org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:146)
at 
org.apache.flink.runtime.iterative.task.IterationTailTask.run(IterationTailTask.java:107)
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:642)
at java.lang.Thread.run(Thread.java:745)​}}


​After a further look, I set a breakpoint (Eclipse IDE debugging) at the line 
in bold:

{{org.apache.flink.graph.library.CommunityDetection.java (source code accessed 
automatically by Maven)
// find the highest score of maxScoreLabel
double highestScore = labelsWithHighestScore.get(maxScoreLabel);​}}

​- maxScoreLabel has the value 3.​

- labelsWithHighestScore was initialized as: Map 
labelsWithHighestScore = new TreeMap<>();

- labelsWithHighestScore is a TreeMap and has the values:

{0=0.0}
null
null
[0=0.0]
null
1​

​It seems that the value 3 should have been added to that 
​labelsWithHighestScore some time during execution, but because it wasn't, an 
exception is thrown.

I

Re: [DISCUSS] (Not) tagging reviewers

2017-01-16 Thread Paris Carbone
I also agree with all the points, especially when it comes to new PRs.

Though, when someone has started reviewing a PR and shows interest it probably 
makes sense to finish doing so. Wouldn’t tagging be acceptable there?
In those case tagging triggers direct notifications, so that people already 
involved in a conversation get reminded and answer pending questions. 

> On 16 Jan 2017, at 12:45, Fabian Hueske  wrote:
> 
> Thanks for bringing this up Stephan.
> I completely agree with you.
> 
> Cheers, Fabian
> 
> 2017-01-16 12:42 GMT+01:00 Stephan Ewen :
> 
>> Hi!
>> 
>> I have seen that recently many pull requests designate reviews by writing
>> "@personA review please" or so.
>> 
>> I am personally quite strongly against that, I think it hurts the community
>> work:
>> 
>>  - The same few people get usually "designated" and will typically get
>> overloaded and often not do the review.
>> 
>>  - At the same time, this discourages other community members from looking
>> at the pull request, which is totally undesirable.
>> 
>>  - In general, review participation should be "pull based" (person decides
>> what they want to work on) not "push based" (random person pushes work to
>> another person). Push-based just creates the wrong feeling in a community
>> of volunteers.
>> 
>>  - In many cases the designated reviews are not the ones most
>> knowledgeable in the code, which is understandable, because how should
>> contributors know whom to tag?
>> 
>> 
>> Long story short, why don't we just drop that habit?
>> 
>> 
>> Greetings,
>> Stephan
>> 



[jira] [Created] (FLINK-5507) remove queryable list state sink

2017-01-16 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5507:
--

 Summary: remove queryable list state sink
 Key: FLINK-5507
 URL: https://issues.apache.org/jira/browse/FLINK-5507
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The queryable state "sink" using ListState 
(".asQueryableState(, ListStateDescriptor)") stores all incoming data 
forever and is never cleaned. Eventually, it will pile up too much memory and 
is thus of limited use.

We should remove it from the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] (Not) tagging reviewers

2017-01-16 Thread Ufuk Celebi
On 16 January 2017 at 12:59:04, Paris Carbone (par...@kth.se) wrote:
> > Though, when someone has started reviewing a PR and shows interest  
> it probably makes sense to finish doing so. Wouldn’t tagging  
> be acceptable there?
> In those case tagging triggers direct notifications, so that  
> people already involved in a conversation get reminded and answer  
> pending questions.

I think that's totally fine Paris since it is more of a reminder in that case.

Stephan is referring to PRs that have a last line in the description like "@XZY 
for review please".

– Ufuk




Re: [DISCUSS] (Not) tagging reviewers

2017-01-16 Thread Alexey Demin
Hi all

View from my prospective:
in middle of summer - 150 PR
in middle of autumn - 180
now 206.

This is mix of bugfixes and improvements.
I understand that work on new features important, but when small and
trivial fixes stay in states of PR more then 2-3 month,
then all users think about changing engine on other product.

Only way push people to merge this fixes in master it's tags.

I don't speak about big changes, only about small and trivial with review
less then 5 min.

Features important, but if this features work incorrect, then user can
select more stability product without any hesitation.

Thanks
Alexey Diomin



2017-01-16 16:36 GMT+04:00 Ufuk Celebi :

> On 16 January 2017 at 12:59:04, Paris Carbone (par...@kth.se) wrote:
> > > Though, when someone has started reviewing a PR and shows interest
> > it probably makes sense to finish doing so. Wouldn’t tagging
> > be acceptable there?
> > In those case tagging triggers direct notifications, so that
> > people already involved in a conversation get reminded and answer
> > pending questions.
>
> I think that's totally fine Paris since it is more of a reminder in that
> case.
>
> Stephan is referring to PRs that have a last line in the description like
> "@XZY for review please".
>
> – Ufuk
>
>
>


[jira] [Created] (FLINK-5508) Remove Mesos dynamic class loading

2017-01-16 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-5508:


 Summary: Remove Mesos dynamic class loading
 Key: FLINK-5508
 URL: https://issues.apache.org/jira/browse/FLINK-5508
 Project: Flink
  Issue Type: Improvement
  Components: Mesos
Affects Versions: 1.2.0, 1.3.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
Priority: Minor
 Fix For: 1.2.0, 1.3.0


Mesos uses dynamic class loading in order to load the 
{{ZooKeeperStateHandleStore}} and the {{CuratorFramework}} class. This can be 
replaced by a compile time dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] (Not) tagging reviewers

2017-01-16 Thread Stephan Ewen
Thanks for the comments.

@Paris - Ufuk has it right, tagging as a reminder (or just because it helps
with referring to the comment from a specific reviewer) makes total sense
to me, I would keep doing that.


On Mon, Jan 16, 2017 at 1:36 PM, Ufuk Celebi  wrote:

> On 16 January 2017 at 12:59:04, Paris Carbone (par...@kth.se) wrote:
> > > Though, when someone has started reviewing a PR and shows interest
> > it probably makes sense to finish doing so. Wouldn’t tagging
> > be acceptable there?
> > In those case tagging triggers direct notifications, so that
> > people already involved in a conversation get reminded and answer
> > pending questions.
>
> I think that's totally fine Paris since it is more of a reminder in that
> case.
>
> Stephan is referring to PRs that have a last line in the description like
> "@XZY for review please".
>
> – Ufuk
>
>
>


[jira] [Created] (FLINK-5509) Replace QueryableStateClient keyHashCode argument

2017-01-16 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5509:
--

 Summary: Replace QueryableStateClient keyHashCode argument
 Key: FLINK-5509
 URL: https://issues.apache.org/jira/browse/FLINK-5509
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Reporter: Ufuk Celebi
Priority: Minor


When going over the low level QueryableStateClient with [~NicoK] we noticed 
that the key hashCode argument can be confusing to users:

{code}
Future getKvState(
  JobID jobId,
  String name,
  int keyHashCode,
  byte[] serializedKeyAndNamespace)
{code}

The {{keyHashCode}} argument is the result of calling {{hashCode()}} on the key 
to look up. This is what is send to the JobManager in order to look up the 
location of the key. While pretty straight forward, it is repetitive and 
possibly confusing.

As an alternative we suggest to make the method generic and simply call 
hashCode on the object ourselves. This way the user just provides the key 
object.

Since there are some early users of the queryable state API already, we would 
suggest to rename the method in order to provoke a compilation error after 
upgrading to the actually released 1.2 version.

(This would also work without renaming since the hashCode of Integer (what 
users currently provide) is the same number, but it would be confusing why it 
acutally works.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Flink Code Base Best Practices

2017-01-16 Thread Stephan Ewen
Very nice, thanks Ufuk!



On Mon, Jan 16, 2017 at 12:05 PM, Ufuk Celebi  wrote:

> Hey all,
>
> I've just created a Wiki page with a loose collection of some coding
> "best practices" for the Flink project. The list was the result of a
> discussion with other PMCs, committers, and contributors. It is not a
> set of enforced rules, but a general list of tips, links to Flink
> utilities, common patterns etc.
>
> https://cwiki.apache.org/confluence/display/FLINK/Best+
> Practices+and+Lessons+Learned
>
> Since the project only has a minimum set of enforced or automatically
> checked rules, the goal of the document is to help both new and old
> contributors alike with some guidelines. In the future we might
> consider translating some of the listed items to automatic checks or
> IDE setup templates.
>
> The document is in progress. Feel free to comment, add or remove items
> while browsing or working on the Flink code base.
>
> – Ufuk
>


[jira] [Created] (FLINK-5510) Replace Scala Future with FlinkFuture in QueryableStateClient

2017-01-16 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5510:
--

 Summary: Replace Scala Future with FlinkFuture in 
QueryableStateClient
 Key: FLINK-5510
 URL: https://issues.apache.org/jira/browse/FLINK-5510
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Reporter: Ufuk Celebi
Priority: Minor


The entry point for queryable state users is the {{QueryableStateClient}} which 
returns query results via Scala Futures. Since merging the initial version of 
QueryableState we have introduced the FlinkFuture wrapper type in order to not 
expose our Scala dependency via the API.

Since APIs tend to stick around longer than expected, it might be worthwhile to 
port the exposed QueryableStateClient interface to use the FlinkFuture. Early 
users can still get the Scala Future via FlinkFuture#getScalaFuture().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5511) Add support for outer joins with local predicates

2017-01-16 Thread lincoln.lee (JIRA)
lincoln.lee created FLINK-5511:
--

 Summary: Add support for outer joins with local predicates
 Key: FLINK-5511
 URL: https://issues.apache.org/jira/browse/FLINK-5511
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: lincoln.lee
Assignee: lincoln.lee
Priority: Minor


currently the test case in 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
will throw a ValidationException indicating: “Invalid non-join predicate 'b < 
3. For non-join predicates use Table#where.” 
{code:title=JoinITCase.scala} 
@Test(expected = classOf[ValidationException]) 
def testNoJoinCondition(): Unit = { 
 … 
 val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 
'c) 
 val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 
'f, 'g, 'h) 

 val joinT = ds2.leftOuterJoin(ds1, 'b === 'd && 'b < 3).select('c, 'g) 
} 
{code} 
This jira aims to supported this kind of local predicates in outer joins. 

More detailed description: http://goo.gl/gK6vP3 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5512) RabbitMQ documentation should inform that exactly-once holds for RMQSource only when parallelism is 1

2017-01-16 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-5512:
--

 Summary: RabbitMQ documentation should inform that exactly-once 
holds for RMQSource only when parallelism is 1  
 Key: FLINK-5512
 URL: https://issues.apache.org/jira/browse/FLINK-5512
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai
 Fix For: 1.2.0


See here for the reasoning: FLINK-2624. We should add an informative warning 
about this limitation in the docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5513) Remove relocation of internal API classes

2017-01-16 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-5513:


 Summary: Remove relocation of internal API classes
 Key: FLINK-5513
 URL: https://issues.apache.org/jira/browse/FLINK-5513
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.2.0, 1.3.0
Reporter: Till Rohrmann
 Fix For: 1.3.0


Currently, we are relocating the {{curator}} dependency in order to avoid 
conflicts with user code classes. This happens for example in the 
{{flink-runtime}} module. The problem with that is that {{curator}} classes, 
such as the {{CuratorFramework}}, are part of Flink's internal API. So for 
example, the {{ZooKeeperStateHandleStore}} requires a {{CuratorFramework}} as 
argument in order to instantiate it. By relocating {{curator}} it is no longer 
possible to use this class outside of {{flink-runtime}} without some nasty 
tricks (see {{flink-mesos}} for that).

I think it is not good practice to relocate internal API classes because it 
hinders easy code reuse. I propose to either introduce our own interfaces which 
abstract the {{CuratorFramework}} away or (imo the better solution) to get rid 
of the {{Curator}} relocation. The latter might entail that we properly 
separate the API modules from the runtime modules so that users don't have to 
pull in the runtime dependencies if they want to develop a Flink job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Flink Code Base Best Practices

2017-01-16 Thread Shaoxuan Wang
Ufuk,
Thanks a lot for noting down all those "good to know" and sharing them to
the community.

On Mon, Jan 16, 2017 at 7:05 PM, Ufuk Celebi  wrote:

> Hey all,
>
> I've just created a Wiki page with a loose collection of some coding
> "best practices" for the Flink project. The list was the result of a
> discussion with other PMCs, committers, and contributors. It is not a
> set of enforced rules, but a general list of tips, links to Flink
> utilities, common patterns etc.
>
> https://cwiki.apache.org/confluence/display/FLINK/Best+
> Practices+and+Lessons+Learned
>
> Since the project only has a minimum set of enforced or automatically
> checked rules, the goal of the document is to help both new and old
> contributors alike with some guidelines. In the future we might
> consider translating some of the listed items to automatic checks or
> IDE setup templates.
>
> The document is in progress. Feel free to comment, add or remove items
> while browsing or working on the Flink code base.
>
> – Ufuk
>


[jira] [Created] (FLINK-5514) Implement an efficient physical execution for CUBE/ROLLUP/GROUPING SETS

2017-01-16 Thread Timo Walther (JIRA)
Timo Walther created FLINK-5514:
---

 Summary: Implement an efficient physical execution for 
CUBE/ROLLUP/GROUPING SETS
 Key: FLINK-5514
 URL: https://issues.apache.org/jira/browse/FLINK-5514
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Timo Walther


A first support for GROUPING SETS has been added in FLINK-5303. However, the 
current runtime implementation is not very efficient as it basically only 
translates logical operators to physical operators i.e. grouping sets are 
currently only translated into multiple groupings that are unioned together. A 
rough design document for this has been created in FLINK-2980.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: [DISCUSS] (Not) tagging reviewers

2017-01-16 Thread Anton Solovev
Hi, Alexey

I will check abandoned PRs to reduce obviously outdated ones and add them to a 
cleanup list https://issues.apache.org/jira/browse/FLINK-5384 


-Original Message-
From: Alexey Demin [mailto:diomi...@gmail.com] 
Sent: Monday, January 16, 2017 5:05 PM
To: dev@flink.apache.org
Subject: Re: [DISCUSS] (Not) tagging reviewers

Hi all

View from my prospective:
in middle of summer - 150 PR
in middle of autumn - 180
now 206.

This is mix of bugfixes and improvements.
I understand that work on new features important, but when small and trivial 
fixes stay in states of PR more then 2-3 month, then all users think about 
changing engine on other product.

Only way push people to merge this fixes in master it's tags.

I don't speak about big changes, only about small and trivial with review less 
then 5 min.

Features important, but if this features work incorrect, then user can select 
more stability product without any hesitation.

Thanks
Alexey Diomin



2017-01-16 16:36 GMT+04:00 Ufuk Celebi :

> On 16 January 2017 at 12:59:04, Paris Carbone (par...@kth.se) wrote:
> > > Though, when someone has started reviewing a PR and shows interest
> > it probably makes sense to finish doing so. Wouldn’t tagging be 
> > acceptable there?
> > In those case tagging triggers direct notifications, so that people 
> > already involved in a conversation get reminded and answer pending 
> > questions.
>
> I think that's totally fine Paris since it is more of a reminder in 
> that case.
>
> Stephan is referring to PRs that have a last line in the description 
> like "@XZY for review please".
>
> – Ufuk
>
>
>


[jira] [Created] (FLINK-5515) fix unused kvState.getSerializedValue call in KvStateServerHandler

2017-01-16 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5515:
--

 Summary: fix unused kvState.getSerializedValue call in 
KvStateServerHandler
 Key: FLINK-5515
 URL: https://issues.apache.org/jira/browse/FLINK-5515
 Project: Flink
  Issue Type: Improvement
Reporter: Nico Kruber


This was added in 4809f5367b08a9734fc1bd4875be51a9f3bb65aa and is probably a 
left-over from a merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5516) Hardcoded paths in flink-python

2017-01-16 Thread Felix seibert (JIRA)
Felix seibert created FLINK-5516:


 Summary: Hardcoded paths in flink-python
 Key: FLINK-5516
 URL: https://issues.apache.org/jira/browse/FLINK-5516
 Project: Flink
  Issue Type: Improvement
Reporter: Felix seibert


The PythonPlanBinder.java contains two hardcoded filesystem paths:

{code:java}
public static final String FLINK_PYTHON_FILE_PATH = 
System.getProperty("java.io.tmpdir") + File.separator + "flink_plan";

private static String FLINK_HDFS_PATH = "hdfs:/tmp";
public static final String FLINK_TMP_DATA_DIR = 
System.getProperty("java.io.tmpdir") + File.separator + "flink_data";
{code}

{noformat}FLINK_PYTHON_FILE_PATH{noformat} and 
{noformat}FLINK_TMP_DATA_DIR{noformat} are configurable by modifying 
{noformat}java.io.tmpdir{noformat}.
For {noformat}FLINK_HDFS_PATH{noformat}, there is no way of configuring 
otherwise but modifying the source. 

Is it possible to make all three parameters configurable in the usual flink 
configuration files (like flink-conf.yaml)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Flink CEP development is stalling

2017-01-16 Thread Ivan Mushketyk
Hi Stephan!

Thank you for your answer.

I appreciate your efforts and I know how busy you and other Flink
committers are.
I am looking forward for the upcoming discussion of Flink modules!

Best regards,
Ivan.

On Mon, 16 Jan 2017 at 11:32 Stephan Ewen  wrote:

> Hi Ivan!
>
> Thank you for bringing this thread up. I agree, we need to do something
> about how some modules are currently handled.
>
> The CEP library definitely needs more active committers. Adding new
> committers will be necessary, I think, but as you mentioned, it needs at
> least one (better more) experienced committers that can help the new
> committers to get into the process and the technical matter. Otherwise we
> cannot keep up a good quality.
>
> How important the involvement of experienced committers is even for
> something that seems more or less self-contained (like the CEP library),
> has already been visible in some previous pull requests to the CEP library
> - those were not compatible with the overall design or with the strategy
> for making streaming applications rescalable. It is hard for new committers
> to be aware of all that, hence the need for some experienced committers to
> help.
>
> Currently, the community is pushing hard on the 1.2 release: testing, docs,
> fixes, usability. I expect that to take not too much longer.
> After that, we will kick off some threads discussing about community and
> project structure. That should involve how to deal with projects like the
> CEP library, and also with the sheer size of the project and code base in
> general.
>
> Greetings,
> Stephan
>
>
> On Sun, Jan 15, 2017 at 11:28 PM, Ivan Mushketyk  >
> wrote:
>
> > Hi Alexey,
> >
> > I agree with you. Most contributors are overloaded, but PRs for other
> > sub-projects are reviewed much faster. In my experience, in most cases,
> you
> > can get a first review for a PR in less than a week and it's usually
> merged
> > within a month or less.
> > Flink CEP is a notable exception. I believe the main reason for that is
> > that there is only one core committer who currently can review Flink CEP
> > PRs (Till) and he is very busy with other work.
> >
> > Best regards,
> > Ivan.
> >
> > On Sun, 15 Jan 2017 at 19:22 Alexey Demin  wrote:
> >
> > > Hi Ivan,
> > >
> > > I think problem not only with CEP project.
> > > Main contributors overloaded and simple fixes frequently are staying as
> > PR
> > > without merge.
> > >
> > > You can see how amount of open PR increasing over time.
> > >
> > > Thanks,
> > > Alexey Diomin
> > >
> > >
> > > 2017-01-15 17:18 GMT+04:00 Ivan Mushketyk :
> > >
> > > > Hi Dmitry,
> > > >
> > > > Your contributions are welcomed, but right now the most critical
> issue
> > is
> > > > that CEP project does not have an experienced Flink contributor who
> can
> > > > review and approve new pull requests.
> > > >
> > > > I hope that Flink community will promptly resolve the issue, so feel
> > free
> > > > to take select a JIRA issue and work on it.
> > > >
> > > > Best regards,
> > > > Ivan.
> > > >
> > > > On Sat, 14 Jan 2017 at 12:29 Dmitry Vorobiov <
> 2belikespr...@gmail.com>
> > > > wrote:
> > > >
> > > > I would be interested to contribute to CEP. I am following Flink
> > project,
> > > > but haven't contributed yet. Next 2 weeks I am a bit busy with my
> work,
> > > but
> > > > then I llbe happy to dig into it. I used to work in IoT so event
> > > processing
> > > > is a close topic for me.
> > > >
> > > > Dmitry.
> > > > On Fri, 13 Jan 2017 at 14:58, Ivan Mushketyk <
> ivan.mushke...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Till,
> > > > >
> > > > > Thank you for your reply.
> > > > >
> > > > > I wonder if the following will work.
> > > > > What if you can find a Flink committer/committers that will review
> > and
> > > > > iterate on CEP PRs before you review them. They don't need to know
> > all
> > > > CEP
> > > > > internals, but they will help to eradicate most of the issues.
> > > > > Then you will have to review PRs only when most of the issues are
> > fixed
> > > > and
> > > > > to make a final decision about whether to merge a PR or not. In
> this
> > > > case,
> > > > > you probably won't need to spend much time on reviewing CEP PRs. As
> > an
> > > > > additional bonus, after some time these new CEP reviewers will
> learn
> > > > enough
> > > > > about CEP to review them by themselves without your input.
> > > > >
> > > > > What do you think about this?
> > > > >
> > > > > Best regards,
> > > > > Ivan.
> > > > >
> > > > > On Fri, 13 Jan 2017 at 11:28 Till Rohrmann 
> > > wrote:
> > > > >
> > > > > Hi Ivan,
> > > > >
> > > > > first of all let me apologise for the bad experience you've had
> with
> > > > > opening CEP PRs in the past.
> > > > >
> > > > > The general problem as you've said is that there is nobody who
> > reviews
> > > > the
> > > > > open PRs. I used to do this in the but at the moment I hardly find
> > time
> > > > due
> > > > > to other commitments.
> > > > >
> > > > > I thin

[jira] [Created] (FLINK-5517) Upgrade hbase version to 1.3.0

2017-01-16 Thread Ted Yu (JIRA)
Ted Yu created FLINK-5517:
-

 Summary: Upgrade hbase version to 1.3.0
 Key: FLINK-5517
 URL: https://issues.apache.org/jira/browse/FLINK-5517
 Project: Flink
  Issue Type: Improvement
Reporter: Ted Yu


In the thread 'Help using HBase with Flink 1.1.4', Giuliano reported seeing:
{code}
java.lang.IllegalAccessError: tried to access method 
com.google.common.base.Stopwatch.()V from class 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator
{code}
The above has been solved by HBASE-14963

hbase 1.3.0 is being released.

We should upgrade hbase dependency to 1.3.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[Dev] Dependencies issue related to implementing InputFormat Interface

2017-01-16 Thread Pawan Manishka Gunarathna
Hi,

I'm currently working on Flink InputFormat Interface implementation. I'm
writing a java program to read data from a file using InputputFormat
Interface. I used maven project and I have added following dependencies to
the pom.xml.



org.apache.flink
flink-core
1.1.4



org.apache.flink
flink-clients_2.11
1.1.4



org.apache.flink
flink-java
1.1.4





I have a java class that implements InputFormat. It works with *InputFormat.
*But it didn't allow to used *InputFormat. *That
OT field didn't recognized.

I need a any kind of help to solve this problem.

Thanks,
Pawan

-- 

*Pawan Gunaratne*
*Mob: +94 770373556*


States split over to external storage

2017-01-16 Thread Chen Qin
Hi there,

I would like to discuss split over local states to external storage. The
use case is NOT another external state backend like HDFS, rather just to
expand beyond what local disk/ memory can hold when large key space exceeds
what task managers could handle. Realizing FLINK-4266 might be hard to
tacking all-in-one, I would live give a shot to split-over first.

An intuitive approach would be treat HeapStatebackend as LRU cache and
split over to external key/value storage when threshold triggered. To make
this happen, we need minor refactor to runtime and adding set/get logic.
One nice thing of keeping HDFS to store snapshots would be avoid versioning
conflicts. Once checkpoint restore happens, partial write data will be
overwritten with previously checkpointed value.

Comments?

-- 
-Chen Qin


[jira] [Created] (FLINK-5518) HadoopInputFormat throws NPE when close() is called before open()

2017-01-16 Thread Jakub Havlik (JIRA)
Jakub Havlik created FLINK-5518:
---

 Summary: HadoopInputFormat throws NPE when close() is called 
before open()
 Key: FLINK-5518
 URL: https://issues.apache.org/jira/browse/FLINK-5518
 Project: Flink
  Issue Type: Bug
  Components: Batch Connectors and Input/Output Formats
Affects Versions: 1.1.4
Reporter: Jakub Havlik


When developing a simple Flink applications reading ORC files it crashes with 
NullPointerException when number of instances/executor threads is higher then 
the number of files because it is trying to close a HadoopInputFormat which is 
trying to close RecordReader which was not yet initialized as there is no file 
for which it should have been opened. The issue is caused when
{code:java}
public void run(SourceContext ctx) throws Exception {
try {
...
while (isRunning) {
format.open(splitIterator.next());
...
} finally {
format.close();
...
}
{code}
in file {{InputFormatSourceFunction.java}} which calls
{code:java}
public void close() throws IOException {

// enforce sequential close() calls
synchronized (CLOSE_MUTEX) {
this.recordReader.close();
}
}
{code}
from {{HadoopInputFormatBase.java}}.

As there is just this one implementation of the {{close()}} method it may be 
enough just to add a null check for the {{this.recordReader}} in there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


答复: States split over to external storage

2017-01-16 Thread liuxinchun
Dear Chen Qin:
I am liuxinchun, and email is liuxinc...@huawei.com ( the email address in the 
"Copy To" is wrong). I have leave a message in FLINK-4266 using name SyinChwun 
Leo. We meet the similar problem in the applications. I hope we can develop 
this feature together. The following is my opinion:

(1) The organization form of current sliding window(SlidingProcessingTimeWindow 
and SlidingEventTimeWindow) have a drawback: When using ListState, a element 
may be kept in multiple windows (size / slide). It's time consuming and waste 
storage when checkpointing.
  Opinion: I think this is a optimal point. Elements can be organized according 
to the key and split(maybe also can called as pane). When triggering cleanup, 
only the oldest split(pane) can be cleanup. 
(2) Incremental backup strategy. In original idea, we plan to only backup the 
new coming element, and that means a whole window may span several checkpoints, 
and we have develop this idea in our private SPS. But in Flink, the window may 
not keep raw data(for example, ReducingState and FoldingState). The idea of 
Chen Qin maybe a candidate strategy. We can keep in touch and exchange our 
respective strategy.
-邮件原件-
发件人: Chen Qin [mailto:c...@uber.com] 
发送时间: 2017年1月17日 13:30
收件人: dev@flink.apache.org
抄送: iuxinc...@huawei.com; Aljoscha Krettek; shijinkui
主题: States split over to external storage

Hi there,

I would like to discuss split over local states to external storage. The use 
case is NOT another external state backend like HDFS, rather just to expand 
beyond what local disk/ memory can hold when large key space exceeds what task 
managers could handle. Realizing FLINK-4266 might be hard to tacking 
all-in-one, I would live give a shot to split-over first.

An intuitive approach would be treat HeapStatebackend as LRU cache and split 
over to external key/value storage when threshold triggered. To make this 
happen, we need minor refactor to runtime and adding set/get logic.
One nice thing of keeping HDFS to store snapshots would be avoid versioning 
conflicts. Once checkpoint restore happens, partial write data will be 
overwritten with previously checkpointed value.

Comments?

--
-Chen Qin


[jira] [Created] (FLINK-5519) scala-maven-plugin version all change to 3.2.2

2017-01-16 Thread shijinkui (JIRA)
shijinkui created FLINK-5519:


 Summary: scala-maven-plugin version all change to 3.2.2
 Key: FLINK-5519
 URL: https://issues.apache.org/jira/browse/FLINK-5519
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: shijinkui


1. scala-maven-plugin version all change to 3.2.2 in all module
2. parent pom version change to apache-18 from apache-14



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5520) Disable outer joins with non-equality predicates

2017-01-16 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-5520:


 Summary: Disable outer joins with non-equality predicates
 Key: FLINK-5520
 URL: https://issues.apache.org/jira/browse/FLINK-5520
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.2.0
Reporter: Fabian Hueske
Priority: Blocker
 Fix For: 1.2.0


Outer joins with non-equality predicates (and at least one equality predicate) 
compute incorrect results. 

Since this is not a very common requirement, I propose to disable this feature 
for the 1.2.0 release and correctly implement it for a later version.

The fix should add checks in the Table API validation phase (to get a good 
error message) and in the DataSetJoinRule to prevent translation of SQL queries 
with non-equality predicates on outer joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)