[jira] [Created] (FLINK-5500) Error when created array literals
Yuhong Hong created FLINK-5500: -- Summary: Error when created array literals Key: FLINK-5500 URL: https://issues.apache.org/jira/browse/FLINK-5500 Project: Flink Issue Type: Bug Components: Table API & SQL Affects Versions: 1.2.0 Reporter: Yuhong Hong It will report error when i create an array literals on TableAPI or SQL. TableAPI: dataStream.toTable(tEnv).select("array(1,2,3)") The complete stacktrace looks like: {code:java} Exception in thread "main" org.apache.flink.table.api.TableException: Result field does not match expected type. Expected: int[]; Actual: ObjectArrayTypeInfo at org.apache.flink.table.typeutils.TypeConverter$.determineReturnType(TypeConverter.scala:109) at org.apache.flink.table.plan.nodes.datastream.DataStreamCalc.translateToPlan(DataStreamCalc.scala:79) at org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:340) at org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:322) at org.apache.flink.table.api.scala.StreamTableEnvironment.toDataStream(StreamTableEnvironment.scala:142) at org.apache.flink.table.api.scala.TableConversions.toDataStream(TableConversions.scala:52) at com.huawei.example.flink.SimpleSql$.main(SimpleSql.scala:77) at com.huawei.example.flink.SimpleSql.main(SimpleSql.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) {code} SQL: tEnv.sql( "select ARRAY\[1,2,3\] FROM OrderA ") {code:java} Exception in thread "main" org.apache.flink.table.api.TableException: Type is not supported: ARRAY at org.apache.flink.table.api.TableException$.apply(exceptions.scala:51) at org.apache.flink.table.calcite.FlinkTypeFactory$.toTypeInfo(FlinkTypeFactory.scala:227) at org.apache.flink.table.plan.logical.LogicalRelNode$$anonfun$15.apply(operators.scala:516) at org.apache.flink.table.plan.logical.LogicalRelNode$$anonfun$15.apply(operators.scala:515) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.flink.table.plan.logical.LogicalRelNode.(operators.scala:515) at org.apache.flink.table.api.StreamTableEnvironment.sql(StreamTableEnvironment.scala:183) at com.huawei.example.flink.SimpleSql$.main(SimpleSql.scala:62) {code} The calcite parse the sql and will translate the array to type ArraySqlType, but the toTypeInfo in FlinkTypeFactory just support ArrayRelType. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Need guidance for write a client connector for 'Flink'
Hi Pawan, this sounds like you need to implement a custom InputFormat [1]. An InputFormat is basically executed in two phases. In the first phase it generates InputSplits. An InputSplit references a a chunk of data that needs to be read. Hence, InputSplits define how the input data is split to be read in parallel. In the second phase, multiple InputFormats are started and request InputSplits from an InputSplitProvider. Each instance of the InputFormat processes one InputSplit at a time. It is hard to give general advice on implementing InputFormats because this very much depends on the data source and data format to read from. I'd suggest to have a look at other InputFormats. Best, Fabian [1] https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/io/InputFormat.java 2017-01-16 6:18 GMT+01:00 Pawan Manishka Gunarathna < pawan.manis...@gmail.com>: > Hi, > > we have a data analytics server that has analytics data tables. So I need > to write a custom *Java* implementation for read data from that data source > and do processing (*batch* processing) using Apache Flink. Basically it's > like a new client connector for Flink. > > So It would be great if you can provide a guidance for my requirement. > > Thanks, > Pawan >
Re: Need guidance for write a client connector for 'Flink'
Hi Fabian, Thanks for providing those information. On Mon, Jan 16, 2017 at 2:36 PM, Fabian Hueske wrote: > Hi Pawan, > > this sounds like you need to implement a custom InputFormat [1]. > An InputFormat is basically executed in two phases. In the first phase it > generates InputSplits. An InputSplit references a a chunk of data that > needs to be read. Hence, InputSplits define how the input data is split to > be read in parallel. In the second phase, multiple InputFormats are started > and request InputSplits from an InputSplitProvider. Each instance of the > InputFormat processes one InputSplit at a time. > > It is hard to give general advice on implementing InputFormats because this > very much depends on the data source and data format to read from. > > I'd suggest to have a look at other InputFormats. > > Best, Fabian > > [1] > https://github.com/apache/flink/blob/master/flink-core/ > src/main/java/org/apache/flink/api/common/io/InputFormat.java > > > 2017-01-16 6:18 GMT+01:00 Pawan Manishka Gunarathna < > pawan.manis...@gmail.com>: > > > Hi, > > > > we have a data analytics server that has analytics data tables. So I need > > to write a custom *Java* implementation for read data from that data > source > > and do processing (*batch* processing) using Apache Flink. Basically it's > > like a new client connector for Flink. > > > > So It would be great if you can provide a guidance for my requirement. > > > > Thanks, > > Pawan > > > -- *Pawan Gunaratne* *Mob: +94 770373556*
[jira] [Created] (FLINK-5501) Determine whether the job starts from last JobManager failure
Zhijiang Wang created FLINK-5501: Summary: Determine whether the job starts from last JobManager failure Key: FLINK-5501 URL: https://issues.apache.org/jira/browse/FLINK-5501 Project: Flink Issue Type: Sub-task Components: JobManager Reporter: Zhijiang Wang When the {{JobManagerRunner}} grants leadership, it should check whether the current job is already running or not. If the job is running, the {{JobManager}} should reconcile itself (enter RECONCILING state) and waits for the {{TaskManager}} reporting task status. Otherwise the {{JobManger}} can schedule the {{ExecutionGraph}} in common way. The {{RunningJobsRegistry}} can provide the way to check the job running status, but we should expand the current interface and fix the related process to support this function. 1. {{RunningJobsRegistry}} sets RUNNING status after {{JobManagerRunner}} granting leadership at the first time. 2. If the job finishes, the job status will be set FINISHED by {{RunningJobsRegistry}} and the status will be deleted before exit. 3. If the {{JobManager}} fails, the job status will be still in RUNNING, so when the {{JobManagerRunner}} (the previous or new one) grants leadership again, it checks the job status and enters {{RECONCILING}} state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Taking time off
Hi Max, thank you for all your work! Enjoy your time off and hope to have you back with us soon ^^ Cheers, -Vasia. On 14 January 2017 at 09:03, Maximilian Michels wrote: > Dear Squirrels, > > Thank you! It's been very exciting to see the Flink community grow and > flourish over the past two years. > > For the beginning of this year, I decided to take some time off, which > means I'll be less engaged on the mailing list or on GitHub/JIRA. > > In the meantime, if you have any questions I might be able to answer, feel > free to contact me. Looking forward to see the squirrels rise further! > > Best, > Max >
[jira] [Created] (FLINK-5502) Add documentation about migrating from 1.1 to 1.2
Kostas Kloudas created FLINK-5502: - Summary: Add documentation about migrating from 1.1 to 1.2 Key: FLINK-5502 URL: https://issues.apache.org/jira/browse/FLINK-5502 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.2.0 Reporter: Kostas Kloudas Assignee: Kostas Kloudas Fix For: 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Flink Code Base Best Practices
Hey all, I've just created a Wiki page with a loose collection of some coding "best practices" for the Flink project. The list was the result of a discussion with other PMCs, committers, and contributors. It is not a set of enforced rules, but a general list of tips, links to Flink utilities, common patterns etc. https://cwiki.apache.org/confluence/display/FLINK/Best+Practices+and+Lessons+Learned Since the project only has a minimum set of enforced or automatically checked rules, the goal of the document is to help both new and old contributors alike with some guidelines. In the future we might consider translating some of the listed items to automatic checks or IDE setup templates. The document is in progress. Feel free to comment, add or remove items while browsing or working on the Flink code base. – Ufuk
Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)
A user reported that outer joins on the Table API and SQL compute wrong results: https://issues.apache.org/jira/browse/FLINK-5498 2017-01-15 20:23 GMT+01:00 Till Rohrmann : > I found two problematic issues with Mesos HA mode which breaks it: > > https://issues.apache.org/jira/browse/FLINK-5495 > https://issues.apache.org/jira/browse/FLINK-5496 > > On Fri, Jan 13, 2017 at 11:29 AM, Fabian Hueske wrote: > > > I tested the Table API / SQL a bit. > > > > I implemented a windowed aggregation with the streaming Table API and it > > produced the same results as a DataStream API implementation. > > Joining a stream with a TableFunction also seemed to work well. > > Moreover, I checked the results of a bunch of TPC-H queries (batch SQL) > > and all produced correct results. > > > > > > > > 2017-01-12 17:45 GMT+01:00 Till Rohrmann : > > > >> I'm wondering whether we should not depend the webserver encryption on > the > >> global encryption activation and activating it instead per default. > >> > >> On Thu, Jan 12, 2017 at 4:54 PM, Chesnay Schepler > >> wrote: > >> > >> > FLINK-5470 is a duplicate of FLINK-5298 for which there is also an > open > >> PR. > >> > > >> > FLINK-5472 is imo invalid since the webserver does support https, you > >> just > >> > have to enable it as per the security documentation. > >> > > >> > > >> > On 12.01.2017 16:20, Till Rohrmann wrote: > >> > > >> > I also found an issue: > >> > > >> > https://issues.apache.org/jira/browse/FLINK-5470 > >> > > >> > I also noticed that Flink's webserver does not support https requests. > >> It > >> > might be worthwhile to add it, though. > >> > > >> > https://issues.apache.org/jira/browse/FLINK-5472 > >> > > >> > On Thu, Jan 12, 2017 at 11:24 AM, Robert Metzger > > >> > wrote: > >> > > >> >> I also found a bunch of issues > >> >> > >> >> https://issues.apache.org/jira/browse/FLINK-5465 > >> >> https://issues.apache.org/jira/browse/FLINK-5462 > >> >> https://issues.apache.org/jira/browse/FLINK-5464 > >> >> https://issues.apache.org/jira/browse/FLINK-5463 > >> >> > >> >> > >> >> On Thu, Jan 12, 2017 at 9:56 AM, Fabian Hueske < > >> >> fhue...@gmail.com> wrote: > >> >> > >> >> > I have another bugfix for 1.2.: > >> >> > > >> >> > https://issues.apache.org/jira/browse/FLINK-2662 (pending PR) > >> >> > > >> >> > 2017-01-10 15:16 GMT+01:00 Robert Metzger < > >> >> rmetz...@apache.org>: > >> >> > > >> >> > > Hi, > >> >> > > > >> >> > > this depends a lot on the number of issues we find during the > >> testing. > >> >> > > > >> >> > > > >> >> > > These are the issues I found so far: > >> >> > > > >> >> > > https://issues.apache.org/jira/browse/FLINK-5379 (unresolved) > >> >> > > https://issues.apache.org/jira/browse/FLINK-5383 (resolved) > >> >> > > https://issues.apache.org/jira/browse/FLINK-5382 (resolved) > >> >> > > https://issues.apache.org/jira/browse/FLINK-5381 (resolved) > >> >> > > https://issues.apache.org/jira/browse/FLINK-5380 (pending PR) > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > On Tue, Jan 10, 2017 at 11:58 AM, shijinkui < > shijin...@huawei.com> > >> >> > wrote: > >> >> > > > >> >> > > > Do we have a probable time of 1.2 release? This month or Next > >> month? > >> >> > > > > >> >> > > > -邮件原件- > >> >> > > > 发件人: Robert Metzger [mailto: > >> >> rmetz...@apache.org] > >> >> > > > 发送时间: 2017年1月3日 20:44 > >> >> > > > 收件人: dev@flink.apache.org > >> >> > > > 抄送: u...@flink.apache.org > >> >> > > > 主题: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing > release > >> >> > > candidate) > >> >> > > > > >> >> > > > Hi, > >> >> > > > > >> >> > > > First of all, I wish everybody a happy new year 2017. > >> >> > > > > >> >> > > > I've set user@flink in CC so that users who are interested in > >> >> helping > >> >> > > > with the testing get notified. Please respond only to the dev@ > >> >> list to > >> >> > > > keep the discussion there! > >> >> > > > > >> >> > > > According to the 1.2 release discussion thread, I've created a > >> first > >> >> > > > release candidate for Flink 1.2. > >> >> > > > The release candidate will not be the final release, because > I'm > >> >> > certain > >> >> > > > that we'll find at least one blocking issue in the candidate :) > >> >> > > > > >> >> > > > Therefore, the RC is meant as a testing only release candidate. > >> >> > > > Please report every issue we need to fix before the next RC in > >> this > >> >> > > thread > >> >> > > > so that we have a good overview. > >> >> > > > > >> >> > > > The release artifacts are located here: > >> >> > > > http://people.apache.org/~rmetzger/flink-1.2.0-rc0/ > >> >> > > > > >> >> > > > The maven staging repository is located here: > >> >> > > > https://repository.apache.org/content/repositories/orgapache > >> >> flink- > >> >> > > > > >> >> > > > The release commit (in branch "release-1.2.0-rc0"): > >> >> > > > http://git-wip-us.apache.org/repos/asf/flink/commit/f3c59ced > >> >> > > > > >> >> > > > > >> >> > > > Happy testing! > >> >> > > > > >> >
[jira] [Created] (FLINK-5503) mesos-appmaster.sh script could print return value message
Till Rohrmann created FLINK-5503: Summary: mesos-appmaster.sh script could print return value message Key: FLINK-5503 URL: https://issues.apache.org/jira/browse/FLINK-5503 Project: Flink Issue Type: Improvement Components: Mesos Affects Versions: 1.2.0, 1.3.0 Reporter: Till Rohrmann Priority: Minor Fix For: 1.3.0 The {{mesos-appmaster.sh}} does not print an error message if the return value of {{MesosApplicationMasterRunner}} is nonzero. This could help the user to realize that something went wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5504) mesos-appmaster.sh logs to wrong directory
Till Rohrmann created FLINK-5504: Summary: mesos-appmaster.sh logs to wrong directory Key: FLINK-5504 URL: https://issues.apache.org/jira/browse/FLINK-5504 Project: Flink Issue Type: Bug Components: Mesos Affects Versions: 1.2.0, 1.3.0 Reporter: Till Rohrmann Priority: Minor Fix For: 1.3.0 The {{mesos-appmaster.sh}} script does not create the log file under {{FLINK_HOME/log}} and does not follow the naming convention. I think we should correct the behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5505) Rename recovery.zookeeper.path.mesos-workers into high-availability.zookeeper.path.mesos-workers
Till Rohrmann created FLINK-5505: Summary: Rename recovery.zookeeper.path.mesos-workers into high-availability.zookeeper.path.mesos-workers Key: FLINK-5505 URL: https://issues.apache.org/jira/browse/FLINK-5505 Project: Flink Issue Type: Improvement Components: Mesos Affects Versions: 1.2.0, 1.3.0 Reporter: Till Rohrmann Priority: Trivial Fix For: 1.3.0 In order to harmonize configuration parameter names I think we should rename {{recovery.zookeeper.path.mesos-workers}} into {{high-availability.zookeeper.path.mesos-workers}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Flink CEP development is stalling
Hi Ivan! Thank you for bringing this thread up. I agree, we need to do something about how some modules are currently handled. The CEP library definitely needs more active committers. Adding new committers will be necessary, I think, but as you mentioned, it needs at least one (better more) experienced committers that can help the new committers to get into the process and the technical matter. Otherwise we cannot keep up a good quality. How important the involvement of experienced committers is even for something that seems more or less self-contained (like the CEP library), has already been visible in some previous pull requests to the CEP library - those were not compatible with the overall design or with the strategy for making streaming applications rescalable. It is hard for new committers to be aware of all that, hence the need for some experienced committers to help. Currently, the community is pushing hard on the 1.2 release: testing, docs, fixes, usability. I expect that to take not too much longer. After that, we will kick off some threads discussing about community and project structure. That should involve how to deal with projects like the CEP library, and also with the sheer size of the project and code base in general. Greetings, Stephan On Sun, Jan 15, 2017 at 11:28 PM, Ivan Mushketyk wrote: > Hi Alexey, > > I agree with you. Most contributors are overloaded, but PRs for other > sub-projects are reviewed much faster. In my experience, in most cases, you > can get a first review for a PR in less than a week and it's usually merged > within a month or less. > Flink CEP is a notable exception. I believe the main reason for that is > that there is only one core committer who currently can review Flink CEP > PRs (Till) and he is very busy with other work. > > Best regards, > Ivan. > > On Sun, 15 Jan 2017 at 19:22 Alexey Demin wrote: > > > Hi Ivan, > > > > I think problem not only with CEP project. > > Main contributors overloaded and simple fixes frequently are staying as > PR > > without merge. > > > > You can see how amount of open PR increasing over time. > > > > Thanks, > > Alexey Diomin > > > > > > 2017-01-15 17:18 GMT+04:00 Ivan Mushketyk : > > > > > Hi Dmitry, > > > > > > Your contributions are welcomed, but right now the most critical issue > is > > > that CEP project does not have an experienced Flink contributor who can > > > review and approve new pull requests. > > > > > > I hope that Flink community will promptly resolve the issue, so feel > free > > > to take select a JIRA issue and work on it. > > > > > > Best regards, > > > Ivan. > > > > > > On Sat, 14 Jan 2017 at 12:29 Dmitry Vorobiov <2belikespr...@gmail.com> > > > wrote: > > > > > > I would be interested to contribute to CEP. I am following Flink > project, > > > but haven't contributed yet. Next 2 weeks I am a bit busy with my work, > > but > > > then I llbe happy to dig into it. I used to work in IoT so event > > processing > > > is a close topic for me. > > > > > > Dmitry. > > > On Fri, 13 Jan 2017 at 14:58, Ivan Mushketyk > > > > wrote: > > > > > > > Hi Till, > > > > > > > > Thank you for your reply. > > > > > > > > I wonder if the following will work. > > > > What if you can find a Flink committer/committers that will review > and > > > > iterate on CEP PRs before you review them. They don't need to know > all > > > CEP > > > > internals, but they will help to eradicate most of the issues. > > > > Then you will have to review PRs only when most of the issues are > fixed > > > and > > > > to make a final decision about whether to merge a PR or not. In this > > > case, > > > > you probably won't need to spend much time on reviewing CEP PRs. As > an > > > > additional bonus, after some time these new CEP reviewers will learn > > > enough > > > > about CEP to review them by themselves without your input. > > > > > > > > What do you think about this? > > > > > > > > Best regards, > > > > Ivan. > > > > > > > > On Fri, 13 Jan 2017 at 11:28 Till Rohrmann > > wrote: > > > > > > > > Hi Ivan, > > > > > > > > first of all let me apologise for the bad experience you've had with > > > > opening CEP PRs in the past. > > > > > > > > The general problem as you've said is that there is nobody who > reviews > > > the > > > > open PRs. I used to do this in the but at the moment I hardly find > time > > > due > > > > to other commitments. > > > > > > > > I think the way to mitigate the problem is to attract more > contributors > > > and > > > > committers who are willing to spend time on PR reviews and finally > > (this > > > > applies only to committers) to commit the PRs. I can try to reach out > > to > > > > other committers to make them aware of the CEP library. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Thu, Jan 12, 2017 at 9:15 AM, > > > > > wrote: > > > > > > > > > +1 > > > > > > > > > > I have some clientes interested in CEP features > > > > > > > > > > El 11/1/17 16:23, "Ivan Mushketyk" > > > escribió: > >
[DISCUSS] (Not) tagging reviewers
Hi! I have seen that recently many pull requests designate reviews by writing "@personA review please" or so. I am personally quite strongly against that, I think it hurts the community work: - The same few people get usually "designated" and will typically get overloaded and often not do the review. - At the same time, this discourages other community members from looking at the pull request, which is totally undesirable. - In general, review participation should be "pull based" (person decides what they want to work on) not "push based" (random person pushes work to another person). Push-based just creates the wrong feeling in a community of volunteers. - In many cases the designated reviews are not the ones most knowledgeable in the code, which is understandable, because how should contributors know whom to tag? Long story short, why don't we just drop that habit? Greetings, Stephan
Re: [DISCUSS] (Not) tagging reviewers
Thanks for bringing this up Stephan. I completely agree with you. Cheers, Fabian 2017-01-16 12:42 GMT+01:00 Stephan Ewen : > Hi! > > I have seen that recently many pull requests designate reviews by writing > "@personA review please" or so. > > I am personally quite strongly against that, I think it hurts the community > work: > > - The same few people get usually "designated" and will typically get > overloaded and often not do the review. > > - At the same time, this discourages other community members from looking > at the pull request, which is totally undesirable. > > - In general, review participation should be "pull based" (person decides > what they want to work on) not "push based" (random person pushes work to > another person). Push-based just creates the wrong feeling in a community > of volunteers. > > - In many cases the designated reviews are not the ones most > knowledgeable in the code, which is understandable, because how should > contributors know whom to tag? > > > Long story short, why don't we just drop that habit? > > > Greetings, > Stephan >
[jira] [Created] (FLINK-5506) Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
Miguel Carvalho Valente Esaguy Coimbra created FLINK-5506: - Summary: Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException Key: FLINK-5506 URL: https://issues.apache.org/jira/browse/FLINK-5506 Project: Flink Issue Type: Bug Components: Gelly Affects Versions: 1.1.4 Reporter: Miguel Carvalho Valente Esaguy Coimbra Reporting this here as per Vasia's advice. I am having the following problem while trying out the org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly API (Java). Specs: JDK 1.8.0_102 x64 Apache Flink: 1.1.4 Suppose I have a very small (I tried an example with 38 vertices as well) dataset stored in a tab-separated file 3-vertex.tsv: #id1 id2 score 010 020 030 This is just a central vertex with 3 neighbors (disconnected between themselves). I am loading the dataset and executing the algorithm with the following code: --- {{// Load the data from the .tsv file. final DataSet> edgeTuples = env.readCsvFile(inputPath) .fieldDelimiter("\t") // node IDs are separated by spaces .ignoreComments("#") // comments start with "%" .types(Long.class, Long.class, Double.class); // Generate a graph and add reverse edges (undirected). final Graph graph = Graph.fromTupleDataSet(edgeTuples, new MapFunction() { private static final long serialVersionUID = 8713516577419451509L; public Long map(Long value) { return value; } }, env).getUndirected(); // CommunityDetection parameters. final double hopAttenuationDelta = 0.5d; final int iterationCount = 10; // Prepare and trigger the execution. DataSet> vs = graph.run(new org.apache.flink.graph.library.CommunityDetection(iterationCount, hopAttenuationDelta)).getVertices(); vs.print();}} --- Running this code throws the following exception (check the bold line): {{org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.NullPointerException at org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158) at org.apache.flink.graph.spargel.ScatterGatherIteration$GatherUdfSimpleVV.coGroup(ScatterGatherIteration.java:389) at org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218) at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486) at org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:146) at org.apache.flink.runtime.iterative.task.IterationTailTask.run(IterationTailTask.java:107) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:642) at java.lang.Thread.run(Thread.java:745)}} After a further look, I set a breakpoint (Eclipse IDE debugging) at the line in bold: {{org.apache.flink.graph.library.CommunityDetection.java (source code accessed automatically by Maven) // find the highest score of maxScoreLabel double highestScore = labelsWithHighestScore.get(maxScoreLabel);}} - maxScoreLabel has the value 3. - labelsWithHighestScore was initialized as: Map labelsWithHighestScore = new TreeMap<>(); - labelsWithHighestScore is a TreeMap and has the values: {0=0.0} null null [0=0.0] null 1 It seems that the value 3 should have been added to that labelsWithHighestScore some time during execution, but because it wasn't, an exception is thrown. I
Re: [DISCUSS] (Not) tagging reviewers
I also agree with all the points, especially when it comes to new PRs. Though, when someone has started reviewing a PR and shows interest it probably makes sense to finish doing so. Wouldn’t tagging be acceptable there? In those case tagging triggers direct notifications, so that people already involved in a conversation get reminded and answer pending questions. > On 16 Jan 2017, at 12:45, Fabian Hueske wrote: > > Thanks for bringing this up Stephan. > I completely agree with you. > > Cheers, Fabian > > 2017-01-16 12:42 GMT+01:00 Stephan Ewen : > >> Hi! >> >> I have seen that recently many pull requests designate reviews by writing >> "@personA review please" or so. >> >> I am personally quite strongly against that, I think it hurts the community >> work: >> >> - The same few people get usually "designated" and will typically get >> overloaded and often not do the review. >> >> - At the same time, this discourages other community members from looking >> at the pull request, which is totally undesirable. >> >> - In general, review participation should be "pull based" (person decides >> what they want to work on) not "push based" (random person pushes work to >> another person). Push-based just creates the wrong feeling in a community >> of volunteers. >> >> - In many cases the designated reviews are not the ones most >> knowledgeable in the code, which is understandable, because how should >> contributors know whom to tag? >> >> >> Long story short, why don't we just drop that habit? >> >> >> Greetings, >> Stephan >>
[jira] [Created] (FLINK-5507) remove queryable list state sink
Nico Kruber created FLINK-5507: -- Summary: remove queryable list state sink Key: FLINK-5507 URL: https://issues.apache.org/jira/browse/FLINK-5507 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Affects Versions: 1.2.0 Reporter: Nico Kruber Assignee: Nico Kruber The queryable state "sink" using ListState (".asQueryableState(, ListStateDescriptor)") stores all incoming data forever and is never cleaned. Eventually, it will pile up too much memory and is thus of limited use. We should remove it from the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] (Not) tagging reviewers
On 16 January 2017 at 12:59:04, Paris Carbone (par...@kth.se) wrote: > > Though, when someone has started reviewing a PR and shows interest > it probably makes sense to finish doing so. Wouldn’t tagging > be acceptable there? > In those case tagging triggers direct notifications, so that > people already involved in a conversation get reminded and answer > pending questions. I think that's totally fine Paris since it is more of a reminder in that case. Stephan is referring to PRs that have a last line in the description like "@XZY for review please". – Ufuk
Re: [DISCUSS] (Not) tagging reviewers
Hi all View from my prospective: in middle of summer - 150 PR in middle of autumn - 180 now 206. This is mix of bugfixes and improvements. I understand that work on new features important, but when small and trivial fixes stay in states of PR more then 2-3 month, then all users think about changing engine on other product. Only way push people to merge this fixes in master it's tags. I don't speak about big changes, only about small and trivial with review less then 5 min. Features important, but if this features work incorrect, then user can select more stability product without any hesitation. Thanks Alexey Diomin 2017-01-16 16:36 GMT+04:00 Ufuk Celebi : > On 16 January 2017 at 12:59:04, Paris Carbone (par...@kth.se) wrote: > > > Though, when someone has started reviewing a PR and shows interest > > it probably makes sense to finish doing so. Wouldn’t tagging > > be acceptable there? > > In those case tagging triggers direct notifications, so that > > people already involved in a conversation get reminded and answer > > pending questions. > > I think that's totally fine Paris since it is more of a reminder in that > case. > > Stephan is referring to PRs that have a last line in the description like > "@XZY for review please". > > – Ufuk > > >
[jira] [Created] (FLINK-5508) Remove Mesos dynamic class loading
Till Rohrmann created FLINK-5508: Summary: Remove Mesos dynamic class loading Key: FLINK-5508 URL: https://issues.apache.org/jira/browse/FLINK-5508 Project: Flink Issue Type: Improvement Components: Mesos Affects Versions: 1.2.0, 1.3.0 Reporter: Till Rohrmann Assignee: Till Rohrmann Priority: Minor Fix For: 1.2.0, 1.3.0 Mesos uses dynamic class loading in order to load the {{ZooKeeperStateHandleStore}} and the {{CuratorFramework}} class. This can be replaced by a compile time dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] (Not) tagging reviewers
Thanks for the comments. @Paris - Ufuk has it right, tagging as a reminder (or just because it helps with referring to the comment from a specific reviewer) makes total sense to me, I would keep doing that. On Mon, Jan 16, 2017 at 1:36 PM, Ufuk Celebi wrote: > On 16 January 2017 at 12:59:04, Paris Carbone (par...@kth.se) wrote: > > > Though, when someone has started reviewing a PR and shows interest > > it probably makes sense to finish doing so. Wouldn’t tagging > > be acceptable there? > > In those case tagging triggers direct notifications, so that > > people already involved in a conversation get reminded and answer > > pending questions. > > I think that's totally fine Paris since it is more of a reminder in that > case. > > Stephan is referring to PRs that have a last line in the description like > "@XZY for review please". > > – Ufuk > > >
[jira] [Created] (FLINK-5509) Replace QueryableStateClient keyHashCode argument
Ufuk Celebi created FLINK-5509: -- Summary: Replace QueryableStateClient keyHashCode argument Key: FLINK-5509 URL: https://issues.apache.org/jira/browse/FLINK-5509 Project: Flink Issue Type: Improvement Components: Queryable State Reporter: Ufuk Celebi Priority: Minor When going over the low level QueryableStateClient with [~NicoK] we noticed that the key hashCode argument can be confusing to users: {code} Future getKvState( JobID jobId, String name, int keyHashCode, byte[] serializedKeyAndNamespace) {code} The {{keyHashCode}} argument is the result of calling {{hashCode()}} on the key to look up. This is what is send to the JobManager in order to look up the location of the key. While pretty straight forward, it is repetitive and possibly confusing. As an alternative we suggest to make the method generic and simply call hashCode on the object ourselves. This way the user just provides the key object. Since there are some early users of the queryable state API already, we would suggest to rename the method in order to provoke a compilation error after upgrading to the actually released 1.2 version. (This would also work without renaming since the hashCode of Integer (what users currently provide) is the same number, but it would be confusing why it acutally works.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Flink Code Base Best Practices
Very nice, thanks Ufuk! On Mon, Jan 16, 2017 at 12:05 PM, Ufuk Celebi wrote: > Hey all, > > I've just created a Wiki page with a loose collection of some coding > "best practices" for the Flink project. The list was the result of a > discussion with other PMCs, committers, and contributors. It is not a > set of enforced rules, but a general list of tips, links to Flink > utilities, common patterns etc. > > https://cwiki.apache.org/confluence/display/FLINK/Best+ > Practices+and+Lessons+Learned > > Since the project only has a minimum set of enforced or automatically > checked rules, the goal of the document is to help both new and old > contributors alike with some guidelines. In the future we might > consider translating some of the listed items to automatic checks or > IDE setup templates. > > The document is in progress. Feel free to comment, add or remove items > while browsing or working on the Flink code base. > > – Ufuk >
[jira] [Created] (FLINK-5510) Replace Scala Future with FlinkFuture in QueryableStateClient
Ufuk Celebi created FLINK-5510: -- Summary: Replace Scala Future with FlinkFuture in QueryableStateClient Key: FLINK-5510 URL: https://issues.apache.org/jira/browse/FLINK-5510 Project: Flink Issue Type: Improvement Components: Queryable State Reporter: Ufuk Celebi Priority: Minor The entry point for queryable state users is the {{QueryableStateClient}} which returns query results via Scala Futures. Since merging the initial version of QueryableState we have introduced the FlinkFuture wrapper type in order to not expose our Scala dependency via the API. Since APIs tend to stick around longer than expected, it might be worthwhile to port the exposed QueryableStateClient interface to use the FlinkFuture. Early users can still get the Scala Future via FlinkFuture#getScalaFuture(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5511) Add support for outer joins with local predicates
lincoln.lee created FLINK-5511: -- Summary: Add support for outer joins with local predicates Key: FLINK-5511 URL: https://issues.apache.org/jira/browse/FLINK-5511 Project: Flink Issue Type: Improvement Components: Table API & SQL Reporter: lincoln.lee Assignee: lincoln.lee Priority: Minor currently the test case in flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala will throw a ValidationException indicating: “Invalid non-join predicate 'b < 3. For non-join predicates use Table#where.” {code:title=JoinITCase.scala} @Test(expected = classOf[ValidationException]) def testNoJoinCondition(): Unit = { … val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c) val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 'f, 'g, 'h) val joinT = ds2.leftOuterJoin(ds1, 'b === 'd && 'b < 3).select('c, 'g) } {code} This jira aims to supported this kind of local predicates in outer joins. More detailed description: http://goo.gl/gK6vP3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5512) RabbitMQ documentation should inform that exactly-once holds for RMQSource only when parallelism is 1
Tzu-Li (Gordon) Tai created FLINK-5512: -- Summary: RabbitMQ documentation should inform that exactly-once holds for RMQSource only when parallelism is 1 Key: FLINK-5512 URL: https://issues.apache.org/jira/browse/FLINK-5512 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Tzu-Li (Gordon) Tai Assignee: Tzu-Li (Gordon) Tai Fix For: 1.2.0 See here for the reasoning: FLINK-2624. We should add an informative warning about this limitation in the docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5513) Remove relocation of internal API classes
Till Rohrmann created FLINK-5513: Summary: Remove relocation of internal API classes Key: FLINK-5513 URL: https://issues.apache.org/jira/browse/FLINK-5513 Project: Flink Issue Type: Improvement Affects Versions: 1.2.0, 1.3.0 Reporter: Till Rohrmann Fix For: 1.3.0 Currently, we are relocating the {{curator}} dependency in order to avoid conflicts with user code classes. This happens for example in the {{flink-runtime}} module. The problem with that is that {{curator}} classes, such as the {{CuratorFramework}}, are part of Flink's internal API. So for example, the {{ZooKeeperStateHandleStore}} requires a {{CuratorFramework}} as argument in order to instantiate it. By relocating {{curator}} it is no longer possible to use this class outside of {{flink-runtime}} without some nasty tricks (see {{flink-mesos}} for that). I think it is not good practice to relocate internal API classes because it hinders easy code reuse. I propose to either introduce our own interfaces which abstract the {{CuratorFramework}} away or (imo the better solution) to get rid of the {{Curator}} relocation. The latter might entail that we properly separate the API modules from the runtime modules so that users don't have to pull in the runtime dependencies if they want to develop a Flink job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Flink Code Base Best Practices
Ufuk, Thanks a lot for noting down all those "good to know" and sharing them to the community. On Mon, Jan 16, 2017 at 7:05 PM, Ufuk Celebi wrote: > Hey all, > > I've just created a Wiki page with a loose collection of some coding > "best practices" for the Flink project. The list was the result of a > discussion with other PMCs, committers, and contributors. It is not a > set of enforced rules, but a general list of tips, links to Flink > utilities, common patterns etc. > > https://cwiki.apache.org/confluence/display/FLINK/Best+ > Practices+and+Lessons+Learned > > Since the project only has a minimum set of enforced or automatically > checked rules, the goal of the document is to help both new and old > contributors alike with some guidelines. In the future we might > consider translating some of the listed items to automatic checks or > IDE setup templates. > > The document is in progress. Feel free to comment, add or remove items > while browsing or working on the Flink code base. > > – Ufuk >
[jira] [Created] (FLINK-5514) Implement an efficient physical execution for CUBE/ROLLUP/GROUPING SETS
Timo Walther created FLINK-5514: --- Summary: Implement an efficient physical execution for CUBE/ROLLUP/GROUPING SETS Key: FLINK-5514 URL: https://issues.apache.org/jira/browse/FLINK-5514 Project: Flink Issue Type: Improvement Components: Table API & SQL Reporter: Timo Walther A first support for GROUPING SETS has been added in FLINK-5303. However, the current runtime implementation is not very efficient as it basically only translates logical operators to physical operators i.e. grouping sets are currently only translated into multiple groupings that are unioned together. A rough design document for this has been created in FLINK-2980. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: [DISCUSS] (Not) tagging reviewers
Hi, Alexey I will check abandoned PRs to reduce obviously outdated ones and add them to a cleanup list https://issues.apache.org/jira/browse/FLINK-5384 -Original Message- From: Alexey Demin [mailto:diomi...@gmail.com] Sent: Monday, January 16, 2017 5:05 PM To: dev@flink.apache.org Subject: Re: [DISCUSS] (Not) tagging reviewers Hi all View from my prospective: in middle of summer - 150 PR in middle of autumn - 180 now 206. This is mix of bugfixes and improvements. I understand that work on new features important, but when small and trivial fixes stay in states of PR more then 2-3 month, then all users think about changing engine on other product. Only way push people to merge this fixes in master it's tags. I don't speak about big changes, only about small and trivial with review less then 5 min. Features important, but if this features work incorrect, then user can select more stability product without any hesitation. Thanks Alexey Diomin 2017-01-16 16:36 GMT+04:00 Ufuk Celebi : > On 16 January 2017 at 12:59:04, Paris Carbone (par...@kth.se) wrote: > > > Though, when someone has started reviewing a PR and shows interest > > it probably makes sense to finish doing so. Wouldn’t tagging be > > acceptable there? > > In those case tagging triggers direct notifications, so that people > > already involved in a conversation get reminded and answer pending > > questions. > > I think that's totally fine Paris since it is more of a reminder in > that case. > > Stephan is referring to PRs that have a last line in the description > like "@XZY for review please". > > – Ufuk > > >
[jira] [Created] (FLINK-5515) fix unused kvState.getSerializedValue call in KvStateServerHandler
Nico Kruber created FLINK-5515: -- Summary: fix unused kvState.getSerializedValue call in KvStateServerHandler Key: FLINK-5515 URL: https://issues.apache.org/jira/browse/FLINK-5515 Project: Flink Issue Type: Improvement Reporter: Nico Kruber This was added in 4809f5367b08a9734fc1bd4875be51a9f3bb65aa and is probably a left-over from a merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5516) Hardcoded paths in flink-python
Felix seibert created FLINK-5516: Summary: Hardcoded paths in flink-python Key: FLINK-5516 URL: https://issues.apache.org/jira/browse/FLINK-5516 Project: Flink Issue Type: Improvement Reporter: Felix seibert The PythonPlanBinder.java contains two hardcoded filesystem paths: {code:java} public static final String FLINK_PYTHON_FILE_PATH = System.getProperty("java.io.tmpdir") + File.separator + "flink_plan"; private static String FLINK_HDFS_PATH = "hdfs:/tmp"; public static final String FLINK_TMP_DATA_DIR = System.getProperty("java.io.tmpdir") + File.separator + "flink_data"; {code} {noformat}FLINK_PYTHON_FILE_PATH{noformat} and {noformat}FLINK_TMP_DATA_DIR{noformat} are configurable by modifying {noformat}java.io.tmpdir{noformat}. For {noformat}FLINK_HDFS_PATH{noformat}, there is no way of configuring otherwise but modifying the source. Is it possible to make all three parameters configurable in the usual flink configuration files (like flink-conf.yaml)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Flink CEP development is stalling
Hi Stephan! Thank you for your answer. I appreciate your efforts and I know how busy you and other Flink committers are. I am looking forward for the upcoming discussion of Flink modules! Best regards, Ivan. On Mon, 16 Jan 2017 at 11:32 Stephan Ewen wrote: > Hi Ivan! > > Thank you for bringing this thread up. I agree, we need to do something > about how some modules are currently handled. > > The CEP library definitely needs more active committers. Adding new > committers will be necessary, I think, but as you mentioned, it needs at > least one (better more) experienced committers that can help the new > committers to get into the process and the technical matter. Otherwise we > cannot keep up a good quality. > > How important the involvement of experienced committers is even for > something that seems more or less self-contained (like the CEP library), > has already been visible in some previous pull requests to the CEP library > - those were not compatible with the overall design or with the strategy > for making streaming applications rescalable. It is hard for new committers > to be aware of all that, hence the need for some experienced committers to > help. > > Currently, the community is pushing hard on the 1.2 release: testing, docs, > fixes, usability. I expect that to take not too much longer. > After that, we will kick off some threads discussing about community and > project structure. That should involve how to deal with projects like the > CEP library, and also with the sheer size of the project and code base in > general. > > Greetings, > Stephan > > > On Sun, Jan 15, 2017 at 11:28 PM, Ivan Mushketyk > > wrote: > > > Hi Alexey, > > > > I agree with you. Most contributors are overloaded, but PRs for other > > sub-projects are reviewed much faster. In my experience, in most cases, > you > > can get a first review for a PR in less than a week and it's usually > merged > > within a month or less. > > Flink CEP is a notable exception. I believe the main reason for that is > > that there is only one core committer who currently can review Flink CEP > > PRs (Till) and he is very busy with other work. > > > > Best regards, > > Ivan. > > > > On Sun, 15 Jan 2017 at 19:22 Alexey Demin wrote: > > > > > Hi Ivan, > > > > > > I think problem not only with CEP project. > > > Main contributors overloaded and simple fixes frequently are staying as > > PR > > > without merge. > > > > > > You can see how amount of open PR increasing over time. > > > > > > Thanks, > > > Alexey Diomin > > > > > > > > > 2017-01-15 17:18 GMT+04:00 Ivan Mushketyk : > > > > > > > Hi Dmitry, > > > > > > > > Your contributions are welcomed, but right now the most critical > issue > > is > > > > that CEP project does not have an experienced Flink contributor who > can > > > > review and approve new pull requests. > > > > > > > > I hope that Flink community will promptly resolve the issue, so feel > > free > > > > to take select a JIRA issue and work on it. > > > > > > > > Best regards, > > > > Ivan. > > > > > > > > On Sat, 14 Jan 2017 at 12:29 Dmitry Vorobiov < > 2belikespr...@gmail.com> > > > > wrote: > > > > > > > > I would be interested to contribute to CEP. I am following Flink > > project, > > > > but haven't contributed yet. Next 2 weeks I am a bit busy with my > work, > > > but > > > > then I llbe happy to dig into it. I used to work in IoT so event > > > processing > > > > is a close topic for me. > > > > > > > > Dmitry. > > > > On Fri, 13 Jan 2017 at 14:58, Ivan Mushketyk < > ivan.mushke...@gmail.com > > > > > > > wrote: > > > > > > > > > Hi Till, > > > > > > > > > > Thank you for your reply. > > > > > > > > > > I wonder if the following will work. > > > > > What if you can find a Flink committer/committers that will review > > and > > > > > iterate on CEP PRs before you review them. They don't need to know > > all > > > > CEP > > > > > internals, but they will help to eradicate most of the issues. > > > > > Then you will have to review PRs only when most of the issues are > > fixed > > > > and > > > > > to make a final decision about whether to merge a PR or not. In > this > > > > case, > > > > > you probably won't need to spend much time on reviewing CEP PRs. As > > an > > > > > additional bonus, after some time these new CEP reviewers will > learn > > > > enough > > > > > about CEP to review them by themselves without your input. > > > > > > > > > > What do you think about this? > > > > > > > > > > Best regards, > > > > > Ivan. > > > > > > > > > > On Fri, 13 Jan 2017 at 11:28 Till Rohrmann > > > wrote: > > > > > > > > > > Hi Ivan, > > > > > > > > > > first of all let me apologise for the bad experience you've had > with > > > > > opening CEP PRs in the past. > > > > > > > > > > The general problem as you've said is that there is nobody who > > reviews > > > > the > > > > > open PRs. I used to do this in the but at the moment I hardly find > > time > > > > due > > > > > to other commitments. > > > > > > > > > > I thin
[jira] [Created] (FLINK-5517) Upgrade hbase version to 1.3.0
Ted Yu created FLINK-5517: - Summary: Upgrade hbase version to 1.3.0 Key: FLINK-5517 URL: https://issues.apache.org/jira/browse/FLINK-5517 Project: Flink Issue Type: Improvement Reporter: Ted Yu In the thread 'Help using HBase with Flink 1.1.4', Giuliano reported seeing: {code} java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator {code} The above has been solved by HBASE-14963 hbase 1.3.0 is being released. We should upgrade hbase dependency to 1.3.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[Dev] Dependencies issue related to implementing InputFormat Interface
Hi, I'm currently working on Flink InputFormat Interface implementation. I'm writing a java program to read data from a file using InputputFormat Interface. I used maven project and I have added following dependencies to the pom.xml. org.apache.flink flink-core 1.1.4 org.apache.flink flink-clients_2.11 1.1.4 org.apache.flink flink-java 1.1.4 I have a java class that implements InputFormat. It works with *InputFormat. *But it didn't allow to used *InputFormat. *That OT field didn't recognized. I need a any kind of help to solve this problem. Thanks, Pawan -- *Pawan Gunaratne* *Mob: +94 770373556*
States split over to external storage
Hi there, I would like to discuss split over local states to external storage. The use case is NOT another external state backend like HDFS, rather just to expand beyond what local disk/ memory can hold when large key space exceeds what task managers could handle. Realizing FLINK-4266 might be hard to tacking all-in-one, I would live give a shot to split-over first. An intuitive approach would be treat HeapStatebackend as LRU cache and split over to external key/value storage when threshold triggered. To make this happen, we need minor refactor to runtime and adding set/get logic. One nice thing of keeping HDFS to store snapshots would be avoid versioning conflicts. Once checkpoint restore happens, partial write data will be overwritten with previously checkpointed value. Comments? -- -Chen Qin
[jira] [Created] (FLINK-5518) HadoopInputFormat throws NPE when close() is called before open()
Jakub Havlik created FLINK-5518: --- Summary: HadoopInputFormat throws NPE when close() is called before open() Key: FLINK-5518 URL: https://issues.apache.org/jira/browse/FLINK-5518 Project: Flink Issue Type: Bug Components: Batch Connectors and Input/Output Formats Affects Versions: 1.1.4 Reporter: Jakub Havlik When developing a simple Flink applications reading ORC files it crashes with NullPointerException when number of instances/executor threads is higher then the number of files because it is trying to close a HadoopInputFormat which is trying to close RecordReader which was not yet initialized as there is no file for which it should have been opened. The issue is caused when {code:java} public void run(SourceContext ctx) throws Exception { try { ... while (isRunning) { format.open(splitIterator.next()); ... } finally { format.close(); ... } {code} in file {{InputFormatSourceFunction.java}} which calls {code:java} public void close() throws IOException { // enforce sequential close() calls synchronized (CLOSE_MUTEX) { this.recordReader.close(); } } {code} from {{HadoopInputFormatBase.java}}. As there is just this one implementation of the {{close()}} method it may be enough just to add a null check for the {{this.recordReader}} in there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
答复: States split over to external storage
Dear Chen Qin: I am liuxinchun, and email is liuxinc...@huawei.com ( the email address in the "Copy To" is wrong). I have leave a message in FLINK-4266 using name SyinChwun Leo. We meet the similar problem in the applications. I hope we can develop this feature together. The following is my opinion: (1) The organization form of current sliding window(SlidingProcessingTimeWindow and SlidingEventTimeWindow) have a drawback: When using ListState, a element may be kept in multiple windows (size / slide). It's time consuming and waste storage when checkpointing. Opinion: I think this is a optimal point. Elements can be organized according to the key and split(maybe also can called as pane). When triggering cleanup, only the oldest split(pane) can be cleanup. (2) Incremental backup strategy. In original idea, we plan to only backup the new coming element, and that means a whole window may span several checkpoints, and we have develop this idea in our private SPS. But in Flink, the window may not keep raw data(for example, ReducingState and FoldingState). The idea of Chen Qin maybe a candidate strategy. We can keep in touch and exchange our respective strategy. -邮件原件- 发件人: Chen Qin [mailto:c...@uber.com] 发送时间: 2017年1月17日 13:30 收件人: dev@flink.apache.org 抄送: iuxinc...@huawei.com; Aljoscha Krettek; shijinkui 主题: States split over to external storage Hi there, I would like to discuss split over local states to external storage. The use case is NOT another external state backend like HDFS, rather just to expand beyond what local disk/ memory can hold when large key space exceeds what task managers could handle. Realizing FLINK-4266 might be hard to tacking all-in-one, I would live give a shot to split-over first. An intuitive approach would be treat HeapStatebackend as LRU cache and split over to external key/value storage when threshold triggered. To make this happen, we need minor refactor to runtime and adding set/get logic. One nice thing of keeping HDFS to store snapshots would be avoid versioning conflicts. Once checkpoint restore happens, partial write data will be overwritten with previously checkpointed value. Comments? -- -Chen Qin
[jira] [Created] (FLINK-5519) scala-maven-plugin version all change to 3.2.2
shijinkui created FLINK-5519: Summary: scala-maven-plugin version all change to 3.2.2 Key: FLINK-5519 URL: https://issues.apache.org/jira/browse/FLINK-5519 Project: Flink Issue Type: Improvement Components: Build System Reporter: shijinkui 1. scala-maven-plugin version all change to 3.2.2 in all module 2. parent pom version change to apache-18 from apache-14 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5520) Disable outer joins with non-equality predicates
Fabian Hueske created FLINK-5520: Summary: Disable outer joins with non-equality predicates Key: FLINK-5520 URL: https://issues.apache.org/jira/browse/FLINK-5520 Project: Flink Issue Type: Bug Components: Table API & SQL Affects Versions: 1.2.0 Reporter: Fabian Hueske Priority: Blocker Fix For: 1.2.0 Outer joins with non-equality predicates (and at least one equality predicate) compute incorrect results. Since this is not a very common requirement, I propose to disable this feature for the 1.2.0 release and correctly implement it for a later version. The fix should add checks in the Table API validation phase (to get a good error message) and in the DataSetJoinRule to prevent translation of SQL queries with non-equality predicates on outer joins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)