Diagnosing TaskManager disappearance

2015-10-29 Thread Greg Hogan
I am testing again on a 64 node cluster (the JobManager is running fine having reduced some operator's parallelism and fixed the string conversion performance). I am seeing TaskManagers drop like flies every other job or so. I am not seeing any output in the .out log files corresponding to the

Re: Diagnosing TaskManager disappearance

2015-10-29 Thread Greg Hogan
I recently discovered that AWS uses NUMA for its largest nodes. An example c4.8xlarge: $ numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 18 19 20 21 22 23 24 25 26 node 0 size: 29813 MB node 0 free: 24537 MB node 1 cpus: 9 10 11 12 13 14 15 16 17 27 28 29 30 31 32 33 34

Re: Diagnosing TaskManager disappearance

2015-10-29 Thread Greg Hogan
connection to the > JobManager? > > Greetings, > Stephan > > > On Thu, Oct 29, 2015 at 9:56 AM, Greg Hogan <c...@greghogan.com> wrote: > > > I recently discovered that AWS uses NUMA for its largest nodes. An > example > > c4.8xlarge: > > > > $ numa

Forwarding Strategies

2015-10-20 Thread Greg Hogan
Looking at org.apache.flink.runtime.operators.shipping.OutputEmitter, shipping strategies FORWARD, PARTITION_RANDOM, and PARTITION_FORCED_REBALANCE all call a local round-robin partitioning function. I'd like to patch this so that the round-robin count starts at the local task index, but shouldn't

Java type erasure and object reuse

2015-09-17 Thread Greg Hogan
ceFunction that needs to collect objects. With object reuse we need to make a copy and with type erasure we cannot call new. Greg Hogan

Re: Towards Flink 0.10

2015-10-05 Thread Greg Hogan
Max, Stephan noted that FLINK-2723 is an API breaking change. The CopyableValue interface has a new method "T copy()". Commit e727355e42bd0ad7d403aee703aaf33a68a839d2 Greg On Mon, Oct 5, 2015 at 10:20 AM, Maximilian Michels wrote: > Hi Flinksters, > > After a lot of

Re: Diagnosing TaskManager disappearance

2015-12-12 Thread Greg Hogan
he network buffers to be re-used by Netty and save half of the network buffer memory? I created FLINK-3164 which would reduce the number of necessary network buffers. Greg Hogan On Fri, Oct 30, 2015 at 12:33 PM, Till Rohrmann <trohrm...@apache.org> wrote: > The logging of the TaskManager sto

Re: Side-effects of DataSet::count

2016-05-30 Thread Greg Hogan
Hi Stephan, Is there a design document, prior discussion, or background material on this enhancement? Am I correct in understanding that this only applies to DataSet since streams run indefinitely? Thanks, Greg On Mon, May 30, 2016 at 5:49 PM, Stephan Ewen wrote: > Hi Eron!

Re: Side-effects of DataSet::count

2016-05-30 Thread Greg Hogan
Hi Simone, This can be done with a map followed by a reduce. DataSet#count leverages accumulators which perform an inherent reduce. Also, DataSet#count implements RichOutputFormat as an optimization to only require a single operator. Previously the counting and accumulating was handled in a

Re: Hotfixes on the master

2016-05-27 Thread Greg Hogan
13 Ufuk Celebi 9 Fabian Hueske 9 Maximilian Michels 6 Greg Hogan 5 Stefano Baghino 3 smarthi 2 Andrea Sella 2 Gyula Fora 2 Jun Aoki 2 Sachin Goel 2 mjsax 2 zentol 1 Alexander Alexandrov 1 Gabor Gevay 1 Prez Cannady

Iteration Intermediate Output

2016-05-26 Thread Greg Hogan
Hi y'all, I think this is an oft-requested feature [0] and there are many graph algorithms for which intermediate output is the desired result. I'd like to take Stephan up on his offer [1] for pointers. I have yet to get in deep, but I see that iteration tasks are treated specially as

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-06-01 Thread Greg Hogan
Is "Observer" too passive? Maintainer -> Guide and/or Shepherd -> Reviewer? Are the component leads the first name in each list? If so, +1 from me :) On Wed, Jun 1, 2016 at 1:59 PM, Chesnay Schepler wrote: > sounds like "Observer" would fit. > > > On 01.06.2016 19:11,

web-dashboard Bower dependencies

2016-01-15 Thread Greg Hogan
Happy Friday, I am looking to submit a pull request for FLINK-3160 which updates files in flink-runtime-web. What is the proper way to handle updated dependencies from bower.json? For example, bootstrap is specified with version "~3.3.5" which permits the patch update to 3.3.6. When I run `npm

Option to disable chaining?

2016-02-08 Thread Greg Hogan
Is it possible to force operator chaining to be disabled? Similar to how object reuse can be enabled or disabled? Greg

Re: Option to disable chaining?

2016-02-08 Thread Greg Hogan
) > > On Mon, Feb 8, 2016 at 10:34 AM, Greg Hogan <c...@greghogan.com> wrote: > > > Is it possible to force operator chaining to be disabled? Similar to how > > object reuse can be enabled or disabled? > > > > Greg > > >

Limitations on grouped ReduceFunction

2016-02-02 Thread Greg Hogan
If a user modifies keyed fields of a grouped reduce during a combine then the reduce will receive incorrect groupings. For example, a useless modification to word count: public WC reduce(WC in1, WC in2) { return new WC(in1.word + " " + in2.word, in1.count + in2.count); } I don't see an

Re: [VOTE] Release Apache Flink 1.0.0 (RC1)

2016-02-25 Thread Greg Hogan
Hi, I have two bugfix pull requests in the stack. [FLINK-3340] [runtime] Fix object juggling in drivers https://github.com/apache/flink/pull/1626 [FLINK-3437] [web-dashboard] Fix UI router state for job plan https://github.com/apache/flink/pull/1661 Greg On Thu, Feb 25, 2016 at 8:32 AM,

Re: [VOTE] Release Apache Flink 1.0.0 (RC1)

2016-02-25 Thread Greg Hogan
Hi Vasia, In the WebUI, the Subtasks and TaskManagers list the same operator statistics but expand to show either per-subtask or per-TaskManager statistics. Summarizing the statistics by TaskManager is valuable when viewing larger clusters. Greg On Thu, Feb 25, 2016 at 11:23 AM, Vasiliki

Fix version

2016-02-22 Thread Greg Hogan
Hi, With 1.0.0 imminent there are 112 tickets with a "fix version" of 1.0.0, the earliest from 2014. From the ticket logs it looks like we typically bump the fix version once the target release has passed. Would it be better to wait to assign a fix version until achieving some combination of

Re: Guarantees for object reuse modes and documentation

2016-02-18 Thread Greg Hogan
Hi Fabian, I would only add to your citations Stephan's comment [1] concerning the design, implementation, and use of object reuse. I see two separate concerns addressed in code. First, as Stephan noted, for certain classes deserialization is sufficiently expensive relative to object creation

Re: Association failure ClassNotFoundException

2016-03-15 Thread Greg Hogan
example program with us which reproduces the problem? I > suspect that, somehow, your user code class BlockInfo is sent directly to > the JobManager where it is deserialized without the user code class loader. > > Cheers, > Till > ​ > > On Tue, Mar 15, 2016 at 4:19 PM, Greg Hogan

Association failure ClassNotFoundException

2016-03-15 Thread Greg Hogan
I am seeing a failure running my code starting with commit 0f8d76c6 (ExecutionConfig to JobGraph). Logs and stack trace are below. Using default configuration so a single TaskManager. From the web UI, data port is 33245 and path is akka.tcp:// flink@192.168.14.134:41339/user/taskmanager.

[DISCUSS] Macro-benchmarking for performance tuning and regression detection

2016-04-06 Thread Greg Hogan
I'd like to discuss the creation of a macro-benchmarking module for Flink. This could be run during pre-release testing to detect performance regressions and during development when refactoring or performance tuning code on the hot path. Many users have published benchmarks and the Flink

Re: Association failure ClassNotFoundException

2016-03-19 Thread Greg Hogan
build at: https://s3.amazonaws.com/apache-flink/flink-1.1-SNAPSHOT.txz Are you able to replicate with the following command: $ ./bin/flink run -c org.apache.flink.graph.examples.Graph500 flink-gelly_with_examples_2.10-1.1-SNAPSHOT.jar On Tue, Mar 15, 2016 at 5:16 PM, Greg Hogan &l

Re: Tuple performance and the curious JIT compiler

2016-03-07 Thread Greg Hogan
> > I have to dig into the serializers, to see if they could suffer from that. > The "getField(pos)" method for example should always have many overrides > (though few would be loaded at any time, because one usually does not use > all Tuple classes at the same time). >

Tuple performance and the curious JIT compiler

2016-03-04 Thread Greg Hogan
I am noticing what looks like the same drop-off in performance when introducing TupleN subclasses as expressed in "Understanding the JIT and tuning the implementation" [1]. I start my single-node cluster, run an algorithm which relies purely on Tuples, and measure the runtime. I execute a

Re: Fix version

2016-03-04 Thread Greg Hogan
to express their wish for fast > resolution. > > I also saw some cases where issues were reopened. > > > > I agree with your suggestion to clear the "fix version" field once 1.0.0 > > has been released. > > > > On Mon, Feb 22, 2016 at 4:43 PM, Greg Hogan

Parallelizing ExecutionConfig.fromCollection

2016-04-25 Thread Greg Hogan
Hi, CollectionInputFormat currently enforces a parallelism of 1 by implementing NonParallelInput and serializing the entire Collection. If my understanding is correct this serialized InputFormat is often the cause of a new job exceeding the akka message size limit. As an alternative the

Re: Parallelizing ExecutionConfig.fromCollection

2016-04-25 Thread Greg Hogan
se we don't know the number of sub tasks yet. In > the latter case, which can also be cause by large closure objects, we > should send the job via the blob manager to the `JobManager` to solve the > problem. > > Cheers, > Till > > On Mon, Apr 25, 2016 at 3:45 PM, Greg Ho

[DISCUSS] Graph algorithms for vertex and edge degree

2016-04-21 Thread Greg Hogan
Vasia and I are looking for additional feedback on FLINK-3772. This ticket [0] and PR [1] provides a set of graph algorithms which compute and store the degree for vertices and edges. Degree annotation is a basic component of many algorithms. For example, PageRank requires the vertex out-degree

[DISCUSS] Methods for translating Graphs

2016-04-21 Thread Greg Hogan
Vasia and I are looking for additional feedback on FLINK-3771. This ticket [0] and PR [1] provides methods for translating the type or value of graph labels, vertex values, and edge values. My use cases are provided in JIRA, but I think users will find many more. Translators compose well with

Re: Eclipse Problems

2016-04-28 Thread Greg Hogan
Matthias, Won't this be a compile-time error as long as the user is parameterizing the return type since .fromElements(OUT...) returns DataStreamSource and will bind to the nearest common superclass? The new .fromElements(Class, OUT...) does give the user the choice of common superclass. Greg

Re: Master test stability poor

2016-04-27 Thread Greg Hogan
We have also started running over Travis' 2 hour limit for the longest build. Greg > On Apr 27, 2016, at 7:53 AM, Ufuk Celebi wrote: > > Hi Till, > > thank you for bringing this up. We really need to fix this. > > Filing JIRAs with critical priority was how we tried to

Re: remote debugging

2016-05-17 Thread Greg Hogan
I also just modify the startup scripts but would it be better to have variants of env.java.opts specific to the JobManager, TaskManager, client, etc.? On Tue, May 17, 2016 at 5:24 AM, Stephan Ewen wrote: > Hey Stefano! > > I think that question is bound to come up again. I

Performance and accuracy of Flink iterations

2016-05-16 Thread Greg Hogan
Hi, This question has arisen with the HITS algorithm (Hubs and Authorities) but the question is the same as with PageRank, for which Stephan published an excellent discussion and comparison of bulk and delta iterations [0]. Delta iterations are clearly faster. Has there been a comparison as to

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-13 Thread Greg Hogan
+1 to better scaling :) Many Jira tickets are good ideas with no current traction. Some have a pull request (usually closed), many have comments or discussion. It seems these old tickets tend to hang around because closing the ticket feels like rejecting the idea. How do we track requested

Re: [DISCUSS] Releasing Flink 1.1.0

2016-07-05 Thread Greg Hogan
ully > >>>> merged today) > >>>> https://github.com/apache/flink/pull/2158 > >>>> > >>>> In regards to metrics: To add a counter metric a user currently has > to call > >>>> "counter(...)" on > >>>> a MetricGro

Re: sampling function

2016-07-09 Thread Greg Hogan
Hi Do, DataSet provides a stable @Public interface. DataSetUtils is marked @PublicEvolving which is intended for public use, has stable behavior, but method signatures may change. It's also good to limit DataSet to common methods whereas the utility methods tend to be used for specific

Re: [DISCUSS] API breaking change in DataStream Windows

2016-08-09 Thread Greg Hogan
I agree that expecting users to cast is undesirable. Upon changing the API, why would we not mark the next release as 2.0? The same issue arose with Gabor's addition of hash-combine in the Scala DataSet API where DataSet was returned rather than a specialized Operator. The solution was to add an

Re: [DISCUSS] Releasing Flink 1.1.0

2016-06-30 Thread Greg Hogan
It would be great if hash-based combine (FLINK-3477) could make it in to be tested for this release. We've seen impressive improvements in performance (though, admittedly, some sort-based enhancements are yet to be worked on). This PR looks to be ripe. Also, as we tidy up a few things with Gelly

Re: [Discuss] Organizing Documentation for Configuration Options

2017-02-07 Thread Greg Hogan
mating the network buffer configuration in order to > get rid of any manual tuning for most users (because of the issues you > described + streaming and batch jobs require different tuning, which > complicates things even more). > > – Ufuk > > On 6 February 2017 at 19:21:28,

Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Greg Hogan
Welcome Jark and Kostas! Thank you for your contributions and many more to come. On Tue, Feb 7, 2017 at 3:16 PM, Fabian Hueske wrote: > Hi everybody, > > I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the > invitation of the Flink PMC to become committers

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-08 Thread Greg Hogan
Hi Pat, Serkan, and Gábor, This looks very nice. I'll treat this like a pre-FLIP and ask my question here. Do I understand correctly that the generated code is only dependent on the length of the sort key? So we could separate the writing and reading of keys and records and from the generated

[Discuss] Organizing Documentation for Configuration Options

2017-02-06 Thread Greg Hogan
Hi devs, Flink's Configuration page [1] has grown intimidatingly long and complex. Options are described across three main sections: common options (single section), advanced options (multiple sections), and full reference. The trailing "background" section further describes the most impactful

Re: [DISCUSS] (Not) tagging reviewers

2017-01-27 Thread Greg Hogan
> I took a quick skim on the PRs and I noticed that only a few of them are actually in mergeable shapes (i.e., properly rebased and passing CI). Although TravisCI is quite unstable, Flink executes multiple tests with different configurations so you'll want to instead look at which tests are

Re: [DISCUSS] Code style / checkstyle

2017-02-22 Thread Greg Hogan
Will not the code style be applied on save to any user-modified file? So this will clutter PRs and overwrite history. On Wed, Feb 22, 2017 at 6:19 AM, Dawid Wysakowicz < wysakowicz.da...@gmail.com> wrote: > I also agree with Till and Chesnayl. Anyway as to "capture the current > style" I have

Re: [DISCUSS] Project build time and possible restructuring

2017-02-22 Thread Greg Hogan
An additional option for reducing time to build and test is parallel execution. This would help users more than on TravisCI since we're generally running on multi-core machines rather than VM slices. Is the idea that each user would only check out the modules that he or she is developing with?

Re: Visualizing topologies

2017-02-24 Thread Greg Hogan
Ken and Fabian, Is the use case to generate and act on the dot file from within the user program? Would it be more maintainable to make the plan JSON more accessible (through the CLI and web interface) which users could then pipe through a converter script? Greg On Fri, Feb 24, 2017 at 4:55 AM,

Re: [DISCUSS] Gelly planning for release 1.3 and roadmap

2017-02-24 Thread Greg Hogan
Thanks, Vasia, for starting the discussion. I was expecting more changes from the recent discussion on restructuring the project, in particular regarding the libraries. Gelly has always collected algorithms and I have personally taken an algorithms-first approach for contributions. Is that

Re: [DISCUSS] Code style / checkstyle

2017-02-24 Thread Greg Hogan
I agree wholeheartedly with Ufuk. We cannot reformat the codebase, cannot pause while flushing the PR queue, and won't find a consensus code style. I think we can create a baseline code style for new and existing contributors for which reformatting on changed files will be acceptable for PR

Re: KeyGroupRangeAssignment ?

2017-02-21 Thread Greg Hogan
Integer's hashCode is the identity function. Store your slot index in an Integer or IntValue and key off that field. On Tue, Feb 21, 2017 at 6:04 AM, Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr> wrote: > Hi, > > As in my example, each key is a window so I want to evenly distributed >

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-14 Thread Greg Hogan
Pat, Thanks for adding the new test results. This idea for this implementation was Gábor's from the FLINK-3722 description. Since you will be filing a FLIP I recommend including these benchmarks for consideration and discussion on the mailing list. In part because the PR is 4 months old and need

Re: [ANNOUNCE] Welcome Stefan Richter as a new committer

2017-02-10 Thread Greg Hogan
Welcome, Stefan, and thank you for your contributions! On Fri, Feb 10, 2017 at 5:00 AM, Ufuk Celebi wrote: > Hey everyone, > > I'm very happy to announce that the Flink PMC has accepted Stefan > Richter to become a committer of the Apache Flink project. > > Stefan is part of

Re: [DISCUSS] Time-based releases in Flink

2017-01-18 Thread Greg Hogan
I'm +0 on switching to a pre-determined schedule. It may be that the Flink codebase has reached a level of maturity allowing for a time-based release schedule, and I'm hopeful that a known schedule will improve communication about and expectations for new features. I'd like to hear a

Re: [DISCUSS] Python API for Fllink libraries

2016-08-22 Thread Greg Hogan
Hi Ivan, My expectation would be that programs written for the Python API would be much slower than when implementing with Java or Scala. A performance comparison would be quite interesting. Gelly has both iterative and non-iterative algorithms. Greg On Sat, Aug 20, 2016 at 7:11 PM, Ivan

Re: [DISCUSS] Gelly planning for release 1.3 and roadmap

2017-03-01 Thread Greg Hogan
sary), rather than high-level things (e.g. > algorithms, performance) on top of it. What if we can change both the > edges' values and vertices' values during an iteration one day? :) > > Best, > Xingcan > > > On Sat, Feb 25, 2017 at 2:43 AM, Vasiliki Kalavri <vasilikikal

Re: [DISCUSS] Gelly planning for release 1.3 and roadmap

2017-03-01 Thread Greg Hogan
On Fri, Feb 24, 2017 at 1:43 PM, Vasiliki Kalavri <vasilikikala...@gmail.com <mailto:vasilikikala...@gmail.com>> wrote: Hi Greg, On 24 February 2017 at 18:09, Greg Hogan <c...@greghogan.com <mailto:c...@greghogan.com>> wrote: > Thanks, Vasia, for starting the disc

Re: [DISCUSS] Code style / checkstyle

2017-02-27 Thread Greg Hogan
ve to go manually through >>> all >>>>> past commits until you find the commit which changed a given line >>> before >>>>> the reformatting. >>>>> >>>>> Cheers, >>>>> Till >>>>> >>>>> On Sun,

Re: [Discuss] Upgrade JUnit to 4.12

2016-10-05 Thread Greg Hogan
Tests are passing with one additional change to an inner test class visibility. The ticket is FLINK-4740. On Wed, Oct 5, 2016 at 3:52 AM, Till Rohrmann <trohrm...@apache.org> wrote: > +1 for that :-) > > On Tue, Oct 4, 2016 at 10:11 PM, Greg Hogan <c...@greghogan.com> w

Re: [Discuss] Upgrade JUnit to 4.12

2016-10-04 Thread Greg Hogan
test 1.10.19. Since the changes are more than a single version I'll create a ticket and PR so the test results can be discussed. Greg On Tue, Oct 4, 2016 at 3:19 PM, Stephan Ewen <se...@apache.org> wrote: > From my side +1, unless there are known issues with JUnit 4.12 > > On Tue, Oc

[Discuss] Upgrade JUnit to 4.12

2016-10-04 Thread Greg Hogan
JUnit 4.12 was released 4 Dec 2014. Flink is currently using JUnit 4.11 from 14 Nov 2012. https://github.com/junit-team/junit4/releases My use case is the support for assert equals on boolean arrays, but in general this looks to be an innocuous change and I could not find any prior discussion.

Re: Performance and Latency Chart for Flink

2016-09-19 Thread Greg Hogan
Hi Amir, You may see improved performance setting "taskmanager.memory.preallocate: true" in order to use off-heap memory. Also, your number of buffers looks quite low and you may want to increase "taskmanager.network.numberOfBuffers". Your setting of 4096 is only 128 MiB. As this is a only

Re: Performance and Latency Chart for Flink

2016-09-19 Thread Greg Hogan
ion: Configuration > > > | > | > | > | || > > | > > | > | > | | > Apache Flink 1.2-SNAPSHOT Documentation: Configuration >| | > > | > > | > > > > 4096 = (16x16)x4x4 where 16 is number of tasks per TM, 4 is # of TMs & 4 > is there in t

Re: Performance and Latency Chart for Flink

2016-09-19 Thread Greg Hogan
y, a metric travels from TaskManager -> WebInterface -> User. > FLINK-4389 was about the first arrow, which is a prerequisite step for the > second one. > > Regards, > Chesnay > > > On 19.09.2016 21:35, Greg Hogan wrote: > >> The nightly snapshots now inclu

Re: Performance and Latency Chart for Flink

2016-09-19 Thread Greg Hogan
: > Thanks Greg."Your setting of 4096 is only 128 MiB."...Correct. Cz I > followed that formula :-)))I can bump it up to twice as much like what the > example is doing to for instance 300 MiB.Is this reasonable? what do you > suggest as a reasonable range?Thanks Greg > >

Re: why job submit timeout is 21474835 second

2016-08-29 Thread Greg Hogan
Could be rewritten as "val INFO_TIMEOUT = Integer.MAX_VALUE seconds"? On Mon, Aug 29, 2016 at 4:22 AM, 时金魁 wrote: > > > AkkaUtils.scala > val INF_TIMEOUT = 21474835 seconds > > > That is job submit timeout 248.55 days. > > > Why is this number? > > > >

Re: Additional project downloads

2016-08-25 Thread Greg Hogan
gt; > Maybe we should put a link to maven central. We could parameterize the > > link > > > so that it always links to the current release linked on our downloads > > > page. > > > > > > On Wed, Aug 24, 2016 at 5:04 PM, Greg Hogan <c...@gre

Additional project downloads

2016-08-24 Thread Greg Hogan
Hi, Should Flink add-ons such as CEP, Gelly, ML, and the optional Metrics Reporters be available from the download page? Is the alternative to direct users to Maven Central? Greg

Re: 答复: [DISCUSS] add netty tcp/restful pushed source support

2016-09-27 Thread Greg Hogan
Apache Bahir's website only suggests support for additional frameworks, but there is a Flink repository at https://github.com/apache/bahir-flink On Tue, Sep 27, 2016 at 8:38 AM, shijinkui wrote: > Hey, Stephan Ewen > > 1. bahir's target is spark. The contributer are

Duplicate sort keys

2016-10-03 Thread Greg Hogan
Is it correct to expect that Flink should remove duplicate sort keys? I'm working on instrumenting the FixedLengthRecordSorter (FLINK-4705) and the following test case from TypeHintITCase:200 is having an unexpected effect due to the keyPositions = {0, 0} being passed to TupleComparator. DataSet

Travis CI

2016-11-10 Thread Greg Hogan
We're getting the dreaded "The job exceeded the maximum time limit for jobs, and has been terminated." error for some recent Travis-CI builds. https://travis-ci.org/apache/flink/builds/174615801 The docs state that termination will occur when "A job takes longer than 50 minutes on

Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

2016-10-14 Thread Greg Hogan
+1 On Fri, Oct 14, 2016 at 5:29 AM, Fabian Hueske wrote: > Hi everybody, > > I would like to propose to deprecate the utility methods to read data with > Hadoop InputFormats from the (batch) ExecutionEnvironment. > > The motivation for deprecating these methods is reduce

Re: Removing flink-contrib/flink-operator-stats

2016-10-19 Thread Greg Hogan
Based on a cursory reading of FLINK-1297 I would lean toward dropping the code rather than moving to Apache Bahir. This looks to only be appropriate for batch and this module was not integrated into the runtime. If there is a way forward to make use this code in core Flink then that would be even

[DISCUSS] @Public libraries

2016-11-22 Thread Greg Hogan
Hi all, Should stable APIs in Flink's CEP, ML, and Gelly libraries be annotated @Public or restricted to use of @PublicEvolving? We would ensure that library APIs do not add restrictions to the core APIs. Libraries could use @PublicEvolving or @Internal core APIs within @Public or

Contributing to flink-web

2016-10-27 Thread Greg Hogan
Should we align the process for contributing to apache/flink-web to mirror that for apache/flink? Flink's JIRA has an existing component for "Project Website". Commits to flink-web are sent to the commits mailing list. Does Jira require further integeration?

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

2016-10-13 Thread Greg Hogan
Hi Robert, What are the benefits to Flink for dropping Hadoop 1 support? Is there significant code cleanup or would we simply be publishing one less set of artifacts? Greg On Thu, Oct 13, 2016 at 10:47 AM, Robert Metzger wrote: > Hi, > > The Apache Hadoop community has

Re: [DISCUSS] Support Suspending and Resuming of Flink Jobs

2016-10-12 Thread Greg Hogan
Sorry, I haven't followed this development, but roughly how much more costly is the new serialization for savepoints? On Wed, Oct 12, 2016 at 5:51 AM, SHI Xiaogang wrote: > Hi all, > > Currently, savepoints are exactly the completed checkpoints, and Flink > provides

Re: [DISCUSS] Merging the FLIP-6 feature branch into the Master branch

2016-12-02 Thread Greg Hogan
Hi Stephan, How soon are you expecting the "release-1.2" fork? I am sure you have considered merging the FLIP-6 branch after the fork. Do we anticipate the new tests pushing Flink over Travis CI's new 50 minute limit? This might be a good opportunity to rebalance the test ranges as the most

[DISCUSS] TravisCI auto cancellation

2017-03-26 Thread Greg Hogan
Hi, Just saw this TravisCI beta feature. I think this would be worthwhile to enable on pull request builds. We could leave branch builds unchanged since there are fewer builds of this type and skipping builds would make it harder to locate a broken build. It’s not uncommon to see three or more

Re: [DISCUSS] Flink dist directory management

2017-03-25 Thread Greg Hogan
Hi Jinkui, +1 to moving gelly-examples into examples/. Also sounds nice to similarly organize the Python examples. Docs will also need to be updated (docs/dev/lib/gelly/index.md). Greg > On Mar 25, 2017, at 3:46 AM, shijinkui wrote: > > Hi, all > > The Flink

Re: [DISCUSS] TravisCI auto cancellation

2017-03-29 Thread Greg Hogan
; > Cheers, > Till > > On Sun, Mar 26, 2017 at 11:57 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> +1 to Greg's suggestion. >> >> On Sun, Mar 26, 2017 at 2:22 PM, Greg Hogan <c...@greghogan.com> wrote: >> >>> Hi, >>> >>&

Re: [DISCUSS] TravisCI auto cancellation

2017-03-29 Thread Greg Hogan
Wow, that was a quick response that this feature was already enabled. > On Mar 29, 2017, at 9:31 AM, Greg Hogan <c...@greghogan.com> wrote: > > Ticket: https://issues.apache.org/jira/browse/INFRA-13778 > <https://issues.apache.org/jira/browse/INFRA-13778> > >

Re: [DISCUSS] FLIP-18: Code Generation for improving sorting performance

2017-03-23 Thread Greg Hogan
I would be more than happy to shepherd and review this PR. I have two discussion points. First, a strategy for developing with templates. IntelliJ has a FreeMarker plugin but we lose formatting and code completion. To minimize this issue we can retain the untemplated code in an abstract class

Bumping API stability check version

2017-03-16 Thread Greg Hogan
Hi, I see in the parent pom.xml that 1.3-SNAPSHOT is checking for API stability against 1.1.4. Also, that this version was only bumped with FLINK-5617 late in the 1.2 development cycle. Should we bump this version as part of the release process, i.e. on the 1.2.0 release updating 1.3-SNAPSHOT

Re: [DISCUSS] Could we Improve tests time and stability?

2017-03-17 Thread Greg Hogan
Dmytro, This is a good idea and a nice speedup, though I notice that nearly half of the speedup (1104s of 2461s) is from job 7 which appears to have hung and timed out in the initial run. Could you test the two changes in isolation (increased maximum memory and garbage collector)? If the

Re: [DISCUSS] Project build time and possible restructuring

2017-03-17 Thread Greg Hogan
;> much complexity and too many repositories. >>>>> "flink" and "flink-libraries" are hopefully enough to get the build >>> time >>>>> significantly down. >>>>> We can also consider putting the connectors into the >> &quo

Re: [DISCUSS] Project build time and possible restructuring

2017-03-15 Thread Greg Hogan
we have library repository >>> depend >>>> on >>>>> snapshot Flink versions, we need to make sure that the snapshot >>>> deployment >>>>> always works. This also means that people working on a library >>> repository >>>&g

[DISCUSS] TravisCI status on GitHub Page

2017-03-20 Thread Greg Hogan
We are now showing the TravisCI build status on Flink’s GitHub page. I think Robert’s comment in Jira may have gone unnoticed when the PR was committed. https://issues.apache.org/jira/browse/FLINK-6122 If not yet seeing the benefit even if

Re: [DISCUSS] Project build time and possible restructuring

2017-03-20 Thread Greg Hogan
t; I would actually suggest to do only the library split initially, to see > what the challenges are in setting up the multi-repo build and release > tooling. Once we gathered experience there, we can probably easily see what > else we can split out. > > Stephan > > > On Fr

Re: [Disuss]Permission of checkpoint directory

2017-03-20 Thread Greg Hogan
Prior discussion at https://github.com/apache/flink/pull/3335 > On Mar 19, 2017, at 11:34 PM, Wangtao (WangTao) wrote: > > Hi All, > > Checkpoint directory will store user data and it is better to keep it with > minimum

Re: Bumping API stability check version

2017-03-16 Thread Greg Hogan
nModifications". Does > it fail the build even if somebody did a change that is non API breaking on > a @Public class? > > On Thu, Mar 16, 2017 at 3:37 PM, Greg Hogan <c...@greghogan.com> wrote: > >> Hi, >> >> I see in the parent pom.xml that 1.3-SNAP

Re: [DISCUSS] Code style / checkstyle

2017-04-05 Thread Greg Hogan
t;>>>> of >>>>>> all code/comment lines. >>>>>> >>>>>> I would like to have a well defined code style, such as the Google >>> Code >>>>>> style, that has nice tooling and support but I don't think we will >>&

Re: [DISCUSS] FLIP-18: Code Generation for improving sorting performance

2017-04-05 Thread Greg Hogan
Pat, Thanks for running additional tests and continuing to work on this contribution. My testing is also showing that the performance gains remain even when multiple classes are used for sorting. I think we should proceed in the order of FLINK-3722, FLINK-4705, and FLINK-5734. Gabor has

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-03-08 Thread Greg Hogan
Hi Pat, I’m still trying to understand the implications of Java’s Class Hierarchy Analysis [0]. Flink currently uses only a single implementation of InMemorySorter, which is NormalizedKeySorter. FLINK-4705 adds support for FixedLengthRecordSorter for Flink’s Value types and Tuples. This

Re: [DISCUSS] Project build time and possible restructuring

2017-03-31 Thread Greg Hogan
st: >>> >>> flink-cep-scala >>> flink-cep >>> flink-gelly-examples >>> flink-gelly-scala >>> flink-gelly >>> flink-ml >>> >>> All other modules (e.g. in flink-contrib) are rather connectors. I think >>> it would b

Re: [ANNOUNCE] New committer: Theodore Vasiloudis

2017-03-21 Thread Greg Hogan
Welcome, Theo, and great to have you onboard with Flink and ML! > On Mar 21, 2017, at 4:35 AM, Robert Metzger wrote: > > Hi everybody, > > On behalf of the PMC I am delighted to announce Theodore Vasiloudis as a > new Flink committer! > > Theo has been a community member

[ANNOUNCE] New Flink PMC member: Chesnay Schepler

2017-07-28 Thread Greg Hogan
Developers, On behalf of the Flink PMC I am delighted to announce Chesnay Schepler as a member of the Flink PMC. Chesnay is a longtime contributor, reviewer, and committer whose breadth of work and knowledge covers nearly the entire codebase. Please join me in congratulating Chesnay and

Re: [VOTE] Release 1.3.2, release candidate #2

2017-08-02 Thread Greg Hogan
-1 The Gelly examples jar is not included in the Scala 2.11 convenience binaries since change-scala-version.sh is not switching the hard-coded Scala version from 2.10 to 2.11 in ./flink-dist/src/main/assemblies/bin.xml. The simplest fix may be to revert FLINK-7211 and simply exclude the

Re: [POLL] Dropping savepoint format compatibility for 1.1.x in the Flink 1.4.0 release

2017-08-17 Thread Greg Hogan
There’s an argument for delaying this change to 1.5 since the feature freeze is two weeks away. There is little time to realize benefits from removing this code. "The reason for that is that there is a lot of code mapping between the completely different legacy format (1.1.x, not re-scalable)

Re: [ANNOUNCE] New Flink committer Jincheng Sun

2017-07-10 Thread Greg Hogan
Congrats and welcome, Jincheng! > On Jul 10, 2017, at 9:17 AM, Fabian Hueske wrote: > > Hi everybody, > > On behalf of the PMC, I'm very happy to announce that Jincheng Sun has > accepted the invitation of the PMC to become a Flink committer. > > Since more than nine

  1   2   3   4   >