[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130677380 Thanks for the contribution @sachingoel0101! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/966 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130979827 Sure. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130318970 For transformation functions, there is a clear case for thin versus rich, for Java8 lambdas. Input formats are a different game. They are super rich by default anyways. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130324909 Functions also need to extend RichFunction to have access to `open()` and `close()`. I think the two things a different enough that any strife for consistency is actually pretty random. If your thoughts currently revolve around the RuntimeContext, it apprears more consistent. If you thoughts are on the life cycle methods, it seems inconsistent. Random. I think you should go ahead and just call them Rich. It is just a name, and what matters is that the JavaDocs describe what it actually means... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130324770 Rich does not refer to the number of methods but the fact that it has the RuntimeContext available. All non-rich variants do not get state inserted. This follows a naming convention in Flink. `AbstractInputFormat` might be a more intuitive name for novices but I'm more inclined to naming consistency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130318553 We are back to square one ;-) - `Function`: single abstract method - `RichFunction` = 5 methods. I see how that gets rich. - `InputFormat`: 8 methods - `RichInputFormat`: 10 methods. We could call it `SlightlyMoreRichInputFormat` ;-) I cannot help but find this very confusing. Why the urgency to stick with the Rich prefix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130319611 The suggestion `AbstractInputFormat` was not so bad, in my opinion. If you want a name that explains what's happening, you can always call it `InputFormatWithContext` ;-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130319656 To keep it consistent with the remaining API. For functions you need to extend a RichFunction if you want to have access to the RuntimeContext. I agree that the name is not perfect (and I think everybody else got your point as well) but I think it's a valid point to aim for consistency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130384903 Oh apologies. I only saw the first comment on the email thread. I guess it's more or less settled. I'll leave it up to you guys to make a final decision on this. :') On Aug 12, 2015 10:59 PM, Sachin Goel sachingoel0...@gmail.com wrote: I agree with Stephan's argument that addition of context to I/O formats is a very marginal enhancement. He literally stole my words. :') However, from my perspective, when I first started using flink, Rich meant runtime context. The idea of open and close wasn't as nearly exciting as the runtime context. What if we changed back to the original name mentioned on jira and make it `ContextAwareInputFormat`? Would everyone be okay with that? On Aug 12, 2015 8:10 PM, Stephan Ewen notificati...@github.com wrote: Functions also need to extend RichFunction to have access to open() and close(). I think the two things a different enough that any strife for consistency is actually pretty random. If your thoughts currently revolve around the RuntimeContext, it apprears more consistent. If you thoughts are on the life cycle methods, it seems inconsistent. Random. I think you should go ahead and just call them Rich. It is just a name, and what matters is that the JavaDocs describe what it actually means... â Reply to this email directly or view it on GitHub https://github.com/apache/flink/pull/966#issuecomment-130324909. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130383984 I agree with Stephan's argument that addition of context to I/O formats is a very marginal enhancement. He literally stole my words. :') However, from my perspective, when I first started using flink, Rich meant runtime context. The idea of open and close wasn't as nearly exciting as the runtime context. What if we changed back to the original name mentioned on jira and make it `ContextAwareInputFormat`? Would everyone be okay with that? On Aug 12, 2015 8:10 PM, Stephan Ewen notificati...@github.com wrote: Functions also need to extend RichFunction to have access to open() and close(). I think the two things a different enough that any strife for consistency is actually pretty random. If your thoughts currently revolve around the RuntimeContext, it apprears more consistent. If you thoughts are on the life cycle methods, it seems inconsistent. Random. I think you should go ahead and just call them Rich. It is just a name, and what matters is that the JavaDocs describe what it actually means... â Reply to this email directly or view it on GitHub https://github.com/apache/flink/pull/966#issuecomment-130324909. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130078564 Okay. I'll hold off on making any changes right now. We should reach at a consensus first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-129849477 Yes, existing formats should converted to rich formats. The name `Abstract..` makes more sense if it becomes the new default way of interfacing with the rich input/output formats. I still think, calling it `Rich...` makes more sense in terms of consistency with the user-defined functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-129847518 I'm in favor of making the existing formats rich. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-129834217 Okay. I think maybe calling them `Rich` maybe an overkill. I will change the names to `Abstract`. Do we keep the existing formats *non-rich* or would it be okay to make all of them *rich* [which is the case right now]? The problem with not making them *rich* is that then we're limiting any extending classes to never be *rich*. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-129830222 Okay, so how about calling the abstract base class`AbstractInputFormat` and have it implement the runtime context and leaves the other methods abstract? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-129826022 @StephanEwen @fhueske The `Rich` prefix might seem a bit odd but it is a naming convention that is consistent with the user-defined functions which have the RuntimeContext available. As for `abstract classes vs interface` I agree with @sachingoel0101 that it makes sense to implement access to the RuntimeContext once for the user. We may also add other rich functions; if we implemented that using an interface we would have to change it later on or add another interface. I think Input/Ouput formats should be rich by default. However, we might not want to break the API for existing external implementations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-130091134 @mxm, @sachingoel0101 get your point. Being consistent with the remaining API is a valid argument. I would be OK with calling the abstract classes `RichInputFormat` and `RichOutputFormat`. Let's wait for at least 24 hours and merge this PR if nobody speaks up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36352389 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/operators/GenericDataSinkBase.java --- @@ -328,9 +330,12 @@ public int compare(IN o1, IN o2) { if(format instanceof InitializeOnMaster) { ((InitializeOnMaster)format).initializeGlobal(1); } - + + if(format instanceof RichOutputFormat){ --- End diff -- Move after `configure()` call --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36352428 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/operators/GenericDataSourceBase.java --- @@ -200,14 +202,18 @@ public void accept(VisitorOperator? visitor) { visitor.postVisit(this); } } - + // - protected ListOUT executeOnCollections(ExecutionConfig executionConfig) throws Exception { + protected ListOUT executeOnCollections(RuntimeContext ctx, ExecutionConfig executionConfig) throws Exception { @SuppressWarnings(unchecked) InputFormatOUT, InputSplit inputFormat = (InputFormatOUT, InputSplit) this.formatWrapper.getUserCodeObject(); + + if(inputFormat instanceof RichInputFormat){ --- End diff -- Move after `configure()` call --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36352731 --- Diff: flink-core/src/test/java/org/apache/flink/api/common/io/RichInputFormatTest.java --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.flink.api.common.io; + +import java.util.HashMap; + +import org.apache.flink.api.common.ExecutionConfig; +import org.apache.flink.api.common.accumulators.Accumulator; +import org.apache.flink.api.common.functions.util.RuntimeUDFContext; +import org.apache.flink.types.Record; +import org.junit.Assert; +import org.junit.Test; + +/** + * Tests runtime context access from inside an RichInputFormat class + */ +public class RichInputFormatTest { + + @Test + public void testCheckRuntimeContextAccess() { + final SerializedInputFormatRecord inputFormat = new SerializedInputFormatRecord(); --- End diff -- We are trying to reduce the use of the `Record` data type which is an artifact of a deprecated API. Could you replace it by another data type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36352779 --- Diff: flink-core/src/test/java/org/apache/flink/api/common/io/RichOutputFormatTest.java --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.flink.api.common.io; + +import java.util.HashMap; + +import org.apache.flink.api.common.ExecutionConfig; +import org.apache.flink.api.common.accumulators.Accumulator; +import org.apache.flink.api.common.functions.util.RuntimeUDFContext; +import org.apache.flink.types.Record; +import org.junit.Assert; +import org.junit.Test; + +/** + * Tests runtime context access from inside an RichOutputFormat class + */ +public class RichOutputFormatTest { + + @Test + public void testCheckRuntimeContextAccess() { + final SerializedOutputFormatRecord inputFormat = new SerializedOutputFormatRecord(); --- End diff -- Please replace `Record` data type. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36355211 --- Diff: flink-staging/flink-jdbc/src/main/java/org/apache/flink/api/java/record/io/jdbc/JDBCOutputFormat.java --- @@ -40,7 +40,7 @@ import org.apache.flink.types.StringValue; import org.apache.flink.types.Value; -public class JDBCOutputFormat implements OutputFormatRecord { --- End diff -- Class has been removed by a recent commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36355327 --- Diff: flink-tests/src/test/java/org/apache/flink/test/classloading/jar/CustomInputSplitProgram.java --- @@ -65,7 +65,7 @@ public static void main(String[] args) throws Exception { } // - public static final class CustomInputFormat implements InputFormatInteger, CustomInputSplit, ResultTypeQueryableInteger { + public static final class CustomInputFormat extends RichInputFormatInteger, CustomInputSplit implements ResultTypeQueryableInteger { --- End diff -- This is a test class which does not need to be adapted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36355535 --- Diff: flink-staging/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapred/record/HadoopRecordInputFormat.java --- @@ -37,7 +37,7 @@ import org.apache.hadoop.mapred.RecordReader; import org.apache.hadoop.util.ReflectionUtils; -public class HadoopRecordInputFormatK, V implements InputFormatRecord, HadoopInputSplit { +public class HadoopRecordInputFormatK, V extends RichInputFormatRecord, HadoopInputSplit { --- End diff -- Class will be removed soon. Not strictly necessary to adapt it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r3630 --- Diff: flink-staging/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapred/record/HadoopRecordOutputFormat.java --- @@ -37,7 +37,7 @@ import org.apache.hadoop.util.ReflectionUtils; -public class HadoopRecordOutputFormatK,V implements OutputFormatRecord { +public class HadoopRecordOutputFormatK,V extends RichOutputFormatRecord { --- End diff -- Class will be removed soon. Not strictly necessary to adapt it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36357066 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/ReplicatingInputFormat.java --- @@ -59,7 +60,7 @@ * @see org.apache.flink.api.common.operators.base.FilterOperatorBase * @see org.apache.flink.api.common.operators.base.MapPartitionOperatorBase */ -public final class ReplicatingInputFormatOT, S extends InputSplit implements InputFormatOT, S { +public final class ReplicatingInputFormatOT, S extends InputSplit extends RichInputFormatOT, S { --- End diff -- The `ReplicatingInputFormat` can wrap any kind of `InputFormat` (with or without RuntimeContext). Therefore, it needs to extend `RichInputFormat` but overwrite the `setRuntimeContext()` method and forward the context to its wrapped IF, if this is also a `RichInputFormat`. Please check the updated JavaDocs as well. The wrapped IF does not need to be rich. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-128157289 Hi @sachingoel0101, Thanks for the update! The PR looks good in general. Regarding the interface vs. abstract class issue, I am a bit undecided with a slight preference towards the abstract class. However, I agree with @StephanEwen that `RichInputFormat` and `RichOutputFormat` are not good names. How about `ContextInputFormat` and `ContextOutputFormat`? Any other opinions on that matter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-128230692 I'm inclined towards the Abstract class too. If we're saying something is context aware, we better provide access to it IMO. It doesn't make sense to ask the user to write it themselves and guarantee that we'd just call the relevant methods at runtime. As for the names, wouldn't it be better to maintain consistency in naming? All the * rich * functions are named `RichFunctions`, which is why I decided to name these `RichFormat`s too. I do agree though. The IO formats are already pretty * rich * in the sense that they allow the user to control everything that happens at runtime. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-127581951 Okay I guess we could leave the proper usage of `RuntimeContext` to the user. We make it explicit that the `RuntimeContext` will not be available in `configure` in the documentation, and if someone is overriding the `open` method, they likely know it'll be called several times and can take care of not performing, for example, duplicate `accumulator` additions. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-127627512 Build failure is on unrelated tests. The recent error on `PartitionedStateCheckpointingITCase` and the other on `KafkaITCase`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-127512155 I am pretty much against `postConfigure()`. There are so many life cycle methods already, we should really not make it more complicated (I think even `configure()` is a problem, but that's an artifact). It is not so hard to make it right in the `open()` method, and there are methods that getOrCreate accumulators, afaik. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36136321 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/DiscardingOutputFormat.java --- @@ -27,7 +27,7 @@ * * @param T The type of the elements accepted by the output format. */ -public class DiscardingOutputFormatT implements OutputFormatT { +public class DiscardingOutputFormatT extends RichOutputFormatT { --- End diff -- Doesn't need to access the RuntimeContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36142648 --- Diff: flink-core/src/test/java/org/apache/flink/api/common/operators/util/NonRichGenericInputFormat.java --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.flink.api.common.operators.util; + +import java.io.IOException; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.NonParallelInput; +import org.apache.flink.api.common.io.DefaultInputSplitAssigner; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.GenericInputSplit; + +/** + * Generic base class for all inputs that are not based on files. + * This is copied from {@link org.apache.flink.api.common.io.GenericInputFormat} + * This class doesn't provide access to RuntimeContext. + */ +public abstract class NonRichGenericInputFormatOT implements InputFormatOT, GenericInputSplit { --- End diff -- This was only for testing purposes. I needed to verify that I didn't break any functionality by making every IO format rich. That is, to verify the class casting behavior in the Pact tasks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-127433865 @fhueske, in the life cycle of an Input format, the `open` method is called several times in each parallel instance of the `DataSource` operator. Consider for example, the simple example of adding an accumulator. Since access to `RuntimeContext` is enabled only after configure, we will end up adding the same accumulator several times, which of course would fail. This is why I decided to add a `postConfigure` method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36136514 --- Diff: flink-java/src/test/java/org/apache/flink/api/java/type/extractor/TypeExtractorInputFormatsTest.java --- @@ -130,7 +131,7 @@ public void testQueryableFormatType() { // Test formats // - public static final class DummyFloatInputFormat implements InputFormatFloat, InputSplit { + public static final class DummyFloatInputFormat extends RichInputFormatFloat, InputSplit { --- End diff -- Test input/output formats don't need access to the RuntimeContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36136228 --- Diff: flink-core/src/test/java/org/apache/flink/api/common/operators/util/NonRichGenericInputFormat.java --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.flink.api.common.operators.util; + +import java.io.IOException; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.NonParallelInput; +import org.apache.flink.api.common.io.DefaultInputSplitAssigner; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.GenericInputSplit; + +/** + * Generic base class for all inputs that are not based on files. + * This is copied from {@link org.apache.flink.api.common.io.GenericInputFormat} + * This class doesn't provide access to RuntimeContext. + */ +public abstract class NonRichGenericInputFormatOT implements InputFormatOT, GenericInputSplit { --- End diff -- I don't think we need this class. Why would somebody need a generic input format without RuntimeContext, if you can have one with RC for free? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-127430006 @fhueske , what do we settle on then? Should I keep it as abstract class or change it to an interface? Also, which classes need to be Rich? Can I assume that the ones you didn't comment on are okay? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-127415936 Hi @sachingoel0101, thanks for opening this PR! I see your point that users would have to implement `setRuntimeContext()` and `getRuntimeContext()` if we would use an interface. However, I am not sure that this would affect many people because we could implement the methods in all our base Input/OutputFormats which should cover a large portion of new formats. Also, these methods can be implemented in less than a minute. I think @StephanEwen has a point that all Input/OutputFormats are always quite rich. I would opt for the interface solution but implement it for all relevant Input/OutputFormats that Flink provides. I would also remove the `postConfigure()` method in favor of a clear documentation of the interface stating that the `RuntimeContext`is available in `open()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36142853 --- Diff: flink-core/src/test/java/org/apache/flink/api/common/operators/util/NonRichGenericInputFormat.java --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.flink.api.common.operators.util; + +import java.io.IOException; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.NonParallelInput; +import org.apache.flink.api.common.io.DefaultInputSplitAssigner; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.GenericInputSplit; + +/** + * Generic base class for all inputs that are not based on files. + * This is copied from {@link org.apache.flink.api.common.io.GenericInputFormat} + * This class doesn't provide access to RuntimeContext. + */ +public abstract class NonRichGenericInputFormatOT implements InputFormatOT, GenericInputSplit { --- End diff -- Ah, sorry. Didn't notice it is in test scope. Good have it for tests! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36136714 --- Diff: flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java --- @@ -307,7 +307,7 @@ public void flatMap(String value, CollectorInteger out) throws Exception { } - private static class WaitingOutputFormat implements OutputFormatInteger { + private static class WaitingOutputFormat extends RichOutputFormatInteger { --- End diff -- Test OF does not use the RuntimeContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36135989 --- Diff: flink-tests/src/test/java/org/apache/flink/test/recordJobs/util/DiscardingOutputFormat.java --- @@ -19,14 +19,14 @@ package org.apache.flink.test.recordJobs.util; -import org.apache.flink.api.common.io.OutputFormat; +import org.apache.flink.api.common.io.RichOutputFormat; import org.apache.flink.configuration.Configuration; import org.apache.flink.types.Record; /** * A simple output format that discards all data by doing nothing. */ -public class DiscardingOutputFormat implements OutputFormatRecord { +public class DiscardingOutputFormat extends RichOutputFormatRecord { --- End diff -- I don't think the DiscardingOutputFormat needs to access the RuntimeContext --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36136676 --- Diff: flink-staging/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteStreamExample.java --- @@ -80,7 +80,7 @@ public void cancel() { * This class implements an OutputFormat for HBase * */ - private static class HBaseOutputFormat implements OutputFormatString { + private static class HBaseOutputFormat extends RichOutputFormatString { --- End diff -- No need to update this class if the example doesn't use the RuntimeContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/966#discussion_r36143535 --- Diff: flink-core/src/test/java/org/apache/flink/api/common/operators/util/NonRichGenericInputFormat.java --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.flink.api.common.operators.util; + +import java.io.IOException; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.NonParallelInput; +import org.apache.flink.api.common.io.DefaultInputSplitAssigner; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.GenericInputSplit; + +/** + * Generic base class for all inputs that are not based on files. + * This is copied from {@link org.apache.flink.api.common.io.GenericInputFormat} + * This class doesn't provide access to RuntimeContext. + */ +public abstract class NonRichGenericInputFormatOT implements InputFormatOT, GenericInputSplit { --- End diff -- No problem! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-127037112 @StephanEwen, it would be much simpler to have an interface. But then we leave the part about implementing the `setRuntimeContext` and `getRuntimeContext` to the user. IMO that's not a very good option. So it makes to extend rich formats from abstract Rich formats. Doing that, I couldn't figure out a better solution if we want to make all existing IO formats rich. To sum it up: 1. If we only defined an interface, we leave the implementation of `setter` and `getter` methods to the user. 2. If we write a Rich abstract class, we have to make all existing IO formats rich since the user might want to extend one of these, and of course can't extend two classes. If there's a way to achieve this, let me know. I might have missed something. As for the `postConfigure` method, it was to guarantee that the `RuntimeContext` will be available now. I tried initializing the context in `DataSourceTask` and `DataSinkTask` before calling `configure`, but it seems the format is re-initialized somewhere as that information is lost. I could remove this function and we can just initialize `counters` etc. in `open` call. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-127032575 What is the `postConfigure` method for? Input and output formats have already the `open()` method, that is called after `configure()`. The `configure()` method itself is somewhat of a baggage from the past, when all code was shipped as classes + configuration, rather than via closures. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-127032323 Would not not be much simpler to add an interface `RuntimeContextFormat` with a method (`setRuntimeContext()`) and call this method on the formats that implement this interface? That would be three changed classes, done. I am also a bit sceptical about the difference between input format and rich input format. For transformation functions the difference between Function (a lambda) and RichFunction (many mode methods) makes sense. Input formats are per-se pretty rich... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-126705428 Thanks for the pull request! You did not convert all InputFormats and OutputFormats. There are a few more, e.g. HadoopInputFormatBase or JDBCInputFormat. Other than that, your pull request looks good. I think it is fine to keep the notion of normal and rich, just like we do for the operator user functions. That also means that we do not break the existing interfaces. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-126706999 Ah yes. I'll update them in a while. There's actually some problem with the unit test I've written too. Travis fails sporadically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/966 [FLINK-1819][core]Allow access to RuntimeContext from Input and Output formats 1. Introduces new Rich Input and Output formats, similar to Rich Functions. 2. Makes all existing input and output formats rich, without any API breaking changes. 3. Provides RuntimeContext to GenericSink and GenericSource operators 4. DataSourceTask and DataSinkTask set DistributedUDFContext to their input and output formats respectively. 5. Complete access to RuntimeContext, including access to accumulators. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachingoel0101/flink flink-1819 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/966.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #966 commit 627ee290fb88ac3e670fe86a734116b06f431ade Author: Sachin Goel sachingoel0...@gmail.com Date: 2015-07-31T04:36:20Z [FLINK-1819][core]Allow access to RuntimeContext from Input and Output formats --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---