[jira] [Created] (FLINK-11007) Update documentation to describe new checkpoint metadata file behavior
Josh Lemer created FLINK-11007: -- Summary: Update documentation to describe new checkpoint metadata file behavior Key: FLINK-11007 URL: https://issues.apache.org/jira/browse/FLINK-11007 Project: Flink Issue Type: Task Components: Documentation Reporter: Josh Lemer In the [documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/checkpointing.html#state-checkpoints-dir] about checkpointing, it is explained that you must set the config file setting `state.checkpoints.dir` to specify the directory where checkpoint metadata files will be stored. This is no longer the case, and apparently now checkpoint metadata files are stored in the checkpoint data directory itself. This should be updated in the docs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9221) Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]
[ https://issues.apache.org/jira/browse/FLINK-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531680#comment-16531680 ] Josh Lemer commented on FLINK-9221: --- [~yanghua]You may be right. One reason why it may be wrong to put the method on `SinkFunction` is that the opposite methods (`map`, `flatMap`) are not on `SourceFunction` or `MapFunction`, but instead are on `DataStream`. However, there isn't really any other abstraction in flink where contramap can go on, since there's not an equivalent `DataSink` abstraction like there is in Akka Streams or similar, so putting it on SinkFunction may be the best option unless we come up with a proper `DataSink` abstraction. > Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B] > --- > > Key: FLINK-9221 > URL: https://issues.apache.org/jira/browse/FLINK-9221 > Project: Flink > Issue Type: Task > Components: DataSet API, DataStream API >Affects Versions: 1.5.0 >Reporter: Josh Lemer >Assignee: vinoyang >Priority: Minor > Labels: flink > > Just like it is very useful to use `DataStream[T]` as a sort of Functor or > Monad with `map`/`flatMap`/`filter` methods, it would be extremely handy to > have a `SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]` on > `SinkFunctions`, so that you can reuse existing complex sink functions, but > with a different input type. For example: > {code} > val bucketingStringSink: SinkFunction[String] = > new BucketingSink[String]("...") > .setBucketer(new DateTimeBucketer("-MM-dd-HHmm") > val bucketingIntListSink: SinkFunction[List[Int]] = > bucketingStringSink.contramap[List[Int]](_.mkString(",")) > {code} > For some more formal motivation behind this, > https://typelevel.org/cats/typeclasses/contravariant.html is definitely a > great place to start! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9396) Allow streaming sources (especially Files, Kafka) to set max queued input elements
Josh Lemer created FLINK-9396: - Summary: Allow streaming sources (especially Files, Kafka) to set max queued input elements Key: FLINK-9396 URL: https://issues.apache.org/jira/browse/FLINK-9396 Project: Flink Issue Type: Task Components: DataStream API Reporter: Josh Lemer In some applications, there can be situations where a file source queues up so many files, that when it is time for a checkpoint to start, the checkpoint takes way too long. It would be nice if we could limit the total queued files to read to some number, to place an upper bound on how long checkpoints take. See http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fwd-Decrease-initial-source-read-speed-td20207.html for previous questions about this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9385) Operators with two inputs should show "Records Received" in Web UI separately, rather than added together
[ https://issues.apache.org/jira/browse/FLINK-9385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Lemer updated FLINK-9385: -- Component/s: Webfrontend Web Client Metrics > Operators with two inputs should show "Records Received" in Web UI > separately, rather than added together > - > > Key: FLINK-9385 > URL: https://issues.apache.org/jira/browse/FLINK-9385 > Project: Flink > Issue Type: Task > Components: Metrics, Web Client, Webfrontend >Reporter: Josh Lemer >Priority: Major > > In the Flink Web UI, there is a column in the Subtasks information view which > shows how many records each subtask has received. However, for subtasks such > as CoProcess operators which take two inputs, the number shown for "Records > Received" is the sum of the records received from the two upstream > datastreams. It would be much more helpful if it displayed the number of > records received from each of the two separately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9385) Operators with two inputs should show "Records Received" in Web UI separately, rather than added together
Josh Lemer created FLINK-9385: - Summary: Operators with two inputs should show "Records Received" in Web UI separately, rather than added together Key: FLINK-9385 URL: https://issues.apache.org/jira/browse/FLINK-9385 Project: Flink Issue Type: Task Reporter: Josh Lemer In the Flink Web UI, there is a column in the Subtasks information view which shows how many records each subtask has received. However, for subtasks such as CoProcess operators which take two inputs, the number shown for "Records Received" is the sum of the records received from the two upstream datastreams. It would be much more helpful if it displayed the number of records received from each of the two separately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9264) Add method Writer[A]#contramap[B](f: B => A): Writer[B]
Josh Lemer created FLINK-9264: - Summary: Add method Writer[A]#contramap[B](f: B => A): Writer[B] Key: FLINK-9264 URL: https://issues.apache.org/jira/browse/FLINK-9264 Project: Flink Issue Type: Task Components: FileSystem Affects Versions: 1.5.0 Reporter: Josh Lemer Similar to https://issues.apache.org/jira/browse/FLINK-9221, it would be very handy to have a `Add method Writer[A]#contramap[B](f: B => A): Writer[B]` method, which would allow reuse of existing writers, with a "formatting" function placed infront of it. For example: val stringWriter = new StringWriter[String]() val intWriter: Writer[Int] = stringWriter.contraMap[Int](_.toString) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9221) Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]
[ https://issues.apache.org/jira/browse/FLINK-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Lemer updated FLINK-9221: -- Component/s: DataSet API > Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B] > --- > > Key: FLINK-9221 > URL: https://issues.apache.org/jira/browse/FLINK-9221 > Project: Flink > Issue Type: Task > Components: DataSet API, DataStream API >Affects Versions: 1.5.0 >Reporter: Josh Lemer >Priority: Minor > Labels: flink > > Just like it is very useful to use `DataStream[T]` as a sort of Functor or > Monad with `map`/`flatMap`/`filter` methods, it would be extremely handy to > have a `SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]` on > `SinkFunctions`, so that you can reuse existing complex sink functions, but > with a different input type. For example: > {code} > val bucketingStringSink: SinkFunction[String] = > new BucketingSink[String]("...") > .setBucketer(new DateTimeBucketer("-MM-dd-HHmm") > val bucketingIntListSink: SinkFunction[List[Int]] = > bucketingStringSink.contramap[List[Int]](_.mkString(",")) > {code} > For some more formal motivation behind this, > https://typelevel.org/cats/typeclasses/contravariant.html is definitely a > great place to start! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9221) Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]
[ https://issues.apache.org/jira/browse/FLINK-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444250#comment-16444250 ] Josh Lemer commented on FLINK-9221: --- Fixed! > Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B] > --- > > Key: FLINK-9221 > URL: https://issues.apache.org/jira/browse/FLINK-9221 > Project: Flink > Issue Type: Task > Components: DataStream API >Affects Versions: 1.5.0 >Reporter: Josh Lemer >Priority: Minor > Labels: flink > > Just like it is very useful to use `DataStream[T]` as a sort of Functor or > Monad with `map`/`flatMap`/`filter` methods, it would be extremely handy to > have a `SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]` on > `SinkFunctions`, so that you can reuse existing complex sink functions, but > with a different input type. For example: > {code} > val bucketingStringSink: SinkFunction[String] = > new BucketingSink[String]("...") > .setBucketer(new DateTimeBucketer("-MM-dd-HHmm") > val bucketingIntListSink: SinkFunction[List[Int]] = > bucketingStringSink.contramap[List[Int]](_.mkString(",")) > {code} > For some more formal motivation behind this, > https://typelevel.org/cats/typeclasses/contravariant.html is definitely a > great place to start! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9221) Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]
[ https://issues.apache.org/jira/browse/FLINK-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Lemer updated FLINK-9221: -- Labels: flink (was: ) > Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B] > --- > > Key: FLINK-9221 > URL: https://issues.apache.org/jira/browse/FLINK-9221 > Project: Flink > Issue Type: Task > Components: DataStream API >Affects Versions: 1.5.0 >Reporter: Josh Lemer >Priority: Minor > Labels: flink > > Just like it is very useful to use `DataStream[T]` as a sort of Functor or > Monad with `map`/`flatMap`/`filter` methods, it would be extremely handy to > have a `SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]` on > `SinkFunctions`, so that you can reuse existing complex sink functions, but > with a different input type. For example: > {code} > val bucketingStringSink: SinkFunction[String] = > new BucketingSink[String]("...") > .setBucketer(new DateTimeBucketer("-MM-dd-HHmm") > val bucketingIntListSink: SinkFunction[List[Int]] = > bucketingStringSink.contramap[List[Int]](_.mkString(",")) > {code} > For some more formal motivation behind this, > https://typelevel.org/cats/typeclasses/contravariant.html is definitely a > great place to start! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9221) Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]
[ https://issues.apache.org/jira/browse/FLINK-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Lemer updated FLINK-9221: -- Affects Version/s: 1.5.0 > Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B] > --- > > Key: FLINK-9221 > URL: https://issues.apache.org/jira/browse/FLINK-9221 > Project: Flink > Issue Type: Task > Components: DataStream API >Affects Versions: 1.5.0 >Reporter: Josh Lemer >Priority: Minor > Labels: flink > > Just like it is very useful to use `DataStream[T]` as a sort of Functor or > Monad with `map`/`flatMap`/`filter` methods, it would be extremely handy to > have a `SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]` on > `SinkFunctions`, so that you can reuse existing complex sink functions, but > with a different input type. For example: > {code} > val bucketingStringSink: SinkFunction[String] = > new BucketingSink[String]("...") > .setBucketer(new DateTimeBucketer("-MM-dd-HHmm") > val bucketingIntListSink: SinkFunction[List[Int]] = > bucketingStringSink.contramap[List[Int]](_.mkString(",")) > {code} > For some more formal motivation behind this, > https://typelevel.org/cats/typeclasses/contravariant.html is definitely a > great place to start! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9221) Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]
[ https://issues.apache.org/jira/browse/FLINK-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Lemer updated FLINK-9221: -- Component/s: DataStream API > Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B] > --- > > Key: FLINK-9221 > URL: https://issues.apache.org/jira/browse/FLINK-9221 > Project: Flink > Issue Type: Task > Components: DataStream API >Affects Versions: 1.5.0 >Reporter: Josh Lemer >Priority: Minor > Labels: flink > > Just like it is very useful to use `DataStream[T]` as a sort of Functor or > Monad with `map`/`flatMap`/`filter` methods, it would be extremely handy to > have a `SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]` on > `SinkFunctions`, so that you can reuse existing complex sink functions, but > with a different input type. For example: > {code} > val bucketingStringSink: SinkFunction[String] = > new BucketingSink[String]("...") > .setBucketer(new DateTimeBucketer("-MM-dd-HHmm") > val bucketingIntListSink: SinkFunction[List[Int]] = > bucketingStringSink.contramap[List[Int]](_.mkString(",")) > {code} > For some more formal motivation behind this, > https://typelevel.org/cats/typeclasses/contravariant.html is definitely a > great place to start! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9221) Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]
[ https://issues.apache.org/jira/browse/FLINK-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Lemer updated FLINK-9221: -- Description: Just like it is very useful to use `DataStream[T]` as a sort of Functor or Monad with `map`/`flatMap`/`filter` methods, it would be extremely handy to have a `SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]` on `SinkFunctions`, so that you can reuse existing complex sink functions, but with a different input type. For example: {code} val bucketingStringSink: SinkFunction[String] = new BucketingSink[String]("...") .setBucketer(new DateTimeBucketer("-MM-dd-HHmm") val bucketingIntListSink: SinkFunction[List[Int]] = bucketingStringSink.contramap[List[Int]](_.mkString(",")) {code} For some more formal motivation behind this, https://typelevel.org/cats/typeclasses/contravariant.html is definitely a great place to start! was: Just like it is very useful to use `DataStream[T]` as a sort of Functor or Monad with `map`/`flatMap`/`filter` methods, it would be extremely handy to have a `SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]` on `SinkFunctions`, so that you can reuse existing complex sink functions, but with a different input type. For example: {code} val bucketingStringSink: SinkFunction[String] = new BucketingSink[String]("...") .setBucketr(new DateTimeBucketer("-MM-dd-HHmm") val bucketingIntListSink: SinkFunction[List[Int]] = bucketingStringSink.contramap[List[Int]](_.mkString(",")) {code} For some more formal motivation behind this, https://typelevel.org/cats/typeclasses/contravariant.html is definitely a great place to start! > Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B] > --- > > Key: FLINK-9221 > URL: https://issues.apache.org/jira/browse/FLINK-9221 > Project: Flink > Issue Type: Task >Reporter: Josh Lemer >Priority: Minor > > Just like it is very useful to use `DataStream[T]` as a sort of Functor or > Monad with `map`/`flatMap`/`filter` methods, it would be extremely handy to > have a `SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]` on > `SinkFunctions`, so that you can reuse existing complex sink functions, but > with a different input type. For example: > {code} > val bucketingStringSink: SinkFunction[String] = > new BucketingSink[String]("...") > .setBucketer(new DateTimeBucketer("-MM-dd-HHmm") > val bucketingIntListSink: SinkFunction[List[Int]] = > bucketingStringSink.contramap[List[Int]](_.mkString(",")) > {code} > For some more formal motivation behind this, > https://typelevel.org/cats/typeclasses/contravariant.html is definitely a > great place to start! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9221) Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]
Josh Lemer created FLINK-9221: - Summary: Add method SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B] Key: FLINK-9221 URL: https://issues.apache.org/jira/browse/FLINK-9221 Project: Flink Issue Type: Task Reporter: Josh Lemer Just like it is very useful to use `DataStream[T]` as a sort of Functor or Monad with `map`/`flatMap`/`filter` methods, it would be extremely handy to have a `SinkFunction[A]#contramap[B](f: B => A): SinkFunction[B]` on `SinkFunctions`, so that you can reuse existing complex sink functions, but with a different input type. For example: {code} val bucketingStringSink: SinkFunction[String] = new BucketingSink[String]("...") .setBucketr(new DateTimeBucketer("-MM-dd-HHmm") val bucketingIntListSink: SinkFunction[List[Int]] = bucketingStringSink.contramap[List[Int]](_.mkString(",")) {code} For some more formal motivation behind this, https://typelevel.org/cats/typeclasses/contravariant.html is definitely a great place to start! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-7484) CaseClassSerializer.duplicate() does not perform proper deep copy
[ https://issues.apache.org/jira/browse/FLINK-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439956#comment-16439956 ] Josh Lemer commented on FLINK-7484: --- Hey there folks, are we all sure that this issue has been entirely fixed? I am getting very similar errors when using `ValueState[scala.collection.mutable.PriorityQueue[(SomeKryoSerializedThing, Long, scala.collection.mutable.BitSet)]` with the following stack trace. This ONLY happens when async snapshots are enabled using the FileSystem State Backend. RocksDB works fine with async snapshots: {code:java} java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.set(ArrayList.java:448) at com.esotericsoftware.kryo.util.MapReferenceResolver.setReadObject(MapReferenceResolver.java:56) at com.esotericsoftware.kryo.Kryo.reference(Kryo.java:875) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:710) at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:189) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:101) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:32) at org.apache.flink.api.scala.typeutils.TraversableSerializer$$anonfun$copy$1.apply(TraversableSerializer.scala:69) at org.apache.flink.api.scala.typeutils.TraversableSerializer$$anonfun$copy$1.apply(TraversableSerializer.scala:69) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.flink.api.scala.typeutils.TraversableSerializer.copy(TraversableSerializer.scala:69) at org.apache.flink.api.scala.typeutils.TraversableSerializer.copy(TraversableSerializer.scala:33) at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:282) at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:306) at org.apache.flink.runtime.state.heap.HeapValueState.value(HeapValueState.java:55) at net.districtm.segmentsync.processing.JoinSegmentMappingWithSegmentAssignments.enqueueSegmentAssignment(JoinSegmentMappingWithSegmentAssignments.scala:104) at net.districtm.segmentsync.processing.JoinSegmentMappingWithSegmentAssignments.processElement2(JoinSegmentMappingWithSegmentAssignments.scala:218) at net.districtm.segmentsync.processing.JoinSegmentMappingWithSegmentAssignments.processElement2(JoinSegmentMappingWithSegmentAssignments.scala:77) at org.apache.flink.streaming.api.operators.co.KeyedCoProcessOperator.processElement2(KeyedCoProcessOperator.java:86) at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:270) at org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:91) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) at java.lang.Thread.run(Thread.java:748) 04/16/2018 19:37:54 Job execution switched to status FAILING. java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.set(ArrayList.java:448) at com.esotericsoftware.kryo.util.MapReferenceResolver.setReadObject(MapReferenceResolver.java:56) at com.esotericsoftware.kryo.Kryo.reference(Kryo.java:875) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:710) at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:189) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:101) at org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:32) at org.apache.flink.api.scala.typeutils.TraversableSerializer$$anonfun$copy$1.apply(TraversableSerializer.scala:69) at org.apache.flink.api.scala.typeutils.TraversableSerializer$$anonfun$copy$1.apply(TraversableSerializer.scala:69) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.flink.api.scala.typeutils.TraversableSerializer.copy(TraversableSerializer.scala:69) at org.apache.flink.api.scala.typeutils.TraversableSerializer.copy(TraversableSerializer.scala:33) at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:282) at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:306) at org.apache.flink.runtime.state.heap.HeapValueState.value(HeapValueState.java:55) at
[jira] [Created] (FLINK-8440) Create Gitter Chanel for Flink users
Josh Lemer created FLINK-8440: - Summary: Create Gitter Chanel for Flink users Key: FLINK-8440 URL: https://issues.apache.org/jira/browse/FLINK-8440 Project: Flink Issue Type: Task Reporter: Josh Lemer Could we get a [www.gitter.im|http://www.gitter.im/] channel for Flink Users (and/or contributors) to ask questions in? Lots of people in lots of projects online get a lot out of having a chat room, and the IRC channel is pretty dead and of course has pretty terrible user experience (no history, etc). Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)