[GitHub] flink pull request: FLINK-3579 Improve String concatenation (Ram)
Github user ramkrish86 commented on the pull request: https://github.com/apache/flink/pull/1821#issuecomment-205139553 I have done the changes only to support the String concat operation. The cast() and abs() grammar change looks more complex and I need to understand things better in parser to support it. Suggest we do it in a seperate JIRA? Is that fine @tillrohrmann ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3579) Improve String concatenation
[ https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223674#comment-15223674 ] ASF GitHub Bot commented on FLINK-3579: --- Github user ramkrish86 commented on the pull request: https://github.com/apache/flink/pull/1821#issuecomment-205139553 I have done the changes only to support the String concat operation. The cast() and abs() grammar change looks more complex and I need to understand things better in parser to support it. Suggest we do it in a seperate JIRA? Is that fine @tillrohrmann ? > Improve String concatenation > > > Key: FLINK-3579 > URL: https://issues.apache.org/jira/browse/FLINK-3579 > Project: Flink > Issue Type: Bug > Components: Table API >Reporter: Timo Walther >Assignee: ramkrishna.s.vasudevan >Priority: Minor > > Concatenation of a String and non-String does not work properly. > e.g. {{f0 + 42}} leads to RelBuilder Exception > ExpressionParser does not like {{f0 + 42.cast(STRING)}} either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3692) Develop a Kafka state backend
Elias Levy created FLINK-3692: - Summary: Develop a Kafka state backend Key: FLINK-3692 URL: https://issues.apache.org/jira/browse/FLINK-3692 Project: Flink Issue Type: New Feature Components: Core Reporter: Elias Levy Flink clusters usually consume of a Kafka cluster. It simplify operations if Flink could store its state checkpoints in Kafka. This should be possibly using different topics to write to, partitioning appropriately, and using compacted topics. This would avoid the need to run an HDFS cluster just to store Flink checkpoints. For inspiration you may want to take a look at how Samza checkpoints a task's local state to a Kafka topic, and how the newer Kafka consumers checkpoint their offsets to Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3035) Redis as State Backend
[ https://issues.apache.org/jira/browse/FLINK-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223577#comment-15223577 ] Elias Levy commented on FLINK-3035: --- This seems like a bad idea. Regis is not a strongly consistent distributed data store, which is what you want to securely store your state snapshots. > Redis as State Backend > -- > > Key: FLINK-3035 > URL: https://issues.apache.org/jira/browse/FLINK-3035 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Matthias J. Sax >Assignee: Subhobrata Dey >Priority: Minor > > Add Redis as a state backend for distributed snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2375) Add Approximate Adamic Adar Similarity method using BloomFilters
[ https://issues.apache.org/jira/browse/FLINK-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223490#comment-15223490 ] ASF GitHub Bot commented on FLINK-2375: --- Github user shghatge closed the pull request at: https://github.com/apache/flink/pull/923 > Add Approximate Adamic Adar Similarity method using BloomFilters > > > Key: FLINK-2375 > URL: https://issues.apache.org/jira/browse/FLINK-2375 > Project: Flink > Issue Type: Task > Components: Gelly >Reporter: Shivani Ghatge >Assignee: Shivani Ghatge >Priority: Minor > > Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a > set of nodes. However, instead of counting the common neighbors and dividing > them by the total number of neighbors, the similarity is weighted according > to the vertex degrees. In particular, it's equal to log(1/numberOfEdges). > The Adamic-Adar algorithm can be broken into three steps: > 1). For each vertex, compute the log of its inverse degrees (with the formula > above) and set it as the vertex value. > 2). Each vertex will then send this new computed value along with a list of > neighbors to the targets of its out-edges > 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of > log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is > the degree of node n). See [2] > Using BloomFilters we increase the scalability of the algorithm. The values > calculated for the edges will be approximate. > Prerequisites: > Full understanding of the Jaccard Similarity Measure algorithm > Reading the associated literature: > [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf > [2] > http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2310) Add an Adamic-Adar Similarity example
[ https://issues.apache.org/jira/browse/FLINK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223491#comment-15223491 ] ASF GitHub Bot commented on FLINK-2310: --- Github user shghatge closed the pull request at: https://github.com/apache/flink/pull/892 > Add an Adamic-Adar Similarity example > - > > Key: FLINK-2310 > URL: https://issues.apache.org/jira/browse/FLINK-2310 > Project: Flink > Issue Type: Task > Components: Gelly >Reporter: Andra Lungu >Assignee: Shivani Ghatge >Priority: Minor > > Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a > set of nodes. However, instead of counting the common neighbors and dividing > them by the total number of neighbors, the similarity is weighted according > to the vertex degrees. In particular, it's equal to log(1/numberOfEdges). > The Adamic-Adar algorithm can be broken into three steps: > 1). For each vertex, compute the log of its inverse degrees (with the formula > above) and set it as the vertex value. > 2). Each vertex will then send this new computed value along with a list of > neighbors to the targets of its out-edges > 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of > log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is > the degree of node n). See [2] > Prerequisites: > - Full understanding of the Jaccard Similarity Measure algorithm > - Reading the associated literature: > [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf > [2] > http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2375] Add Approximate Adamic Adar Simil...
Github user shghatge closed the pull request at: https://github.com/apache/flink/pull/923 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2310] Add an Adamic Adar Similarity exa...
Github user shghatge closed the pull request at: https://github.com/apache/flink/pull/892 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (FLINK-3669) WindowOperator registers a lot of timers at StreamTask
[ https://issues.apache.org/jira/browse/FLINK-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223402#comment-15223402 ] Konstantin Knauf edited comment on FLINK-3669 at 4/3/16 5:32 PM: - I tried to implement the 1.. The hard part, I think, is to store the {{ScheduledFuture}}s to in the {{WindowOperator}}. We could use a {{Map}}, but this only works if we also implement timer coalescing. Besides that I am unsure about how to de/serialize this map. Could you give me pointer? Edit. Ok, serializing the future does not make sense, so it could not be checkpointed, which is not necessarily a problem, I think. Additionally, this should be a {{Map }} instead. was (Author: knaufk): I tried to implement the 1.. The hard part, I think, is to store the {{ScheduledFuture}}s to in the {{WindowOperator}}. We could use a {{Map }}, but this only works if we also implement timer coalescing. Besides that I am unsure about how to de/serialize this map. Could you give me pointer? > WindowOperator registers a lot of timers at StreamTask > -- > > Key: FLINK-3669 > URL: https://issues.apache.org/jira/browse/FLINK-3669 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 1.0.1 >Reporter: Aljoscha Krettek >Priority: Blocker > > Right now, the WindowOperator registers a timer at the StreamTask for every > processing-time timer that a Trigger registers. We should combine several > registered trigger timers to only register one low-level timer (timer > coalescing). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3669) WindowOperator registers a lot of timers at StreamTask
[ https://issues.apache.org/jira/browse/FLINK-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223402#comment-15223402 ] Konstantin Knauf commented on FLINK-3669: - I tried to implement the 1.. The hard part, I think, is to store the {{ScheduledFuture}}s to in the {{WindowOperator}}. We could use a {{Map}}, but this only works if we also implement timer coalescing. Besides that I am unsure about how to de/serialize this map. Could you give me pointer? > WindowOperator registers a lot of timers at StreamTask > -- > > Key: FLINK-3669 > URL: https://issues.apache.org/jira/browse/FLINK-3669 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 1.0.1 >Reporter: Aljoscha Krettek >Priority: Blocker > > Right now, the WindowOperator registers a timer at the StreamTask for every > processing-time timer that a Trigger registers. We should combine several > registered trigger timers to only register one low-level timer (timer > coalescing). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'
[ https://issues.apache.org/jira/browse/FLINK-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223375#comment-15223375 ] ASF GitHub Bot commented on FLINK-3657: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/1829#issuecomment-205001401 +1 for merging to master. But would love to isolate the changes to exclude refactoring changes if possible > Change access of DataSetUtils.countElements() to 'public' > -- > > Key: FLINK-3657 > URL: https://issues.apache.org/jira/browse/FLINK-3657 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.0.0 >Reporter: Suneel Marthi >Assignee: Suneel Marthi >Priority: Minor > Fix For: 1.0.1 > > > The access of DatasetUtils.countElements() is presently 'private', change > that to be 'public'. We happened to be replicating the functionality in our > project and realized the method already existed in Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: FLINK-3657: Change access of DataSetUtils.coun...
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/1829#issuecomment-205001401 +1 for merging to master. But would love to isolate the changes to exclude refactoring changes if possible --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3688) ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is called and TimeCharacteristic = ProcessingTime
[ https://issues.apache.org/jira/browse/FLINK-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223353#comment-15223353 ] Konstantin Knauf commented on FLINK-3688: - Regarding dropping Watermarks in {{StreamRecordSerializer.serialize()}}. Did you mean to change {{StreamRecordSerializer extends TypeSerializerClassCastException in StreamRecordSerializer when WindowOperator.trigger() is > called and TimeCharacteristic = ProcessingTime > > > Key: FLINK-3688 > URL: https://issues.apache.org/jira/browse/FLINK-3688 > Project: Flink > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Konstantin Knauf >Assignee: Konstantin Knauf >Priority: Critical > > Hi, > when using {{TimeCharacteristics.ProcessingTime}} a ClassCastException is > thrown in {{StreamRecordSerializer}} when > {{WindowOperator.processWatermark()}} is called from > {{WindowOperator.trigger()}}, i.e. whenever a ProcessingTimeTimer is > triggered. > The problem seems to be that {{processWatermark()}} is also called in > {{trigger()}}, when time characteristic is ProcessingTime, but in > {{RecordWriterOutput}} {{enableWatermarkMultiplexing}} is {{false}} and the > {{TypeSerializer}} is a {{StreamRecordSerializer}}, which ultimately leads to > the ClassCastException. Do you agree? > If this is indeed a bug, there several possible solutions. > # Only calling {{processWatermark()}} in {{trigger()}}, when > TimeCharacteristic is EventTime > # Not calling {{processWatermark()}} in {{trigger()}} at all, instead wait > for the next watermark to trigger the EventTimeTimers with a timestamp behind > the current watermark. This is, of course, a trade off. > # Using {{MultiplexingStreamRecordSerializer}} all the time, but I have no > idea what the side effect of this change would be. I assume there is a reason > for existence of the StreamRecordSerializer ;) > StackTrace: > {quote} > TimerException\{java.lang.RuntimeException: > org.apache.flink.streaming.api.watermark.Watermark cannot be cast to > org.apache.flink.streaming.runtime.streamrecord.StreamRecord\} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:716) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: > org.apache.flink.streaming.api.watermark.Watermark cannot be cast to > org.apache.flink.streaming.runtime.streamrecord.StreamRecord > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:93) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.emitWatermark(OperatorChain.java:370) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processWatermark(WindowOperator.java:293) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.trigger(WindowOperator.java:323) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:710) > ... 7 more > Caused by: java.lang.ClassCastException: > org.apache.flink.streaming.api.watermark.Watermark cannot be cast to > org.apache.flink.streaming.runtime.streamrecord.StreamRecord > at > org.apache.flink.streaming.runtime.streamrecord.StreamRecordSerializer.serialize(StreamRecordSerializer.java:42) > at >
[GitHub] flink pull request: FLINK-3657: Change access of DataSetUtils.coun...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/1829#issuecomment-204931011 @smarthi I think it's not too specific at all since its part of `DataSetUtils` and not `DataSet`. Let's merge the changes to master for `1.1`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'
[ https://issues.apache.org/jira/browse/FLINK-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223193#comment-15223193 ] ASF GitHub Bot commented on FLINK-3657: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/1829#issuecomment-204931011 @smarthi I think it's not too specific at all since its part of `DataSetUtils` and not `DataSet`. Let's merge the changes to master for `1.1`. > Change access of DataSetUtils.countElements() to 'public' > -- > > Key: FLINK-3657 > URL: https://issues.apache.org/jira/browse/FLINK-3657 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.0.0 >Reporter: Suneel Marthi >Assignee: Suneel Marthi >Priority: Minor > Fix For: 1.0.1 > > > The access of DatasetUtils.countElements() is presently 'private', change > that to be 'public'. We happened to be replicating the functionality in our > project and realized the method already existed in Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)