[GitHub] flink pull request: FLINK-3579 Improve String concatenation (Ram)

2016-04-03 Thread ramkrish86
Github user ramkrish86 commented on the pull request:

https://github.com/apache/flink/pull/1821#issuecomment-205139553
  
I have done the changes only to support the String concat operation. The 
cast() and abs() grammar change looks more complex and I need to understand 
things better in parser to support it. Suggest we do it in a seperate JIRA?  Is 
that fine @tillrohrmann ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3579) Improve String concatenation

2016-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223674#comment-15223674
 ] 

ASF GitHub Bot commented on FLINK-3579:
---

Github user ramkrish86 commented on the pull request:

https://github.com/apache/flink/pull/1821#issuecomment-205139553
  
I have done the changes only to support the String concat operation. The 
cast() and abs() grammar change looks more complex and I need to understand 
things better in parser to support it. Suggest we do it in a seperate JIRA?  Is 
that fine @tillrohrmann ?


> Improve String concatenation
> 
>
> Key: FLINK-3579
> URL: https://issues.apache.org/jira/browse/FLINK-3579
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Timo Walther
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
>
> Concatenation of a String and non-String does not work properly.
> e.g. {{f0 + 42}} leads to RelBuilder Exception
> ExpressionParser does not like {{f0 + 42.cast(STRING)}} either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3692) Develop a Kafka state backend

2016-04-03 Thread Elias Levy (JIRA)
Elias Levy created FLINK-3692:
-

 Summary: Develop a Kafka state backend
 Key: FLINK-3692
 URL: https://issues.apache.org/jira/browse/FLINK-3692
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Elias Levy


Flink clusters usually consume of a Kafka cluster.  It simplify operations if 
Flink could store its state checkpoints in Kafka.  This should be possibly 
using different topics to write to, partitioning appropriately, and using 
compacted topics.  This would avoid the need to run an HDFS cluster just to 
store Flink checkpoints.

For inspiration you may want to take a look at how Samza checkpoints a task's 
local state to a Kafka topic, and how the newer Kafka consumers checkpoint 
their offsets to Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3035) Redis as State Backend

2016-04-03 Thread Elias Levy (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223577#comment-15223577
 ] 

Elias Levy commented on FLINK-3035:
---

This seems like a bad idea. Regis is not a strongly consistent distributed data 
store, which is what you want to securely store your state snapshots.

> Redis as State Backend
> --
>
> Key: FLINK-3035
> URL: https://issues.apache.org/jira/browse/FLINK-3035
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Matthias J. Sax
>Assignee: Subhobrata Dey
>Priority: Minor
>
> Add Redis as a state backend for distributed snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2375) Add Approximate Adamic Adar Similarity method using BloomFilters

2016-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223490#comment-15223490
 ] 

ASF GitHub Bot commented on FLINK-2375:
---

Github user shghatge closed the pull request at:

https://github.com/apache/flink/pull/923


> Add Approximate Adamic Adar Similarity method using BloomFilters
> 
>
> Key: FLINK-2375
> URL: https://issues.apache.org/jira/browse/FLINK-2375
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Reporter: Shivani Ghatge
>Assignee: Shivani Ghatge
>Priority: Minor
>
> Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a 
> set of nodes. However, instead of counting the common neighbors and dividing 
> them by the total number of neighbors, the similarity is weighted according 
> to the vertex degrees. In particular, it's equal to log(1/numberOfEdges).
> The Adamic-Adar algorithm can be broken into three steps:
> 1). For each vertex, compute the log of its inverse degrees (with the formula 
> above) and set it as the vertex value.
> 2). Each vertex will then send this new computed value along with a list of 
> neighbors to the targets of its out-edges
> 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of 
> log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is 
> the degree of node n). See [2]
> Using BloomFilters we increase the scalability of the algorithm. The values 
> calculated for the edges will be approximate.
> Prerequisites:
> Full understanding of the Jaccard Similarity Measure algorithm
> Reading the associated literature:
> [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf
> [2] 
> http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2310) Add an Adamic-Adar Similarity example

2016-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223491#comment-15223491
 ] 

ASF GitHub Bot commented on FLINK-2310:
---

Github user shghatge closed the pull request at:

https://github.com/apache/flink/pull/892


> Add an Adamic-Adar Similarity example
> -
>
> Key: FLINK-2310
> URL: https://issues.apache.org/jira/browse/FLINK-2310
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Reporter: Andra Lungu
>Assignee: Shivani Ghatge
>Priority: Minor
>
> Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a 
> set of nodes. However, instead of counting the common neighbors and dividing 
> them by the total number of neighbors, the similarity is weighted according 
> to the vertex degrees. In particular, it's equal to log(1/numberOfEdges).
> The Adamic-Adar algorithm can be broken into three steps: 
> 1). For each vertex, compute the log of its inverse degrees (with the formula 
> above) and set it as the vertex value. 
> 2). Each vertex will then send this new computed value along with a list of 
> neighbors to the targets of its out-edges
> 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of 
> log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is 
> the degree of node n). See [2]
> Prerequisites: 
> - Full understanding of the Jaccard Similarity Measure algorithm
> - Reading the associated literature: 
> [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf
> [2] 
> http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2375] Add Approximate Adamic Adar Simil...

2016-04-03 Thread shghatge
Github user shghatge closed the pull request at:

https://github.com/apache/flink/pull/923


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2310] Add an Adamic Adar Similarity exa...

2016-04-03 Thread shghatge
Github user shghatge closed the pull request at:

https://github.com/apache/flink/pull/892


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (FLINK-3669) WindowOperator registers a lot of timers at StreamTask

2016-04-03 Thread Konstantin Knauf (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223402#comment-15223402
 ] 

Konstantin Knauf edited comment on FLINK-3669 at 4/3/16 5:32 PM:
-

I tried to implement the 1.. The hard part, I think, is to store the 
{{ScheduledFuture}}s to in the {{WindowOperator}}. We could use a 
{{Map}}, but this only works if we also implement timer 
coalescing. Besides that I am unsure about how to de/serialize this map. Could 
you give me pointer?

Edit. Ok, serializing the future does not make sense, so it could not be 
checkpointed, which is not necessarily a problem, I think.  Additionally, this 
should be a {{Map}} instead.


was (Author: knaufk):
I tried to implement the 1.. The hard part, I think, is to store the 
{{ScheduledFuture}}s to in the {{WindowOperator}}. We could use a 
{{Map}}, but this only works if we also implement timer 
coalescing. Besides that I am unsure about how to de/serialize this map. Could 
you give me pointer?

> WindowOperator registers a lot of timers at StreamTask
> --
>
> Key: FLINK-3669
> URL: https://issues.apache.org/jira/browse/FLINK-3669
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.1
>Reporter: Aljoscha Krettek
>Priority: Blocker
>
> Right now, the WindowOperator registers a timer at the StreamTask for every 
> processing-time timer that a Trigger registers. We should combine several 
> registered trigger timers to only register one low-level timer (timer 
> coalescing).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3669) WindowOperator registers a lot of timers at StreamTask

2016-04-03 Thread Konstantin Knauf (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223402#comment-15223402
 ] 

Konstantin Knauf commented on FLINK-3669:
-

I tried to implement the 1.. The hard part, I think, is to store the 
{{ScheduledFuture}}s to in the {{WindowOperator}}. We could use a 
{{Map}}, but this only works if we also implement timer 
coalescing. Besides that I am unsure about how to de/serialize this map. Could 
you give me pointer?

> WindowOperator registers a lot of timers at StreamTask
> --
>
> Key: FLINK-3669
> URL: https://issues.apache.org/jira/browse/FLINK-3669
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.1
>Reporter: Aljoscha Krettek
>Priority: Blocker
>
> Right now, the WindowOperator registers a timer at the StreamTask for every 
> processing-time timer that a Trigger registers. We should combine several 
> registered trigger timers to only register one low-level timer (timer 
> coalescing).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'

2016-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223375#comment-15223375
 ] 

ASF GitHub Bot commented on FLINK-3657:
---

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-205001401
  
+1 for merging to master. 

But would love to isolate the changes to exclude refactoring changes if 
possible


> Change access of DataSetUtils.countElements() to 'public' 
> --
>
> Key: FLINK-3657
> URL: https://issues.apache.org/jira/browse/FLINK-3657
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0.1
>
>
> The access of DatasetUtils.countElements() is presently 'private', change 
> that to be 'public'. We happened to be replicating the functionality in our 
> project and realized the method already existed in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3657: Change access of DataSetUtils.coun...

2016-04-03 Thread hsaputra
Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-205001401
  
+1 for merging to master. 

But would love to isolate the changes to exclude refactoring changes if 
possible


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3688) ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is called and TimeCharacteristic = ProcessingTime

2016-04-03 Thread Konstantin Knauf (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223353#comment-15223353
 ] 

Konstantin Knauf commented on FLINK-3688:
-

Regarding dropping Watermarks in {{StreamRecordSerializer.serialize()}}. Did 
you mean to change {{StreamRecordSerializer extends 
TypeSerializer ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is 
> called and TimeCharacteristic = ProcessingTime
> 
>
> Key: FLINK-3688
> URL: https://issues.apache.org/jira/browse/FLINK-3688
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Konstantin Knauf
>Assignee: Konstantin Knauf
>Priority: Critical
>
> Hi,
> when using {{TimeCharacteristics.ProcessingTime}} a ClassCastException is 
> thrown in {{StreamRecordSerializer}} when 
> {{WindowOperator.processWatermark()}} is called from 
> {{WindowOperator.trigger()}}, i.e. whenever a ProcessingTimeTimer is 
> triggered. 
> The problem seems to be that {{processWatermark()}} is also called in 
> {{trigger()}}, when time characteristic is ProcessingTime, but in 
> {{RecordWriterOutput}} {{enableWatermarkMultiplexing}} is {{false}} and the 
> {{TypeSerializer}} is a {{StreamRecordSerializer}}, which ultimately leads to 
> the ClassCastException. Do you agree?
> If this is indeed a bug, there several possible solutions.
> # Only calling {{processWatermark()}} in {{trigger()}}, when 
> TimeCharacteristic is EventTime
> # Not calling {{processWatermark()}} in {{trigger()}} at all, instead wait 
> for the next watermark to trigger the EventTimeTimers with a timestamp behind 
> the current watermark. This is, of course, a trade off. 
> # Using {{MultiplexingStreamRecordSerializer}} all the time, but I have no 
> idea what the side effect of this change would be. I assume there is a reason 
> for existence of the StreamRecordSerializer ;)
> StackTrace: 
> {quote}
> TimerException\{java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord\}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:716)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:93)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.emitWatermark(OperatorChain.java:370)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processWatermark(WindowOperator.java:293)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.trigger(WindowOperator.java:323)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:710)
>   ... 7 more
> Caused by: java.lang.ClassCastException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecordSerializer.serialize(StreamRecordSerializer.java:42)
>   at 
> 

[GitHub] flink pull request: FLINK-3657: Change access of DataSetUtils.coun...

2016-04-03 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-204931011
  
@smarthi I think it's not too specific at all since its part of 
`DataSetUtils` and not `DataSet`. Let's merge the changes to master for `1.1`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'

2016-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223193#comment-15223193
 ] 

ASF GitHub Bot commented on FLINK-3657:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-204931011
  
@smarthi I think it's not too specific at all since its part of 
`DataSetUtils` and not `DataSet`. Let's merge the changes to master for `1.1`.


> Change access of DataSetUtils.countElements() to 'public' 
> --
>
> Key: FLINK-3657
> URL: https://issues.apache.org/jira/browse/FLINK-3657
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0.1
>
>
> The access of DatasetUtils.countElements() is presently 'private', change 
> that to be 'public'. We happened to be replicating the functionality in our 
> project and realized the method already existed in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)