date:20160408

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233304#comment-15233304
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user danielblazevski commented on the pull request:

https://github.com/apache/flink/pull/1220#issuecomment-207685012
  
@hsaputra I added apache/flink as upstream, namely:
`git remote add upstream https://github.com/apache/flink.git`
Then I ran what Chiwan above suggested, namely:
```
# fetch updated master branch
git fetch upstream master
# checkout local master branch
git checkout master 
# merge local master branch and upstream master branch (this should be 
fast-forward merge.)
git merge upstream/master
# checkout local FLINK-1745 branch
git checkout FLINK-1745
# rebase FLINK-1745 on local master branch
git rebase master
# force push local FLINK-1745 branch to github's FLINK-1745 branch
git push origin +FLINK-1745
```
I then moved the 4 knn files originally in flink-staging/ to 
flink-libraries/ and pushed again. 

The unfortunate thing now is that when I run `mvn clean package 
-DskipTests` I get errors (I can show you if you'd likebut I assume the 
Travic CI build won't go through and the error will pop up there too).  Did I 
do something wrong?  The good news is that I made a copy of the directory that 
I was working in since I've had rebasing problems before, so I can always try 
to go back to that and do a force push.

I wonder since I'm only adding new files whether it's even easier to just 
clone `apache/master`, run `mvn clean package -DskipTests` put the new files in 
there and submit a new PR?



> Add exact k-nearest-neighbours algorithm to machine learning library
> 
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Blazevski
>  Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2016-04-08 Thread danielblazevski

Github user danielblazevski commented on the pull request:

https://github.com/apache/flink/pull/1220#issuecomment-207685012
  
@hsaputra I added apache/flink as upstream, namely:
`git remote add upstream https://github.com/apache/flink.git`
Then I ran what Chiwan above suggested, namely:
```
# fetch updated master branch
git fetch upstream master
# checkout local master branch
git checkout master 
# merge local master branch and upstream master branch (this should be 
fast-forward merge.)
git merge upstream/master
# checkout local FLINK-1745 branch
git checkout FLINK-1745
# rebase FLINK-1745 on local master branch
git rebase master
# force push local FLINK-1745 branch to github's FLINK-1745 branch
git push origin +FLINK-1745
```
I then moved the 4 knn files originally in flink-staging/ to 
flink-libraries/ and pushed again. 

The unfortunate thing now is that when I run `mvn clean package 
-DskipTests` I get errors (I can show you if you'd likebut I assume the 
Travic CI build won't go through and the error will pop up there too).  Did I 
do something wrong?  The good news is that I made a copy of the directory that 
I was working in since I've had rebasing problems before, so I can always try 
to go back to that and do a force push.

I wonder since I'm only adding new files whether it's even easier to just 
clone `apache/master`, run `mvn clean package -DskipTests` put the new files in 
there and submit a new PR?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3721) Min and max accumulators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232942#comment-15232942
 ] 

ASF GitHub Bot commented on FLINK-3721:
---

GitHub user greghogan opened a pull request:

https://github.com/apache/flink/pull/1866

[FLINK-3721] Min and max accumulators

Flink already contains DoubleCounter, IntCounter, and LongCounter for 
adding numbers. This adds equivalent accumulators for storing the minimum and 
maximum double, int, and long values.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/greghogan/flink 3721_min_and_max_accumulators

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1866.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1866


commit fb413e766d84ba96b4f699fd7597c89082121625
Author: Greg Hogan 
Date:   2016-04-08T19:06:39Z

[FLINK-3721] Min and max accumulators

Flink already contains DoubleCounter, IntCounter, and LongCounter for
adding numbers. This adds equivalent accumulators for storing the
minimum and maximum double, int, and long values.




> Min and max accumulators
> 
>
> Key: FLINK-3721
> URL: https://issues.apache.org/jira/browse/FLINK-3721
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> Flink already contains {{DoubleCounter}}, {{IntCounter}}, and {{LongCounter}} 
> for adding numbers. This will add equivalent accumulators for storing the 
> minimum and maximum double, int, and long values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3721] Min and max accumulators

2016-04-08 Thread greghogan

GitHub user greghogan opened a pull request:

https://github.com/apache/flink/pull/1866

[FLINK-3721] Min and max accumulators

Flink already contains DoubleCounter, IntCounter, and LongCounter for 
adding numbers. This adds equivalent accumulators for storing the minimum and 
maximum double, int, and long values.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/greghogan/flink 3721_min_and_max_accumulators

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1866.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1866


commit fb413e766d84ba96b4f699fd7597c89082121625
Author: Greg Hogan 
Date:   2016-04-08T19:06:39Z

[FLINK-3721] Min and max accumulators

Flink already contains DoubleCounter, IntCounter, and LongCounter for
adding numbers. This adds equivalent accumulators for storing the
minimum and maximum double, int, and long values.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK][3720] Added Warm Starts

2016-04-08 Thread rawkintrevo

Github user rawkintrevo commented on the pull request:

https://github.com/apache/flink/pull/1865#issuecomment-207584968
  
There is an error in the docs (code isn't in the highlighting)- waiting to 
see what else needs to be done (per reviewers comments) to minimize the commits 
/ squashing. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK][3720] Added Warm Starts

2016-04-08 Thread rawkintrevo

GitHub user rawkintrevo opened a pull request:

https://github.com/apache/flink/pull/1865

[FLINK][3720] Added Warm Starts

This adds warm start functionality for IterativeSolver and 
exposes/implements in Multiple Linear Regression.

Everything in the checklist should be GTG except updating JavaDocs

- [x] General
- [ ] Documentation
  - JavaDoc for public methods has been added
- [x] Tests & Build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rawkintrevo/flink warmstarts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1865


commit 8576054f7f34361f96a0e815c2cb4f16ab9b688b
Author: Trevor Grant 
Date:   2016-04-08T15:06:28Z

[FLINK][3720] Added Warm Starts

[FLINK][3720] Added Tests and Documentation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3398) Flink Kafka consumer should support auto-commit opt-outs

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232739#comment-15232739
 ] 

ASF GitHub Bot commented on FLINK-3398:
---

Github user shikhar commented on the pull request:

https://github.com/apache/flink/pull/1690#issuecomment-207563659
  
Another thing is that presence of offsets in ZK (0.8) / Kafka (0.9) also 
affects what happens when a job is starting without a checkpoint / savepoint. 
If they are present, the reset mode won't be respected, and there is potential 
for dropping messages (e.g. if the recovery strategy in the absence of a valid 
checkpoint or savepoint is to process from the beginning of available data with 
offset reset mode 'earliest').


> Flink Kafka consumer should support auto-commit opt-outs
> 
>
> Key: FLINK-3398
> URL: https://issues.apache.org/jira/browse/FLINK-3398
> Project: Flink
>  Issue Type: Bug
>Reporter: Shikhar Bhushan
>
> Currently the Kafka source will commit consumer offsets to Zookeeper, either 
> upon a checkpoint if checkpointing is enabled, otherwise periodically based 
> on {{auto.commit.interval.ms}}
> It should be possible to opt-out of committing consumer offsets to Zookeeper. 
> Kafka has this config as {{auto.commit.enable}} (0.8) and 
> {{enable.auto.commit}} (0.9).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: FLINK-3398: Allow for opting-out from Kafka of...

2016-04-08 Thread shikhar

Github user shikhar commented on the pull request:

https://github.com/apache/flink/pull/1690#issuecomment-207563659
  
Another thing is that presence of offsets in ZK (0.8) / Kafka (0.9) also 
affects what happens when a job is starting without a checkpoint / savepoint. 
If they are present, the reset mode won't be respected, and there is potential 
for dropping messages (e.g. if the recovery strategy in the absence of a valid 
checkpoint or savepoint is to process from the beginning of available data with 
offset reset mode 'earliest').


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-3721) Min and max accumulators

2016-04-08 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-3721:
-

 Summary: Min and max accumulators
 Key: FLINK-3721
 URL: https://issues.apache.org/jira/browse/FLINK-3721
 Project: Flink
  Issue Type: New Feature
  Components: Core
Affects Versions: 1.1.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor


Flink already contains {{DoubleCounter}}, {{IntCounter}}, and {{LongCounter}} 
for adding numbers. This will add equivalent accumulators for storing the 
minimum and maximum double, int, and long values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2544) Some test cases using PowerMock fail with Java 8u20

2016-04-08 Thread Soila Kavulya (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232697#comment-15232697
 ] 

Soila Kavulya commented on FLINK-2544:
--

+1 for documenting this. I ran into this while building the unit tests with 
Java 8u20. It would be nice to have it in the README under building Apache 
Flink from source.

> Some test cases using PowerMock fail with Java 8u20
> ---
>
> Key: FLINK-2544
> URL: https://issues.apache.org/jira/browse/FLINK-2544
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>Priority: Minor
>
> I observed that some of the test cases using {{PowerMockRunner}} fail with 
> Java 8u20 with the following error:
> {code}
> java.lang.VerifyError: Bad  method call from inside of a branch
> Exception Details:
>   Location:
> 
> org/apache/flink/client/program/ClientTest$SuccessReturningActor.()V 
> @32: invokespecial
>   Reason:
> Error exists in the bytecode
>   Bytecode:
> 0x000: 2a4c 1214 b800 1a03 bd00 0d12 1bb8 001f
> 0x010: b800 254e 2db2 0029 a500 0e2a 01c0 002b
> 0x020: b700 2ea7 0009 2bb7 0030 0157 2a01 4c01
> 0x030: 4d01 4e2b 01a5 0008 2b4e a700 0912 32b8
> 0x040: 001a 4e2d 1234 03bd 000d 1236 b800 1f12
> 0x050: 32b8 003a 3a04 1904 b200 29a6 000a b800
> 0x060: 3c4d a700 0919 04c0 0011 4d2c b800 42b5
> 0x070: 0046 b1
>   Stackmap Table:
> full_frame(@38,{UninitializedThis,UninitializedThis,Top,Object[#13]},{})
> full_frame(@44,{Object[#2],Object[#2],Top,Object[#13]},{})
> full_frame(@61,{Object[#2],Null,Null,Null},{Object[#2]})
> full_frame(@67,{Object[#2],Null,Null,Object[#15]},{Object[#2]})
> 
> full_frame(@101,{Object[#2],Null,Null,Object[#15],Object[#13]},{Object[#2]})
> 
> full_frame(@107,{Object[#2],Null,Object[#17],Object[#15],Object[#13]},{Object[#2]})
>   at java.lang.Class.getDeclaredConstructors0(Native Method)
>   at java.lang.Class.privateGetDeclaredConstructors(Class.java:2658)
>   at java.lang.Class.getConstructor0(Class.java:3062)
>   at java.lang.Class.getDeclaredConstructor(Class.java:2165)
>   at akka.util.Reflect$$anonfun$4.apply(Reflect.scala:86)
>   at akka.util.Reflect$$anonfun$4.apply(Reflect.scala:86)
>   at scala.util.Try$.apply(Try.scala:161)
>   at akka.util.Reflect$.findConstructor(Reflect.scala:86)
>   at akka.actor.NoArgsReflectConstructor.(Props.scala:359)
>   at akka.actor.IndirectActorProducer$.apply(Props.scala:308)
>   at akka.actor.Props.producer(Props.scala:176)
>   at akka.actor.Props.(Props.scala:189)
>   at akka.actor.Props$.create(Props.scala:99)
>   at akka.actor.Props$.create(Props.scala:99)
>   at akka.actor.Props.create(Props.scala)
>   at 
> org.apache.flink.client.program.ClientTest.shouldSubmitToJobClient(ClientTest.java:143)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:68)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:310)
>   at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:88)
>   at 
> org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:96)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.executeTest(PowerMockJUnit44RunnerDelegateImpl.java:294)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTestInSuper(PowerMockJUnit47RunnerDelegateImpl.java:127)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTest(PowerMockJUnit47RunnerDelegateImpl.java:82)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runBeforesThenTestThenAfters(PowerMockJUnit44RunnerDelegateImpl.java:282)
>   at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:86)
>   at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:49)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.invokeTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:207)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.runMethods(PowerMockJUnit44RunnerDelegateImpl.java:146)
>   at 
>

[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232661#comment-15232661
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/1220#issuecomment-207547199
  
@danielblazevski : Sorry, but could you help rebase the conflicts for this 
PR? Thanks


> Add exact k-nearest-neighbours algorithm to machine learning library
> 
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Blazevski
>  Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2016-04-08 Thread hsaputra

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/1220#issuecomment-207547199
  
@danielblazevski : Sorry, but could you help rebase the conflicts for this 
PR? Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232653#comment-15232653
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1807#issuecomment-207546079
  
@vasia thank you for the recommendations, the improvements are almost ready 
to push. Running the `Graph500` example on an AWS c4.8xlarge (with 36 'virtual 
cores') generated a billion edges (scale 26, edge factor 16) in 25.8s wall time 
(23.4s execution time). When simplifying the graph with 'clip-and-flip' the 
runtime was 2m33s wall time (2m31s execution time) and when performing a full 
flip (thus doubling the number of edges) the runtime was 5m00s wall time (4m58s 
execution time).

Many algorithms require edges to be sorted so that cost is already 
accounted for. Edge generation should scale beautifully.


> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread greghogan

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1807#issuecomment-207546079
  
@vasia thank you for the recommendations, the improvements are almost ready 
to push. Running the `Graph500` example on an AWS c4.8xlarge (with 36 'virtual 
cores') generated a billion edges (scale 26, edge factor 16) in 25.8s wall time 
(23.4s execution time). When simplifying the graph with 'clip-and-flip' the 
runtime was 2m33s wall time (2m31s execution time) and when performing a full 
flip (thus doubling the number of edges) the runtime was 5m00s wall time (4m58s 
execution time).

Many algorithms require edges to be sorted so that cost is already 
accounted for. Edge generation should scale beautifully.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3709) [streaming] Graph event rates over time

2016-04-08 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232646#comment-15232646
 ] 

Nick Dimiduk commented on FLINK-3709:
-

Thanks good news Robert. Is there a JIRA I should be following for this work? 
Maybe this one becomes a subtask? Thanks.

> [streaming] Graph event rates over time
> ---
>
> Key: FLINK-3709
> URL: https://issues.apache.org/jira/browse/FLINK-3709
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Reporter: Nick Dimiduk
>
> The streaming server job page displays bytes and records sent and received, 
> which answers the question "is data moving?" The next obvious question is "is 
> data moving over time?" That could be answered by a chart displaying 
> bytes/events rates. This would be a great chart to add to this display.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-3279) Optionally disable DistinctOperator combiner

2016-04-08 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan reassigned FLINK-3279:
-

Assignee: Greg Hogan

> Optionally disable DistinctOperator combiner
> 
>
> Key: FLINK-3279
> URL: https://issues.apache.org/jira/browse/FLINK-3279
> Project: Flink
>  Issue Type: New Feature
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> Calling {{DataSet.distinct()}} executes {{DistinctOperator.DistinctFunction}} 
> which is a combinable {{RichGroupReduceFunction}}. Sometimes we know that 
> there will be few duplicate records and disabling the combine would improve 
> performance.
> I propose adding {{public DistinctOperator setCombinable(boolean 
> combinable)}} to {{DistinctOperator}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1404) Add support to cache intermediate results

2016-04-08 Thread Ovidiu Marcu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232639#comment-15232639
 ] 

Ovidiu Marcu commented on FLINK-1404:
-

hi, is this related to the 'persistent-blocking-no back pressure intermediate 
result partition variant' specified in https://github.com/apache/flink/pull/254 
?

> Add support to cache intermediate results
> -
>
> Key: FLINK-1404
> URL: https://issues.apache.org/jira/browse/FLINK-1404
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Runtime
>Reporter: Ufuk Celebi
>Assignee: Maximilian Michels
>
> With blocking intermediate results (FLINK-1350) and proper partition state 
> management (FLINK-1359) it is necessary to allow the network buffer pool to 
> request eviction of historic intermediate results when not enough buffers are 
> available. With the currently available pipelined intermediate partitions 
> this is not an issue, because buffer pools can be released as soon as a 
> partition is consumed.
> We need to be able to trigger the recycling of buffers held by historic 
> intermediate results when not enough buffers are available for new local 
> pools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3469] Improve documentation for groupin...

2016-04-08 Thread fhueske

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1858#issuecomment-207507759
  
+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3469) Improve documentation for grouping keys

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232466#comment-15232466
 ] 

ASF GitHub Bot commented on FLINK-3469:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1858#issuecomment-207507759
  
+1 to merge


> Improve documentation for grouping keys
> ---
>
> Key: FLINK-3469
> URL: https://issues.apache.org/jira/browse/FLINK-3469
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> The transformation documentation for "Reduce on DataSet Grouped by 
> KeySelector Function" uses a field expression in the Java example.
> There are four ways to specify keys and only two have named examples in the 
> documentation. Expand the documentation to cover all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-3477) Add hash-based combine strategy for ReduceFunction

2016-04-08 Thread Gabor Gevay (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay reassigned FLINK-3477:
--

Assignee: Gabor Gevay

> Add hash-based combine strategy for ReduceFunction
> --
>
> Key: FLINK-3477
> URL: https://issues.apache.org/jira/browse/FLINK-3477
> Project: Flink
>  Issue Type: Sub-task
>  Components: Local Runtime
>Reporter: Fabian Hueske
>Assignee: Gabor Gevay
>
> This issue is about adding a hash-based combine strategy for ReduceFunctions.
> The interface of the {{reduce()}} method is as follows:
> {code}
> public T reduce(T v1, T v2)
> {code}
> Input type and output type are identical and the function returns only a 
> single value. A Reduce function is incrementally applied to compute a final 
> aggregated value. This allows to hold the preaggregated value in a hash-table 
> and update it with each function call. 
> The hash-based strategy requires special implementation of an in-memory hash 
> table. The hash table should support in place updates of elements (if the 
> updated value has the same size as the new value) but also appending updates 
> with invalidation of the old value (if the binary length of the new value 
> differs). The hash table needs to be able to evict and emit all elements if 
> it runs out-of-memory.
> We should also add {{HASH}} and {{SORT}} compiler hints to 
> {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the 
> execution strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3716) Kafka08ITCase.testFailOnNoBroker() timing out before it has a chance to pass

2016-04-08 Thread Todd Lisonbee (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232384#comment-15232384
 ] 

Todd Lisonbee commented on FLINK-3716:
--

I opened a pull request with a fix.  

By decreasing the socket timeout the test passes for me locally and also runs a 
lot faster.  The other solution would be to increase the JUnit timeout but that 
seemed less desirable.  I'm still not sure why this test would pass on some 
systems but not mine.  I suspect the systems where it is passing have extra 
configuration.  But in any case this fix seemed valid.

https://github.com/apache/flink/pull/1864

> Kafka08ITCase.testFailOnNoBroker() timing out before it has a chance to pass
> 
>
> Key: FLINK-3716
> URL: https://issues.apache.org/jira/browse/FLINK-3716
> Project: Flink
>  Issue Type: Bug
>Reporter: Todd Lisonbee
>  Labels: test-stability
>
> This is on the latest master 4/7/2016 with `mvn clean verify`.  
> Test also reliably fails running it directly from IntelliJ.
> Test has a 60 second timeout but it seems to need much more time to run (my 
> workstation has server class Xeon).
> Test 
> testFailOnNoBroker(org.apache.flink.streaming.connectors.kafka.Kafka08ITCase) 
> failed with:
> java.lang.Exception: test timed out after 6 milliseconds
>   at sun.nio.ch.Net.poll(Native Method)
>   at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
>   at 
> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:204)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>   at 
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>   at kafka.utils.Utils$.read(Utils.scala:380)
>   at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>   at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>   at 
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>   at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
>   at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79)
>   at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68)
>   at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:91)
>   at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.getPartitionsForTopic(FlinkKafkaConsumer08.java:521)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:218)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl.getConsumer(KafkaTestEnvironmentImpl.java:95)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironment.getConsumer(KafkaTestEnvironment.java:65)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironment.getConsumer(KafkaTestEnvironment.java:73)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase.runFailOnNoBrokerTest(KafkaConsumerTestBase.java:155)
>   at 
> org.apache.flink.streaming.connectors.kafka.Kafka08ITCase.testFailOnNoBroker(Kafka08ITCase.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3716) Kafka08ITCase.testFailOnNoBroker() timing out before it has a chance to pass

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232374#comment-15232374
 ] 

ASF GitHub Bot commented on FLINK-3716:
---

GitHub user tlisonbee opened a pull request:

https://github.com/apache/flink/pull/1864

[FLINK-3716] Kafka08ITCase.testFailOnNoBroker() fix

Decreasing socket timeout so that testFailOnNoBroker() will be able to pass 
before the JUnit timeout.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tlisonbee/flink FLINK-3716

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1864.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1864


commit c30f76392508ab743dfc56ff72cd9fc1fa4770a3
Author: Todd Lisonbee 
Date:   2016-04-08T00:47:19Z

[FLINK-3716] decreasing socket timeout so testFailOnNoBroker() will pass 
before JUnit timeout




> Kafka08ITCase.testFailOnNoBroker() timing out before it has a chance to pass
> 
>
> Key: FLINK-3716
> URL: https://issues.apache.org/jira/browse/FLINK-3716
> Project: Flink
>  Issue Type: Bug
>Reporter: Todd Lisonbee
>  Labels: test-stability
>
> This is on the latest master 4/7/2016 with `mvn clean verify`.  
> Test also reliably fails running it directly from IntelliJ.
> Test has a 60 second timeout but it seems to need much more time to run (my 
> workstation has server class Xeon).
> Test 
> testFailOnNoBroker(org.apache.flink.streaming.connectors.kafka.Kafka08ITCase) 
> failed with:
> java.lang.Exception: test timed out after 6 milliseconds
>   at sun.nio.ch.Net.poll(Native Method)
>   at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
>   at 
> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:204)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>   at 
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>   at kafka.utils.Utils$.read(Utils.scala:380)
>   at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>   at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>   at 
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>   at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
>   at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79)
>   at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68)
>   at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:91)
>   at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.getPartitionsForTopic(FlinkKafkaConsumer08.java:521)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:218)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl.getConsumer(KafkaTestEnvironmentImpl.java:95)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironment.getConsumer(KafkaTestEnvironment.java:65)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironment.getConsumer(KafkaTestEnvironment.java:73)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase.runFailOnNoBrokerTest(KafkaConsumerTestBase.java:155)
>   at 
> org.apache.flink.streaming.connectors.kafka.Kafka08ITCase.testFailOnNoBroker(Kafka08ITCase.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3716] Kafka08ITCase.testFailOnNoBroker(...

2016-04-08 Thread tlisonbee

GitHub user tlisonbee opened a pull request:

https://github.com/apache/flink/pull/1864

[FLINK-3716] Kafka08ITCase.testFailOnNoBroker() fix

Decreasing socket timeout so that testFailOnNoBroker() will be able to pass 
before the JUnit timeout.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tlisonbee/flink FLINK-3716

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1864.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1864


commit c30f76392508ab743dfc56ff72cd9fc1fa4770a3
Author: Todd Lisonbee 
Date:   2016-04-08T00:47:19Z

[FLINK-3716] decreasing socket timeout so testFailOnNoBroker() will pass 
before JUnit timeout




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-3716) Kafka08ITCase.testFailOnNoBroker() timing out before it has a chance to pass

2016-04-08 Thread Todd Lisonbee (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lisonbee updated FLINK-3716:
-
Labels: test-stability  (was: )

> Kafka08ITCase.testFailOnNoBroker() timing out before it has a chance to pass
> 
>
> Key: FLINK-3716
> URL: https://issues.apache.org/jira/browse/FLINK-3716
> Project: Flink
>  Issue Type: Bug
>Reporter: Todd Lisonbee
>  Labels: test-stability
>
> This is on the latest master 4/7/2016 with `mvn clean verify`.  
> Test also reliably fails running it directly from IntelliJ.
> Test has a 60 second timeout but it seems to need much more time to run (my 
> workstation has server class Xeon).
> Test 
> testFailOnNoBroker(org.apache.flink.streaming.connectors.kafka.Kafka08ITCase) 
> failed with:
> java.lang.Exception: test timed out after 6 milliseconds
>   at sun.nio.ch.Net.poll(Native Method)
>   at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
>   at 
> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:204)
>   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>   at 
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>   at kafka.utils.Utils$.read(Utils.scala:380)
>   at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>   at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>   at 
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>   at kafka.network.BlockingChannel.receive(BlockingChannel.scala:111)
>   at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79)
>   at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68)
>   at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:91)
>   at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.getPartitionsForTopic(FlinkKafkaConsumer08.java:521)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:218)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl.getConsumer(KafkaTestEnvironmentImpl.java:95)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironment.getConsumer(KafkaTestEnvironment.java:65)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironment.getConsumer(KafkaTestEnvironment.java:73)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase.runFailOnNoBrokerTest(KafkaConsumerTestBase.java:155)
>   at 
> org.apache.flink.streaming.connectors.kafka.Kafka08ITCase.testFailOnNoBroker(Kafka08ITCase.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3720) Add warm starts for models

2016-04-08 Thread Trevor Grant (JIRA)

Trevor Grant created FLINK-3720:
---

 Summary: Add warm starts for models
 Key: FLINK-3720
 URL: https://issues.apache.org/jira/browse/FLINK-3720
 Project: Flink
  Issue Type: Improvement
  Components: Machine Learning Library
Reporter: Trevor Grant
Assignee: Trevor Grant
 Fix For: 1.1.0


Add 'warm-start' to Iterative Solver. 

- Make weight vector settable (this will allow for model saving/loading)
- Make iterator existing weight vector if available
- Keep track of what iteration we're on for additional partial fits in SGD (and 
anywhere else it makes sense). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232286#comment-15232286
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59037618
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/generator/AbstractGraphGenerator.java
 ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.generator;
+
+public abstract class AbstractGraphGenerator
--- End diff --

The alternative being to copy `setParallelism` into each generator class?


> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread greghogan

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59037618
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/generator/AbstractGraphGenerator.java
 ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.generator;
+
+public abstract class AbstractGraphGenerator
--- End diff --

The alternative being to copy `setParallelism` into each generator class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3669) WindowOperator registers a lot of timers at StreamTask

2016-04-08 Thread Konstantin Knauf (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232228#comment-15232228
 ] 

Konstantin Knauf commented on FLINK-3669:
-

Is WindowCheckpointingIT the right place to put a test of restore/snapshot 
state? Generally, are there any testing guidlines? 

> WindowOperator registers a lot of timers at StreamTask
> --
>
> Key: FLINK-3669
> URL: https://issues.apache.org/jira/browse/FLINK-3669
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.1
>Reporter: Aljoscha Krettek
>Assignee: Konstantin Knauf
>Priority: Blocker
>
> Right now, the WindowOperator registers a timer at the StreamTask for every 
> processing-time timer that a Trigger registers. We should combine several 
> registered trigger timers to only register one low-level timer (timer 
> coalescing).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232211#comment-15232211
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027618
  
--- Diff: 
flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/Graph500.java
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.examples;
+
+import org.apache.commons.math3.random.JDKRandomGenerator;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.io.CsvOutputFormat;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.utils.DataSetUtils;
+import org.apache.flink.api.java.utils.ParameterTool;
+import org.apache.flink.graph.generator.RMatGraph;
+import org.apache.flink.graph.generator.random.JDKRandomGeneratorFactory;
+import org.apache.flink.graph.generator.random.RandomGenerableFactory;
+import org.apache.flink.types.LongValue;
+
+import java.text.NumberFormat;
+
+/**
+ * Generate an RMat graph for Graph 500.
+ *
+ * Note that this does not yet implement permutation of vertex labels or 
edges.
+
+ * @see http://www.graph500.org/specifications;>Graph 500
+ */
--- End diff --

Can you add a usage description? What are the input parameters?


> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232212#comment-15232212
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027687
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/generator/AbstractGraphGenerator.java
 ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.generator;
+
+public abstract class AbstractGraphGenerator
--- End diff --

Why do we need both this class and the `GraphGenerator` interface?


> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3444] env.fromElements relies on the fi...

2016-04-08 Thread zentol

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1857#issuecomment-207442123
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232216#comment-15232216
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1807#issuecomment-207442760
  
Really amazing job @greghogan! I left a few suggestions on improving the 
docs / usage of the generators. Otherwise, I think it's good to merge. This 
will be a great addition to Gelly :) Do you have any idea on how the generators 
scale?


> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread vasia

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1807#issuecomment-207442760
  
Really amazing job @greghogan! I left a few suggestions on improving the 
docs / usage of the generators. Otherwise, I think it's good to merge. This 
will be a great addition to Gelly :) Do you have any idea on how the generators 
scale?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3444) env.fromElements relies on the first input element for determining the DataSet/DataStream type

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232213#comment-15232213
 ] 

ASF GitHub Bot commented on FLINK-3444:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1857#issuecomment-207442123
  
+1


> env.fromElements relies on the first input element for determining the 
> DataSet/DataStream type
> --
>
> Key: FLINK-3444
> URL: https://issues.apache.org/jira/browse/FLINK-3444
> Project: Flink
>  Issue Type: Bug
>  Components: DataSet API, DataStream API
>Affects Versions: 0.10.0, 1.0.0
>Reporter: Vasia Kalavri
>
> The {{fromElements}} method of the {{ExecutionEnvironment}} and 
> {{StreamExecutionEnvironment}} determines the DataSet/DataStream type by 
> extracting the type of the first input element.
> This is problematic if the first element is a subtype of another element in 
> the collection.
> For example, the following
> {code}
> DataStream input = env.fromElements(new Event(1, "a"), new SubEvent(2, 
> "b"));
> {code}
> succeeds, while the following
> {code}
> DataStream input = env.fromElements(new SubEvent(1, "a"), new Event(2, 
> "b"));
> {code}
> fails with "java.lang.IllegalArgumentException: The elements in the 
> collection are not all subclasses of SubEvent".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027687
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/generator/AbstractGraphGenerator.java
 ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.generator;
+
+public abstract class AbstractGraphGenerator
--- End diff --

Why do we need both this class and the `GraphGenerator` interface?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027618
  
--- Diff: 
flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/examples/Graph500.java
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.examples;
+
+import org.apache.commons.math3.random.JDKRandomGenerator;
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.io.CsvOutputFormat;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.api.java.utils.DataSetUtils;
+import org.apache.flink.api.java.utils.ParameterTool;
+import org.apache.flink.graph.generator.RMatGraph;
+import org.apache.flink.graph.generator.random.JDKRandomGeneratorFactory;
+import org.apache.flink.graph.generator.random.RandomGenerableFactory;
+import org.apache.flink.types.LongValue;
+
+import java.text.NumberFormat;
+
+/**
+ * Generate an RMat graph for Graph 500.
+ *
+ * Note that this does not yet implement permutation of vertex labels or 
edges.
+
+ * @see http://www.graph500.org/specifications;>Graph 500
+ */
--- End diff --

Can you add a usage description? What are the input parameters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027469
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CompleteGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CompleteGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Cycle Graph
+
+An undirected graph where all edges form a single cycle.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CycleGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CycleGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CycleGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Empty Graph
+
+The graph containing no edges.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new EmptyGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.EmptyGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new EmptyGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Grid Graph
+
+An undirected graph connecting vertices in a regular tiling in one or more 
dimensions.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new GridGraph(env)
+.addDimension(2, false)
+.addDimension(4, false)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.GridGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new GridGraph(env.getJavaEnv).addDimension(2, 
false).addDimension(4, false).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+5
+
+
+6
+
+
+7
+
+
+### Hypercube Graph
+
+An undirected graph where edges form an n-dimensional hypercube.
+
+
+
+{% highlight java %}

[jira] [Commented] (FLINK-3688) ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is called and TimeCharacteristic = ProcessingTime

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232208#comment-15232208
 ] 

ASF GitHub Bot commented on FLINK-3688:
---

Github user knaufk closed the pull request at:

https://github.com/apache/flink/pull/1861


> ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is 
> called and TimeCharacteristic = ProcessingTime
> 
>
> Key: FLINK-3688
> URL: https://issues.apache.org/jira/browse/FLINK-3688
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Konstantin Knauf
>Assignee: Konstantin Knauf
>Priority: Critical
> Fix For: 1.0.2
>
>
> Hi,
> when using {{TimeCharacteristics.ProcessingTime}} a ClassCastException is 
> thrown in {{StreamRecordSerializer}} when 
> {{WindowOperator.processWatermark()}} is called from 
> {{WindowOperator.trigger()}}, i.e. whenever a ProcessingTimeTimer is 
> triggered. 
> The problem seems to be that {{processWatermark()}} is also called in 
> {{trigger()}}, when time characteristic is ProcessingTime, but in 
> {{RecordWriterOutput}} {{enableWatermarkMultiplexing}} is {{false}} and the 
> {{TypeSerializer}} is a {{StreamRecordSerializer}}, which ultimately leads to 
> the ClassCastException. Do you agree?
> If this is indeed a bug, there several possible solutions.
> # Only calling {{processWatermark()}} in {{trigger()}}, when 
> TimeCharacteristic is EventTime
> # Not calling {{processWatermark()}} in {{trigger()}} at all, instead wait 
> for the next watermark to trigger the EventTimeTimers with a timestamp behind 
> the current watermark. This is, of course, a trade off. 
> # Using {{MultiplexingStreamRecordSerializer}} all the time, but I have no 
> idea what the side effect of this change would be. I assume there is a reason 
> for existence of the StreamRecordSerializer ;)
> StackTrace: 
> {quote}
> TimerException\{java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord\}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:716)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:93)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.emitWatermark(OperatorChain.java:370)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processWatermark(WindowOperator.java:293)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.trigger(WindowOperator.java:323)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:710)
>   ... 7 more
> Caused by: java.lang.ClassCastException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecordSerializer.serialize(StreamRecordSerializer.java:42)
>   at 
> org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:56)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:79)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.broadcastEmit(RecordWriter.java:109)
>   at 
> org.apache.flink.streaming.runtime.io.StreamRecordWriter.broadcastEmit(StreamRecordWriter.java:95)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:90)
>   ... 11 more
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232210#comment-15232210
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027469
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CompleteGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CompleteGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Cycle Graph
+
+An undirected graph where all edges form a single cycle.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CycleGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CycleGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CycleGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Empty Graph
+
+The graph containing no edges.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new EmptyGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.EmptyGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new EmptyGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Grid Graph
+
+An undirected graph connecting vertices in a regular tiling in one or more 
dimensions.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new GridGraph(env)
+.addDimension(2, false)
+.addDimension(4, false)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.GridGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new GridGraph(env.getJavaEnv).addDimension(2, 
false).addDimension(4, false).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4

[GitHub] flink pull request: [FLINK-3688] WindowOperator.trigger() does not...

2016-04-08 Thread knaufk

Github user knaufk closed the pull request at:

https://github.com/apache/flink/pull/1861


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232207#comment-15232207
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027355
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CompleteGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CompleteGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Cycle Graph
+
+An undirected graph where all edges form a single cycle.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CycleGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CycleGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CycleGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Empty Graph
+
+The graph containing no edges.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new EmptyGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.EmptyGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new EmptyGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Grid Graph
+
+An undirected graph connecting vertices in a regular tiling in one or more 
dimensions.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new GridGraph(env)
+.addDimension(2, false)
+.addDimension(4, false)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.GridGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new GridGraph(env.getJavaEnv).addDimension(2, 
false).addDimension(4, false).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232206#comment-15232206
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027233
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CompleteGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CompleteGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Cycle Graph
+
+An undirected graph where all edges form a single cycle.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CycleGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CycleGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CycleGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Empty Graph
+
+The graph containing no edges.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new EmptyGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.EmptyGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new EmptyGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Grid Graph
+
+An undirected graph connecting vertices in a regular tiling in one or more 
dimensions.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new GridGraph(env)
+.addDimension(2, false)
+.addDimension(4, false)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.GridGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new GridGraph(env.getJavaEnv).addDimension(2, 
false).addDimension(4, false).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027355
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CompleteGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CompleteGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Cycle Graph
+
+An undirected graph where all edges form a single cycle.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CycleGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CycleGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CycleGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Empty Graph
+
+The graph containing no edges.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new EmptyGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.EmptyGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new EmptyGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Grid Graph
+
+An undirected graph connecting vertices in a regular tiling in one or more 
dimensions.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new GridGraph(env)
+.addDimension(2, false)
+.addDimension(4, false)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.GridGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new GridGraph(env.getJavaEnv).addDimension(2, 
false).addDimension(4, false).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+5
+
+
+6
+
+
+7
+
+
+### Hypercube Graph
+
+An undirected graph where edges form an n-dimensional hypercube.
+
+
+
+{% highlight java %}

[GitHub] flink pull request: [FLINK-3688] WindowOperator.trigger() does not...

2016-04-08 Thread aljoscha

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1861#issuecomment-207440713
  
Could you please close if github does not do so automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027233
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CompleteGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CompleteGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Cycle Graph
+
+An undirected graph where all edges form a single cycle.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CycleGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CycleGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CycleGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Empty Graph
+
+The graph containing no edges.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new EmptyGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.EmptyGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new EmptyGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Grid Graph
+
+An undirected graph connecting vertices in a regular tiling in one or more 
dimensions.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new GridGraph(env)
+.addDimension(2, false)
+.addDimension(4, false)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.GridGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new GridGraph(env.getJavaEnv).addDimension(2, 
false).addDimension(4, false).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+5
+
+
+6
+
+
+7
+
+
+### Hypercube Graph
+
+An undirected graph where edges form an n-dimensional hypercube.
+
+
+
+{% highlight java %}

[jira] [Commented] (FLINK-3688) ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is called and TimeCharacteristic = ProcessingTime

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232204#comment-15232204
 ] 

ASF GitHub Bot commented on FLINK-3688:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1861#issuecomment-207440713
  
Could you please close if github does not do so automatically.


> ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is 
> called and TimeCharacteristic = ProcessingTime
> 
>
> Key: FLINK-3688
> URL: https://issues.apache.org/jira/browse/FLINK-3688
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Konstantin Knauf
>Assignee: Konstantin Knauf
>Priority: Critical
> Fix For: 1.0.2
>
>
> Hi,
> when using {{TimeCharacteristics.ProcessingTime}} a ClassCastException is 
> thrown in {{StreamRecordSerializer}} when 
> {{WindowOperator.processWatermark()}} is called from 
> {{WindowOperator.trigger()}}, i.e. whenever a ProcessingTimeTimer is 
> triggered. 
> The problem seems to be that {{processWatermark()}} is also called in 
> {{trigger()}}, when time characteristic is ProcessingTime, but in 
> {{RecordWriterOutput}} {{enableWatermarkMultiplexing}} is {{false}} and the 
> {{TypeSerializer}} is a {{StreamRecordSerializer}}, which ultimately leads to 
> the ClassCastException. Do you agree?
> If this is indeed a bug, there several possible solutions.
> # Only calling {{processWatermark()}} in {{trigger()}}, when 
> TimeCharacteristic is EventTime
> # Not calling {{processWatermark()}} in {{trigger()}} at all, instead wait 
> for the next watermark to trigger the EventTimeTimers with a timestamp behind 
> the current watermark. This is, of course, a trade off. 
> # Using {{MultiplexingStreamRecordSerializer}} all the time, but I have no 
> idea what the side effect of this change would be. I assume there is a reason 
> for existence of the StreamRecordSerializer ;)
> StackTrace: 
> {quote}
> TimerException\{java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord\}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:716)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:93)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.emitWatermark(OperatorChain.java:370)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processWatermark(WindowOperator.java:293)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.trigger(WindowOperator.java:323)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:710)
>   ... 7 more
> Caused by: java.lang.ClassCastException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecordSerializer.serialize(StreamRecordSerializer.java:42)
>   at 
> org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:56)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:79)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.broadcastEmit(RecordWriter.java:109)
>   at 
> org.apache.flink.streaming.runtime.io.StreamRecordWriter.broadcastEmit(StreamRecordWriter.java:95)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:90)
>   ... 11 more
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-3688) ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is called and TimeCharacteristic = ProcessingTime

2016-04-08 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved FLINK-3688.
-
   Resolution: Fixed
Fix Version/s: 1.0.2

Fixed in 
https://github.com/apache/flink/commit/a234719a25836f8c125aee27f1235caf42943b3b

Also copied to release-1.0 for eventual 1.0.2 release

> ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is 
> called and TimeCharacteristic = ProcessingTime
> 
>
> Key: FLINK-3688
> URL: https://issues.apache.org/jira/browse/FLINK-3688
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Konstantin Knauf
>Assignee: Konstantin Knauf
>Priority: Critical
> Fix For: 1.0.2
>
>
> Hi,
> when using {{TimeCharacteristics.ProcessingTime}} a ClassCastException is 
> thrown in {{StreamRecordSerializer}} when 
> {{WindowOperator.processWatermark()}} is called from 
> {{WindowOperator.trigger()}}, i.e. whenever a ProcessingTimeTimer is 
> triggered. 
> The problem seems to be that {{processWatermark()}} is also called in 
> {{trigger()}}, when time characteristic is ProcessingTime, but in 
> {{RecordWriterOutput}} {{enableWatermarkMultiplexing}} is {{false}} and the 
> {{TypeSerializer}} is a {{StreamRecordSerializer}}, which ultimately leads to 
> the ClassCastException. Do you agree?
> If this is indeed a bug, there several possible solutions.
> # Only calling {{processWatermark()}} in {{trigger()}}, when 
> TimeCharacteristic is EventTime
> # Not calling {{processWatermark()}} in {{trigger()}} at all, instead wait 
> for the next watermark to trigger the EventTimeTimers with a timestamp behind 
> the current watermark. This is, of course, a trade off. 
> # Using {{MultiplexingStreamRecordSerializer}} all the time, but I have no 
> idea what the side effect of this change would be. I assume there is a reason 
> for existence of the StreamRecordSerializer ;)
> StackTrace: 
> {quote}
> TimerException\{java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord\}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:716)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:93)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.emitWatermark(OperatorChain.java:370)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processWatermark(WindowOperator.java:293)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.trigger(WindowOperator.java:323)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:710)
>   ... 7 more
> Caused by: java.lang.ClassCastException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecordSerializer.serialize(StreamRecordSerializer.java:42)
>   at 
> org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:56)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:79)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.broadcastEmit(RecordWriter.java:109)
>   at 
> org.apache.flink.streaming.runtime.io.StreamRecordWriter.broadcastEmit(StreamRecordWriter.java:95)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:90)
>   ... 11 more
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027023
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CompleteGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CompleteGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Cycle Graph
+
+An undirected graph where all edges form a single cycle.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CycleGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CycleGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CycleGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Empty Graph
+
+The graph containing no edges.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new EmptyGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.EmptyGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new EmptyGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Grid Graph
+
+An undirected graph connecting vertices in a regular tiling in one or more 
dimensions.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new GridGraph(env)
+.addDimension(2, false)
+.addDimension(4, false)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.GridGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new GridGraph(env.getJavaEnv).addDimension(2, 
false).addDimension(4, false).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+5
+
+
+6
+
+
+7
+
+
+### Hypercube Graph
+
+An undirected graph where edges form an n-dimensional hypercube.
+
+
+
+{% highlight java %}

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232200#comment-15232200
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59026954
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CompleteGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CompleteGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Cycle Graph
+
+An undirected graph where all edges form a single cycle.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CycleGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CycleGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CycleGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Empty Graph
+
+The graph containing no edges.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new EmptyGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.EmptyGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new EmptyGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Grid Graph
+
+An undirected graph connecting vertices in a regular tiling in one or more 
dimensions.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new GridGraph(env)
+.addDimension(2, false)
--- End diff --

I think we should explain what `addDimension` does and what its arguments 
represent.


> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232201#comment-15232201
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59027023
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CompleteGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CompleteGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Cycle Graph
+
+An undirected graph where all edges form a single cycle.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CycleGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CycleGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CycleGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Empty Graph
+
+The graph containing no edges.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new EmptyGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.EmptyGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new EmptyGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Grid Graph
+
+An undirected graph connecting vertices in a regular tiling in one or more 
dimensions.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new GridGraph(env)
+.addDimension(2, false)
+.addDimension(4, false)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.GridGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new GridGraph(env.getJavaEnv).addDimension(2, 
false).addDimension(4, false).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59026954
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CompleteGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CompleteGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Cycle Graph
+
+An undirected graph where all edges form a single cycle.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CycleGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.CycleGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new CycleGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+
+
+
+
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Empty Graph
+
+The graph containing no edges.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new EmptyGraph(env, 5)
+.generate();
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+import org.apache.flink.api.scala._
+import org.apache.flink.graph.generator.EmptyGraph
+
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+
+val graph = new EmptyGraph(env.getJavaEnv, 5).generate()
+{% endhighlight %}
+
+
+
+http://www.w3.org/2000/svg;
+xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+
+0
+
+
+1
+
+
+2
+
+
+3
+
+
+4
+
+
+### Grid Graph
+
+An undirected graph connecting vertices in a regular tiling in one or more 
dimensions.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new GridGraph(env)
+.addDimension(2, false)
--- End diff --

I think we should explain what `addDimension` does and what its arguments 
represent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232198#comment-15232198
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59026858
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
--- End diff --

Could you explain what the parameter 5 is for?


> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59026858
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
+### Complete Graph
+
+An undirected graph connecting every distinct pair of vertices.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph graph = new CompleteGraph(env, 5)
--- End diff --

Could you explain what the parameter 5 is for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-08 Thread vasia

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59026824
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
--- End diff --

Could we add an overview of how graph generators can be created and used? 
e.g. that you pass the parameters to the specific generator and then call 
`generate()` to get the graph?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232197#comment-15232197
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1807#discussion_r59026824
  
--- Diff: docs/apis/batch/libs/gelly.md ---
@@ -1734,3 +1734,547 @@ vertex represents a group of vertices and each edge 
represents a group of edges
 vertex and edge in the output graph stores the common group value and the 
number of represented elements.
 
 {% top %}
+
+Graph Generators
+---
+
+Gelly provides a collection of scalable graph generators. Each generator is
+
+* parallelizable, in order to create large datasets
+* scale-free, generating the same graph regardless of parallelism
+* thrifty, using as few operators as possible
+
--- End diff --

Could we add an overview of how graph generators can be created and used? 
e.g. that you pass the parameters to the specific generator and then call 
`generate()` to get the graph?


> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-3717) Add functionality to get the current offset of a source with a FileInputFormat

2016-04-08 Thread Kostas Kloudas (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas Kloudas updated FLINK-3717:
--
Description: This is the first step in order to make the File Sources 
fault-tolerant. We have to be able to get the latest read offset in the file 
despite any caching performed during reading. This will guarantee that the task 
that will take over the execution of the failed one will be able to start from 
the correct point in the file.

> Add functionality to get the current offset of a source with a FileInputFormat
> --
>
> Key: FLINK-3717
> URL: https://issues.apache.org/jira/browse/FLINK-3717
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>
> This is the first step in order to make the File Sources fault-tolerant. We 
> have to be able to get the latest read offset in the file despite any caching 
> performed during reading. This will guarantee that the task that will take 
> over the execution of the failed one will be able to start from the correct 
> point in the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3640] Add support for SQL in DataSet pr...

2016-04-08 Thread fhueske

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1862#issuecomment-207438727
  
Thanks for the PR. I had a few minor comments but otherwise it looks really 
good. 

There are a few follow up issues, IMO:
- Check if we somehow can get around the `EnumerableToLogicalTableScan`. 
Maybe the Calcite community can help. I will open a JIRA for this once the PR 
is merged.
- Check how we can exclude unsupported SQL features such as outer joins, 
intersection, etc. Also here, the Calcite community should be able to help. I 
will open a JIRA for this once the PR is merged.
- Refactor `TranslationContext` and `TableEnvironment` to prevent that the 
same planner is used several times. I'll start a discussion about this soon.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3640) Add support for SQL queries in DataSet programs

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232194#comment-15232194
 ] 

ASF GitHub Bot commented on FLINK-3640:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1862#issuecomment-207438727
  
Thanks for the PR. I had a few minor comments but otherwise it looks really 
good. 

There are a few follow up issues, IMO:
- Check if we somehow can get around the `EnumerableToLogicalTableScan`. 
Maybe the Calcite community can help. I will open a JIRA for this once the PR 
is merged.
- Check how we can exclude unsupported SQL features such as outer joins, 
intersection, etc. Also here, the Calcite community should be able to help. I 
will open a JIRA for this once the PR is merged.
- Refactor `TranslationContext` and `TableEnvironment` to prevent that the 
same planner is used several times. I'll start a discussion about this soon.



> Add support for SQL queries in DataSet programs
> ---
>
> Key: FLINK-3640
> URL: https://issues.apache.org/jira/browse/FLINK-3640
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> This issue covers the task of supporting SQL queries embedded in DataSet 
> programs. In this mode, the input and output of a SQL query is a Table. For 
> this issue, we need to make the following additions to the Table API:
> - add a {{tEnv.sql(query: String): Table}} method for converting a query 
> result into a Table
> - integrate Calcite's SQL parser into the batch Table API translation process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3640) Add support for SQL queries in DataSet programs

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232183#comment-15232183
 ] 

ASF GitHub Bot commented on FLINK-3640:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1862#discussion_r59025508
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/test/JoinITCase.scala
 ---
@@ -0,0 +1,239 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.scala.sql.test
+
+import org.apache.calcite.tools.ValidationException
+import org.apache.flink.api.scala._
+import org.apache.flink.api.scala.table._
+import org.apache.flink.api.scala.util.CollectionDataSets
+import org.apache.flink.api.table.{TableException, Row}
+import org.apache.flink.api.table.plan.TranslationContext
+import org.apache.flink.api.table.test.utils.TableProgramsTestBase
+import 
org.apache.flink.api.table.test.utils.TableProgramsTestBase.TableConfigMode
+import 
org.apache.flink.test.util.MultipleProgramsTestBase.TestExecutionMode
+import org.apache.flink.test.util.TestBaseUtils
+import org.junit._
+import org.junit.runner.RunWith
+import org.junit.runners.Parameterized
+
+import scala.collection.JavaConverters._
+
+@RunWith(classOf[Parameterized])
+class JoinITCase(
+mode: TestExecutionMode,
+configMode: TableConfigMode)
+  extends TableProgramsTestBase(mode, configMode) {
+
+  @Test
+  def testJoin(): Unit = {
+
+val env = ExecutionEnvironment.getExecutionEnvironment
+val tEnv = getScalaTableEnvironment
+TranslationContext.reset()
+
+val sqlQuery = "SELECT c, g FROM Table3, Table5 WHERE b = e"
+
+val ds1 = CollectionDataSets.getSmall3TupleDataSet(env).toTable.as('a, 
'b, 'c)
+val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable.as('d, 'e, 
'f, 'g, 'h)
+tEnv.registerTable("Table3", ds1)
+tEnv.registerTable("Table5", ds2)
+
+val result = tEnv.sql(sqlQuery)
+
+val expected = "Hi,Hallo\n" + "Hello,Hallo Welt\n" + "Hello 
world,Hallo Welt\n"
+val results = result.toDataSet[Row](getConfig).collect()
+TestBaseUtils.compareResultAsText(results.asJava, expected)
+  }
+
+  @Test
+  def testJoinWithFilter(): Unit = {
+
+val env = ExecutionEnvironment.getExecutionEnvironment
+val tEnv = getScalaTableEnvironment
+TranslationContext.reset()
+
+val sqlQuery = "SELECT c, g FROM Table3, Table5 WHERE b = e AND b < 2"
+
+val ds1 = CollectionDataSets.getSmall3TupleDataSet(env).toTable.as('a, 
'b, 'c)
+val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable.as('d, 'e, 
'f, 'g, 'h)
+tEnv.registerTable("Table3", ds1)
+tEnv.registerTable("Table5", ds2)
+
+val result = tEnv.sql(sqlQuery)
+
+val expected = "Hi,Hallo\n"
+val results = result.toDataSet[Row](getConfig).collect()
+TestBaseUtils.compareResultAsText(results.asJava, expected)
+  }
+
+  @Test
+  def testJoinWithJoinFilter(): Unit = {
+
+val env = ExecutionEnvironment.getExecutionEnvironment
+val tEnv = getScalaTableEnvironment
+TranslationContext.reset()
+
+val sqlQuery = "SELECT c, g FROM Table3, Table5 WHERE b = e AND a < 6 
AND h < b"
+
+val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable.as('a, 'b, 
'c)
+val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable.as('d, 'e, 
'f, 'g, 'h)
+tEnv.registerTable("Table3", ds1)
+tEnv.registerTable("Table5", ds2)
+
+val result = tEnv.sql(sqlQuery)
+
+val expected = "Hello world, how are you?,Hallo Welt wie\n" +
+  "I am fine.,Hallo Welt wie\n"
+val results = result.toDataSet[Row](getConfig).collect()
+TestBaseUtils.compareResultAsText(results.asJava, expected)
+  }
+
+  @Test

[GitHub] flink pull request: [FLINK-3640] Add support for SQL in DataSet pr...

2016-04-08 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1862#discussion_r59025508
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/test/JoinITCase.scala
 ---
@@ -0,0 +1,239 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.scala.sql.test
+
+import org.apache.calcite.tools.ValidationException
+import org.apache.flink.api.scala._
+import org.apache.flink.api.scala.table._
+import org.apache.flink.api.scala.util.CollectionDataSets
+import org.apache.flink.api.table.{TableException, Row}
+import org.apache.flink.api.table.plan.TranslationContext
+import org.apache.flink.api.table.test.utils.TableProgramsTestBase
+import 
org.apache.flink.api.table.test.utils.TableProgramsTestBase.TableConfigMode
+import 
org.apache.flink.test.util.MultipleProgramsTestBase.TestExecutionMode
+import org.apache.flink.test.util.TestBaseUtils
+import org.junit._
+import org.junit.runner.RunWith
+import org.junit.runners.Parameterized
+
+import scala.collection.JavaConverters._
+
+@RunWith(classOf[Parameterized])
+class JoinITCase(
+mode: TestExecutionMode,
+configMode: TableConfigMode)
+  extends TableProgramsTestBase(mode, configMode) {
+
+  @Test
+  def testJoin(): Unit = {
+
+val env = ExecutionEnvironment.getExecutionEnvironment
+val tEnv = getScalaTableEnvironment
+TranslationContext.reset()
+
+val sqlQuery = "SELECT c, g FROM Table3, Table5 WHERE b = e"
+
+val ds1 = CollectionDataSets.getSmall3TupleDataSet(env).toTable.as('a, 
'b, 'c)
+val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable.as('d, 'e, 
'f, 'g, 'h)
+tEnv.registerTable("Table3", ds1)
+tEnv.registerTable("Table5", ds2)
+
+val result = tEnv.sql(sqlQuery)
+
+val expected = "Hi,Hallo\n" + "Hello,Hallo Welt\n" + "Hello 
world,Hallo Welt\n"
+val results = result.toDataSet[Row](getConfig).collect()
+TestBaseUtils.compareResultAsText(results.asJava, expected)
+  }
+
+  @Test
+  def testJoinWithFilter(): Unit = {
+
+val env = ExecutionEnvironment.getExecutionEnvironment
+val tEnv = getScalaTableEnvironment
+TranslationContext.reset()
+
+val sqlQuery = "SELECT c, g FROM Table3, Table5 WHERE b = e AND b < 2"
+
+val ds1 = CollectionDataSets.getSmall3TupleDataSet(env).toTable.as('a, 
'b, 'c)
+val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable.as('d, 'e, 
'f, 'g, 'h)
+tEnv.registerTable("Table3", ds1)
+tEnv.registerTable("Table5", ds2)
+
+val result = tEnv.sql(sqlQuery)
+
+val expected = "Hi,Hallo\n"
+val results = result.toDataSet[Row](getConfig).collect()
+TestBaseUtils.compareResultAsText(results.asJava, expected)
+  }
+
+  @Test
+  def testJoinWithJoinFilter(): Unit = {
+
+val env = ExecutionEnvironment.getExecutionEnvironment
+val tEnv = getScalaTableEnvironment
+TranslationContext.reset()
+
+val sqlQuery = "SELECT c, g FROM Table3, Table5 WHERE b = e AND a < 6 
AND h < b"
+
+val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable.as('a, 'b, 
'c)
+val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable.as('d, 'e, 
'f, 'g, 'h)
+tEnv.registerTable("Table3", ds1)
+tEnv.registerTable("Table5", ds2)
+
+val result = tEnv.sql(sqlQuery)
+
+val expected = "Hello world, how are you?,Hallo Welt wie\n" +
+  "I am fine.,Hallo Welt wie\n"
+val results = result.toDataSet[Row](getConfig).collect()
+TestBaseUtils.compareResultAsText(results.asJava, expected)
+  }
+
+  @Test
+  def testJoinWithMultipleKeys(): Unit = {
+
+val env = ExecutionEnvironment.getExecutionEnvironment
+val tEnv = getScalaTableEnvironment
+TranslationContext.reset()
+
+val sqlQuery = "SELECT c, g

[jira] [Commented] (FLINK-3640) Add support for SQL queries in DataSet programs

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232179#comment-15232179
 ] 

ASF GitHub Bot commented on FLINK-3640:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1862#discussion_r59025219
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/test/AggregationsITCase.scala
 ---
@@ -0,0 +1,264 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.scala.sql.test
+
+import org.apache.flink.api.scala._
+import org.apache.flink.api.scala.table._
+import org.apache.flink.api.scala.util.CollectionDataSets
+import org.apache.flink.api.table.Row
+import org.apache.flink.api.table.plan.TranslationContext
+import org.apache.flink.api.table.test.utils.TableProgramsTestBase
+import 
org.apache.flink.api.table.test.utils.TableProgramsTestBase.TableConfigMode
+import 
org.apache.flink.test.util.MultipleProgramsTestBase.TestExecutionMode
+import org.apache.flink.test.util.TestBaseUtils
+import org.junit._
+import org.junit.runner.RunWith
+import org.junit.runners.Parameterized
+
+import scala.collection.JavaConverters._
+
+@RunWith(classOf[Parameterized])
+class AggregationsITCase(
--- End diff --

Do we need each test for `DataSet` and `Table`? Wouldn't it be sufficient 
to test for `Table` and have one or two tests for `DataSet`?


> Add support for SQL queries in DataSet programs
> ---
>
> Key: FLINK-3640
> URL: https://issues.apache.org/jira/browse/FLINK-3640
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> This issue covers the task of supporting SQL queries embedded in DataSet 
> programs. In this mode, the input and output of a SQL query is a Table. For 
> this issue, we need to make the following additions to the Table API:
> - add a {{tEnv.sql(query: String): Table}} method for converting a query 
> result into a Table
> - integrate Calcite's SQL parser into the batch Table API translation process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3640] Add support for SQL in DataSet pr...

2016-04-08 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1862#discussion_r59025219
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/sql/test/AggregationsITCase.scala
 ---
@@ -0,0 +1,264 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.scala.sql.test
+
+import org.apache.flink.api.scala._
+import org.apache.flink.api.scala.table._
+import org.apache.flink.api.scala.util.CollectionDataSets
+import org.apache.flink.api.table.Row
+import org.apache.flink.api.table.plan.TranslationContext
+import org.apache.flink.api.table.test.utils.TableProgramsTestBase
+import 
org.apache.flink.api.table.test.utils.TableProgramsTestBase.TableConfigMode
+import 
org.apache.flink.test.util.MultipleProgramsTestBase.TestExecutionMode
+import org.apache.flink.test.util.TestBaseUtils
+import org.junit._
+import org.junit.runner.RunWith
+import org.junit.runners.Parameterized
+
+import scala.collection.JavaConverters._
+
+@RunWith(classOf[Parameterized])
+class AggregationsITCase(
--- End diff --

Do we need each test for `DataSet` and `Table`? Wouldn't it be sufficient 
to test for `Table` and have one or two tests for `DataSet`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3311) Add a connector for streaming data into Cassandra

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232142#comment-15232142
 ] 

ASF GitHub Bot commented on FLINK-3311:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1771#issuecomment-207421723
  
I've tested the change again on a cluster and locally. The mode without WAL 
works fine, the WAL-variant fails on (or quickly after) recovery.
It seems to be an issue with the `FsStatebackend`, not with this PR. Still 
I'll wait with the merging until we understood the issue.


![image](https://cloud.githubusercontent.com/assets/89049/14383013/6ed277de-fd92-11e5-8387-4e09575b9cce.png)


![image](https://cloud.githubusercontent.com/assets/89049/14383018/776e0368-fd92-11e5-9bfe-cce5fd9d0277.png)




> Add a connector for streaming data into Cassandra
> -
>
> Key: FLINK-3311
> URL: https://issues.apache.org/jira/browse/FLINK-3311
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Andrea Sella
>
> We had users in the past asking for a Flink+Cassandra integration.
> It seems that there is a well-developed java client for connecting into 
> Cassandra: https://github.com/datastax/java-driver (ASL 2.0)
> There are also tutorials out there on how to start a local cassandra instance 
> (for the tests): 
> http://prettyprint.me/prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/index.html
> For the data types, I think we should support TupleX types, and map standard 
> java types to the respective cassandra types.
> In addition, it seems that there is a object mapper from datastax to store 
> POJOs in Cassandra (there are annotations for defining the primary key and 
> types)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3311/FLINK-3332] Add Cassandra connecto...

2016-04-08 Thread rmetzger

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1771#issuecomment-207421723
  
I've tested the change again on a cluster and locally. The mode without WAL 
works fine, the WAL-variant fails on (or quickly after) recovery.
It seems to be an issue with the `FsStatebackend`, not with this PR. Still 
I'll wait with the merging until we understood the issue.


![image](https://cloud.githubusercontent.com/assets/89049/14383013/6ed277de-fd92-11e5-8387-4e09575b9cce.png)


![image](https://cloud.githubusercontent.com/assets/89049/14383018/776e0368-fd92-11e5-9bfe-cce5fd9d0277.png)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3713) DisposeSavepoint message uses system classloader to discard state

2016-04-08 Thread Konstantin Knauf (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232116#comment-15232116
 ] 

Konstantin Knauf commented on FLINK-3713:
-

Works as expected.

> DisposeSavepoint message uses system classloader to discard state
> -
>
> Key: FLINK-3713
> URL: https://issues.apache.org/jira/browse/FLINK-3713
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Robert Metzger
>
> The {{DisposeSavepoint}} message in the JobManager is using the system 
> classloader to discard the state:
> {code}
> val savepoint = savepointStore.getState(savepointPath)
> log.debug(s"$savepoint")
> // Discard the associated checkpoint
> savepoint.discard(getClass.getClassLoader)
> // Dispose the savepoint
> savepointStore.disposeState(savepointPath)
> {code}
> Which leads to issues when the state contains user classes:
> {code}
> 2016-04-07 03:02:12,225 INFO  org.apache.flink.yarn.YarnJobManager
>   - Disposing savepoint at 
> 'hdfs:/bigdata/flink/savepoints/savepoint-7610540d089f'.
> 2016-04-07 03:02:12,233 WARN  
> org.apache.flink.runtime.checkpoint.StateForTask  - Failed to 
> discard checkpoint state: StateForTask eed5cc5a12dc2e0672848ba81bd8fa6d-0 : 
> SerializedValue
> java.lang.ClassNotFoundException: 
> .MetricsProcessor$CombinedKeysFoldFunction
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:270)
>   at 
> org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:64)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at java.util.HashMap.readObject(HashMap.java:1184)
>   at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290)
>   at 
>

[jira] [Commented] (FLINK-3719) WebInterface: Moving the barrier between graph and stats

2016-04-08 Thread Niels Basjes (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232072#comment-15232072
 ] 

Niels Basjes commented on FLINK-3719:
-

Originally asked here FLINK-3584

> WebInterface: Moving the barrier between graph and stats
> 
>
> Key: FLINK-3719
> URL: https://issues.apache.org/jira/browse/FLINK-3719
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Reporter: Niels Basjes
>
> It would be really useful if the separator between the graphical view of a 
> job topology at the top and the textual overview of the counters at the 
> bottom can be moved up/down. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-3584) WebInterface: Broken graphical display in Chrome

2016-04-08 Thread Niels Basjes (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niels Basjes closed FLINK-3584.
---
Resolution: Fixed

I double checked: This no longer occurs in 1.0.0.
I opened a new ticket for the feature request of moving the separator: 
FLINK-3719

> WebInterface: Broken graphical display in Chrome
> 
>
> Key: FLINK-3584
> URL: https://issues.apache.org/jira/browse/FLINK-3584
> Project: Flink
>  Issue Type: Bug
>  Components: Webfrontend
>Reporter: Niels Basjes
>
> When running a Flink (streaming) application there is a very nice graphical 
> representation of the flow in the jobtracker webinterface.
> We found that when using Firefox we can move and scale this graphics very 
> nicely and also there are arrows between the boxes that show which direction 
> the data flows.
> When using Google Chrome to look at the exact same data we find that we can 
> scale the image but not move it around. And the arrows are missing.
> Something that doesn't work in both cases (but what we do expect should work) 
> is the option to move up/down the separator between the graphical at the top 
> and the textual overview at the bottom. Or is this simply a missing feature?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3719) WebInterface: Moving the barrier between graph and stats

2016-04-08 Thread Niels Basjes (JIRA)

Niels Basjes created FLINK-3719:
---

 Summary: WebInterface: Moving the barrier between graph and stats
 Key: FLINK-3719
 URL: https://issues.apache.org/jira/browse/FLINK-3719
 Project: Flink
  Issue Type: Improvement
  Components: Webfrontend
Reporter: Niels Basjes


It would be really useful if the separator between the graphical view of a job 
topology at the top and the textual overview of the counters at the bottom can 
be moved up/down. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3718) Add Option For Completely Async Backup in RocksDB State Backend

2016-04-08 Thread Aljoscha Krettek (JIRA)

Aljoscha Krettek created FLINK-3718:
---

 Summary: Add Option For Completely Async Backup in RocksDB State 
Backend
 Key: FLINK-3718
 URL: https://issues.apache.org/jira/browse/FLINK-3718
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek


Right now, the snapshotting for RocksDB has a synchronous part where a backup 
of the RocksDB database is drawn and an asynchronous part where this backup is 
written to HDFS.

We should add an option that uses the snapshot feature of RocksDB to get an 
iterator over all keys at a set point in time. The iterator can be used to 
store everything to HDFS. Normal operation can continue while we store the 
keys. This makes the snapshot completely asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3670) Kerberos: Improving long-running streaming jobs

2016-04-08 Thread Niels Basjes (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232054#comment-15232054
 ] 

Niels Basjes commented on FLINK-3670:
-

A while ago I found that part of the problem is in the upstream tools that are 
used. 
See a similar bug report for Spark (SPARK-11182) and what looks like an 
important blocker to really fix this HDFS-9276 

> Kerberos: Improving long-running streaming jobs
> ---
>
> Key: FLINK-3670
> URL: https://issues.apache.org/jira/browse/FLINK-3670
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client, Local Runtime
>Reporter: Maximilian Michels
>
> We have seen in the past, that Hadoop's delegation tokens are subject to a 
> number of subtle token renewal bugs. In addition, they have a maximum life 
> time that can be worked around but is very inconvenient for the user.
> As per [mailing list 
> discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Kerberos-for-Streaming-amp-Kafka-td10906.html],
>  a way to work around the maximum life time of DelegationTokens would be to 
> pass the Kerberos principal and key tab upon job submission. A daemon could 
> then periodically renew the ticket. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3640) Add support for SQL queries in DataSet programs

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231995#comment-15231995
 ] 

ASF GitHub Bot commented on FLINK-3640:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1862#discussion_r59007631
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/AbstractTableEnvironment.scala
 ---
@@ -83,4 +84,17 @@ class AbstractTableEnvironment {
 )
 TranslationContext.registerTable(dataSetTable, name)
   }
+
+  /**
+   * Execute a SQL query on a batch [[Table]].
+   * The [[Table]] has to be registered in the tables registry
--- End diff --

We can query more than one table (all tables need to be registered). Tables 
should be registered at the TableEnvironment. I think the term `tables 
registry` is not defined.


> Add support for SQL queries in DataSet programs
> ---
>
> Key: FLINK-3640
> URL: https://issues.apache.org/jira/browse/FLINK-3640
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> This issue covers the task of supporting SQL queries embedded in DataSet 
> programs. In this mode, the input and output of a SQL query is a Table. For 
> this issue, we need to make the following additions to the Table API:
> - add a {{tEnv.sql(query: String): Table}} method for converting a query 
> result into a Table
> - integrate Calcite's SQL parser into the batch Table API translation process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3640] Add support for SQL in DataSet pr...

2016-04-08 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1862#discussion_r59007631
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/AbstractTableEnvironment.scala
 ---
@@ -83,4 +84,17 @@ class AbstractTableEnvironment {
 )
 TranslationContext.registerTable(dataSetTable, name)
   }
+
+  /**
+   * Execute a SQL query on a batch [[Table]].
+   * The [[Table]] has to be registered in the tables registry
--- End diff --

We can query more than one table (all tables need to be registered). Tables 
should be registered at the TableEnvironment. I think the term `tables 
registry` is not defined.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3640) Add support for SQL queries in DataSet programs

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231990#comment-15231990
 ] 

ASF GitHub Bot commented on FLINK-3640:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1862#discussion_r59007483
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/AbstractTableEnvironment.scala
 ---
@@ -83,4 +84,17 @@ class AbstractTableEnvironment {
 )
 TranslationContext.registerTable(dataSetTable, name)
   }
+
+  /**
+   * Execute a SQL query on a batch [[Table]].
--- End diff --

Keep the description of this method more generic as it will be the entry 
point for stream SQL queries as well.


> Add support for SQL queries in DataSet programs
> ---
>
> Key: FLINK-3640
> URL: https://issues.apache.org/jira/browse/FLINK-3640
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> This issue covers the task of supporting SQL queries embedded in DataSet 
> programs. In this mode, the input and output of a SQL query is a Table. For 
> this issue, we need to make the following additions to the Table API:
> - add a {{tEnv.sql(query: String): Table}} method for converting a query 
> result into a Table
> - integrate Calcite's SQL parser into the batch Table API translation process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3640] Add support for SQL in DataSet pr...

2016-04-08 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1862#discussion_r59007483
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/AbstractTableEnvironment.scala
 ---
@@ -83,4 +84,17 @@ class AbstractTableEnvironment {
 )
 TranslationContext.registerTable(dataSetTable, name)
   }
+
+  /**
+   * Execute a SQL query on a batch [[Table]].
--- End diff --

Keep the description of this method more generic as it will be the entry 
point for stream SQL queries as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-3717) Add functionality to get the current offset of a source with a FileInputFormat

2016-04-08 Thread Kostas Kloudas (JIRA)

Kostas Kloudas created FLINK-3717:
-

 Summary: Add functionality to get the current offset of a source 
with a FileInputFormat
 Key: FLINK-3717
 URL: https://issues.apache.org/jira/browse/FLINK-3717
 Project: Flink
  Issue Type: Sub-task
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2314) Make Streaming File Sources Persistent

2016-04-08 Thread Kostas Kloudas (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas Kloudas updated FLINK-2314:
--
Labels:   (was: easyfix starter)

> Make Streaming File Sources Persistent
> --
>
> Key: FLINK-2314
> URL: https://issues.apache.org/jira/browse/FLINK-2314
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>
> Streaming File sources should participate in the checkpointing. They should 
> track the bytes they read from the file and checkpoint it.
> One can look at the sequence generating source function for an example of a 
> checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3311) Add a connector for streaming data into Cassandra

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231954#comment-15231954
 ] 

ASF GitHub Bot commented on FLINK-3311:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/1771#discussion_r59004103
  
--- Diff: 
flink-streaming-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraCommitter.java
 ---
@@ -0,0 +1,126 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.connectors.cassandra;
+
+import com.datastax.driver.core.Cluster;
+import com.datastax.driver.core.PreparedStatement;
+import com.datastax.driver.core.Session;
+import org.apache.flink.api.java.ClosureCleaner;
+import org.apache.flink.streaming.runtime.operators.CheckpointCommitter;
+
+/**
+ * CheckpointCommitter that saves information about completed checkpoints 
within a separate table in a cassandra
+ * database.
+ * 
+ * Entries are in the form |operator_id | subtask_id | 
last_completed_checkpoint|
+ */
+public class CassandraCommitter extends CheckpointCommitter {
+   private ClusterBuilder builder;
+   private transient Cluster cluster;
+   private transient Session session;
+
+   private static final String KEYSPACE = "flink_auxiliary";
+   private String TABLE = "checkpoints_";
+
+   private transient PreparedStatement deleteStatement;
+   private transient PreparedStatement updateStatement;
+   private transient PreparedStatement selectStatement;
+
+   public CassandraCommitter(ClusterBuilder builder) {
+   this.builder = builder;
+   ClosureCleaner.clean(builder, true);
+   }
+
+   /**
+* Internally used to set the job ID after instantiation.
+*
+* @param id
+* @throws Exception
+*/
+   public void setJobId(String id) throws Exception {
+   super.setJobId(id);
+   TABLE += id;
+   }
+
+   /**
+* Generates the necessary tables to store information.
+*
+* @return
+* @throws Exception
+*/
+   @Override
+   public void createResource() throws Exception {
+   cluster = builder.getCluster();
+   session = cluster.connect();
+
+   session.execute(String.format("CREATE KEYSPACE IF NOT EXISTS %s 
with replication={'class':'SimpleStrategy', 'replication_factor':3};", 
KEYSPACE));
+   session.execute(String.format("CREATE TABLE IF NOT EXISTS %s.%s 
(sink_id text, sub_id int, checkpoint_id bigint, PRIMARY KEY (sink_id, 
sub_id));", KEYSPACE, TABLE));
--- End diff --

fixed


> Add a connector for streaming data into Cassandra
> -
>
> Key: FLINK-3311
> URL: https://issues.apache.org/jira/browse/FLINK-3311
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Andrea Sella
>
> We had users in the past asking for a Flink+Cassandra integration.
> It seems that there is a well-developed java client for connecting into 
> Cassandra: https://github.com/datastax/java-driver (ASL 2.0)
> There are also tutorials out there on how to start a local cassandra instance 
> (for the tests): 
> http://prettyprint.me/prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/index.html
> For the data types, I think we should support TupleX types, and map standard 
> java types to the respective cassandra types.
> In addition, it seems that there is a object mapper from datastax to store 
> POJOs in Cassandra (there are annotations for defining the primary key and 
> types)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3311/FLINK-3332] Add Cassandra connecto...

2016-04-08 Thread zentol

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/1771#discussion_r59004103
  
--- Diff: 
flink-streaming-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraCommitter.java
 ---
@@ -0,0 +1,126 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.connectors.cassandra;
+
+import com.datastax.driver.core.Cluster;
+import com.datastax.driver.core.PreparedStatement;
+import com.datastax.driver.core.Session;
+import org.apache.flink.api.java.ClosureCleaner;
+import org.apache.flink.streaming.runtime.operators.CheckpointCommitter;
+
+/**
+ * CheckpointCommitter that saves information about completed checkpoints 
within a separate table in a cassandra
+ * database.
+ * 
+ * Entries are in the form |operator_id | subtask_id | 
last_completed_checkpoint|
+ */
+public class CassandraCommitter extends CheckpointCommitter {
+   private ClusterBuilder builder;
+   private transient Cluster cluster;
+   private transient Session session;
+
+   private static final String KEYSPACE = "flink_auxiliary";
+   private String TABLE = "checkpoints_";
+
+   private transient PreparedStatement deleteStatement;
+   private transient PreparedStatement updateStatement;
+   private transient PreparedStatement selectStatement;
+
+   public CassandraCommitter(ClusterBuilder builder) {
+   this.builder = builder;
+   ClosureCleaner.clean(builder, true);
+   }
+
+   /**
+* Internally used to set the job ID after instantiation.
+*
+* @param id
+* @throws Exception
+*/
+   public void setJobId(String id) throws Exception {
+   super.setJobId(id);
+   TABLE += id;
+   }
+
+   /**
+* Generates the necessary tables to store information.
+*
+* @return
+* @throws Exception
+*/
+   @Override
+   public void createResource() throws Exception {
+   cluster = builder.getCluster();
+   session = cluster.connect();
+
+   session.execute(String.format("CREATE KEYSPACE IF NOT EXISTS %s 
with replication={'class':'SimpleStrategy', 'replication_factor':3};", 
KEYSPACE));
+   session.execute(String.format("CREATE TABLE IF NOT EXISTS %s.%s 
(sink_id text, sub_id int, checkpoint_id bigint, PRIMARY KEY (sink_id, 
sub_id));", KEYSPACE, TABLE));
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3311) Add a connector for streaming data into Cassandra

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231910#comment-15231910
 ] 

ASF GitHub Bot commented on FLINK-3311:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1771#discussion_r58999599
  
--- Diff: 
flink-streaming-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraCommitter.java
 ---
@@ -0,0 +1,126 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.connectors.cassandra;
+
+import com.datastax.driver.core.Cluster;
+import com.datastax.driver.core.PreparedStatement;
+import com.datastax.driver.core.Session;
+import org.apache.flink.api.java.ClosureCleaner;
+import org.apache.flink.streaming.runtime.operators.CheckpointCommitter;
+
+/**
+ * CheckpointCommitter that saves information about completed checkpoints 
within a separate table in a cassandra
+ * database.
+ * 
+ * Entries are in the form |operator_id | subtask_id | 
last_completed_checkpoint|
+ */
+public class CassandraCommitter extends CheckpointCommitter {
+   private ClusterBuilder builder;
+   private transient Cluster cluster;
+   private transient Session session;
+
+   private static final String KEYSPACE = "flink_auxiliary";
+   private String TABLE = "checkpoints_";
+
+   private transient PreparedStatement deleteStatement;
+   private transient PreparedStatement updateStatement;
+   private transient PreparedStatement selectStatement;
+
+   public CassandraCommitter(ClusterBuilder builder) {
+   this.builder = builder;
+   ClosureCleaner.clean(builder, true);
+   }
+
+   /**
+* Internally used to set the job ID after instantiation.
+*
+* @param id
+* @throws Exception
+*/
+   public void setJobId(String id) throws Exception {
+   super.setJobId(id);
+   TABLE += id;
+   }
+
+   /**
+* Generates the necessary tables to store information.
+*
+* @return
+* @throws Exception
+*/
+   @Override
+   public void createResource() throws Exception {
+   cluster = builder.getCluster();
+   session = cluster.connect();
+
+   session.execute(String.format("CREATE KEYSPACE IF NOT EXISTS %s 
with replication={'class':'SimpleStrategy', 'replication_factor':3};", 
KEYSPACE));
+   session.execute(String.format("CREATE TABLE IF NOT EXISTS %s.%s 
(sink_id text, sub_id int, checkpoint_id bigint, PRIMARY KEY (sink_id, 
sub_id));", KEYSPACE, TABLE));
--- End diff --

Users still can not pass custom keyspaces


> Add a connector for streaming data into Cassandra
> -
>
> Key: FLINK-3311
> URL: https://issues.apache.org/jira/browse/FLINK-3311
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Andrea Sella
>
> We had users in the past asking for a Flink+Cassandra integration.
> It seems that there is a well-developed java client for connecting into 
> Cassandra: https://github.com/datastax/java-driver (ASL 2.0)
> There are also tutorials out there on how to start a local cassandra instance 
> (for the tests): 
> http://prettyprint.me/prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/index.html
> For the data types, I think we should support TupleX types, and map standard 
> java types to the respective cassandra types.
> In addition, it seems that there is a object mapper from datastax to store 
> POJOs in Cassandra (there are annotations for defining the primary key and 
> types)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3311/FLINK-3332] Add Cassandra connecto...

2016-04-08 Thread rmetzger

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1771#discussion_r58999599
  
--- Diff: 
flink-streaming-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraCommitter.java
 ---
@@ -0,0 +1,126 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.connectors.cassandra;
+
+import com.datastax.driver.core.Cluster;
+import com.datastax.driver.core.PreparedStatement;
+import com.datastax.driver.core.Session;
+import org.apache.flink.api.java.ClosureCleaner;
+import org.apache.flink.streaming.runtime.operators.CheckpointCommitter;
+
+/**
+ * CheckpointCommitter that saves information about completed checkpoints 
within a separate table in a cassandra
+ * database.
+ * 
+ * Entries are in the form |operator_id | subtask_id | 
last_completed_checkpoint|
+ */
+public class CassandraCommitter extends CheckpointCommitter {
+   private ClusterBuilder builder;
+   private transient Cluster cluster;
+   private transient Session session;
+
+   private static final String KEYSPACE = "flink_auxiliary";
+   private String TABLE = "checkpoints_";
+
+   private transient PreparedStatement deleteStatement;
+   private transient PreparedStatement updateStatement;
+   private transient PreparedStatement selectStatement;
+
+   public CassandraCommitter(ClusterBuilder builder) {
+   this.builder = builder;
+   ClosureCleaner.clean(builder, true);
+   }
+
+   /**
+* Internally used to set the job ID after instantiation.
+*
+* @param id
+* @throws Exception
+*/
+   public void setJobId(String id) throws Exception {
+   super.setJobId(id);
+   TABLE += id;
+   }
+
+   /**
+* Generates the necessary tables to store information.
+*
+* @return
+* @throws Exception
+*/
+   @Override
+   public void createResource() throws Exception {
+   cluster = builder.getCluster();
+   session = cluster.connect();
+
+   session.execute(String.format("CREATE KEYSPACE IF NOT EXISTS %s 
with replication={'class':'SimpleStrategy', 'replication_factor':3};", 
KEYSPACE));
+   session.execute(String.format("CREATE TABLE IF NOT EXISTS %s.%s 
(sink_id text, sub_id int, checkpoint_id bigint, PRIMARY KEY (sink_id, 
sub_id));", KEYSPACE, TABLE));
--- End diff --

Users still can not pass custom keyspaces


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3709) [streaming] Graph event rates over time

2016-04-08 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231898#comment-15231898
 ] 

Robert Metzger commented on FLINK-3709:
---

Hi Nick. I agree that showing rates instead of counts is better for streaming 
jobs.
I'm currently thinking about extending our metrics / monitoring system (with 
[~Zentol]) and improving the web interface for that. We'll also change the 
count to a rate on the overview page (and make the count still available on a 
special page)

> [streaming] Graph event rates over time
> ---
>
> Key: FLINK-3709
> URL: https://issues.apache.org/jira/browse/FLINK-3709
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Reporter: Nick Dimiduk
>
> The streaming server job page displays bytes and records sent and received, 
> which answers the question "is data moving?" The next obvious question is "is 
> data moving over time?" That could be answered by a chart displaying 
> bytes/events rates. This would be a great chart to add to this display.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3697) keyBy() with nested POJO computes invalid field position indexes

2016-04-08 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231878#comment-15231878
 ] 

Robert Metzger commented on FLINK-3697:
---

Merged for 1.0.2: http://git-wip-us.apache.org/repos/asf/flink/commit/43093e3b

> keyBy() with nested POJO computes invalid field position indexes
> 
>
> Key: FLINK-3697
> URL: https://issues.apache.org/jira/browse/FLINK-3697
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.0.0
> Environment: MacOS X 10.10
>Reporter: Ron Crocker
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pojo
> Fix For: 1.1.0
>
>
> Using named keys in keyBy() for nested POJO types results in failure. The 
> iindexes for named key fields are used inconsistently with nested POJO types. 
> In particular, {{PojoTypeInfo.getFlatFields()}} returns the field's position 
> after (apparently) flattening the structure but is referenced in the 
> unflattened version of the POJO type by {{PojoTypeInfo.getTypeAt()}}.
> In the example below, getFlatFields() returns positions 0, 1, and 14. These 
> positions appear correct in the flattened structure of the Data class. 
> However, in {{KeySelector getSelectorForKeys(Keys keys, 
> TypeInformation typeInfo, ExecutionConfig executionConfig)}}, a call to 
> {{compositeType.getTypeAt(logicalKeyPositions[i])}} for the third key results 
> {{PojoTypeInfo.getTypeAt()}} declaring it out of range, as it compares the 
> length of the directly named fields of the object vs the length of flattened 
> version of that type.
> Concrete Example:
> Consider this graph:
> {code}
> DataStream dataStream = see.addSource(new 
> FlinkKafkaConsumer08<>(timesliceConstants.topic, new DataDeserialzer(), 
> kafkaConsumerProperties));
> dataStream
>   .flatMap(new DataMapper())
>   .keyBy("aaa", "abc", "wxyz")
> {code}
> {{DataDeserialzer}} returns a "NativeDataFormat" object; {{DataMapper}} takes 
> this NativeDataFormat object and extracts individual Data objects: {code}
> public class Data {
> public int aaa;
> public int abc;
> public long wxyz;
> public int t1;
> public int t2;
> public Policy policy;
> public Stats stats;
> public Data() {}
> {code}
> A {{Policy}} object is an instance of this class:
> {code}
> public class Policy {
> public short a;
> public short b;
> public boolean c;
> public boolean d;
> public Policy() {}
> }
> {code}
> A {{Stats}} object is an instance of this class:
> {code}
> public class Stats {
> public long count;
> public float a;
> public float b;
> public float c;
> public float d;
> public float e;
> public Stats() {}
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3688) ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is called and TimeCharacteristic = ProcessingTime

2016-04-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231813#comment-15231813
 ] 

ASF GitHub Bot commented on FLINK-3688:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1861#issuecomment-207287141
  
Thanks for your work! I'll merge it today.


> ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is 
> called and TimeCharacteristic = ProcessingTime
> 
>
> Key: FLINK-3688
> URL: https://issues.apache.org/jira/browse/FLINK-3688
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Konstantin Knauf
>Assignee: Konstantin Knauf
>Priority: Critical
>
> Hi,
> when using {{TimeCharacteristics.ProcessingTime}} a ClassCastException is 
> thrown in {{StreamRecordSerializer}} when 
> {{WindowOperator.processWatermark()}} is called from 
> {{WindowOperator.trigger()}}, i.e. whenever a ProcessingTimeTimer is 
> triggered. 
> The problem seems to be that {{processWatermark()}} is also called in 
> {{trigger()}}, when time characteristic is ProcessingTime, but in 
> {{RecordWriterOutput}} {{enableWatermarkMultiplexing}} is {{false}} and the 
> {{TypeSerializer}} is a {{StreamRecordSerializer}}, which ultimately leads to 
> the ClassCastException. Do you agree?
> If this is indeed a bug, there several possible solutions.
> # Only calling {{processWatermark()}} in {{trigger()}}, when 
> TimeCharacteristic is EventTime
> # Not calling {{processWatermark()}} in {{trigger()}} at all, instead wait 
> for the next watermark to trigger the EventTimeTimers with a timestamp behind 
> the current watermark. This is, of course, a trade off. 
> # Using {{MultiplexingStreamRecordSerializer}} all the time, but I have no 
> idea what the side effect of this change would be. I assume there is a reason 
> for existence of the StreamRecordSerializer ;)
> StackTrace: 
> {quote}
> TimerException\{java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord\}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:716)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:93)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.emitWatermark(OperatorChain.java:370)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processWatermark(WindowOperator.java:293)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.trigger(WindowOperator.java:323)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:710)
>   ... 7 more
> Caused by: java.lang.ClassCastException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecordSerializer.serialize(StreamRecordSerializer.java:42)
>   at 
> org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:56)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:79)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.broadcastEmit(RecordWriter.java:109)
>   at 
> org.apache.flink.streaming.runtime.io.StreamRecordWriter.broadcastEmit(StreamRecordWriter.java:95)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:90)
>   ... 11 more
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3688] WindowOperator.trigger() does not...

2016-04-08 Thread aljoscha

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1861#issuecomment-207287141
  
Thanks for your work! I'll merge it today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

84 matches

Mail list logo