[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-13 Thread Chiwan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283386#comment-15283386
 ] 

Chiwan Park commented on FLINK-1873:


[~chobeat] [~till.rohrmann] How about split this issue to several issues? For 
example, a issue that covers row-based matrix implementation and the other 
issue that covers block-based matrix implementation. This approach makes review 
and tracking this issue easy.

> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3912] [docs] Fix errors in Batch Scala ...

2016-05-13 Thread houcros
Github user houcros commented on the pull request:

https://github.com/apache/flink/pull/1991#issuecomment-219182774
  
Happy to help :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-3912) Typos in Batch Scala API Documentation

2016-05-13 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-3912.

   Resolution: Fixed
Fix Version/s: 1.0.4
   1.1.0

Fixed for 1.0.4 with 496dbc881b6cbec3851b523db9b5b35dbe0313c6
Fixed for 1.1.0 with a9fc71d850528b472dee3541ae141e121bfc9b5f

Thanks for the fix!

> Typos in Batch Scala API Documentation
> --
>
> Key: FLINK-3912
> URL: https://issues.apache.org/jira/browse/FLINK-3912
> Project: Flink
>  Issue Type: Improvement
>  Components: Batch, Documentation, Scala API
>Reporter: Ignacio N. Lucero Ascencio
>Priority: Trivial
>  Labels: typo
> Fix For: 1.1.0, 1.0.4
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> In the Batch Guide Documentation, in the Join section there are some small 
> typos/errors for the Scala API.
> In particular, in the section: Join with Flat-Join Function, "left" is used 
> as "rating", and "right" is used as "weight".
> Also a parenthesis is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3912] [docs] Fix errors in Batch Scala ...

2016-05-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1991


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3911) Sort operation before a group reduce doesn't seem to be implemented on 1.0.2

2016-05-13 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283238#comment-15283238
 ] 

Fabian Hueske commented on FLINK-3911:
--

In the JIRA description it says that you run the job on a standalone cluster. 

Just to clarify,
When you submit a Flink 0.10.2 job with a 0.10.2 client to a 0.10.2 cluster it 
works,
but it fails when you submit a Flink 1.0.2 (or 1.0.3) job with a 1.0.2 client 
to a 1.0.2 cluster, right?

I did not observe the when I tried to reproduce the problem in an IDE.
Have you tried to run the job locally in an IDE? 

> Sort operation before a group reduce doesn't seem to be implemented on 1.0.2
> 
>
> Key: FLINK-3911
> URL: https://issues.apache.org/jira/browse/FLINK-3911
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.2
> Environment: Linux Ubuntu, standalone cluster
>Reporter: Patrice Freydiere
>  Labels: newbie
>
> i have this piece of code: 
>  // group by id and sort on field order
> DataSet> waysGeometry = 
> joinedWaysWithPoints.groupBy(0).sortGroup(1, Order.ASCENDING)
> .reduceGroup(new 
> GroupReduceFunction, Tuple2 byte[]>>() {
> @Override
> public void 
> reduce(Iterable> values,
> 
> Collector> out) throws Exception {
> long id = -1;
> and this exception when executing ;
> ava.lang.Exception: The data preparation for task 'GroupReduce (GroupReduce 
> at constructOSMStreams(ProcessOSM.java:112))' , caused an error: Unrecognized 
> driver strategy for GroupReduce driver: SORTED_GROUP_COMBINE
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:456)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Unrecognized driver strategy for GroupReduce 
> driver: SORTED_GROUP_COMBINE
>   at 
> org.apache.flink.runtime.operators.GroupReduceDriver.prepare(GroupReduceDriver.java:90)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:450)
>   ... 3 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Flink-2971][table] Add outer joins to the Tab...

2016-05-13 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1981#issuecomment-219172614
  
Hi @dawidwys, 
the runtime code, rule, API methods, and the tests look very good. :-)

PR #1958 should be ready to be merged (waiting for @twalthr to give his 
OK). I think you can rebase your code on top of #1958. 

One last comment with respect to style fixes. Unfortunately, we do not have 
a strict code style in place that is automatically enforced and contributors 
follow sometimes different styles. We try to keep style changes in PRs to a 
minimum. Some changes make absolutely sense, but other changes might be 
reverted by the next person going over the file. Also style changes can 
distract from important changes. There is no hard rule for what to change, but 
a good rule of thumb is to leave code as it is if in doubt. 

Thanks, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3911) Sort operation before a group reduce doesn't seem to be implemented on 1.0.2

2016-05-13 Thread Patrice Freydiere (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283226#comment-15283226
 ] 

Patrice Freydiere commented on FLINK-3911:
--

For 1.0.2 and 1.0.3 i downloaded them on website , the job used the same
version from maven repository.

The same code work on 0.10.2 . (I m currently it)


Patrice
Le ven. 13 mai 2016 à 22:57, Fabian Hueske (JIRA)  a



> Sort operation before a group reduce doesn't seem to be implemented on 1.0.2
> 
>
> Key: FLINK-3911
> URL: https://issues.apache.org/jira/browse/FLINK-3911
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.2
> Environment: Linux Ubuntu, standalone cluster
>Reporter: Patrice Freydiere
>  Labels: newbie
>
> i have this piece of code: 
>  // group by id and sort on field order
> DataSet> waysGeometry = 
> joinedWaysWithPoints.groupBy(0).sortGroup(1, Order.ASCENDING)
> .reduceGroup(new 
> GroupReduceFunction, Tuple2 byte[]>>() {
> @Override
> public void 
> reduce(Iterable> values,
> 
> Collector> out) throws Exception {
> long id = -1;
> and this exception when executing ;
> ava.lang.Exception: The data preparation for task 'GroupReduce (GroupReduce 
> at constructOSMStreams(ProcessOSM.java:112))' , caused an error: Unrecognized 
> driver strategy for GroupReduce driver: SORTED_GROUP_COMBINE
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:456)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Unrecognized driver strategy for GroupReduce 
> driver: SORTED_GROUP_COMBINE
>   at 
> org.apache.flink.runtime.operators.GroupReduceDriver.prepare(GroupReduceDriver.java:90)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:450)
>   ... 3 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Flink-2971][table] Add outer joins to the Tab...

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1981#discussion_r63257292
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/batch/table/JoinITCase.scala
 ---
@@ -271,4 +270,86 @@ class JoinITCase(mode: TestExecutionMode) extends 
MultipleProgramsTestBase(mode)
 ds1.join(ds2).where('b === 'e).select('c, 'g)
   }
 
+  @Test
+  def testLeftJoinWithMultipleKeys(): Unit = {
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+val tEnv = TableEnvironment.getTableEnvironment(env)
+tEnv.getConfig.setNullCheck(true)
+
+val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 
'b, 'c)
+val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 
'e, 'f, 'g, 'h)
+
+val joinT = ds1.leftOuterJoin(ds2, 'a === 'd && 'b === 'h).select('c, 
'g)
+
+val expected = "Hi,Hallo\n" + "Hello,Hallo Welt\n" + "Hello 
world,Hallo Welt wie gehts?\n" +
+  "Hello world,ABC\n" + "Hello world, how are you?,null\n" + "I am 
fine.,HIJ\n" +
+  "I am fine.,IJK\n" + "Luke Skywalker,null\n" + "Comment#1,null\n" + 
"Comment#2,null\n" +
+  "Comment#3,null\n" + "Comment#4,null\n" + "Comment#5,null\n" + 
"Comment#6,null\n" +
+  "Comment#7,null\n" + "Comment#8,null\n" + "Comment#9,null\n" + 
"Comment#10,null\n" +
+  "Comment#11,null\n" + "Comment#12,null\n" + "Comment#13,null\n" + 
"Comment#14,null\n" +
+  "Comment#15,null\n"
+val results = joinT.toDataSet[Row].collect()
+TestBaseUtils.compareResultAsText(results.asJava, expected)
+  }
+
+  @Test
+  def testLeftJoinWithFilterInJoinCondition(): Unit = {
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+val tEnv = TableEnvironment.getTableEnvironment(env)
+tEnv.getConfig.setNullCheck(true)
+
+val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 
'b, 'c)
+val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 
'e, 'f, 'g, 'h)
+
+val joinT = ds1.leftOuterJoin(ds2, 'a < 3 && 'b === 
'd.cast(TypeInformation.of(classOf[Long])))
--- End diff --

Can you change the query such that the result has a null field?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2971][table] Add outer joins to the Tab...

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1981#discussion_r63257088
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/batch/table/JoinITCase.scala
 ---
@@ -106,9 +107,9 @@ class JoinITCase(mode: TestExecutionMode) extends 
MultipleProgramsTestBase(mode)
 val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 
'e, 'f, 'g, 'h)
 
 ds1.join(ds2)
-  // must fail. Field 'foo does not exist
--- End diff --

I think this indention was intended. Can you undo the change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2971][table] Add outer joins to the Tab...

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1981#discussion_r63256282
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/batch/sql/JoinITCase.scala
 ---
@@ -259,14 +260,21 @@ class JoinITCase(
 tEnv.registerTable("Table3", ds1)
 tEnv.registerTable("Table5", ds2)
 
-tEnv.sql(sqlQuery).toDataSet[Row].collect()
+val expected = "Hi,Hallo\n" + "Hello,Hallo Welt\n" + "Hello 
world,Hallo Welt\n" +
+  "null,Hallo Welt wie\n" + "null,Hallo Welt wie gehts?\n" + 
"null,ABC\n" + "null,BCD\n" +
+  "null,CDE\n" + "null,DEF\n" + "null,EFG\n" + "null,FGH\n" + 
"null,GHI\n" + "null,HIJ\n" +
+  "null,IJK\n" + "null,JKL\n" + "null,KLM"
+
+val results = tEnv.sql(sqlQuery).toDataSet[Row].collect()
+TestBaseUtils.compareResultAsText(results.asJava, expected)
   }
 
-  @Test(expected = classOf[PlanGenException])
+  @Test
   def testLeftOuterJoin(): Unit = {
 
 val env = ExecutionEnvironment.getExecutionEnvironment
 val tEnv = TableEnvironment.getTableEnvironment(env, config)
+tEnv.getConfig.setNullCheck(true)
 
 val sqlQuery = "SELECT c, g FROM Table3 LEFT OUTER JOIN Table5 ON b = 
e"
--- End diff --

Switch Table3 and Table5 to have also null values in the result


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3912] [docs] Fix errors in Batch Scala ...

2016-05-13 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1991#issuecomment-219167409
  
Thanks for the fix! 
Will merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2971][table] Add outer joins to the Tab...

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1981#discussion_r63255457
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala
 ---
@@ -334,6 +335,156 @@ class Table(
 * }}}
 */
   def join(right: Table): Table = {
+join(right, new Literal(true, TypeInformation.of(classOf[Boolean])), 
JoinType.INNER)
+  }
+
+  /**
+* Joins two [[Table]]s. Similar to an SQL join. The fields of the two 
joined
+* operations must not overlap, use [[as]] to rename fields if 
necessary.
+*
+* Note: Both tables must be bound to the same [[TableEnvironment]].
+*
+* Example:
+*
+* {{{
+*   left.join(right, "a = b && c > 3")
+* }}}
+*/
+  def join(right: Table, joinPredicate: String): Table = {
+join(right, joinPredicate, JoinType.INNER)
+  }
+
+  /**
+* Joins two [[Table]]s. Similar to an SQL join. The fields of the two 
joined
+* operations must not overlap, use [[as]] to rename fields if 
necessary.
+*
+* Note: Both tables must be bound to the same [[TableEnvironment]].
+*
+* Example:
+*
+* {{{
+*   left.join(right, 'a === 'b && 'c > 3).select('a, 'b, 'd)
+* }}}
+*/
+  def join(right: Table, joinPredicate: Expression): Table = {
+join(right, joinPredicate, JoinType.INNER)
+  }
+
+  /**
+* Joins two [[Table]]s. Similar to an SQL left outer join. The fields 
of the two joined
+* operations must not overlap, use [[as]] to rename fields if 
necessary.
+*
+* Note: Both tables must be bound to the same [[TableEnvironment]] and 
its [[TableConfig]] must
+* have nullCheck enables.
+*
+* Example:
+*
+* {{{
+*   left.leftOuterJoin(right, "a = b && c > 3").select('a, 'b, 'd)
+* }}}
+*/
+  def leftOuterJoin(right: Table, joinPredicate: String): Table = {
+join(right, joinPredicate, JoinType.LEFT_OUTER)
+  }
+
+  /**
+* Joins two [[Table]]s. Similar to an SQL left outer join. The fields 
of the two joined
+* operations must not overlap, use [[as]] to rename fields if 
necessary.
+*
+* Note: Both tables must be bound to the same [[TableEnvironment]] and 
its [[TableConfig]] must
+* have nullCheck enables.
+*
+* Example:
+*
+* {{{
+*   left.leftOuterJoin(right, 'a === 'b && 'c > 3).select('a, 'b, 'd)
+* }}}
+*/
+  def leftOuterJoin(right: Table, joinPredicate: Expression): Table = {
+join(right, joinPredicate, JoinType.LEFT_OUTER)
+  }
+
+  /**
+* Joins two [[Table]]s. Similar to an SQL right outer join. The fields 
of the two joined
+* operations must not overlap, use [[as]] to rename fields if 
necessary.
+*
+* Note: Both tables must be bound to the same [[TableEnvironment]] and 
its [[TableConfig]] must
+* have nullCheck enables.
+*
+* Example:
+*
+* {{{
+*   left.rightOuterJoin(right, "a = b && c > 3").select('a, 'b, 'd)
+* }}}
+*/
+  def rightOuterJoin(right: Table, joinPredicate: String): Table = {
+join(right, joinPredicate, JoinType.RIGHT_OUTER)
+  }
+
+  /**
+* Joins two [[Table]]s. Similar to an SQL right outer join. The fields 
of the two joined
+* operations must not overlap, use [[as]] to rename fields if 
necessary.
+*
+* Note: Both tables must be bound to the same [[TableEnvironment]] and 
its [[TableConfig]] must
+* have nullCheck enables.
+*
+* Example:
+*
+* {{{
+*   left.rightOuterJoin(right, 'a === 'b && 'c > 3).select('a, 'b, 'd)
+* }}}
+*/
+  def rightOuterJoin(right: Table, joinPredicate: Expression): Table = {
+join(right, joinPredicate, JoinType.RIGHT_OUTER)
+  }
+
+  /**
+* Joins two [[Table]]s. Similar to an SQL full outer join. The fields 
of the two joined
+* operations must not overlap, use [[as]] to rename fields if 
necessary.
+*
+* Note: Both tables must be bound to the same [[TableEnvironment]] and 
its [[TableConfig]] must
+* have nullCheck enables.
+*
+* Example:
+*
+* {{{
+*   left.fullOuterJoin(right, "a = b && c > 3").select('a, 'b, 'd)
+* }}}
+*/
+  def fullOuterJoin(right: Table, joinPredicate: String): Table = {
+join(right, joinPredicate, JoinType.FULL_OUTER)
+  }
+
+  /**
+* Joins two [[Table]]s. Similar to an SQL full outer join. The fields 
of the two joined
+* operations must not overlap, use [[as]] to rename fields if 

[GitHub] flink pull request: [FLINK-3912] [docs] Fix errors in Batch Scala ...

2016-05-13 Thread houcros
GitHub user houcros opened a pull request:

https://github.com/apache/flink/pull/1991

[FLINK-3912] [docs] Fix errors in Batch Scala API Documentation, Join 
section

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

… section

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/houcros/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1991.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1991


commit b9274b3874a5e9e393f9e975671cd329bb210f82
Author: Ignacio N. Lucero Ascencio 
Date:   2016-05-13T21:41:25Z

[FLINK-3912] [docs] Fix errors in Batch Scala API Documentation, Join 
section




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2971][table] Add outer joins to the Tab...

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1981#discussion_r63255146
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala
 ---
@@ -334,6 +335,156 @@ class Table(
 * }}}
 */
   def join(right: Table): Table = {
+join(right, new Literal(true, TypeInformation.of(classOf[Boolean])), 
JoinType.INNER)
+  }
+
+  /**
+* Joins two [[Table]]s. Similar to an SQL join. The fields of the two 
joined
+* operations must not overlap, use [[as]] to rename fields if 
necessary.
+*
+* Note: Both tables must be bound to the same [[TableEnvironment]].
+*
+* Example:
+*
+* {{{
+*   left.join(right, "a = b && c > 3")
+* }}}
+*/
+  def join(right: Table, joinPredicate: String): Table = {
+join(right, joinPredicate, JoinType.INNER)
+  }
+
+  /**
+* Joins two [[Table]]s. Similar to an SQL join. The fields of the two 
joined
+* operations must not overlap, use [[as]] to rename fields if 
necessary.
+*
+* Note: Both tables must be bound to the same [[TableEnvironment]].
+*
+* Example:
+*
+* {{{
+*   left.join(right, 'a === 'b && 'c > 3).select('a, 'b, 'd)
+* }}}
+*/
+  def join(right: Table, joinPredicate: Expression): Table = {
+join(right, joinPredicate, JoinType.INNER)
+  }
+
+  /**
+* Joins two [[Table]]s. Similar to an SQL left outer join. The fields 
of the two joined
+* operations must not overlap, use [[as]] to rename fields if 
necessary.
+*
+* Note: Both tables must be bound to the same [[TableEnvironment]] and 
its [[TableConfig]] must
+* have nullCheck enables.
--- End diff --

should be "... must have nullCheck enableD"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3912) Typos in Batch Scala API Documentation

2016-05-13 Thread Ignacio N. Lucero Ascencio (JIRA)
Ignacio N. Lucero Ascencio created FLINK-3912:
-

 Summary: Typos in Batch Scala API Documentation
 Key: FLINK-3912
 URL: https://issues.apache.org/jira/browse/FLINK-3912
 Project: Flink
  Issue Type: Improvement
  Components: Batch, Documentation, Scala API
Reporter: Ignacio N. Lucero Ascencio
Priority: Trivial


In the Batch Guide Documentation, in the Join section there are some small 
typos/errors for the Scala API.

In particular, in the section: Join with Flat-Join Function, "left" is used as 
"rating", and "right" is used as "weight".

Also a parenthesis is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1827) Move test classes in test folders and fix scope of test dependencies

2016-05-13 Thread josh gruenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283180#comment-15283180
 ] 

josh gruenberg commented on FLINK-1827:
---

Hi all,

This change just tripped up my team: we have some tests that depend on 
flink-test-utils, and we're working with 1.1-SNAPSHOT to try out the 
SessionWindows feature. Our build suddenly stopped working last week when this 
change was deployed, until rmetzger helped us get unstuck (thanks! :) by adding 
the "test-jar" type to the pom dependency.

I'll offer that I think this may not ultimately be a great technique for 
achieving your goal of streamlining your dev-builds. Here's why:

- specifying "test-jar" is error-prone boilerplate for all 
consumers of this dependency; forgetting to include this unusual requirement 
leads to confusing error-messages
- problems arise with transitive dependencies when depending on test-jars: the 
transitive dependencies do not get propagated to the consuming projects, 
requiring all consumers to add direct references to all dependencies 
themselves. This problem can snowball later if more dependencies are added to 
the test-jar. (See https://issues.apache.org/jira/browse/MNG-1378)

I don't think it's correct to assume that all "test-related" code should always 
be in src/test: the purpose of this particular artifact is just to provide 
test-utilities; that is its only reason for existing, and so that code should 
be its src/main. If there were tests FOR the test-utils, then _those_ would 
appropriately reside in src/test (and would thus correctly be omitted from the 
assembly of the released jar).

There's more on this here: 
https://maven.apache.org/plugins/maven-jar-plugin/examples/create-test-jar.html

If your intent is to omit this artifact from a streamlined build, then this 
might be better approached via alternative maven configurations. There are 
probably many ways to achieve this; one possibility is via a profile in the 
flink-test-utils pom that disables its build (eg, by disabling the compiler 
plugin, as described in the second answer here: 
http://stackoverflow.com/questions/14614446/how-do-i-disable-the-maven-compiler-plugin).


> Move test classes in test folders and fix scope of test dependencies
> 
>
> Key: FLINK-1827
> URL: https://issues.apache.org/jira/browse/FLINK-1827
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 0.9
>Reporter: Flavio Pompermaier
>Priority: Minor
>  Labels: test-compile
> Fix For: 1.1.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Right now it is not possible to avoid compilation of test classes 
> (-Dmaven.test.skip=true) because some project (e.g. flink-test-utils) 
> requires test classes in non-test sources (e.g. 
> scalatest_${scala.binary.version})
> Test classes should be moved to src/main/test (if Java) and src/test/scala 
> (if scala) and use scope=test for test dependencies



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Flink-2971][table] Add outer joins to the Tab...

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1981#discussion_r63253874
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/nodes/dataset/DataSetJoin.scala
 ---
@@ -159,9 +158,20 @@ class DataSetJoin(
 val leftDataSet = 
left.asInstanceOf[DataSetRel].translateToPlan(tableEnv)
 val rightDataSet = 
right.asInstanceOf[DataSetRel].translateToPlan(tableEnv)
 
+val (joinOperator, nullCheck) = joinType match {
+  case JoinRelType.INNER => (leftDataSet.join(rightDataSet), false)
+  case JoinRelType.LEFT => (leftDataSet.leftOuterJoin(rightDataSet), 
true)
+  case JoinRelType.RIGHT => (leftDataSet.rightOuterJoin(rightDataSet), 
true)
+  case JoinRelType.FULL => (leftDataSet.fullOuterJoin(rightDataSet), 
true)
+}
+
+if (nullCheck && !config.getNullCheck) {
+  throw new TableException("Null check in TableEnvironment must be 
enabled for outer joins.")
--- End diff --

TableEnvironment should be TableConfig


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3911) Sort operation before a group reduce doesn't seem to be implemented on 1.0.2

2016-05-13 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283144#comment-15283144
 ] 

Fabian Hueske commented on FLINK-3911:
--

Is it possible that the Flink version of the submitting client and the cluster 
do not match? 
Strategies are passed as ENUMs and the index does not match if different 
versions for client and cluster are used and a new strategy was added between 
versions.

> Sort operation before a group reduce doesn't seem to be implemented on 1.0.2
> 
>
> Key: FLINK-3911
> URL: https://issues.apache.org/jira/browse/FLINK-3911
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.2
> Environment: Linux Ubuntu, standalone cluster
>Reporter: Patrice Freydiere
>  Labels: newbie
>
> i have this piece of code: 
>  // group by id and sort on field order
> DataSet> waysGeometry = 
> joinedWaysWithPoints.groupBy(0).sortGroup(1, Order.ASCENDING)
> .reduceGroup(new 
> GroupReduceFunction, Tuple2 byte[]>>() {
> @Override
> public void 
> reduce(Iterable> values,
> 
> Collector> out) throws Exception {
> long id = -1;
> and this exception when executing ;
> ava.lang.Exception: The data preparation for task 'GroupReduce (GroupReduce 
> at constructOSMStreams(ProcessOSM.java:112))' , caused an error: Unrecognized 
> driver strategy for GroupReduce driver: SORTED_GROUP_COMBINE
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:456)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Unrecognized driver strategy for GroupReduce 
> driver: SORTED_GROUP_COMBINE
>   at 
> org.apache.flink.runtime.operators.GroupReduceDriver.prepare(GroupReduceDriver.java:90)
>   at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:450)
>   ... 3 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3911) Sort operation before a group reduce doesn't seem to be implemented on 1.0.2

2016-05-13 Thread Patrice Freydiere (JIRA)
Patrice Freydiere created FLINK-3911:


 Summary: Sort operation before a group reduce doesn't seem to be 
implemented on 1.0.2
 Key: FLINK-3911
 URL: https://issues.apache.org/jira/browse/FLINK-3911
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.0.2
 Environment: Linux Ubuntu, standalone cluster
Reporter: Patrice Freydiere


i have this piece of code: 

 // group by id and sort on field order
DataSet> waysGeometry = 
joinedWaysWithPoints.groupBy(0).sortGroup(1, Order.ASCENDING)
.reduceGroup(new 
GroupReduceFunction, Tuple2>() {
@Override
public void 
reduce(Iterable> values,
Collector> out) throws Exception {
long id = -1;


and this exception when executing ;

ava.lang.Exception: The data preparation for task 'GroupReduce (GroupReduce at 
constructOSMStreams(ProcessOSM.java:112))' , caused an error: Unrecognized 
driver strategy for GroupReduce driver: SORTED_GROUP_COMBINE
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:456)
at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Exception: Unrecognized driver strategy for GroupReduce 
driver: SORTED_GROUP_COMBINE
at 
org.apache.flink.runtime.operators.GroupReduceDriver.prepare(GroupReduceDriver.java:90)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:450)
... 3 more






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2259][ml] Add Train-Testing Splitters

2016-05-13 Thread rawkintrevo
Github user rawkintrevo commented on the pull request:

https://github.com/apache/flink/pull/1898#issuecomment-219120765
  
bump?  failing on flaky test, can someone restart/verify/etc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3910) New self-join operator

2016-05-13 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3910:
-

 Summary: New self-join operator
 Key: FLINK-3910
 URL: https://issues.apache.org/jira/browse/FLINK-3910
 Project: Flink
  Issue Type: New Feature
  Components: DataSet API, Java API, Scala API
Affects Versions: 1.1.0
Reporter: Greg Hogan
Assignee: Greg Hogan


Flink currently provides inner- and outer-joins as well as cogroup and the 
non-keyed cross. {{JoinOperator}} hints at future support for semi- and 
anti-joins.

Many Gelly algorithms perform a self-join [0]. Still pending reviews, 
FLINK-3768 performs a self-join on non-skewed data in TriangleListing.java and 
FLINK-3780 performs a self-join on skewed data in JaccardSimilarity.java. A 
{{SelfJoinHint}} will select between skewed and non-skewed implementations.

The object-reuse-disabled case can be simply handled with a new {{Operator}}. 
The object-reuse-enabled case requires either {{CopyableValue}} types (as in 
the code above) or a custom driver which has access to the serializer (or 
making the serializer accessible to rich functions, and I think there be 
dragons).

If the idea of a self-join is agreeable, I'd like to work out a rough 
implementation and go from there.

[0] https://en.wikipedia.org/wiki/Join_%28SQL%29#Self-join



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3889] Make File Monitoring Function che...

2016-05-13 Thread kl0u
Github user kl0u commented on the pull request:

https://github.com/apache/flink/pull/1984#issuecomment-219117647
  
After the discussion we had today with @StephanEwen and @aljoscha , I also 
added the PROCESS_ONCE watchType which processes the current (when invoked) 
content of a file/directory and exits. This is to be able to accommodate 
bounded file sources (a la batch).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3856) Create types for java.sql.Date/Time/Timestamp

2016-05-13 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282854#comment-15282854
 ] 

Fabian Hueske commented on FLINK-3856:
--

Thanks for taking care of this Max!

> Create types for java.sql.Date/Time/Timestamp
> -
>
> Key: FLINK-3856
> URL: https://issues.apache.org/jira/browse/FLINK-3856
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Timo Walther
>Assignee: Timo Walther
> Fix For: 1.1.0
>
>
> At the moment there is only the {{Date}} type which is not sufficient for 
> most use cases about time.
> The Table API would also benefit from having different types as output result.
> I would propose to add the three {{java.sql.}} types either as {{BasicTypes}} 
> or in an additional class {{TimeTypes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3655] Multiple File Paths for InputFile...

2016-05-13 Thread gna-phetsarath
GitHub user gna-phetsarath opened a pull request:

https://github.com/apache/flink/pull/1990

[FLINK-3655] Multiple File Paths for InputFileFormat.

I had to create a new PR, because I messed up my branches.

This addresses [FLINK-3655] Multiple File Paths for InputFileFormat but 
does not implement file name globbing.

Also, this branch does not use guava.

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-3655] 
Multiple File Paths for InputFileFormat.")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

Removed Guava.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gna-phetsarath/flink 
FLINK-3655-mulitple_directories_for_FileInputFormat_2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1990


commit 435a48339d0730714c90f61cfc4d435425e159e7
Author: Phetsarath, Sourigna 
Date:   2016-05-13T16:24:46Z

[FLINK-3655] Multiple File Paths for InputFileFormat.
Removed Guava.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-3901 - Added CsvRowInputFormat

2016-05-13 Thread fpompermaier
GitHub user fpompermaier opened a pull request:

https://github.com/apache/flink/pull/1989

FLINK-3901 - Added CsvRowInputFormat

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fpompermaier/flink FLINK-3901

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1989.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1989


commit 4f8288f205629340aa46cad50ae232b9c2fc7439
Author: Flavio Pompermaier 
Date:   2016-05-13T16:19:51Z

FLINK-3901 - Added CsvRowInputFormat




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3655] Multiple File Paths for InputFile...

2016-05-13 Thread gna-phetsarath
Github user gna-phetsarath closed the pull request at:

https://github.com/apache/flink/pull/1987


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3701] reuse serializer lists in Executi...

2016-05-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1913


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3909) Maven Failsafe plugin may report SUCCESS on failed tests

2016-05-13 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3909:
-

 Summary: Maven Failsafe plugin may report SUCCESS on failed tests
 Key: FLINK-3909
 URL: https://issues.apache.org/jira/browse/FLINK-3909
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.1.0
Reporter: Maximilian Michels
 Fix For: 1.1.0


The following build completed successfully on Travis but there are actually 
test failures: https://travis-ci.org/apache/flink/jobs/129943398#L5402



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3856) Create types for java.sql.Date/Time/Timestamp

2016-05-13 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282843#comment-15282843
 ] 

Maximilian Michels commented on FLINK-3856:
---

There is something wrong with our Maven configuration. You can see from the 
test output that the test failed before but still Maven supported SUCCESS: 
https://travis-ci.org/apache/flink/jobs/129943398#L5402

> Create types for java.sql.Date/Time/Timestamp
> -
>
> Key: FLINK-3856
> URL: https://issues.apache.org/jira/browse/FLINK-3856
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Timo Walther
>Assignee: Timo Walther
> Fix For: 1.1.0
>
>
> At the moment there is only the {{Date}} type which is not sufficient for 
> most use cases about time.
> The Table API would also benefit from having different types as output result.
> I would propose to add the three {{java.sql.}} types either as {{BasicTypes}} 
> or in an additional class {{TimeTypes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3856) Create types for java.sql.Date/Time/Timestamp

2016-05-13 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282839#comment-15282839
 ] 

Maximilian Michels commented on FLINK-3856:
---

Additional fix with 96b353d98f6b6d441ebedf69ec12cfa333a1d7c9.

> Create types for java.sql.Date/Time/Timestamp
> -
>
> Key: FLINK-3856
> URL: https://issues.apache.org/jira/browse/FLINK-3856
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Timo Walther
>Assignee: Timo Walther
> Fix For: 1.1.0
>
>
> At the moment there is only the {{Date}} type which is not sufficient for 
> most use cases about time.
> The Table API would also benefit from having different types as output result.
> I would propose to add the three {{java.sql.}} types either as {{BasicTypes}} 
> or in an additional class {{TimeTypes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-13 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1958#issuecomment-219085233
  
Hi @yjshen, thanks a lot for the update! This PR is good to merge, IMO. 
@twalthr, let me know if you want to have another look as well. Otherwise, 
I'll merge this PR in the next days. 
Very happy to have this improvement :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1958#discussion_r63206959
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/comparison.scala
 ---
@@ -74,6 +109,8 @@ case class IsNull(child: Expression) extends 
UnaryExpression {
   override def toRexNode(implicit relBuilder: RelBuilder): RexNode = {
 relBuilder.isNull(child.toRexNode)
   }
+
+  override def resultType = BOOLEAN_TYPE_INFO
--- End diff --

Oh yes, sure. You're right. I did not expand the code (Github had folded 
some lines) I thought this belonged to `GreaterThan`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3761] Introduction of key groups

2016-05-13 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1988#issuecomment-219083052
  
@aljoscha would be great to get some feedback from you :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3761] Introduction of key groups

2016-05-13 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1988#issuecomment-219082963
  
No I haven't run performance benchmarks, yet. Should definitely do that as 
a follow-up. 

You're right that the behaviour of RocksDB is interesting in particular. It 
might make sense for the future to implement a RocksDB specialised 
`KeyGroupStateBackend` implementation, which uses only a single RocksDB 
instance for all key groups.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-3908) FieldParsers error state is not reset correctly to NONE

2016-05-13 Thread Flavio Pompermaier (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Pompermaier updated FLINK-3908:
--
Priority: Major  (was: Blocker)

> FieldParsers error state is not reset correctly to NONE
> ---
>
> Key: FLINK-3908
> URL: https://issues.apache.org/jira/browse/FLINK-3908
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.2
>Reporter: Flavio Pompermaier
>  Labels: parser
>
> If during the parse of a csv there's a parse error (for example when in a 
> integer column there are non-int values) the errorState is not reset 
> correctly in the next parseField call. A simple fix would be to add as a 
> first statement of the {{parseField()}} function a call to 
> {{setErrorState(ParseErrorState.NONE)}} but it is something that should be 
> handled better (by default) for every subclass of {{FieldParser}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3908) FieldParsers error state is not reset correctly to NONE

2016-05-13 Thread Flavio Pompermaier (JIRA)
Flavio Pompermaier created FLINK-3908:
-

 Summary: FieldParsers error state is not reset correctly to NONE
 Key: FLINK-3908
 URL: https://issues.apache.org/jira/browse/FLINK-3908
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.2
Reporter: Flavio Pompermaier
Priority: Blocker


If during the parse of a csv there's a parse error (for example when in a 
integer column there are non-int values) the errorState is not reset correctly 
in the next parseField call. A simple fix would be to add as a first statement 
of the {{parseField()}} function a call to 
{{setErrorState(ParseErrorState.NONE)}} but it is something that should be 
handled better (by default) for every subclass of {{FieldParser}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3761] Introduction of key groups

2016-05-13 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1988#issuecomment-219064791
  
Very cool stuff! I was wondering did you do any benchmarks for the 
performance impact of this change? For instance it would be good to know how 
well RocksDB behaves with a large number of instances etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3907) Directed Clustering Coefficient

2016-05-13 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3907:
-

 Summary: Directed Clustering Coefficient
 Key: FLINK-3907
 URL: https://issues.apache.org/jira/browse/FLINK-3907
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 1.1.0
Reporter: Greg Hogan
Assignee: Greg Hogan


A directed clustering coefficient algorithm can be implemented using an 
efficient triangle listing implementation which emits not only the three vertex 
IDs forming the triangle but also a bitmask indicating which edges form the 
triangle. A triangle can be formed with a minimum of three or maximum of six 
directed edges. Directed clustering coefficient can then shatter the triangles 
and emit a score of either 1 or 2 for each vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2765) Upgrade hbase version for hadoop-2 to 1.2 release

2016-05-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-2765:
--
Description: 
Currently 0.98.11 is used:
{code}
0.98.11-hadoop2
{code}

Stable release for hadoop-2 is 1.1.x line
We should upgrade to 1.2

  was:
Currently 0.98.11 is used:
{code}
0.98.11-hadoop2
{code}
Stable release for hadoop-2 is 1.1.x line
We should upgrade to 1.2


> Upgrade hbase version for hadoop-2 to 1.2 release
> -
>
> Key: FLINK-2765
> URL: https://issues.apache.org/jira/browse/FLINK-2765
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
>
> Currently 0.98.11 is used:
> {code}
> 0.98.11-hadoop2
> {code}
> Stable release for hadoop-2 is 1.1.x line
> We should upgrade to 1.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3222) Incorrect shift amount in OperatorCheckpointStats#hashCode()

2016-05-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3222:
--
Description: 
Here is related code:
{code}
result = 31 * result + (int) (subTaskStats.length ^ (subTaskStats.length 
>>> 32));
{code}
subTaskStats.length is an int.


The shift amount is greater than 31 bits.

  was:
Here is related code:
{code}
result = 31 * result + (int) (subTaskStats.length ^ (subTaskStats.length 
>>> 32));
{code}
subTaskStats.length is an int.

The shift amount is greater than 31 bits.


> Incorrect shift amount in OperatorCheckpointStats#hashCode()
> 
>
> Key: FLINK-3222
> URL: https://issues.apache.org/jira/browse/FLINK-3222
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
> result = 31 * result + (int) (subTaskStats.length ^ (subTaskStats.length 
> >>> 32));
> {code}
> subTaskStats.length is an int.
> The shift amount is greater than 31 bits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3753) KillerWatchDog should not use kill on toKill thread

2016-05-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3753:
--
Description: 
{code}
// this is harsh, but this watchdog is a last resort
if (toKill.isAlive()) {
  toKill.stop();
}
{code}
stop() is deprecated.


See:
https://www.securecoding.cert.org/confluence/display/java/THI05-J.+Do+not+use+Thread.stop()+to+terminate+threads

  was:
{code}
// this is harsh, but this watchdog is a last resort
if (toKill.isAlive()) {
  toKill.stop();
}
{code}
stop() is deprecated.

See:
https://www.securecoding.cert.org/confluence/display/java/THI05-J.+Do+not+use+Thread.stop()+to+terminate+threads


> KillerWatchDog should not use kill on toKill thread
> ---
>
> Key: FLINK-3753
> URL: https://issues.apache.org/jira/browse/FLINK-3753
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> // this is harsh, but this watchdog is a last resort
> if (toKill.isAlive()) {
>   toKill.stop();
> }
> {code}
> stop() is deprecated.
> See:
> https://www.securecoding.cert.org/confluence/display/java/THI05-J.+Do+not+use+Thread.stop()+to+terminate+threads



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3734) Unclosed DataInputView in AbstractAlignedProcessingTimeWindowOperator#restoreState()

2016-05-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3734:
--
Description: 
{code}
DataInputView in = inputState.getState(getUserCodeClassloader());

final long nextEvaluationTime = in.readLong();
final long nextSlideTime = in.readLong();

AbstractKeyedTimePanes panes = 
createPanes(keySelector, function);
panes.readFromInput(in, keySerializer, stateTypeSerializer);

restoredState = new RestoredState<>(panes, nextEvaluationTime, 
nextSlideTime);
  }
{code}
DataInputView in is not closed upon return.

  was:
{code}
DataInputView in = inputState.getState(getUserCodeClassloader());

final long nextEvaluationTime = in.readLong();
final long nextSlideTime = in.readLong();

AbstractKeyedTimePanes panes = 
createPanes(keySelector, function);
panes.readFromInput(in, keySerializer, stateTypeSerializer);

restoredState = new RestoredState<>(panes, nextEvaluationTime, 
nextSlideTime);
  }
{code}

DataInputView in is not closed upon return.


> Unclosed DataInputView in 
> AbstractAlignedProcessingTimeWindowOperator#restoreState()
> 
>
> Key: FLINK-3734
> URL: https://issues.apache.org/jira/browse/FLINK-3734
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> DataInputView in = inputState.getState(getUserCodeClassloader());
> final long nextEvaluationTime = in.readLong();
> final long nextSlideTime = in.readLong();
> AbstractKeyedTimePanes panes = 
> createPanes(keySelector, function);
> panes.readFromInput(in, keySerializer, stateTypeSerializer);
> restoredState = new RestoredState<>(panes, nextEvaluationTime, 
> nextSlideTime);
>   }
> {code}
> DataInputView in is not closed upon return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3801) Upgrade Joda-Time library to 2.9.3

2016-05-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3801:
--
Description: 
Currently yoda-time 2.5 is used which was very old.


We should upgrade to 2.9.3

  was:
Currently yoda-time 2.5 is used which was very old.

We should upgrade to 2.9.3


> Upgrade Joda-Time library to 2.9.3
> --
>
> Key: FLINK-3801
> URL: https://issues.apache.org/jira/browse/FLINK-3801
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
>
> Currently yoda-time 2.5 is used which was very old.
> We should upgrade to 2.9.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3856) Create types for java.sql.Date/Time/Timestamp

2016-05-13 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282704#comment-15282704
 ] 

Fabian Hueske commented on FLINK-3856:
--

This is weird. 
I ran Travis before committing and all five builds succeeded. 
However, I can also reproduce the problem locally. :-/

> Create types for java.sql.Date/Time/Timestamp
> -
>
> Key: FLINK-3856
> URL: https://issues.apache.org/jira/browse/FLINK-3856
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Timo Walther
>Assignee: Timo Walther
> Fix For: 1.1.0
>
>
> At the moment there is only the {{Date}} type which is not sufficient for 
> most use cases about time.
> The Table API would also benefit from having different types as output result.
> I would propose to add the three {{java.sql.}} types either as {{BasicTypes}} 
> or in an additional class {{TimeTypes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3856) Create types for java.sql.Date/Time/Timestamp

2016-05-13 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282696#comment-15282696
 ] 

Maximilian Michels commented on FLINK-3856:
---

This breaks {{GroupReduceITCase.testGroupByGenericType}} because it checks for 
{{   
Assert.assertTrue(ec.getRegisteredKryoTypes().contains(java.sql.Date.class));}}.

Fixing this while merging FLINK-3701.

> Create types for java.sql.Date/Time/Timestamp
> -
>
> Key: FLINK-3856
> URL: https://issues.apache.org/jira/browse/FLINK-3856
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Timo Walther
>Assignee: Timo Walther
> Fix For: 1.1.0
>
>
> At the moment there is only the {{Date}} type which is not sufficient for 
> most use cases about time.
> The Table API would also benefit from having different types as output result.
> I would propose to add the three {{java.sql.}} types either as {{BasicTypes}} 
> or in an additional class {{TimeTypes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3229] Flink streaming consumer for AWS ...

2016-05-13 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1911#issuecomment-219043000
  
As discussed in the JIRA, I'm going to follow the "relocation approach" for 
fixing the protobuf issue. But we won't release the kinesis connector to mvn 
central.
In the meantime, we'll try to come up with a better solution regarding the 
protobuf issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1873) Distributed matrix implementation

2016-05-13 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282684#comment-15282684
 ] 

Simone Robutti commented on FLINK-1873:
---

Hello Till,

I worked full time on this issue this week and I almost have a draft for a PR.

I would like to submit it with the following features:

2 matrix formats:
*row-based distribution*
*block-based distribution*

*conversion from block-based to row-based*
*conversion from row-based to block-based*

Operations on block-based matrices:
*per-block operations on two matrices
*sum*
*sub*
*multiplication*

Row-based builders:

*from COO*

Row-based collectors

*local SparseMatrix*
*local DenseMatrix*
*local Seq of COO entries*

There are many basic features that are actually simpler than the one I already 
implemented and many others that may have a rather high priority (SVD?) but 
before proceeding I would like to receive a review on what is already done to 
stabilize the structures I'm working on. Also this is my first open source 
contribution so I would receive a validation on the technical and stylistical 
aspects to avoid the same errors on the work yet to be done.

If you think there are other core features to consider for this first 
iteration, please let me know. Otherwise I plan to open a PR next week. 


> Distributed matrix implementation
> -
>
> Key: FLINK-1873
> URL: https://issues.apache.org/jira/browse/FLINK-1873
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: liaoyuxi
>Assignee: Simone Robutti
>  Labels: ML
>
> It would help to implement machine learning algorithm more quickly and 
> concise if Flink would provide support for storing data and computation in 
> distributed matrix. The design of the implementation is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-13 Thread yjshen
Github user yjshen commented on a diff in the pull request:

https://github.com/apache/flink/pull/1958#discussion_r63180341
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/comparison.scala
 ---
@@ -74,6 +109,8 @@ case class IsNull(child: Expression) extends 
UnaryExpression {
   override def toRexNode(implicit relBuilder: RelBuilder): RexNode = {
 relBuilder.isNull(child.toRexNode)
   }
+
+  override def resultType = BOOLEAN_TYPE_INFO
--- End diff --

`IsNull` accepts all type of columns and returns if the current cell is 
`null`, therefore, the `resultType` is `Boolean`. `IsNull` is extending 
`UnaryExpression` which does not have a default `resultType` implementation, so 
we need override.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1958#discussion_r63177787
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/comparison.scala
 ---
@@ -74,6 +109,8 @@ case class IsNull(child: Expression) extends 
UnaryExpression {
   override def toRexNode(implicit relBuilder: RelBuilder): RexNode = {
 relBuilder.isNull(child.toRexNode)
   }
+
+  override def resultType = BOOLEAN_TYPE_INFO
--- End diff --

I think we do not need to override here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3655] Multiple File Paths for InputFile...

2016-05-13 Thread gna-phetsarath
Github user gna-phetsarath commented on the pull request:

https://github.com/apache/flink/pull/1987#issuecomment-219033497
  
I'll remove Guava later today or early tomorrow.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-3447) Package Gelly algorithms by framework

2016-05-13 Thread Greg Hogan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-3447.
-
Resolution: Won't Fix

As the number of algorithms in Gelly continues to grow I think we are better 
off namespacing by the algorithm flavor (similarity, clustering, etc.) than be 
the implementation method. There has also been a noted desire to keep one or 
more "best" implementations in the library and have other implementations as 
Gelly examples.

> Package Gelly algorithms by framework
> -
>
> Key: FLINK-3447
> URL: https://issues.apache.org/jira/browse/FLINK-3447
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> Currently algorithms in the Gelly library are collected in the 
> {{org.apache.flink.graph.library}} package. The gather-sum-apply class names 
> are prefixed by "GSA". Gelly contains multiple frameworks as named in 
> FLINK-3208.
> Since algorithms can be (and are) duplicated across the multiple frameworks, 
> we can move the algorithms into subpackages by the name of the framework.
> - vertex-centric model: {{org.apache.flink.graph.library.pregel}}
> - scatter-gather model: {{org.apache.flink.graph.library.spargel}}
> - gather-sum-apply model: {{org.apache.flink.graph.library.gsa}}
> - native methods: {{org.apache.flink.graph.library.asm}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Flink 3750 fixed

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1941#discussion_r63174853
  
--- Diff: 
flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.java
 ---
@@ -157,14 +216,25 @@ public boolean reachedEnd() throws IOException {
 * @throws java.io.IOException
 */
@Override
-   public OUT nextRecord(OUT tuple) throws IOException {
+   public Row nextRecord(Row row) throws IOException {
try {
-   resultSet.next();
-   if (columnTypes == null) {
-   extractTypes(tuple);
+   hasNext = resultSet.next();
+   if (!hasNext) {
+   return null;
+   }
+   try {
+   //This throws a NPE when the TypeInfo is not 
passed to the InputFormat,
+   //i.e. KryoSerializer used to generate the 
passed row
+   row.productArity();
--- End diff --

OK, I see. How about we extend the InputFormat to implement the 
`ResultTypeQueryable` interface and let users either specify all field types or 
at least the number of result attributes via the `JDBCInputFormatBuilder`.
Then we do not have to fall back to the KryoSerializer that does creates 
corrupt `Row` objects and users do not have to specify types in the 
`env.createInput()` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Flink 3750 fixed

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1941#discussion_r63173122
  
--- Diff: 
flink-batch-connectors/flink-jdbc/src/test/java/org/apache/flink/api/java/io/jdbc/example/JDBCFullTest.java
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.java.io.jdbc.example;
+
+import java.sql.Types;
+
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.io.jdbc.JDBCInputFormat;
+import 
org.apache.flink.api.java.io.jdbc.JDBCInputFormat.JDBCInputFormatBuilder;
+import org.apache.flink.api.java.io.jdbc.JDBCOutputFormat;
+import org.apache.flink.api.java.io.jdbc.JDBCTestBase;
+import 
org.apache.flink.api.java.io.jdbc.split.NumericBetweenParametersProvider;
+import org.apache.flink.api.table.Row;
+import org.apache.flink.api.table.typeutils.RowTypeInfo;
+import org.junit.Test;
+
+public class JDBCFullTest extends JDBCTestBase {
+   
+   @Test
+   public void test() throws Exception {
--- End diff --

Yes, previously this was an example and not a test. Not sure why it ended 
up in the `test` directory. 
It is a good idea to have an end-to-end test, but we should also check that 
the job does what it should do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-13 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1956#issuecomment-219023387
  
Thank you for the update @gallenvara. I'll take a look soon!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Flink 3750 fixed

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1941#discussion_r63162186
  
--- Diff: 
flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCOutputFormat.java
 ---
@@ -94,31 +94,106 @@ private void establishConnection() throws 
SQLException, ClassNotFoundException {
dbConn = DriverManager.getConnection(dbURL, username, 
password);
}
}
-
+   
/**
 * Adds a record to the prepared statement.
 * 
 * When this method is called, the output format is guaranteed to be 
opened.
+* 
 * 
-* WARNING: this may fail if the JDBC driver doesn't handle null 
correctly and no column types specified in the SqlRow
+* WARNING: this may fail when no column types specified (because a 
best effort approach is attempted in order to
+* insert a null value but it's not guaranteed that the JDBC driver 
handles PreparedStatement.setObject(pos, null))
 *
-* @param tuple The records to add to the output.
+* @param row The records to add to the output.
+* @see PreparedStatement
 * @throws IOException Thrown, if the records could not be added due to 
an I/O problem.
 */
@Override
-   public void writeRecord(Row tuple) throws IOException {
+   public void writeRecord(Row row) throws IOException {
+   if (typesArray != null && typesArray.length > 0 && 
typesArray.length == row.productArity()) {
+   LOG.warn("Column SQL types array doesn't match arity of 
passed Row! Check the passed array...");
+   } 
try {
-   for (int index = 0; index < tuple.productArity(); 
index++) {
-   if (tuple.productElement(index) == null && 
typesArray != null && typesArray.length > 0) {
-   if (typesArray.length == 
tuple.productArity()) {
-   upload.setNull(index + 1, 
typesArray[index]);
-   } else {
-   LOG.warn("Column SQL types 
array doesn't match arity of SqlRow! Check the passed array...");
-   }
+   for (int index = 0; index < row.productArity(); 
index++) {
+   if (typesArray == null ) {
--- End diff --

I would move this check out of the loop, i.e.,
```
if (typesArray == null) {
  for (...) {
//...
   upload.setObject(...)
  }
} else {
  for (...) {
// ...
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Flink 3750 fixed

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1941#discussion_r63161970
  
--- Diff: 
flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCOutputFormat.java
 ---
@@ -94,31 +94,106 @@ private void establishConnection() throws 
SQLException, ClassNotFoundException {
dbConn = DriverManager.getConnection(dbURL, username, 
password);
}
}
-
+   
/**
 * Adds a record to the prepared statement.
 * 
 * When this method is called, the output format is guaranteed to be 
opened.
+* 
 * 
-* WARNING: this may fail if the JDBC driver doesn't handle null 
correctly and no column types specified in the SqlRow
+* WARNING: this may fail when no column types specified (because a 
best effort approach is attempted in order to
--- End diff --

This warning should also go into the JavaDocs of the `JdbcOutputFormat` 
class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Flink 3750 fixed

2016-05-13 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1941#discussion_r63161244
  
--- Diff: 
flink-batch-connectors/flink-jdbc/src/test/java/org/apache/flink/api/java/io/jdbc/JDBCTestBase.java
 ---
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.java.io.jdbc;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.SQLException;
+import java.sql.Statement;
+
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+
+/**
+ * Base test class for JDBC Input and Output formats
+ */
+public class JDBCTestBase {
+   
+   public static final String DRIVER_CLASS = 
"org.apache.derby.jdbc.EmbeddedDriver";
+   public static final String DB_URL = "jdbc:derby:memory:ebookshop";
+   public static final String INPUT_TABLE = "books";
+   public static final String OUTPUT_TABLE = "newbooks";
+   public static final String SELECT_ALL_BOOKS = "select * from " + 
INPUT_TABLE;
+   public static final String SELECT_ALL_NEWBOOKS = "select * from " + 
OUTPUT_TABLE;
+   public static final String SELECT_EMPTY = "select * from books WHERE 
QTY < 0";
+   public static final String INSERT_TEMPLATE = "insert into %s (id, 
title, author, price, qty) values (?,?,?,?,?)";
+   public static final String SELECT_ALL_BOOKS_SPLIT_BY_ID = 
JDBCTestBase.SELECT_ALL_BOOKS + " WHERE id BETWEEN ? AND ?";
+   public static final String SELECT_ALL_BOOKS_SPLIT_BY_AUTHOR = 
JDBCTestBase.SELECT_ALL_BOOKS + " WHERE author = ?";
+   
+   protected JDBCInputFormat jdbcInputFormat;
+   protected JDBCOutputFormat jdbcOutputFormat;
+
+   protected static Connection conn;
+
+   public static final Object[][] testData = {
--- End diff --

Oh, yes. Sorry :blush: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3211) Add AWS Kinesis streaming connector

2016-05-13 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282581#comment-15282581
 ] 

Robert Metzger commented on FLINK-3211:
---

I agree with merging the kinesis connector from the #1911 pull request and then 
use follow up JIRAs and pull requests to further enhance it.

We can even release Flink 1.1, without providing a binary for Kinesis.

> Add AWS Kinesis streaming connector
> ---
>
> Key: FLINK-3211
> URL: https://issues.apache.org/jira/browse/FLINK-3211
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> AWS Kinesis is a widely adopted message queue used by AWS users, much like a 
> cloud service version of Apache Kafka. Support for AWS Kinesis will be a 
> great addition to the handful of Flink's streaming connectors to external 
> systems and a great reach out to the AWS community.
> AWS supports two different ways to consume Kinesis data: with the low-level 
> AWS SDK [1], or with the high-level KCL (Kinesis Client Library) [2]. AWS SDK 
> can be used to consume Kinesis data, including stream read beginning from a 
> specific offset (or "record sequence number" in Kinesis terminology). On the 
> other hand, AWS officially recommends using KCL, which offers a higher-level 
> of abstraction that also comes with checkpointing and failure recovery by 
> using a KCL-managed AWS DynamoDB "leash table" as the checkpoint state 
> storage.
> However, KCL is essentially a stream processing library that wraps all the 
> partition-to-task (or "shard" in Kinesis terminology) determination and 
> checkpointing to allow the user to focus only on streaming application logic. 
> This leads to the understanding that we can not use the KCL to implement the 
> Kinesis streaming connector if we are aiming for a deep integration of Flink 
> with Kinesis that provides exactly once guarantees (KCL promises only 
> at-least-once). Therefore, AWS SDK will be the way to go for the 
> implementation of this feature.
> With the ability to read from specific offsets, and also the fact that 
> Kinesis and Kafka share a lot of similarities, the basic principles of the 
> implementation of Flink's Kinesis streaming connector will very much resemble 
> the Kafka connector. We can basically follow the outlines described in 
> [~StephanEwen]'s description [3] and [~rmetzger]'s Kafka connector 
> implementation [4]. A few tweaks due to some of Kinesis v.s. Kafka 
> differences is described as following:
> 1. While the Kafka connector can support reading from multiple topics, I 
> currently don't think this is a good idea for Kinesis streams (a Kinesis 
> Stream is logically equivalent to a Kafka topic). Kinesis streams can exist 
> in different AWS regions, and each Kinesis stream under the same AWS user 
> account may have completely independent access settings with different 
> authorization keys. Overall, a Kinesis stream feels like a much more 
> consolidated resource compared to Kafka topics. It would be great to hear 
> more thoughts on this part.
> 2. While Kafka has brokers that can hold multiple partitions, the only 
> partitioning abstraction for AWS Kinesis is "shards". Therefore, in contrast 
> to the Kafka connector having per broker connections where the connections 
> can handle multiple Kafka partitions, the Kinesis connector will only need to 
> have simple per shard connections.
> 3. Kinesis itself does not support committing offsets back to Kinesis. If we 
> were to implement this feature like the Kafka connector with Kafka / ZK to 
> sync outside view of progress, we probably could use ZK or DynamoDB like the 
> way KCL works. More thoughts on this part will be very helpful too.
> As for the Kinesis Sink, it should be possible to use the AWS KPL (Kinesis 
> Producer Library) [5]. However, for higher code consistency with the proposed 
> Kinesis Consumer, I think it will be better to stick with the AWS SDK for the 
> implementation. The implementation should be straight forward, being almost 
> if not completely the same as the Kafka sink.
> References:
> [1] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-sdk.html
> [2] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html
> [3] 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-Connector-td2872.html
> [4] http://data-artisans.com/kafka-flink-a-practical-how-to/
> [5] 
> http://docs.aws.amazon.com//kinesis/latest/dev/developing-producers-with-kpl.html#d0e4998



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3211) Add AWS Kinesis streaming connector

2016-05-13 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282577#comment-15282577
 ] 

Robert Metzger commented on FLINK-3211:
---

I think we have to support record deaggregation. It can happen that we are 
consuming a stream that was produced by a system using the KPL.

I agree that the out-of-sync configuration is not nice currently. I think we 
should change the producer and make it like the consumer. Then, this is also 
similar to our kafka connector code.

> Add AWS Kinesis streaming connector
> ---
>
> Key: FLINK-3211
> URL: https://issues.apache.org/jira/browse/FLINK-3211
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> AWS Kinesis is a widely adopted message queue used by AWS users, much like a 
> cloud service version of Apache Kafka. Support for AWS Kinesis will be a 
> great addition to the handful of Flink's streaming connectors to external 
> systems and a great reach out to the AWS community.
> AWS supports two different ways to consume Kinesis data: with the low-level 
> AWS SDK [1], or with the high-level KCL (Kinesis Client Library) [2]. AWS SDK 
> can be used to consume Kinesis data, including stream read beginning from a 
> specific offset (or "record sequence number" in Kinesis terminology). On the 
> other hand, AWS officially recommends using KCL, which offers a higher-level 
> of abstraction that also comes with checkpointing and failure recovery by 
> using a KCL-managed AWS DynamoDB "leash table" as the checkpoint state 
> storage.
> However, KCL is essentially a stream processing library that wraps all the 
> partition-to-task (or "shard" in Kinesis terminology) determination and 
> checkpointing to allow the user to focus only on streaming application logic. 
> This leads to the understanding that we can not use the KCL to implement the 
> Kinesis streaming connector if we are aiming for a deep integration of Flink 
> with Kinesis that provides exactly once guarantees (KCL promises only 
> at-least-once). Therefore, AWS SDK will be the way to go for the 
> implementation of this feature.
> With the ability to read from specific offsets, and also the fact that 
> Kinesis and Kafka share a lot of similarities, the basic principles of the 
> implementation of Flink's Kinesis streaming connector will very much resemble 
> the Kafka connector. We can basically follow the outlines described in 
> [~StephanEwen]'s description [3] and [~rmetzger]'s Kafka connector 
> implementation [4]. A few tweaks due to some of Kinesis v.s. Kafka 
> differences is described as following:
> 1. While the Kafka connector can support reading from multiple topics, I 
> currently don't think this is a good idea for Kinesis streams (a Kinesis 
> Stream is logically equivalent to a Kafka topic). Kinesis streams can exist 
> in different AWS regions, and each Kinesis stream under the same AWS user 
> account may have completely independent access settings with different 
> authorization keys. Overall, a Kinesis stream feels like a much more 
> consolidated resource compared to Kafka topics. It would be great to hear 
> more thoughts on this part.
> 2. While Kafka has brokers that can hold multiple partitions, the only 
> partitioning abstraction for AWS Kinesis is "shards". Therefore, in contrast 
> to the Kafka connector having per broker connections where the connections 
> can handle multiple Kafka partitions, the Kinesis connector will only need to 
> have simple per shard connections.
> 3. Kinesis itself does not support committing offsets back to Kinesis. If we 
> were to implement this feature like the Kafka connector with Kafka / ZK to 
> sync outside view of progress, we probably could use ZK or DynamoDB like the 
> way KCL works. More thoughts on this part will be very helpful too.
> As for the Kinesis Sink, it should be possible to use the AWS KPL (Kinesis 
> Producer Library) [5]. However, for higher code consistency with the proposed 
> Kinesis Consumer, I think it will be better to stick with the AWS SDK for the 
> implementation. The implementation should be straight forward, being almost 
> if not completely the same as the Kafka sink.
> References:
> [1] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-sdk.html
> [2] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html
> [3] 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-Connector-td2872.html
> [4] http://data-artisans.com/kafka-flink-a-practical-how-to/
> [5] 
> http://docs.aws.amazon.com//kinesis/latest/dev/developing-producers-with-kpl.html#d0e4998



--

[jira] [Commented] (FLINK-3211) Add AWS Kinesis streaming connector

2016-05-13 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282573#comment-15282573
 ] 

Robert Metzger commented on FLINK-3211:
---

I use this code here: https://github.com/rmetzger/flink-kinesis-test
I build the kinesis connector, then the flink-kinesis-test project, then I 
start the data consumer and data generator


> Add AWS Kinesis streaming connector
> ---
>
> Key: FLINK-3211
> URL: https://issues.apache.org/jira/browse/FLINK-3211
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> AWS Kinesis is a widely adopted message queue used by AWS users, much like a 
> cloud service version of Apache Kafka. Support for AWS Kinesis will be a 
> great addition to the handful of Flink's streaming connectors to external 
> systems and a great reach out to the AWS community.
> AWS supports two different ways to consume Kinesis data: with the low-level 
> AWS SDK [1], or with the high-level KCL (Kinesis Client Library) [2]. AWS SDK 
> can be used to consume Kinesis data, including stream read beginning from a 
> specific offset (or "record sequence number" in Kinesis terminology). On the 
> other hand, AWS officially recommends using KCL, which offers a higher-level 
> of abstraction that also comes with checkpointing and failure recovery by 
> using a KCL-managed AWS DynamoDB "leash table" as the checkpoint state 
> storage.
> However, KCL is essentially a stream processing library that wraps all the 
> partition-to-task (or "shard" in Kinesis terminology) determination and 
> checkpointing to allow the user to focus only on streaming application logic. 
> This leads to the understanding that we can not use the KCL to implement the 
> Kinesis streaming connector if we are aiming for a deep integration of Flink 
> with Kinesis that provides exactly once guarantees (KCL promises only 
> at-least-once). Therefore, AWS SDK will be the way to go for the 
> implementation of this feature.
> With the ability to read from specific offsets, and also the fact that 
> Kinesis and Kafka share a lot of similarities, the basic principles of the 
> implementation of Flink's Kinesis streaming connector will very much resemble 
> the Kafka connector. We can basically follow the outlines described in 
> [~StephanEwen]'s description [3] and [~rmetzger]'s Kafka connector 
> implementation [4]. A few tweaks due to some of Kinesis v.s. Kafka 
> differences is described as following:
> 1. While the Kafka connector can support reading from multiple topics, I 
> currently don't think this is a good idea for Kinesis streams (a Kinesis 
> Stream is logically equivalent to a Kafka topic). Kinesis streams can exist 
> in different AWS regions, and each Kinesis stream under the same AWS user 
> account may have completely independent access settings with different 
> authorization keys. Overall, a Kinesis stream feels like a much more 
> consolidated resource compared to Kafka topics. It would be great to hear 
> more thoughts on this part.
> 2. While Kafka has brokers that can hold multiple partitions, the only 
> partitioning abstraction for AWS Kinesis is "shards". Therefore, in contrast 
> to the Kafka connector having per broker connections where the connections 
> can handle multiple Kafka partitions, the Kinesis connector will only need to 
> have simple per shard connections.
> 3. Kinesis itself does not support committing offsets back to Kinesis. If we 
> were to implement this feature like the Kafka connector with Kafka / ZK to 
> sync outside view of progress, we probably could use ZK or DynamoDB like the 
> way KCL works. More thoughts on this part will be very helpful too.
> As for the Kinesis Sink, it should be possible to use the AWS KPL (Kinesis 
> Producer Library) [5]. However, for higher code consistency with the proposed 
> Kinesis Consumer, I think it will be better to stick with the AWS SDK for the 
> implementation. The implementation should be straight forward, being almost 
> if not completely the same as the Kafka sink.
> References:
> [1] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-sdk.html
> [2] 
> http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html
> [3] 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-Connector-td2872.html
> [4] http://data-artisans.com/kafka-flink-a-practical-how-to/
> [5] 
> http://docs.aws.amazon.com//kinesis/latest/dev/developing-producers-with-kpl.html#d0e4998



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3905) Add KafkaOutputFormat (DataSet API)

2016-05-13 Thread Maximilian Bode (JIRA)
Maximilian Bode created FLINK-3905:
--

 Summary: Add KafkaOutputFormat (DataSet API)
 Key: FLINK-3905
 URL: https://issues.apache.org/jira/browse/FLINK-3905
 Project: Flink
  Issue Type: New Feature
  Components: Kafka Connector
Affects Versions: 1.0.3
Reporter: Maximilian Bode
Assignee: Maximilian Bode


Right now, Flink can ingest records from and write records to Kafka in the 
DataStream API, via the {{FlinkKafkaConsumer08}} and {{FlinkKafkaProducer08}} 
and the corresponding classes for Kafka 0.9. In Flink batch jobs, interaction 
with Kafka is currently not supported.

If there is an easy way to create an inverse to the OutputFormatSinkFunction, 
something like a SinkFunctionOutputFormat, this might be the way to go?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3776] Flink Scala shell does not allow ...

2016-05-13 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1945#issuecomment-218993072
  
Thanks for the PR!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3887) Improve dependency management for building docs

2016-05-13 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282554#comment-15282554
 ] 

Maximilian Michels commented on FLINK-3887:
---

The related Infra issue

> Improve dependency management for building docs
> ---
>
> Key: FLINK-3887
> URL: https://issues.apache.org/jira/browse/FLINK-3887
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>
> Our nightly docs builds currently fail: 
> https://ci.apache.org/builders/flink-docs-master/
> I will file an issue with JIRA to fix it. The root cause is that we rely on a 
> couple of dependencies to be installed. We could circumvent this by providing 
> a Ruby Gemfile that we can then use to load necessary dependencies. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3776] Flink Scala shell does not allow ...

2016-05-13 Thread eastcirclek
Github user eastcirclek commented on the pull request:

https://github.com/apache/flink/pull/1945#issuecomment-218992214
  
Okay, I got the idea. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3904) GlobalConfiguration doesn't ensure config has been loaded

2016-05-13 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3904:
-

 Summary: GlobalConfiguration doesn't ensure config has been loaded
 Key: FLINK-3904
 URL: https://issues.apache.org/jira/browse/FLINK-3904
 Project: Flink
  Issue Type: Improvement
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
 Fix For: 1.1.0


By default, {{GlobalConfiguration}} returns an empty Configuration. Instead, a 
call to {{get()}} should fail if the config hasn't been loaded explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3776) Flink Scala shell does not allow to set configuration for local execution

2016-05-13 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-3776.
---
   Resolution: Fixed
 Assignee: Dongwon Kim
Fix Version/s: 1.1.0

Fixed with 099fdfa0c5789f509242f83e8f808d552e63ee8d

> Flink Scala shell does not allow to set configuration for local execution
> -
>
> Key: FLINK-3776
> URL: https://issues.apache.org/jira/browse/FLINK-3776
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala Shell
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Dongwon Kim
>Priority: Minor
> Fix For: 1.1.0
>
>
> Flink's Scala shell starts a {{LocalFlinkMiniCluster}} with an empty 
> configuration when the shell is started in local mode. In order to allow the 
> user to configure the mini cluster, e.g., number of slots, size of memory, it 
> would be good to forward a user specified configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3776] Flink Scala shell does not allow ...

2016-05-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1945


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3776] Flink Scala shell does not allow ...

2016-05-13 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1945#issuecomment-218987472
  
`GlobalConfiguration` doesn't ensure that the config has been loaded when 
you call `get()`. It will give you an empty `Configuration` if you do not call 
`loadConfiguration` explicitly. If you pass the config after you called the 
load method, it is clear that the config has been loaded.

Your code works, I'll will just open a follow-up issue to make 
GlobalConfiguration more explicit, i.e. fail on `get()` if the config hasn't 
been loaded explicitly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-3852) Use a StreamExecutionEnvironment in the quickstart job skeleton

2016-05-13 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3852:
--
Assignee: Mark Reddy

> Use a StreamExecutionEnvironment in the quickstart job skeleton
> ---
>
> Key: FLINK-3852
> URL: https://issues.apache.org/jira/browse/FLINK-3852
> Project: Flink
>  Issue Type: Task
>  Components: Quickstarts
>Reporter: Robert Metzger
>Assignee: Mark Reddy
>  Labels: starter
>
> The Job skeleton created by the maven archetype "quickstart" is still setting 
> up an ExecutionEnvironment, not a StreamExecutionEnvironment.
> These days, most users are using Flink for streaming, so we should reflect 
> that in the quickstart as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-13 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1947#issuecomment-218983234
  
The tests should now pass.

I've fixed the following issues:
* several examples had non-transient Metric fields
* removed one failing example (CollectionInputFormat; accessed Context in 
constructor)
* IOMetrics are not granular, effectively gathering the same metrics we 
gather currently
* Resolved several test issues
 * General:
   * Introduced several DummyMetricGroups for use in tests
* mocked Context/Environments now return DummyMetricGroups
 * Core:
   * RuntimeUDFContext constructor modified to take a MetricGroup argument
* Integrated metrics into the CollectionExecutor
 * Table API: asterisks (introduced by select *) were not properly removed 
by the JMXReporter
 * Streaming: StreamingConfig ChainIndex default is now 0 instead of -1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Refactor StreamSourceContext

2016-05-13 Thread zentol
Github user zentol closed the pull request at:

https://github.com/apache/flink/pull/1886


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Refactor MockStreamSourceContext

2016-05-13 Thread zentol
Github user zentol closed the pull request at:

https://github.com/apache/flink/pull/1889


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3655] Multiple File Paths for InputFile...

2016-05-13 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1987#issuecomment-218971863
  
Hi @gna-phetsarath, thanks for the PR! I haven't had a detailed look at it 
yet, but I noticed that you added a Guava dependency to `flink-core`. We are 
currently trying to reduce Flink's dependencies on Guava (and want to get rid 
of it completely eventually) because other libraries we depend on use different 
versions of Guava.
Can you remove the Guava dependency? 

I will try to do a more detailed review in the next days. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-3878) File cache doesn't support multiple duplicate temp directories

2016-05-13 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-3878.

   Resolution: Fixed
 Assignee: Ken Krugler
Fix Version/s: 1.1.0
   1.0.4

Fixed for 1.0.4 with 4ae55cfd74d5b43eb4ab6e3a36fa6d8ca15d665d
Fixed for 1.1.0 with 3c90d3654c84f2a5f58564d2243f6a6e83da3fba

Thanks for the fix!

> File cache doesn't support multiple duplicate temp directories
> --
>
> Key: FLINK-3878
> URL: https://issues.apache.org/jira/browse/FLINK-3878
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime, Local Runtime
>Affects Versions: 1.1.0, 1.0.2
>Reporter: Ken Krugler
>Assignee: Ken Krugler
> Fix For: 1.0.4, 1.1.0
>
> Attachments: FLINK-3878.patch
>
>
> Based on 
> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html, you 
> should be able to specify the same temp directory name multiple times. This 
> works for some of the Flink infrastructure (e.g. the TaskManager's temp file 
> directory), but not for FileCache.
> The problem is that the FileCache() constructor tries to use the same random 
> directory name for each of the specified temp dir locations, so after the 
> first directory is created, the second create fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3855) Upgrade Jackson version

2016-05-13 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-3855.

   Resolution: Done
Fix Version/s: 1.1.0

Done with 780c7f317931b11f8d8e3c7c857f5d8611603a8a

Thanks for the contribution!

> Upgrade Jackson version
> ---
>
> Key: FLINK-3855
> URL: https://issues.apache.org/jira/browse/FLINK-3855
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.3
>Reporter: Tatu Saloranta
>Priority: Minor
> Fix For: 1.1.0
>
>
> Jackson version in use (2.4.2) is rather old (and not even the latest patch 
> from minor version), so it'd be make sense to upgrade to bit newer. Latest 
> would be 2.7.4, but at first I propose going to 2.5.5.
> All tests pass, but if there are issues I'd be happy to help; I'm author of 
> Jackson project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3856) Create types for java.sql.Date/Time/Timestamp

2016-05-13 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-3856.

   Resolution: Implemented
Fix Version/s: 1.1.0

Implemented with bbd02d24bc7547e2c9384d713b20f86682cac08c

> Create types for java.sql.Date/Time/Timestamp
> -
>
> Key: FLINK-3856
> URL: https://issues.apache.org/jira/browse/FLINK-3856
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Timo Walther
>Assignee: Timo Walther
> Fix For: 1.1.0
>
>
> At the moment there is only the {{Date}} type which is not sufficient for 
> most use cases about time.
> The Table API would also benefit from having different types as output result.
> I would propose to add the three {{java.sql.}} types either as {{BasicTypes}} 
> or in an additional class {{TimeTypes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Jackson version upgrade: default from 2.4.2 to...

2016-05-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1952


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3856] [core] Create types for java.sql....

2016-05-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1959


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-3878: File cache doesn't support multipl...

2016-05-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1965


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-3878: File cache doesn't support multipl...

2016-05-13 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1965#issuecomment-218968027
  
Merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Jackson version upgrade: default from 2.4.2 to...

2016-05-13 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1952#issuecomment-218967944
  
Merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3856] [core] Create types for java.sql....

2016-05-13 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1959#issuecomment-218967984
  
Merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---