[jira] [Updated] (FLINK-3680) Remove or improve (not set) text in the Job Plan UI

2016-03-29 Thread Jamie Grier (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jamie Grier updated FLINK-3680:
---
Attachment: Screen Shot 2016-03-29 at 8.12.17 PM.png
Screen Shot 2016-03-29 at 8.13.12 PM.png

> Remove or improve (not set) text in the Job Plan UI
> ---
>
> Key: FLINK-3680
> URL: https://issues.apache.org/jira/browse/FLINK-3680
> Project: Flink
>  Issue Type: Bug
>  Components: Webfrontend
>Reporter: Jamie Grier
> Attachments: Screen Shot 2016-03-29 at 8.12.17 PM.png, Screen Shot 
> 2016-03-29 at 8.13.12 PM.png
>
>
> When running streaming jobs the UI display (not set) in the UI in a few 
> different places.  This is not the case for batch jobs.
> To illustrate I've included screen shots of the UI for the batch and 
> streaming WordCount examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3680) Remove or improve (not set) text in the Job Plan UI

2016-03-29 Thread Jamie Grier (JIRA)
Jamie Grier created FLINK-3680:
--

 Summary: Remove or improve (not set) text in the Job Plan UI
 Key: FLINK-3680
 URL: https://issues.apache.org/jira/browse/FLINK-3680
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Reporter: Jamie Grier


When running streaming jobs the UI display (not set) in the UI in a few 
different places.  This is not the case for batch jobs.

To illustrate I've included screen shots of the UI for the batch and streaming 
WordCount examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2998) Support range partition comparison for multi input nodes.

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217301#comment-15217301
 ] 

ASF GitHub Bot commented on FLINK-2998:
---

Github user gallenvara commented on the pull request:

https://github.com/apache/flink/pull/1838#issuecomment-203216796
  
@fhueske @ChengXiangLi Can you please help with review? :)


> Support range partition comparison for multi input nodes.
> -
>
> Key: FLINK-2998
> URL: https://issues.apache.org/jira/browse/FLINK-2998
> Project: Flink
>  Issue Type: New Feature
>  Components: Optimizer
>Reporter: Chengxiang Li
>Priority: Minor
>
> The optimizer may have potential opportunity to optimize the DAG while it 
> found two input range partition are equivalent, we does not support the 
> comparison yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2998] Support range partition compariso...

2016-03-29 Thread gallenvara
Github user gallenvara commented on the pull request:

https://github.com/apache/flink/pull/1838#issuecomment-203216796
  
@fhueske @ChengXiangLi Can you please help with review? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2998) Support range partition comparison for multi input nodes.

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217300#comment-15217300
 ] 

ASF GitHub Bot commented on FLINK-2998:
---

GitHub user gallenvara opened a pull request:

https://github.com/apache/flink/pull/1838

[FLINK-2998] Support range partition comparison for multi input nodes.

The PR implements range partition comparison in operation such as join and 
cogroup for multi inputs, now optimizer can optimize the dag to avoid 
re-partition if it find the data distributions user supplied are equal.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gallenvara/flink flink-2998

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1838


commit 37e6147a829e50ba8a45c26f225e16e7695f6489
Author: gallenvara 
Date:   2016-03-29T14:36:21Z

Support range partition comparison for multi input nodes.




> Support range partition comparison for multi input nodes.
> -
>
> Key: FLINK-2998
> URL: https://issues.apache.org/jira/browse/FLINK-2998
> Project: Flink
>  Issue Type: New Feature
>  Components: Optimizer
>Reporter: Chengxiang Li
>Priority: Minor
>
> The optimizer may have potential opportunity to optimize the DAG while it 
> found two input range partition are equivalent, we does not support the 
> comparison yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2998] Support range partition compariso...

2016-03-29 Thread gallenvara
GitHub user gallenvara opened a pull request:

https://github.com/apache/flink/pull/1838

[FLINK-2998] Support range partition comparison for multi input nodes.

The PR implements range partition comparison in operation such as join and 
cogroup for multi inputs, now optimizer can optimize the dag to avoid 
re-partition if it find the data distributions user supplied are equal.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gallenvara/flink flink-2998

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1838


commit 37e6147a829e50ba8a45c26f225e16e7695f6489
Author: gallenvara 
Date:   2016-03-29T14:36:21Z

Support range partition comparison for multi input nodes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (FLINK-3670) Kerberos: Improving long-running streaming jobs

2016-03-29 Thread Eron Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217092#comment-15217092
 ] 

Eron Wright  edited comment on FLINK-3670 at 3/29/16 11:57 PM:
---

Another possibility worth considering is to leverage Hadoop's 'proxy user' 
functionality.
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html

In this approach, the JobManager impersonates the job submitter when accessing 
HDFS, HBASE, or Hive.  Those servers would be configured to treat the 
JobManager principal as a superuser.

Note that the above solution isn't general, since Kafka (for example) doesn't 
provide proxy user functionality.Maybe both options could be provided.


was (Author: eronwright):
Another possibility worth considering is to leverage Hadoop's 'proxy user' 
functionality.
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html

In this approach, the JobManager impersonates the job submitter when accessing 
HDFS, HBASE, or Hive.  Those servers would be configured to treat the 
JobManager principal as a proxy user.

Note that the above solution isn't general, since Kafka (for example) doesn't 
provide proxy user functionality.Maybe both options could be provided.

> Kerberos: Improving long-running streaming jobs
> ---
>
> Key: FLINK-3670
> URL: https://issues.apache.org/jira/browse/FLINK-3670
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client, Local Runtime
>Reporter: Maximilian Michels
>
> We have seen in the past, that Hadoop's delegation tokens are subject to a 
> number of subtle token renewal bugs. In addition, they have a maximum life 
> time that can be worked around but is very inconvenient for the user.
> As per [mailing list 
> discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Kerberos-for-Streaming-amp-Kafka-td10906.html],
>  a way to work around the maximum life time of DelegationTokens would be to 
> pass the Kerberos principal and key tab upon job submission. A daemon could 
> then periodically renew the ticket. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3670) Kerberos: Improving long-running streaming jobs

2016-03-29 Thread Eron Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217092#comment-15217092
 ] 

Eron Wright  commented on FLINK-3670:
-

Another possibility worth considering is to leverage Hadoop's 'proxy user' 
functionality.
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html

In this approach, the JobManager impersonates the job submitter when accessing 
HDFS, HBASE, or Hive.  Those servers would be configured to treat the 
JobManager principal as a proxy user.

Note that the above solution isn't general, since Kafka (for example) doesn't 
provide proxy user functionality.Maybe both options could be provided.

> Kerberos: Improving long-running streaming jobs
> ---
>
> Key: FLINK-3670
> URL: https://issues.apache.org/jira/browse/FLINK-3670
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client, Local Runtime
>Reporter: Maximilian Michels
>
> We have seen in the past, that Hadoop's delegation tokens are subject to a 
> number of subtle token renewal bugs. In addition, they have a maximum life 
> time that can be worked around but is very inconvenient for the user.
> As per [mailing list 
> discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Kerberos-for-Streaming-amp-Kafka-td10906.html],
>  a way to work around the maximum life time of DelegationTokens would be to 
> pass the Kerberos principal and key tab upon job submission. A daemon could 
> then periodically renew the ticket. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3579) Improve String concatenation

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217029#comment-15217029
 ] 

ASF GitHub Bot commented on FLINK-3579:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57815729
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala
 ---
@@ -116,7 +118,18 @@ object ExpressionParser extends JavaTokenParsers with 
PackratParsers {
 atom <~ ".cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(STRING)" ^^ { e => Cast(e, 
BasicTypeInfo.STRING_TYPE_INFO) } |
-atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) }
+atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) 
} |
+// When an integer is directly casted.
+atom <~ "cast(BYTE)" ^^ { e => Cast(e, BasicTypeInfo.BYTE_TYPE_INFO) } 
|
+atom <~ "cast(SHORT)" ^^ { e => Cast(e, BasicTypeInfo.SHORT_TYPE_INFO) 
} |
+atom <~ "cast(INT)" ^^ { e => Cast(e, BasicTypeInfo.INT_TYPE_INFO) } |
+atom <~ "cast(LONG)" ^^ { e => Cast(e, BasicTypeInfo.LONG_TYPE_INFO) } 
|
+atom <~ "cast(FLOAT)" ^^ { e => Cast(e, BasicTypeInfo.FLOAT_TYPE_INFO) 
} |
+atom <~ "cast(DOUBLE)" ^^ { e => Cast(e, 
BasicTypeInfo.DOUBLE_TYPE_INFO) } |
+atom <~ "cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
+atom <~ "cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
--- End diff --

This is about the Table API. SQL will be parsed by Calcite. So it is up to 
us what we accept.


> Improve String concatenation
> 
>
> Key: FLINK-3579
> URL: https://issues.apache.org/jira/browse/FLINK-3579
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Timo Walther
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
>
> Concatenation of a String and non-String does not work properly.
> e.g. {{f0 + 42}} leads to RelBuilder Exception
> ExpressionParser does not like {{f0 + 42.cast(STRING)}} either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3579 Improve String concatenation (Ram)

2016-03-29 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57815729
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala
 ---
@@ -116,7 +118,18 @@ object ExpressionParser extends JavaTokenParsers with 
PackratParsers {
 atom <~ ".cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(STRING)" ^^ { e => Cast(e, 
BasicTypeInfo.STRING_TYPE_INFO) } |
-atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) }
+atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) 
} |
+// When an integer is directly casted.
+atom <~ "cast(BYTE)" ^^ { e => Cast(e, BasicTypeInfo.BYTE_TYPE_INFO) } 
|
+atom <~ "cast(SHORT)" ^^ { e => Cast(e, BasicTypeInfo.SHORT_TYPE_INFO) 
} |
+atom <~ "cast(INT)" ^^ { e => Cast(e, BasicTypeInfo.INT_TYPE_INFO) } |
+atom <~ "cast(LONG)" ^^ { e => Cast(e, BasicTypeInfo.LONG_TYPE_INFO) } 
|
+atom <~ "cast(FLOAT)" ^^ { e => Cast(e, BasicTypeInfo.FLOAT_TYPE_INFO) 
} |
+atom <~ "cast(DOUBLE)" ^^ { e => Cast(e, 
BasicTypeInfo.DOUBLE_TYPE_INFO) } |
+atom <~ "cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
+atom <~ "cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
--- End diff --

This is about the Table API. SQL will be parsed by Calcite. So it is up to 
us what we accept.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3477) Add hash-based combine strategy for ReduceFunction

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216998#comment-15216998
 ] 

ASF GitHub Bot commented on FLINK-3477:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-203143579
  
Thanks for doing these experiments! The results are quite convincing. I'm 
currently on vacation and will be back in about a week. 


> Add hash-based combine strategy for ReduceFunction
> --
>
> Key: FLINK-3477
> URL: https://issues.apache.org/jira/browse/FLINK-3477
> Project: Flink
>  Issue Type: Sub-task
>  Components: Local Runtime
>Reporter: Fabian Hueske
>
> This issue is about adding a hash-based combine strategy for ReduceFunctions.
> The interface of the {{reduce()}} method is as follows:
> {code}
> public T reduce(T v1, T v2)
> {code}
> Input type and output type are identical and the function returns only a 
> single value. A Reduce function is incrementally applied to compute a final 
> aggregated value. This allows to hold the preaggregated value in a hash-table 
> and update it with each function call. 
> The hash-based strategy requires special implementation of an in-memory hash 
> table. The hash table should support in place updates of elements (if the 
> updated value has the same size as the new value) but also appending updates 
> with invalidation of the old value (if the binary length of the new value 
> differs). The hash table needs to be able to evict and emit all elements if 
> it runs out-of-memory.
> We should also add {{HASH}} and {{SORT}} compiler hints to 
> {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the 
> execution strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3477] [runtime] Add hash-based combine ...

2016-03-29 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-203143579
  
Thanks for doing these experiments! The results are quite convincing. I'm 
currently on vacation and will be back in about a week. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3673) Annotations for code generation

2016-03-29 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216988#comment-15216988
 ] 

Fabian Hueske commented on FLINK-3673:
--

Hi [~horvgab], can you add more detail about the annotations you are proposing 
with this issue? 
What would the semantics be and how could they be used to generate more 
efficient code for serializers? Thanks

> Annotations for code generation
> ---
>
> Key: FLINK-3673
> URL: https://issues.apache.org/jira/browse/FLINK-3673
> Project: Flink
>  Issue Type: Sub-task
>  Components: Type Serialization System
>Reporter: Gabor Horvath
>Assignee: Gabor Horvath
>  Labels: gsoc2016
>
> Annotations should be utilized to generate more efficient serialization code.
> The very same annotations can be used to make the getLength method much 
> smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3679) DeserializationSchema should handle zero or more outputs for every input

2016-03-29 Thread Jamie Grier (JIRA)
Jamie Grier created FLINK-3679:
--

 Summary: DeserializationSchema should handle zero or more outputs 
for every input
 Key: FLINK-3679
 URL: https://issues.apache.org/jira/browse/FLINK-3679
 Project: Flink
  Issue Type: Bug
  Components: DataStream API
Reporter: Jamie Grier


There are a couple of issues with the DeserializationSchema API that I think 
should be improved.  This request has come to me via an existing Flink user.

The main issue is simply that the API assumes that there is a one-to-one 
mapping between input and outputs.  In reality there are scenarios where one 
input message (say from Kafka) might actually map to zero or more logical 
elements in the pipeline.

Particularly important here is the case where you receive a message from a 
source (such as Kafka) and say the raw bytes don't deserialize properly.  Right 
now the only recourse is to throw IOException and therefore fail the job.  

This is definitely not good since bad data is a reality and failing the job is 
not the right option.  If the job fails we'll just end up replaying the bad 
data and the whole thing will start again.

Instead in this case it would be best if the user could just return the empty 
set.

The other case is where one input message should logically be multiple output 
messages.  This case is probably less important since there are other ways to 
do this but in general it might be good to make the 
DeserializationSchema.deserialize() method return a collection rather than a 
single element.

Maybe we need to support a DeserializationSchema variant that has semantics 
more like that of FlatMap.







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2157) Create evaluation framework for ML library

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216749#comment-15216749
 ] 

ASF GitHub Bot commented on FLINK-2157:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/871#issuecomment-203075630
  
@rawkintrevo  AFAIK it's lack of time from a commiter to review it. If 
@tillrohrmann can find some time to review this I'll refactor it to get rid of 
the conflicts and hopefully we can merge this and move on to #891 and #902 


> Create evaluation framework for ML library
> --
>
> Key: FLINK-2157
> URL: https://issues.apache.org/jira/browse/FLINK-2157
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Theodore Vasiloudis
>  Labels: ML
> Fix For: 1.0.0
>
>
> Currently, FlinkML lacks means to evaluate the performance of trained models. 
> It would be great to add some {{Evaluators}} which can calculate some score 
> based on the information about true and predicted labels. This could also be 
> used for the cross validation to choose the right hyper parameters.
> Possible scores could be F score [1], zero-one-loss score, etc.
> Resources
> [1] [http://en.wikipedia.org/wiki/F1_score]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-03-29 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/871#issuecomment-203075630
  
@rawkintrevo  AFAIK it's lack of time from a commiter to review it. If 
@tillrohrmann can find some time to review this I'll refactor it to get rid of 
the conflicts and hopefully we can merge this and move on to #891 and #902 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2157) Create evaluation framework for ML library

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216721#comment-15216721
 ] 

ASF GitHub Bot commented on FLINK-2157:
---

Github user rawkintrevo commented on the pull request:

https://github.com/apache/flink/pull/871#issuecomment-203067582
  
Continued from mailing list:  Till already mentioned that having Rsquared 
built in to MLR was just a convenience method, it's not good for a number of 
reasons in practice.  

Also- what is the hold up on this PR? what needs to be done/ what are the 
remaining things to decide? Having some model scoring would be very handy. 


> Create evaluation framework for ML library
> --
>
> Key: FLINK-2157
> URL: https://issues.apache.org/jira/browse/FLINK-2157
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Theodore Vasiloudis
>  Labels: ML
> Fix For: 1.0.0
>
>
> Currently, FlinkML lacks means to evaluate the performance of trained models. 
> It would be great to add some {{Evaluators}} which can calculate some score 
> based on the information about true and predicted labels. This could also be 
> used for the cross validation to choose the right hyper parameters.
> Possible scores could be F score [1], zero-one-loss score, etc.
> Resources
> [1] [http://en.wikipedia.org/wiki/F1_score]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-03-29 Thread rawkintrevo
Github user rawkintrevo commented on the pull request:

https://github.com/apache/flink/pull/871#issuecomment-203067582
  
Continued from mailing list:  Till already mentioned that having Rsquared 
built in to MLR was just a convenience method, it's not good for a number of 
reasons in practice.  

Also- what is the hold up on this PR? what needs to be done/ what are the 
remaining things to decide? Having some model scoring would be very handy. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3678) Make Flink logs directory configurable

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216397#comment-15216397
 ] 

ASF GitHub Bot commented on FLINK-3678:
---

GitHub user stefanobaghino opened a pull request:

https://github.com/apache/flink/pull/1837

[FLINK-3678] Make Flink logs directory configurable

* Edit config.sh
* Document the newly defined log directory configuration key

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/radicalbit/flink 3678

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1837.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1837


commit 05002c01345425d6fe9814ea7f669630fa5514b3
Author: Stefano Baghino 
Date:   2016-03-29T17:10:46Z

[FLINK-3678] Make Flink logs directory configurable

* Edit config.sh
* Document the newly defined log directory configuration key




> Make Flink logs directory configurable
> --
>
> Key: FLINK-3678
> URL: https://issues.apache.org/jira/browse/FLINK-3678
> Project: Flink
>  Issue Type: Improvement
>  Components: Start-Stop Scripts
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Stefano Baghino
>Priority: Minor
> Fix For: 1.0.1
>
>
> Currently Flink logs are stored under {{$FLINK_HOME/log}} and the user cannot 
> configure an alternative storage location. It would be nice to add a 
> configuration key in the {{flink-conf.yaml}} and edit the {{bin/flink}} 
> launch script accordingly to get the value if present or default to the 
> current behavior if no value is provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3678] Make Flink logs directory configu...

2016-03-29 Thread stefanobaghino
GitHub user stefanobaghino opened a pull request:

https://github.com/apache/flink/pull/1837

[FLINK-3678] Make Flink logs directory configurable

* Edit config.sh
* Document the newly defined log directory configuration key

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/radicalbit/flink 3678

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1837.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1837


commit 05002c01345425d6fe9814ea7f669630fa5514b3
Author: Stefano Baghino 
Date:   2016-03-29T17:10:46Z

[FLINK-3678] Make Flink logs directory configurable

* Edit config.sh
* Document the newly defined log directory configuration key




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3651) Fix faulty RollingSink Restore

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216361#comment-15216361
 ] 

ASF GitHub Bot commented on FLINK-3651:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1830


> Fix faulty RollingSink Restore
> --
>
> Key: FLINK-3651
> URL: https://issues.apache.org/jira/browse/FLINK-3651
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 1.1.0, 1.0.1
>
>
> The RollingSink restore logic has a bug where the sink for subtask index 1 
> also removes files for subtask index 11 because the regex that checks for the 
> file name also matches that one. Adding the suffix to the regex should solve 
> the problem because then the regex for 1 will only match files for subtask 
> index 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3651) Fix faulty RollingSink Restore

2016-03-29 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3651.
--
   Resolution: Fixed
Fix Version/s: 1.0.1
   1.1.0

Fixed in 580a177 (master), 2089029 (release-1.0)

> Fix faulty RollingSink Restore
> --
>
> Key: FLINK-3651
> URL: https://issues.apache.org/jira/browse/FLINK-3651
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 1.1.0, 1.0.1
>
>
> The RollingSink restore logic has a bug where the sink for subtask index 1 
> also removes files for subtask index 11 because the regex that checks for the 
> file name also matches that one. Adding the suffix to the regex should solve 
> the problem because then the regex for 1 will only match files for subtask 
> index 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3651] Fix faulty RollingSink Restore

2016-03-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1830


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3651) Fix faulty RollingSink Restore

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216351#comment-15216351
 ] 

ASF GitHub Bot commented on FLINK-3651:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1830#issuecomment-203004909
  
I'm going to merge this to `master` and `release-1.0`.


> Fix faulty RollingSink Restore
> --
>
> Key: FLINK-3651
> URL: https://issues.apache.org/jira/browse/FLINK-3651
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The RollingSink restore logic has a bug where the sink for subtask index 1 
> also removes files for subtask index 11 because the regex that checks for the 
> file name also matches that one. Adding the suffix to the regex should solve 
> the problem because then the regex for 1 will only match files for subtask 
> index 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3651] Fix faulty RollingSink Restore

2016-03-29 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1830#issuecomment-203004909
  
I'm going to merge this to `master` and `release-1.0`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3678) Make Flink logs directory configurable

2016-03-29 Thread Stefano Baghino (JIRA)
Stefano Baghino created FLINK-3678:
--

 Summary: Make Flink logs directory configurable
 Key: FLINK-3678
 URL: https://issues.apache.org/jira/browse/FLINK-3678
 Project: Flink
  Issue Type: Improvement
  Components: Start-Stop Scripts
Affects Versions: 1.0.0
Reporter: Stefano Baghino
Assignee: Stefano Baghino
Priority: Minor
 Fix For: 1.0.1


Currently Flink logs are stored under {{$FLINK_HOME/log}} and the user cannot 
configure an alternative storage location. It would be nice to add a 
configuration key in the {{flink-conf.yaml}} and edit the {{bin/flink}} launch 
script accordingly to get the value if present or default to the current 
behavior if no value is provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns

2016-03-29 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216289#comment-15216289
 ] 

Maximilian Michels commented on FLINK-3677:
---

Linked FLINK-3655 which is related to this issue. 

> FileInputFormat: Allow to specify include/exclude file name patterns
> 
>
> Key: FLINK-3677
> URL: https://issues.apache.org/jira/browse/FLINK-3677
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Maximilian Michels
>Priority: Minor
>  Labels: starter
>
> It would be nice to be able to specify a regular expression to filter files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns

2016-03-29 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3677:
-

 Summary: FileInputFormat: Allow to specify include/exclude file 
name patterns
 Key: FLINK-3677
 URL: https://issues.apache.org/jira/browse/FLINK-3677
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Maximilian Michels
Priority: Minor


It would be nice to be able to specify a regular expression to filter files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3667) Generalize client<->cluster communication

2016-03-29 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216222#comment-15216222
 ] 

Maximilian Michels commented on FLINK-3667:
---

+1 Makes sense to convert the {{Client}} class to a base class.

> Generalize client<->cluster communication
> -
>
> Key: FLINK-3667
> URL: https://issues.apache.org/jira/browse/FLINK-3667
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>
> Here are some notes I took when inspecting the client<->cluster classes with 
> regard to future integration of other resource management frameworks in 
> addition to Yarn (e.g. Mesos).
> {noformat}
> 1 Cluster Client Abstraction
> 
> 1.1 Status Quo
> ──
> 1.1.1 FlinkYarnClient
> ╌
>   • Holds the cluster configuration (Flink-specific and Yarn-specific)
>   • Contains the deploy() method to deploy the cluster
>   • Creates the Hadoop Yarn client
>   • Receives the initial job manager address
>   • Bootstraps the FlinkYarnCluster
> 1.1.2 FlinkYarnCluster
> ╌╌
>   • Wrapper around the Hadoop Yarn client
>   • Queries cluster for status updates
>   • Life time methods to start and shutdown the cluster
>   • Flink specific features like shutdown after job completion
> 1.1.3 ApplicationClient
> ╌╌╌
>   • Acts as a middle-man for asynchronous cluster communication
>   • Designed to communicate with Yarn, not used in Standalone mode
> 1.1.4 CliFrontend
> ╌
>   • Deeply integrated with FlinkYarnClient and FlinkYarnCluster
>   • Constantly distinguishes between Yarn and Standalone mode
>   • Would be nice to have a general abstraction in place
> 1.1.5 Client
> 
>   • Job submission and Job related actions, agnostic of resource framework
> 1.2 Proposal
> 
> 1.2.1 ClusterConfig (before: AbstractFlinkYarnClient)
> ╌
>   • Extensible cluster-agnostic config
>   • May be extended by specific cluster, e.g. YarnClusterConfig
> 1.2.2 ClusterClient (before: AbstractFlinkYarnClient)
> ╌
>   • Deals with cluster (RM) specific communication
>   • Exposes framework agnostic information
>   • YarnClusterClient, MesosClusterClient, StandaloneClusterClient
> 1.2.3 FlinkCluster (before: AbstractFlinkYarnCluster)
> ╌
>   • Basic interface to communicate with a running cluster
>   • Receives the ClusterClient for cluster-specific communication
>   • Should not have to care about the specific implementations of the
> client
> 1.2.4 ApplicationClient
> ╌╌╌
>   • Can be changed to work cluster-agnostic (first steps already in
> FLINK-3543)
> 1.2.5 CliFrontend
> ╌
>   • CliFrontend does never have to differentiate between different
> cluster types after it has determined which cluster class to load.
>   • Base class handles framework agnostic command line arguments
>   • Pluggables for Yarn, Mesos handle specific commands
> {noformat}
> I would like to create/refactor the affected classes to set us up for a more 
> flexible client side resource management abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3524] [kafka] Add JSONDeserializationSc...

2016-03-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1834#issuecomment-202964440
  
Looks good.
Is this a Kafka-only util?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3524) Provide a JSONDeserialisationSchema in the kafka connector package

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216193#comment-15216193
 ] 

ASF GitHub Bot commented on FLINK-3524:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1834#issuecomment-202964440
  
Looks good.
Is this a Kafka-only util?


> Provide a JSONDeserialisationSchema in the kafka connector package
> --
>
> Key: FLINK-3524
> URL: https://issues.apache.org/jira/browse/FLINK-3524
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>  Labels: starter
>
> (I don't want to include this into 1.0.0)
> Currently, there is no standardized way of parsing JSON data from a Kafka 
> stream. I see a lot of users using JSON in their topics. It would make things 
> easier for our users to provide a serializer for them.
> I suggest to use the jackson library because we have that aready as a 
> dependency in Flink and it allows to parse from a byte[].
> I would suggest to provide the following classes:
>  - JSONDeserializationSchema()
>  - JSONDeKeyValueSerializationSchema(bool includeMetadata)
> The second variant should produce a record like this:
> {code}
> {"key": "keydata",
> "value": "valuedata",
> "metadata": {"offset": 123, "topic": "", "partition": 2 }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3676) WebClient hasn't been removed from the docs

2016-03-29 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3676.
-
Resolution: Fixed

master: 875cb448d7407687402eec15c77c24683c2d5c56
release-1.0: 875cb448d7407687402eec15c77c24683c2d5c56

> WebClient hasn't been removed from the docs
> ---
>
> Key: FLINK-3676
> URL: https://issues.apache.org/jira/browse/FLINK-3676
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0, 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3676) WebClient hasn't been removed from the docs

2016-03-29 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3676:
-

 Summary: WebClient hasn't been removed from the docs
 Key: FLINK-3676
 URL: https://issues.apache.org/jira/browse/FLINK-3676
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.0.0, 1.1.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: 1.1.0, 1.0.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3675) YARN ship folder incosistent behavior

2016-03-29 Thread Stefano Baghino (JIRA)
Stefano Baghino created FLINK-3675:
--

 Summary: YARN ship folder incosistent behavior
 Key: FLINK-3675
 URL: https://issues.apache.org/jira/browse/FLINK-3675
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Affects Versions: 1.0.0
Reporter: Stefano Baghino


After [some discussion on the user mailing 
list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html]
 it came up that the {{flink/lib}} folder is always supposed to be shipped to 
the YARN cluster so that all the nodes have access to its contents.

Currently however, the Flink long-running YARN session actually ships the 
folder because it's explicitly specified in the {{yarn-session.sh}} script, 
while running a single job on YARN does not automatically ship it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3477) Add hash-based combine strategy for ReduceFunction

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215928#comment-15215928
 ] 

ASF GitHub Bot commented on FLINK-3477:
---

Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-202868272
  
Overall it seems that the hash-based combiner works better than the 
sort-based one for (a) uniform, or normal key distribution, and (b) 
fixed-length records.

For skewed key distribution (like Zipf) the two strategies are practically 
equal, and for variable-length record the extra effort in compacting the record 
offsets the advanges of the hash-based aggregation approach. 


> Add hash-based combine strategy for ReduceFunction
> --
>
> Key: FLINK-3477
> URL: https://issues.apache.org/jira/browse/FLINK-3477
> Project: Flink
>  Issue Type: Sub-task
>  Components: Local Runtime
>Reporter: Fabian Hueske
>
> This issue is about adding a hash-based combine strategy for ReduceFunctions.
> The interface of the {{reduce()}} method is as follows:
> {code}
> public T reduce(T v1, T v2)
> {code}
> Input type and output type are identical and the function returns only a 
> single value. A Reduce function is incrementally applied to compute a final 
> aggregated value. This allows to hold the preaggregated value in a hash-table 
> and update it with each function call. 
> The hash-based strategy requires special implementation of an in-memory hash 
> table. The hash table should support in place updates of elements (if the 
> updated value has the same size as the new value) but also appending updates 
> with invalidation of the old value (if the binary length of the new value 
> differs). The hash table needs to be able to evict and emit all elements if 
> it runs out-of-memory.
> We should also add {{HASH}} and {{SORT}} compiler hints to 
> {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the 
> execution strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3477] [runtime] Add hash-based combine ...

2016-03-29 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-202868272
  
Overall it seems that the hash-based combiner works better than the 
sort-based one for (a) uniform, or normal key distribution, and (b) 
fixed-length records.

For skewed key distribution (like Zipf) the two strategies are practically 
equal, and for variable-length record the extra effort in compacting the record 
offsets the advanges of the hash-based aggregation approach. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3667) Generalize client<->cluster communication

2016-03-29 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215927#comment-15215927
 ] 

Stephan Ewen commented on FLINK-3667:
-

Sounds good! Here are some thoughts I had when reading the proposal:

I am wondering, if we get too many clients in the end (JobClient, Client, 
ClusterClient), and whether we should make your proposed {{ClusterClient}} and 
the {{Client}} class one thing.

{code}
AbstractClient (common functionality, like the JobManager communication, 
submitting jobs once everything runs)
 |
+-- StandaloneClient (like current Client)
+-- YarnClient
+-- MesosClient
{code}

Similarly, one would have {{FlinkCluster}} (abstract superclass), and 
{{StandaloneCluster}}, {{YarnCluster}}, {{MesosCluster}} that would be created 
from the configs, and would have a {{getClient()}} call that returns the above 
clients.

> Generalize client<->cluster communication
> -
>
> Key: FLINK-3667
> URL: https://issues.apache.org/jira/browse/FLINK-3667
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>
> Here are some notes I took when inspecting the client<->cluster classes with 
> regard to future integration of other resource management frameworks in 
> addition to Yarn (e.g. Mesos).
> {noformat}
> 1 Cluster Client Abstraction
> 
> 1.1 Status Quo
> ──
> 1.1.1 FlinkYarnClient
> ╌
>   • Holds the cluster configuration (Flink-specific and Yarn-specific)
>   • Contains the deploy() method to deploy the cluster
>   • Creates the Hadoop Yarn client
>   • Receives the initial job manager address
>   • Bootstraps the FlinkYarnCluster
> 1.1.2 FlinkYarnCluster
> ╌╌
>   • Wrapper around the Hadoop Yarn client
>   • Queries cluster for status updates
>   • Life time methods to start and shutdown the cluster
>   • Flink specific features like shutdown after job completion
> 1.1.3 ApplicationClient
> ╌╌╌
>   • Acts as a middle-man for asynchronous cluster communication
>   • Designed to communicate with Yarn, not used in Standalone mode
> 1.1.4 CliFrontend
> ╌
>   • Deeply integrated with FlinkYarnClient and FlinkYarnCluster
>   • Constantly distinguishes between Yarn and Standalone mode
>   • Would be nice to have a general abstraction in place
> 1.1.5 Client
> 
>   • Job submission and Job related actions, agnostic of resource framework
> 1.2 Proposal
> 
> 1.2.1 ClusterConfig (before: AbstractFlinkYarnClient)
> ╌
>   • Extensible cluster-agnostic config
>   • May be extended by specific cluster, e.g. YarnClusterConfig
> 1.2.2 ClusterClient (before: AbstractFlinkYarnClient)
> ╌
>   • Deals with cluster (RM) specific communication
>   • Exposes framework agnostic information
>   • YarnClusterClient, MesosClusterClient, StandaloneClusterClient
> 1.2.3 FlinkCluster (before: AbstractFlinkYarnCluster)
> ╌
>   • Basic interface to communicate with a running cluster
>   • Receives the ClusterClient for cluster-specific communication
>   • Should not have to care about the specific implementations of the
> client
> 1.2.4 ApplicationClient
> ╌╌╌
>   • Can be changed to work cluster-agnostic (first steps already in
> FLINK-3543)
> 1.2.5 CliFrontend
> ╌
>   • CliFrontend does never have to differentiate between different
> cluster types after it has determined which cluster class to load.
>   • Base class handles framework agnostic command line arguments
>   • Pluggables for Yarn, Mesos handle specific commands
> {noformat}
> I would like to create/refactor the affected classes to set us up for a more 
> flexible client side resource management abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3477) Add hash-based combine strategy for ReduceFunction

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215916#comment-15215916
 ] 

ASF GitHub Bot commented on FLINK-3477:
---

Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-202864630
  
@fhueske We have used the Easter break to conduct the experiments. A 
preliminary writeup is in the Google Doc. @ggevay will provide the results 
analysis later today. Cheers!  


> Add hash-based combine strategy for ReduceFunction
> --
>
> Key: FLINK-3477
> URL: https://issues.apache.org/jira/browse/FLINK-3477
> Project: Flink
>  Issue Type: Sub-task
>  Components: Local Runtime
>Reporter: Fabian Hueske
>
> This issue is about adding a hash-based combine strategy for ReduceFunctions.
> The interface of the {{reduce()}} method is as follows:
> {code}
> public T reduce(T v1, T v2)
> {code}
> Input type and output type are identical and the function returns only a 
> single value. A Reduce function is incrementally applied to compute a final 
> aggregated value. This allows to hold the preaggregated value in a hash-table 
> and update it with each function call. 
> The hash-based strategy requires special implementation of an in-memory hash 
> table. The hash table should support in place updates of elements (if the 
> updated value has the same size as the new value) but also appending updates 
> with invalidation of the old value (if the binary length of the new value 
> differs). The hash table needs to be able to evict and emit all elements if 
> it runs out-of-memory.
> We should also add {{HASH}} and {{SORT}} compiler hints to 
> {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the 
> execution strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3477] [runtime] Add hash-based combine ...

2016-03-29 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-202864630
  
@fhueske We have used the Easter break to conduct the experiments. A 
preliminary writeup is in the Google Doc. @ggevay will provide the results 
analysis later today. Cheers!  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-3547) Add support for streaming projection, selection, and union

2016-03-29 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-3547.
--
   Resolution: Implemented
Fix Version/s: 1.1.0

> Add support for streaming projection, selection, and union
> --
>
> Key: FLINK-3547
> URL: https://issues.apache.org/jira/browse/FLINK-3547
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3547) Add support for streaming projection, selection, and union

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215848#comment-15215848
 ] 

ASF GitHub Bot commented on FLINK-3547:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1820


> Add support for streaming projection, selection, and union
> --
>
> Key: FLINK-3547
> URL: https://issues.apache.org/jira/browse/FLINK-3547
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3547] add support for streaming filter,...

2016-03-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1820


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-3545) ResourceManager: YARN integration

2016-03-29 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3545.
-
Resolution: Implemented

Implemented in 4405235e5483d3e4ad94f4ba31627aa852580042.

> ResourceManager: YARN integration
> -
>
> Key: FLINK-3545
> URL: https://issues.apache.org/jira/browse/FLINK-3545
> Project: Flink
>  Issue Type: Sub-task
>  Components: ResourceManager, YARN Client
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> This integrates YARN support with the ResourceManager abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3544) ResourceManager runtime components

2016-03-29 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3544.
-
Resolution: Implemented

Implemented in 92ff2b152cac3ad6a53373c0c022579306051133.

> ResourceManager runtime components
> --
>
> Key: FLINK-3544
> URL: https://issues.apache.org/jira/browse/FLINK-3544
> Project: Flink
>  Issue Type: Sub-task
>  Components: ResourceManager
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3544) ResourceManager runtime components

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215832#comment-15215832
 ] 

ASF GitHub Bot commented on FLINK-3544:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1741


> ResourceManager runtime components
> --
>
> Key: FLINK-3544
> URL: https://issues.apache.org/jira/browse/FLINK-3544
> Project: Flink
>  Issue Type: Sub-task
>  Components: ResourceManager
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3544] Introduce ResourceManager compone...

2016-03-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1741


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3674) Add an interface for EventTime aware User Function

2016-03-29 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-3674:
---

 Summary: Add an interface for EventTime aware User Function
 Key: FLINK-3674
 URL: https://issues.apache.org/jira/browse/FLINK-3674
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 1.0.0
Reporter: Stephan Ewen
 Fix For: 1.1.0


I suggest to add an interface that UDFs can implement, which will let them be 
notified upon watermark updates.

Example usage:
{code}
public interface EventTimeFunction {
void onWatermark(Watermark watermark);
}

public class MyMapper implements MapFunction, EventTimeFunction 
{

private long currentEventTime = Long.MIN_VALUE;

public String map(String value) {
return value + " @ " + currentEventTime;
}

public void onWatermark(Watermark watermark) {
currentEventTime = watermark.getTimestamp();
}
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3673) Annotations for code generation

2016-03-29 Thread Gabor Horvath (JIRA)
Gabor Horvath created FLINK-3673:


 Summary: Annotations for code generation
 Key: FLINK-3673
 URL: https://issues.apache.org/jira/browse/FLINK-3673
 Project: Flink
  Issue Type: Sub-task
  Components: Type Serialization System
Reporter: Gabor Horvath
Assignee: Gabor Horvath


Annotations should be utilized to generate more efficient serialization code.
The very same annotations can be used to make the getLength method much smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215789#comment-15215789
 ] 

ASF GitHub Bot commented on FLINK-3257:
---

Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/1668#issuecomment-202818056
  
Thanks @StephanEwen and @uce for looking into it! I really appreciate it. 
How about the following:

1. I update this PR with the patch that uses ListState and apply some nice 
refactorings Gyula made
2. I will also address all your comments and then merge this to master
3. We start working on perfecting stream finalization on loops and 
backpressure deadlock elimination in seperate PRs right away. These are 
different problems and we need to address them separately, in my view of course.


> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> ---
>
> Key: FLINK-3257
> URL: https://issues.apache.org/jira/browse/FLINK-3257
> Project: Flink
>  Issue Type: Improvement
>Reporter: Paris Carbone
>Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution 
> graph. An alternative scheme can potentially include records in-transit 
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the 
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as 
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start 
> block output and start upstream backup of all records forwarded from the 
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch 
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource 
> should finalize the snapshot, unblock its output and emit all records 
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected 
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but 
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3579) Improve String concatenation

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215791#comment-15215791
 ] 

ASF GitHub Bot commented on FLINK-3579:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57699432
  
--- Diff: 
flink-libraries/flink-table/src/test/java/org/apache/flink/api/java/table/test/StringExpressionsITCase.java
 ---
@@ -121,6 +120,66 @@ public void testNonWorkingSubstring2() throws 
Exception {
resultSet.collect();
}
 
+   @Test
+   public void testStringConcat() throws Exception {
+   ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+   TableEnvironment tableEnv = new TableEnvironment();
+
+   DataSet> ds = env.fromElements(
+   new Tuple2<>("ABCD", 3),
+   new Tuple2<>("ABCD", 2));
+
+   Table in = tableEnv.fromDataSet(ds, "a, b");
+
+   Table result = in
+   .select("a + b + 42");
--- End diff --

What happens if you do `"42 + a"` or even `"42 + b + a"`?


> Improve String concatenation
> 
>
> Key: FLINK-3579
> URL: https://issues.apache.org/jira/browse/FLINK-3579
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Timo Walther
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
>
> Concatenation of a String and non-String does not work properly.
> e.g. {{f0 + 42}} leads to RelBuilder Exception
> ExpressionParser does not like {{f0 + 42.cast(STRING)}} either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3579 Improve String concatenation (Ram)

2016-03-29 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57699432
  
--- Diff: 
flink-libraries/flink-table/src/test/java/org/apache/flink/api/java/table/test/StringExpressionsITCase.java
 ---
@@ -121,6 +120,66 @@ public void testNonWorkingSubstring2() throws 
Exception {
resultSet.collect();
}
 
+   @Test
+   public void testStringConcat() throws Exception {
+   ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+   TableEnvironment tableEnv = new TableEnvironment();
+
+   DataSet> ds = env.fromElements(
+   new Tuple2<>("ABCD", 3),
+   new Tuple2<>("ABCD", 2));
+
+   Table in = tableEnv.fromDataSet(ds, "a, b");
+
+   Table result = in
+   .select("a + b + 42");
--- End diff --

What happens if you do `"42 + a"` or even `"42 + b + a"`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3257] Add Exactly-Once Processing Guara...

2016-03-29 Thread senorcarbone
Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/1668#issuecomment-202818056
  
Thanks @StephanEwen and @uce for looking into it! I really appreciate it. 
How about the following:

1. I update this PR with the patch that uses ListState and apply some nice 
refactorings Gyula made
2. I will also address all your comments and then merge this to master
3. We start working on perfecting stream finalization on loops and 
backpressure deadlock elimination in seperate PRs right away. These are 
different problems and we need to address them separately, in my view of course.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3579) Improve String concatenation

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215786#comment-15215786
 ] 

ASF GitHub Bot commented on FLINK-3579:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57699238
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/RexNodeTranslator.scala
 ---
@@ -134,7 +136,13 @@ object RexNodeTranslator {
   case Plus(left, right) =>
 val l = toRexNode(left, relBuilder)
 val r = toRexNode(right, relBuilder)
-relBuilder.call(SqlStdOperatorTable.PLUS, l, r)
+if(SqlTypeName.STRING_TYPES.contains(l.getType.getSqlTypeName)) {
--- End diff --

What if `r` is a String type and `l` would have to be casted to `String`?


> Improve String concatenation
> 
>
> Key: FLINK-3579
> URL: https://issues.apache.org/jira/browse/FLINK-3579
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Timo Walther
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
>
> Concatenation of a String and non-String does not work properly.
> e.g. {{f0 + 42}} leads to RelBuilder Exception
> ExpressionParser does not like {{f0 + 42.cast(STRING)}} either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3579 Improve String concatenation (Ram)

2016-03-29 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57699238
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/RexNodeTranslator.scala
 ---
@@ -134,7 +136,13 @@ object RexNodeTranslator {
   case Plus(left, right) =>
 val l = toRexNode(left, relBuilder)
 val r = toRexNode(right, relBuilder)
-relBuilder.call(SqlStdOperatorTable.PLUS, l, r)
+if(SqlTypeName.STRING_TYPES.contains(l.getType.getSqlTypeName)) {
--- End diff --

What if `r` is a String type and `l` would have to be casted to `String`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3547) Add support for streaming projection, selection, and union

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215784#comment-15215784
 ] 

ASF GitHub Bot commented on FLINK-3547:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1820#issuecomment-202816038
  
merging this


> Add support for streaming projection, selection, and union
> --
>
> Key: FLINK-3547
> URL: https://issues.apache.org/jira/browse/FLINK-3547
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3547] add support for streaming filter,...

2016-03-29 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1820#issuecomment-202816038
  
merging this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3579) Improve String concatenation

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215781#comment-15215781
 ] 

ASF GitHub Bot commented on FLINK-3579:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57699015
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala
 ---
@@ -116,7 +118,18 @@ object ExpressionParser extends JavaTokenParsers with 
PackratParsers {
 atom <~ ".cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(STRING)" ^^ { e => Cast(e, 
BasicTypeInfo.STRING_TYPE_INFO) } |
-atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) }
+atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) 
} |
+// When an integer is directly casted.
+atom <~ "cast(BYTE)" ^^ { e => Cast(e, BasicTypeInfo.BYTE_TYPE_INFO) } 
|
+atom <~ "cast(SHORT)" ^^ { e => Cast(e, BasicTypeInfo.SHORT_TYPE_INFO) 
} |
+atom <~ "cast(INT)" ^^ { e => Cast(e, BasicTypeInfo.INT_TYPE_INFO) } |
+atom <~ "cast(LONG)" ^^ { e => Cast(e, BasicTypeInfo.LONG_TYPE_INFO) } 
|
+atom <~ "cast(FLOAT)" ^^ { e => Cast(e, BasicTypeInfo.FLOAT_TYPE_INFO) 
} |
+atom <~ "cast(DOUBLE)" ^^ { e => Cast(e, 
BasicTypeInfo.DOUBLE_TYPE_INFO) } |
+atom <~ "cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
+atom <~ "cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
--- End diff --

Is it valid SQL to have both `cast(BOOL)` and `cast(BOOLEAN)`?


> Improve String concatenation
> 
>
> Key: FLINK-3579
> URL: https://issues.apache.org/jira/browse/FLINK-3579
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Timo Walther
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
>
> Concatenation of a String and non-String does not work properly.
> e.g. {{f0 + 42}} leads to RelBuilder Exception
> ExpressionParser does not like {{f0 + 42.cast(STRING)}} either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3579) Improve String concatenation

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215780#comment-15215780
 ] 

ASF GitHub Bot commented on FLINK-3579:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57698940
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala
 ---
@@ -116,7 +118,18 @@ object ExpressionParser extends JavaTokenParsers with 
PackratParsers {
 atom <~ ".cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(STRING)" ^^ { e => Cast(e, 
BasicTypeInfo.STRING_TYPE_INFO) } |
-atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) }
+atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) 
} |
+// When an integer is directly casted.
+atom <~ "cast(BYTE)" ^^ { e => Cast(e, BasicTypeInfo.BYTE_TYPE_INFO) } 
|
+atom <~ "cast(SHORT)" ^^ { e => Cast(e, BasicTypeInfo.SHORT_TYPE_INFO) 
} |
+atom <~ "cast(INT)" ^^ { e => Cast(e, BasicTypeInfo.INT_TYPE_INFO) } |
+atom <~ "cast(LONG)" ^^ { e => Cast(e, BasicTypeInfo.LONG_TYPE_INFO) } 
|
+atom <~ "cast(FLOAT)" ^^ { e => Cast(e, BasicTypeInfo.FLOAT_TYPE_INFO) 
} |
+atom <~ "cast(DOUBLE)" ^^ { e => Cast(e, 
BasicTypeInfo.DOUBLE_TYPE_INFO) } |
+atom <~ "cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
+atom <~ "cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
+atom <~ "cast(STRING)" ^^ { e => Cast(e, 
BasicTypeInfo.STRING_TYPE_INFO) } |
+atom <~ "cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) }
--- End diff --

this rule was already added at the top of the list


> Improve String concatenation
> 
>
> Key: FLINK-3579
> URL: https://issues.apache.org/jira/browse/FLINK-3579
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Timo Walther
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
>
> Concatenation of a String and non-String does not work properly.
> e.g. {{f0 + 42}} leads to RelBuilder Exception
> ExpressionParser does not like {{f0 + 42.cast(STRING)}} either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3579 Improve String concatenation (Ram)

2016-03-29 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57699015
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala
 ---
@@ -116,7 +118,18 @@ object ExpressionParser extends JavaTokenParsers with 
PackratParsers {
 atom <~ ".cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(STRING)" ^^ { e => Cast(e, 
BasicTypeInfo.STRING_TYPE_INFO) } |
-atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) }
+atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) 
} |
+// When an integer is directly casted.
+atom <~ "cast(BYTE)" ^^ { e => Cast(e, BasicTypeInfo.BYTE_TYPE_INFO) } 
|
+atom <~ "cast(SHORT)" ^^ { e => Cast(e, BasicTypeInfo.SHORT_TYPE_INFO) 
} |
+atom <~ "cast(INT)" ^^ { e => Cast(e, BasicTypeInfo.INT_TYPE_INFO) } |
+atom <~ "cast(LONG)" ^^ { e => Cast(e, BasicTypeInfo.LONG_TYPE_INFO) } 
|
+atom <~ "cast(FLOAT)" ^^ { e => Cast(e, BasicTypeInfo.FLOAT_TYPE_INFO) 
} |
+atom <~ "cast(DOUBLE)" ^^ { e => Cast(e, 
BasicTypeInfo.DOUBLE_TYPE_INFO) } |
+atom <~ "cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
+atom <~ "cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
--- End diff --

Is it valid SQL to have both `cast(BOOL)` and `cast(BOOLEAN)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-3579 Improve String concatenation (Ram)

2016-03-29 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57698940
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala
 ---
@@ -116,7 +118,18 @@ object ExpressionParser extends JavaTokenParsers with 
PackratParsers {
 atom <~ ".cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
 atom <~ ".cast(STRING)" ^^ { e => Cast(e, 
BasicTypeInfo.STRING_TYPE_INFO) } |
-atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) }
+atom <~ ".cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) 
} |
+// When an integer is directly casted.
+atom <~ "cast(BYTE)" ^^ { e => Cast(e, BasicTypeInfo.BYTE_TYPE_INFO) } 
|
+atom <~ "cast(SHORT)" ^^ { e => Cast(e, BasicTypeInfo.SHORT_TYPE_INFO) 
} |
+atom <~ "cast(INT)" ^^ { e => Cast(e, BasicTypeInfo.INT_TYPE_INFO) } |
+atom <~ "cast(LONG)" ^^ { e => Cast(e, BasicTypeInfo.LONG_TYPE_INFO) } 
|
+atom <~ "cast(FLOAT)" ^^ { e => Cast(e, BasicTypeInfo.FLOAT_TYPE_INFO) 
} |
+atom <~ "cast(DOUBLE)" ^^ { e => Cast(e, 
BasicTypeInfo.DOUBLE_TYPE_INFO) } |
+atom <~ "cast(BOOL)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
+atom <~ "cast(BOOLEAN)" ^^ { e => Cast(e, 
BasicTypeInfo.BOOLEAN_TYPE_INFO) } |
+atom <~ "cast(STRING)" ^^ { e => Cast(e, 
BasicTypeInfo.STRING_TYPE_INFO) } |
+atom <~ "cast(DATE)" ^^ { e => Cast(e, BasicTypeInfo.DATE_TYPE_INFO) }
--- End diff --

this rule was already added at the top of the list


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3579) Improve String concatenation

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215779#comment-15215779
 ] 

ASF GitHub Bot commented on FLINK-3579:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57698703
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala
 ---
@@ -59,6 +59,8 @@ object ExpressionParser extends JavaTokenParsers with 
PackratParsers {
   Literal(str.toInt)
 } else if (str.endsWith("f") | str.endsWith("F")) {
   Literal(str.toFloat)
+} else if (str.endsWith(".")) {
--- End diff --

Is it valid SQL standard to have a format like `1.`? What happens to `.1`?


> Improve String concatenation
> 
>
> Key: FLINK-3579
> URL: https://issues.apache.org/jira/browse/FLINK-3579
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Timo Walther
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
>
> Concatenation of a String and non-String does not work properly.
> e.g. {{f0 + 42}} leads to RelBuilder Exception
> ExpressionParser does not like {{f0 + 42.cast(STRING)}} either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3579 Improve String concatenation (Ram)

2016-03-29 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1821#discussion_r57698703
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/ExpressionParser.scala
 ---
@@ -59,6 +59,8 @@ object ExpressionParser extends JavaTokenParsers with 
PackratParsers {
   Literal(str.toInt)
 } else if (str.endsWith("f") | str.endsWith("F")) {
   Literal(str.toFloat)
+} else if (str.endsWith(".")) {
--- End diff --

Is it valid SQL standard to have a format like `1.`? What happens to `.1`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3651) Fix faulty RollingSink Restore

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215721#comment-15215721
 ] 

ASF GitHub Bot commented on FLINK-3651:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1830#issuecomment-202794232
  
OK, depending on whether it is feasible to test this separately, I would go 
ahead and merge it as is or add a test and merge then. :+1:


> Fix faulty RollingSink Restore
> --
>
> Key: FLINK-3651
> URL: https://issues.apache.org/jira/browse/FLINK-3651
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The RollingSink restore logic has a bug where the sink for subtask index 1 
> also removes files for subtask index 11 because the regex that checks for the 
> file name also matches that one. Adding the suffix to the regex should solve 
> the problem because then the regex for 1 will only match files for subtask 
> index 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3651] Fix faulty RollingSink Restore

2016-03-29 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1830#issuecomment-202794232
  
OK, depending on whether it is feasible to test this separately, I would go 
ahead and merge it as is or add a test and merge then. :+1:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3670) Kerberos: Improving long-running streaming jobs

2016-03-29 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3670:
-

 Summary: Kerberos: Improving long-running streaming jobs
 Key: FLINK-3670
 URL: https://issues.apache.org/jira/browse/FLINK-3670
 Project: Flink
  Issue Type: Improvement
  Components: Command-line client, Local Runtime
Reporter: Maximilian Michels


We have seen in the past, that Hadoop's delegation tokens are subject to a 
number of subtle token renewal bugs. In addition, they have a maximum life time 
that can be worked around but is very inconvenient for the user.

As per [mailing list 
discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Kerberos-for-Streaming-amp-Kafka-td10906.html],
 a way to work around the maximum life time of DelegationTokens would be to 
pass the Kerberos principal and key tab upon job submission. A daemon could 
then periodically renew the ticket. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1159] Case style anonymous functions no...

2016-03-29 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1704#issuecomment-202778390
  
Sounds great @stefanobaghino. I think you can push your work to this PR as 
well since it is all related to the partial function support. Looking forward 
having partial function support :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1159) Case style anonymous functions not supported by Scala API

2016-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215685#comment-15215685
 ] 

ASF GitHub Bot commented on FLINK-1159:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1704#issuecomment-202778390
  
Sounds great @stefanobaghino. I think you can push your work to this PR as 
well since it is all related to the partial function support. Looking forward 
having partial function support :-)


> Case style anonymous functions not supported by Scala API
> -
>
> Key: FLINK-1159
> URL: https://issues.apache.org/jira/browse/FLINK-1159
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API
>Reporter: Till Rohrmann
>Assignee: Stefano Baghino
>
> In Scala it is very common to define anonymous functions of the following form
> {code}
> {
> case foo: Bar => foobar(foo)
> case _ => throw new RuntimeException()
> }
> {code}
> These case style anonymous functions are not supported yet by the Scala API. 
> Thus, one has to write redundant code to name the function parameter.
> What works is the following pattern, but it is not intuitive for someone 
> coming from Scala:
> {code}
> dataset.map{
>   _ match{
> case foo:Bar => ...
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-03-29 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215684#comment-15215684
 ] 

Till Rohrmann commented on FLINK-3650:
--

Sure you can work on this [~ram_krish]. It is actually just exposing the 
{{maxBy/minBy}} Java API calls in the Scala API.

> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)