[jira] [Commented] (FLINK-8811) Add MiniClusterClient to allow fast MiniCluster operations

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383313#comment-16383313
 ] 

ASF GitHub Bot commented on FLINK-8811:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5600


> Add MiniClusterClient to allow fast MiniCluster operations
> --
>
> Key: FLINK-8811
> URL: https://issues.apache.org/jira/browse/FLINK-8811
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: flip-6
> Fix For: 1.5.0, 1.6.0
>
>
> We should offer a {{ClusterClient}} implementation for the {{MiniCluster}}. 
> That way we would be able to submit and wait for result without polling how 
> it would be the case by using the {{RestClusterClient}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8808) Enable RestClusterClient to submit jobs to local Dispatchers

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383312#comment-16383312
 ] 

ASF GitHub Bot commented on FLINK-8808:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5599


> Enable RestClusterClient to submit jobs to local Dispatchers
> 
>
> Key: FLINK-8808
> URL: https://issues.apache.org/jira/browse/FLINK-8808
> Project: Flink
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0, 1.6.0
>
>
> The {{RestClusterClient}} should be able to submit a job to a {{Dispatcher}} 
> which runs in a local {{ActorSystem}} on the same host as the 
> {{RestClusterClient}}. This is the case for test cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5600: [FLINK-8811] [flip6] Add initial implementation of...

2018-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5600


---


[GitHub] flink pull request #5599: [FLINK-8808] [flip6] Allow RestClusterClient to co...

2018-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5599


---


[jira] [Commented] (FLINK-8794) When using BucketingSink, it happens that one of the files is always in the [.in-progress] state

2018-03-01 Thread Piotr Nowojski (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383295#comment-16383295
 ] 

Piotr Nowojski commented on FLINK-8794:
---

[~Backlight] 

3.What kind of downstream processor? Can not it just ignore the files with 
"*pending" or "*in-progress" sufixes and "_" prefix?

I also don't think this is a bug, but designed feature ( 
https://issues.apache.org/jira/browse/FLINK-5054 ) of the BucketingSink. On the 
other hand, we could change this behaviour/add an option for BucketingSink to 
use temporary "in-progress" and "pending" directories instead of prefixes.

Also [~kkl0u] could you elaborate why rescaling forced us to keep lingering 
files? 

 

> When using BucketingSink, it happens that one of the files is always in the 
> [.in-progress] state
> 
>
> Key: FLINK-8794
> URL: https://issues.apache.org/jira/browse/FLINK-8794
> Project: Flink
>  Issue Type: Bug
>  Components: filesystem-connector
>Affects Versions: 1.4.0, 1.4.1
>Reporter: yanxiaobin
>Priority: Major
>
> When using BucketingSink, it happens that one of the files is always in the 
> [.in-progress] state. And this state has never changed after that.  The 
> underlying use of S3 as storage.
>  
> {code:java}
> // code placeholder
> {code}
> 2018-02-28 11:58:42  147341619 {color:#d04437}_part-28-0.in-progress{color}
> 2018-02-28 12:06:27  147315059 part-0-0
> 2018-02-28 12:06:27  147462359 part-1-0
> 2018-02-28 12:06:27  147316006 part-10-0
> 2018-02-28 12:06:28  147349854 part-100-0
> 2018-02-28 12:06:27  147421625 part-101-0
> 2018-02-28 12:06:27  147443830 part-102-0
> 2018-02-28 12:06:27  147372801 part-103-0
> 2018-02-28 12:06:27  147343670 part-104-0
> ..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8829) Flink in EMR(YARN) is down due to Akka communication issue

2018-03-01 Thread Aleksandr Filichkin (JIRA)
Aleksandr Filichkin created FLINK-8829:
--

 Summary: Flink in EMR(YARN) is down due to Akka communication issue
 Key: FLINK-8829
 URL: https://issues.apache.org/jira/browse/FLINK-8829
 Project: Flink
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.3.2
Reporter: Aleksandr Filichkin


Hi,

We have running Flink 1.3.2 app in Amazon EMR. Every week our Flink job is down 
due to:

_2018-02-16 19:00:04,595 WARN akka.remote.ReliableDeliverySupervisor - 
Association with remote system 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]
 has failed, address is now gated for [5000] ms. Reason: [Association failed 
with 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]]
 Caused by: [Connection refused: 
ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com/10.97.34.209:42177] 
2018-02-16 19:00:05,593 WARN akka.remote.RemoteWatcher - Detected unreachable: 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]
 2018-02-16 19:00:05,596 INFO 
org.apache.flink.runtime.client.JobSubmissionClientActor - Lost connection to 
JobManager 
akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager].
 Triggering connection timeout._

Do you have any ideas how to troubleshoot it?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8829) Flink in EMR(YARN) is down due to Akka communication issue

2018-03-01 Thread Aleksandr Filichkin (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Filichkin updated FLINK-8829:
---
Description: 
Hi,

We have running Flink 1.3.2 app in Amazon EMR with YARN. Every week our Flink 
job is down due to:

_2018-02-16 19:00:04,595 WARN akka.remote.ReliableDeliverySupervisor - 
Association with remote system 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]
 has failed, address is now gated for [5000] ms. Reason: [Association failed 
with 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]]
 Caused by: [Connection refused: 
ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com/10.97.34.209:42177] 
2018-02-16 19:00:05,593 WARN akka.remote.RemoteWatcher - Detected unreachable: 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]
 2018-02-16 19:00:05,596 INFO 
org.apache.flink.runtime.client.JobSubmissionClientActor - Lost connection to 
JobManager 
akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager].
 Triggering connection timeout._

Do you have any ideas how to troubleshoot it?

 

  was:
Hi,

We have running Flink 1.3.2 app in Amazon EMR. Every week our Flink job is down 
due to:

_2018-02-16 19:00:04,595 WARN akka.remote.ReliableDeliverySupervisor - 
Association with remote system 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]
 has failed, address is now gated for [5000] ms. Reason: [Association failed 
with 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]]
 Caused by: [Connection refused: 
ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com/10.97.34.209:42177] 
2018-02-16 19:00:05,593 WARN akka.remote.RemoteWatcher - Detected unreachable: 
[akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]
 2018-02-16 19:00:05,596 INFO 
org.apache.flink.runtime.client.JobSubmissionClientActor - Lost connection to 
JobManager 
akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager].
 Triggering connection timeout._

Do you have any ideas how to troubleshoot it?

 


> Flink in EMR(YARN) is down due to Akka communication issue
> --
>
> Key: FLINK-8829
> URL: https://issues.apache.org/jira/browse/FLINK-8829
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.3.2
>Reporter: Aleksandr Filichkin
>Priority: Major
>
> Hi,
> We have running Flink 1.3.2 app in Amazon EMR with YARN. Every week our Flink 
> job is down due to:
> _2018-02-16 19:00:04,595 WARN akka.remote.ReliableDeliverySupervisor - 
> Association with remote system 
> [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]
>  has failed, address is now gated for [5000] ms. Reason: [Association failed 
> with 
> [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]]
>  Caused by: [Connection refused: 
> ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com/10.97.34.209:42177] 
> 2018-02-16 19:00:05,593 WARN akka.remote.RemoteWatcher - Detected 
> unreachable: 
> [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]
>  2018-02-16 19:00:05,596 INFO 
> org.apache.flink.runtime.client.JobSubmissionClientActor - Lost connection to 
> JobManager 
> akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager].
>  Triggering connection timeout._
> Do you have any ideas how to troubleshoot it?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8821) Fix non-terminating decimal error

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383293#comment-16383293
 ] 

ASF GitHub Bot commented on FLINK-8821:
---

Github user twalthr commented on the issue:

https://github.com/apache/flink/pull/5608
  
Yes, I agree that division is a special in this case. Thank you @Xpray. The 
changes look good. Will merge...


> Fix non-terminating decimal error
> -
>
> Key: FLINK-8821
> URL: https://issues.apache.org/jira/browse/FLINK-8821
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Reporter: Ruidong Li
>Assignee: Ruidong Li
>Priority: Major
>
> The DecimalAvgAggFunction lacks precision protection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5608: [FLINK-8821][TableAPI && SQL] Fix non-terminating decimal...

2018-03-01 Thread twalthr
Github user twalthr commented on the issue:

https://github.com/apache/flink/pull/5608
  
Yes, I agree that division is a special in this case. Thank you @Xpray. The 
changes look good. Will merge...


---


[jira] [Updated] (FLINK-8828) Add collect method to DataStream / DataSet scala api

2018-03-01 Thread Jelmer Kuperus (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jelmer Kuperus updated FLINK-8828:
--
Description: 
A collect function is a method that takes a Partial Function as its parameter 
and applies it to all the elements in the collection to create a new collection 
which satisfies the Partial Function.

It can be found on all [core scala collection 
classes|http://www.scala-lang.org/api/2.9.2/scala/collection/TraversableLike.html]
 as well as on spark's [rdd 
interface|https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.rdd.RDD]

To understand its utility imagine the following scenario :

Given a DataStream that produces events of type _Purchase_ and _View_ 
Transform this stream into a stream of purchase amounts over 1000 euros.

Currently an implementation might look like
{noformat}
val x = dataStream
  .filter(_.isInstanceOf[Purchase])
  .map(_.asInstanceOf[Purchase])
  .filter(_.amount > 1000)
  .map(_.amount){noformat}
Or alternatively you could do this
{noformat}
dataStream.flatMap(_ match {
  case p: Purchase if p.amount > 1000 => Some(p.amount)
  case _ => None
}){noformat}
But with collect implemented it could look like
{noformat}
dataStream.collect {
  case p: Purchase if p.amount > 1000 => p.amount
}{noformat}
 

Which is a lot nicer to both read and write

  was:
A collect function is a method that takes a Partial Function as its parameter 
and applies it to all the elements in the collection to create a new collection 
which satisfies the Partial Function.

It can be found on all [core scala collection 
classes|http://www.scala-lang.org/api/2.9.2/scala/collection/TraversableLike.html]
 as well as on spark's [rdd 
interface|https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.rdd.RDD]

To understand its utility imagine the following scenario :

You have a DataStream that produces events of type _Purchase_ and _View_ 
 You would like to transform this stream into a stream of purchase amounts over 
1000 euros.

Currently an implementation might look like
{noformat}
val x = dataStream
  .filter(_.isInstanceOf[Purchase])
  .map(_.asInstanceOf[Purchase])
  .filter(_.amount > 1000)
  .map(_.amount){noformat}
Or alternatively you could do this
{noformat}
dataStream.flatMap(_ match {
  case p: Purchase if p.amount > 1000 => Some(p.amount)
  case _ => None
}){noformat}
But with collect implemented it could look like
{noformat}
dataStream.collect {
  case p: Purchase if p.amount > 1000 => p.amount
}{noformat}
 

Which is a lot nicer to both read and write


> Add collect method to DataStream / DataSet scala api
> 
>
> Key: FLINK-8828
> URL: https://issues.apache.org/jira/browse/FLINK-8828
> Project: Flink
>  Issue Type: Improvement
>  Components: Core, DataSet API, DataStream API, Scala API
>Affects Versions: 1.4.0
>Reporter: Jelmer Kuperus
>Priority: Major
>
> A collect function is a method that takes a Partial Function as its parameter 
> and applies it to all the elements in the collection to create a new 
> collection which satisfies the Partial Function.
> It can be found on all [core scala collection 
> classes|http://www.scala-lang.org/api/2.9.2/scala/collection/TraversableLike.html]
>  as well as on spark's [rdd 
> interface|https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.rdd.RDD]
> To understand its utility imagine the following scenario :
> Given a DataStream that produces events of type _Purchase_ and _View_ 
> Transform this stream into a stream of purchase amounts over 1000 euros.
> Currently an implementation might look like
> {noformat}
> val x = dataStream
>   .filter(_.isInstanceOf[Purchase])
>   .map(_.asInstanceOf[Purchase])
>   .filter(_.amount > 1000)
>   .map(_.amount){noformat}
> Or alternatively you could do this
> {noformat}
> dataStream.flatMap(_ match {
>   case p: Purchase if p.amount > 1000 => Some(p.amount)
>   case _ => None
> }){noformat}
> But with collect implemented it could look like
> {noformat}
> dataStream.collect {
>   case p: Purchase if p.amount > 1000 => p.amount
> }{noformat}
>  
> Which is a lot nicer to both read and write



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8828) Add collect method to DataStream / DataSet scala api

2018-03-01 Thread Jelmer Kuperus (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jelmer Kuperus updated FLINK-8828:
--
Description: 
A collect function is a method that takes a Partial Function as its parameter 
and applies it to all the elements in the collection to create a new collection 
which satisfies the Partial Function.

It can be found on all [core scala collection 
classes|http://www.scala-lang.org/api/2.9.2/scala/collection/TraversableLike.html]
 as well as on spark's [rdd 
interface|https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.rdd.RDD]

To understand its utility imagine the following scenario :

You have a DataStream that produces events of type _Purchase_ and _View_ 
 You would like to transform this stream into a stream of purchase amounts over 
1000 euros.

Currently an implementation might look like
{noformat}
val x = dataStream
  .filter(_.isInstanceOf[Purchase])
  .map(_.asInstanceOf[Purchase])
  .filter(_.amount > 1000)
  .map(_.amount){noformat}
Or alternatively you could do this
{noformat}
dataStream.flatMap(_ match {
  case p: Purchase if p.amount > 1000 => Some(p.amount)
  case _ => None
}){noformat}
But with collect implemented it could look like
{noformat}
dataStream.collect {
  case p: Purchase if p.amount > 1000 => p.amount
}{noformat}
 

Which is a lot nicer to both read and write

  was:
A collect function is a method that takes a Partial Function as its parameter 
and applies it to all the elements in the collection to create a new collection 
which satisfies the Partial Function.

It can be found on all [core scala collection 
classes|http://www.scala-lang.org/api/2.9.2/scala/collection/TraversableLike.html]
 as well as on spark's [rdd 
interface|https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.rdd.RDD]

To understand its utility imagine the following scenario :

You have a DataStream that produces events of type _Purchase_ and View 
 You would like to transform this stream into a stream of purchase amounts over 
1000 euros.

Currently an implementation might look like
{noformat}
val x = dataStream
  .filter(_.isInstanceOf[Purchase])
  .map(_.asInstanceOf[Purchase])
  .filter(_.amount > 1000)
  .map(_.amount){noformat}
Or alternatively you could do this
{noformat}
dataStream.flatMap(_ match {
  case p: Purchase if p.amount > 1000 => Some(p.amount)
  case _ => None
}){noformat}
But with collect implemented it could look like
{noformat}
dataStream.collect {
  case p: Purchase if p.amount > 1000 => p.amount
}{noformat}
 

Which is a lot nicer to both read and write


> Add collect method to DataStream / DataSet scala api
> 
>
> Key: FLINK-8828
> URL: https://issues.apache.org/jira/browse/FLINK-8828
> Project: Flink
>  Issue Type: Improvement
>  Components: Core, DataSet API, DataStream API, Scala API
>Affects Versions: 1.4.0
>Reporter: Jelmer Kuperus
>Priority: Major
>
> A collect function is a method that takes a Partial Function as its parameter 
> and applies it to all the elements in the collection to create a new 
> collection which satisfies the Partial Function.
> It can be found on all [core scala collection 
> classes|http://www.scala-lang.org/api/2.9.2/scala/collection/TraversableLike.html]
>  as well as on spark's [rdd 
> interface|https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.rdd.RDD]
> To understand its utility imagine the following scenario :
> You have a DataStream that produces events of type _Purchase_ and _View_ 
>  You would like to transform this stream into a stream of purchase amounts 
> over 1000 euros.
> Currently an implementation might look like
> {noformat}
> val x = dataStream
>   .filter(_.isInstanceOf[Purchase])
>   .map(_.asInstanceOf[Purchase])
>   .filter(_.amount > 1000)
>   .map(_.amount){noformat}
> Or alternatively you could do this
> {noformat}
> dataStream.flatMap(_ match {
>   case p: Purchase if p.amount > 1000 => Some(p.amount)
>   case _ => None
> }){noformat}
> But with collect implemented it could look like
> {noformat}
> dataStream.collect {
>   case p: Purchase if p.amount > 1000 => p.amount
> }{noformat}
>  
> Which is a lot nicer to both read and write



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8191) Add a RoundRobinPartitioner to be shipped with the Kafka connector

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-8191:
---
Fix Version/s: (was: 1.5.0)
   1.6.0

> Add a RoundRobinPartitioner to be shipped with the Kafka connector
> --
>
> Key: FLINK-8191
> URL: https://issues.apache.org/jira/browse/FLINK-8191
> Project: Flink
>  Issue Type: New Feature
>  Components: Kafka Connector
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Aegeaner
>Priority: Blocker
> Fix For: 1.6.0
>
>
> We should perhaps consider adding a round-robin partitioner ready for use to 
> be shipped with the Kafka connector, along side the already available 
> {{FlinkFixedPartitioner}}.
> See the original discussion here:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FlinkKafkaProducerXX-td16951.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-8191) Add a RoundRobinPartitioner to be shipped with the Kafka connector

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383268#comment-16383268
 ] 

Tzu-Li (Gordon) Tai edited comment on FLINK-8191 at 3/2/18 6:53 AM:


Yes, moving to 1.6.0. This can be included as part of the major connector 
reworks in 1.6.


was (Author: tzulitai):
Yes, moving to 1.6.0.

> Add a RoundRobinPartitioner to be shipped with the Kafka connector
> --
>
> Key: FLINK-8191
> URL: https://issues.apache.org/jira/browse/FLINK-8191
> Project: Flink
>  Issue Type: New Feature
>  Components: Kafka Connector
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Aegeaner
>Priority: Blocker
> Fix For: 1.6.0
>
>
> We should perhaps consider adding a round-robin partitioner ready for use to 
> be shipped with the Kafka connector, along side the already available 
> {{FlinkFixedPartitioner}}.
> See the original discussion here:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FlinkKafkaProducerXX-td16951.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8191) Add a RoundRobinPartitioner to be shipped with the Kafka connector

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383268#comment-16383268
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-8191:


Yes, moving to 1.6.0.

> Add a RoundRobinPartitioner to be shipped with the Kafka connector
> --
>
> Key: FLINK-8191
> URL: https://issues.apache.org/jira/browse/FLINK-8191
> Project: Flink
>  Issue Type: New Feature
>  Components: Kafka Connector
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Aegeaner
>Priority: Blocker
> Fix For: 1.6.0
>
>
> We should perhaps consider adding a round-robin partitioner ready for use to 
> be shipped with the Kafka connector, along side the already available 
> {{FlinkFixedPartitioner}}.
> See the original discussion here:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FlinkKafkaProducerXX-td16951.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-7913) Add support for Kafka default partitioner

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-7913:
---
Fix Version/s: (was: 1.5.0)
   1.6.0

> Add support for Kafka default partitioner
> -
>
> Key: FLINK-7913
> URL: https://issues.apache.org/jira/browse/FLINK-7913
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.4.0
>Reporter: Konstantin Lalafaryan
>Assignee: Konstantin Lalafaryan
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Currently in the Apache Flink it is available only *FlinkKafkaPartitioner* 
> and just one implementation *FlinkFixedPartitioner*. 
> In order to be able to use Kafka's default partitioner you have to create new 
> implementation for *FlinkKafkaPartitioner* and fork the code from the Kafka. 
> It will be really good to be able to define the partitioner without 
> implementing the new class.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-6764:
---
Fix Version/s: (was: 1.5.0)
   1.6.0

> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383266#comment-16383266
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-6764:


Yes, moving to 1.6.0.

> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7913) Add support for Kafka default partitioner

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383267#comment-16383267
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-7913:


Yes, moving to 1.6.0.

> Add support for Kafka default partitioner
> -
>
> Key: FLINK-7913
> URL: https://issues.apache.org/jira/browse/FLINK-7913
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.4.0
>Reporter: Konstantin Lalafaryan
>Assignee: Konstantin Lalafaryan
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Currently in the Apache Flink it is available only *FlinkKafkaPartitioner* 
> and just one implementation *FlinkFixedPartitioner*. 
> In order to be able to use Kafka's default partitioner you have to create new 
> implementation for *FlinkKafkaPartitioner* and fork the code from the Kafka. 
> It will be really good to be able to define the partitioner without 
> implementing the new class.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-6944) Introduce DefaultTypeSerializerConfigSnapshot as a base implementation for serializer compatibility checks

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-6944:
---
Fix Version/s: 1.6.0

> Introduce DefaultTypeSerializerConfigSnapshot as a base implementation for 
> serializer compatibility checks
> --
>
> Key: FLINK-6944
> URL: https://issues.apache.org/jira/browse/FLINK-6944
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing, Type Serialization System
>Affects Versions: 1.3.0, 1.3.1
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0, 1.6.0
>
>
> Currently, we store both the {{TypeSerializer}} and its corresponding 
> {{TypeSerializerConfigSnapshot}} in checkpoints of managed state. This, in 
> most cases, are actually duplicate information.
> This JIRA proposes to change this by only storing the 
> {{TypeSerializerConfigSnapshot}}, while at the same time, letting 
> {{TypeSerializer.snapshotConfiguration}} return a default 
> {{DefaultTypeSerializerConfigSnapshot}}.
> This default simply serializes the serializer instance using Java 
> serialization.
> The {{DefaultTypeSerializerConfigSnapshot}} should wrap the serializer bytes, 
> the serialVersionUID of the serializer class, and the serializer class' 
> classname. The latter two will be used to check compatibility in the default 
> implementation of {{TypeSerializer.ensureCompatibility}}. Specifically, if 
> classname / serialVersionUID has changed, the default implementation of 
> {{TypeSerializer.ensureCompatibility}} will simply return 
> {{CompatibilityResult.requiresMigration}} with the deserialized serializer as 
> the convert deserializer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-6763) Inefficient PojoSerializerConfigSnapshot serialization format

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-6763:
---
Fix Version/s: (was: 1.5.0)
   1.6.0

> Inefficient PojoSerializerConfigSnapshot serialization format
> -
>
> Key: FLINK-6763
> URL: https://issues.apache.org/jira/browse/FLINK-6763
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing, Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.6.0
>
>
> The {{PojoSerializerConfigSnapshot}} stores for each serializer the beginning 
> offset and ending offset in the serialization stream. This information is 
> also written if the serializer serialization is supposed to be ignored. The 
> beginning and ending offsets are stored as a sequence of integers at the 
> beginning of the serialization stream. We store this information to skip 
> broken serializers.
> I think we don't need both offsets. Instead I would suggest to write the 
> length of the serialized serializer first into the serialization stream and 
> then the serialized serializer. This can be done in 
> {{TypeSerializerSerializationUtil.writeSerializer}}. When reading the 
> serializer via {{TypeSerializerSerializationUtil.tryReadSerializer}}, we can 
> try to deserialize the serializer. If this operation fails, then we can skip 
> the number of serialized serializer because we know how long it was.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6944) Introduce DefaultTypeSerializerConfigSnapshot as a base implementation for serializer compatibility checks

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383265#comment-16383265
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-6944:


Yes, moving to 1.6.0.

> Introduce DefaultTypeSerializerConfigSnapshot as a base implementation for 
> serializer compatibility checks
> --
>
> Key: FLINK-6944
> URL: https://issues.apache.org/jira/browse/FLINK-6944
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing, Type Serialization System
>Affects Versions: 1.3.0, 1.3.1
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, we store both the {{TypeSerializer}} and its corresponding 
> {{TypeSerializerConfigSnapshot}} in checkpoints of managed state. This, in 
> most cases, are actually duplicate information.
> This JIRA proposes to change this by only storing the 
> {{TypeSerializerConfigSnapshot}}, while at the same time, letting 
> {{TypeSerializer.snapshotConfiguration}} return a default 
> {{DefaultTypeSerializerConfigSnapshot}}.
> This default simply serializes the serializer instance using Java 
> serialization.
> The {{DefaultTypeSerializerConfigSnapshot}} should wrap the serializer bytes, 
> the serialVersionUID of the serializer class, and the serializer class' 
> classname. The latter two will be used to check compatibility in the default 
> implementation of {{TypeSerializer.ensureCompatibility}}. Specifically, if 
> classname / serialVersionUID has changed, the default implementation of 
> {{TypeSerializer.ensureCompatibility}} will simply return 
> {{CompatibilityResult.requiresMigration}} with the deserialized serializer as 
> the convert deserializer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6763) Inefficient PojoSerializerConfigSnapshot serialization format

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383264#comment-16383264
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-6763:


[~aljoscha] yes, moving to 1.6.0.

> Inefficient PojoSerializerConfigSnapshot serialization format
> -
>
> Key: FLINK-6763
> URL: https://issues.apache.org/jira/browse/FLINK-6763
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing, Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.6.0
>
>
> The {{PojoSerializerConfigSnapshot}} stores for each serializer the beginning 
> offset and ending offset in the serialization stream. This information is 
> also written if the serializer serialization is supposed to be ignored. The 
> beginning and ending offsets are stored as a sequence of integers at the 
> beginning of the serialization stream. We store this information to skip 
> broken serializers.
> I think we don't need both offsets. Instead I would suggest to write the 
> length of the serialized serializer first into the serialization stream and 
> then the serialized serializer. This can be done in 
> {{TypeSerializerSerializationUtil.writeSerializer}}. When reading the 
> serializer via {{TypeSerializerSerializationUtil.tryReadSerializer}}, we can 
> try to deserialize the serializer. If this operation fails, then we can skip 
> the number of serialized serializer because we know how long it was.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8828) Add collect method to DataStream / DataSet scala api

2018-03-01 Thread Jelmer Kuperus (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jelmer Kuperus updated FLINK-8828:
--
Description: 
A collect function is a method that takes a Partial Function as its parameter 
and applies it to all the elements in the collection to create a new collection 
which satisfies the Partial Function.

It can be found on all [core scala collection 
classes|http://www.scala-lang.org/api/2.9.2/scala/collection/TraversableLike.html]
 as well as on spark's [rdd 
interface|https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.rdd.RDD]

To understand its utility imagine the following scenario :

You have a DataStream that produces events of type _Purchase_ and View 
 You would like to transform this stream into a stream of purchase amounts over 
1000 euros.

Currently an implementation might look like
{noformat}
val x = dataStream
  .filter(_.isInstanceOf[Purchase])
  .map(_.asInstanceOf[Purchase])
  .filter(_.amount > 1000)
  .map(_.amount){noformat}
Or alternatively you could do this
{noformat}
dataStream.flatMap(_ match {
  case p: Purchase if p.amount > 1000 => Some(p.amount)
  case _ => None
}){noformat}
But with collect implemented it could look like
{noformat}
dataStream.collect {
  case p: Purchase if p.amount > 1000 => p.amount
}{noformat}
 

Which is a lot nicer to both read and write

  was:
A collect function is a method that takes a Partial Function as its parameter 
and applies it to all the elements in the collection to create a new collection 
which satisfies the Partial Function.

It can be found on all [core scala collection 
classes|http://www.scala-lang.org/api/2.9.2/scala/collection/TraversableLike.html]
 as well as on spark's [rdd 
interface|https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.rdd.RDD]

To understand its utility imagine the following scenario :(

You have a DataStream that produces events of type _Purchase_ and View 
You would like to transform this stream into a stream of purchase amounts over 
1000 euros. 

Currently an implementation might look like
{noformat}
val x = dataStream
  .filter(_.isInstanceOf[Purchase])
  .map(_.asInstanceOf[Purchase])
  .filter(_.amount > 1000)
  .map(_.amount){noformat}
Or alternatively you could do this
{noformat}
dataStream.flatMap(_ match {
  case p: Purchase if p.amount > 1000 => Some(p.amount)
  case _ => None
}){noformat}
But with collect implemented it could look like
{noformat}
dataStream.collect {
  case p: Purchase if p.amount > 1000 => p.amount
}{noformat}
 

Which is a lot nicer to both read and write


> Add collect method to DataStream / DataSet scala api
> 
>
> Key: FLINK-8828
> URL: https://issues.apache.org/jira/browse/FLINK-8828
> Project: Flink
>  Issue Type: Improvement
>  Components: Core, DataSet API, DataStream API, Scala API
>Affects Versions: 1.4.0
>Reporter: Jelmer Kuperus
>Priority: Major
>
> A collect function is a method that takes a Partial Function as its parameter 
> and applies it to all the elements in the collection to create a new 
> collection which satisfies the Partial Function.
> It can be found on all [core scala collection 
> classes|http://www.scala-lang.org/api/2.9.2/scala/collection/TraversableLike.html]
>  as well as on spark's [rdd 
> interface|https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.rdd.RDD]
> To understand its utility imagine the following scenario :
> You have a DataStream that produces events of type _Purchase_ and View 
>  You would like to transform this stream into a stream of purchase amounts 
> over 1000 euros.
> Currently an implementation might look like
> {noformat}
> val x = dataStream
>   .filter(_.isInstanceOf[Purchase])
>   .map(_.asInstanceOf[Purchase])
>   .filter(_.amount > 1000)
>   .map(_.amount){noformat}
> Or alternatively you could do this
> {noformat}
> dataStream.flatMap(_ match {
>   case p: Purchase if p.amount > 1000 => Some(p.amount)
>   case _ => None
> }){noformat}
> But with collect implemented it could look like
> {noformat}
> dataStream.collect {
>   case p: Purchase if p.amount > 1000 => p.amount
> }{noformat}
>  
> Which is a lot nicer to both read and write



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-6352) FlinkKafkaConsumer should support to use timestamp to set up start offset

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai resolved FLINK-6352.

   Resolution: Fixed
Fix Version/s: 1.6.0

> FlinkKafkaConsumer should support to use timestamp to set up start offset
> -
>
> Key: FLINK-6352
> URL: https://issues.apache.org/jira/browse/FLINK-6352
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Fang Yong
>Assignee: Fang Yong
>Priority: Blocker
> Fix For: 1.5.0, 1.6.0
>
>
> Currently "auto.offset.reset" is used to initialize the start offset of 
> FlinkKafkaConsumer, and the value should be earliest/latest/none. This method 
> can only let the job comsume the beginning or the most recent data, but can 
> not specify the specific offset of Kafka began to consume. 
> So, there should be a configuration item (such as 
> "flink.source.start.time" and the format is "-MM-dd HH:mm:ss") that 
> allows user to configure the initial offset of Kafka. The action of 
> "flink.source.start.time" is as follows:
> 1) job start from checkpoint / savepoint
>   a> offset of partition can be restored from checkpoint/savepoint,  
> "flink.source.start.time" will be ignored.
>   b> there's no checkpoint/savepoint for the partition (For example, this 
> partition is newly increased), the "flink.kafka.start.time" will be used to 
> initialize the offset of the partition
> 2) job has no checkpoint / savepoint, the "flink.source.start.time" is used 
> to initialize the offset of the kafka
>   a> the "flink.source.start.time" is valid, use it to set the offset of kafka
>   b> the "flink.source.start.time" is out-of-range, the same as it does 
> currently with no initial offset, get kafka's current offset and start reading



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6352) FlinkKafkaConsumer should support to use timestamp to set up start offset

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383258#comment-16383258
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-6352:


Merged.

1.6.0: f8ca273549aded00c7cd12699cebc1f5bba83153
1.5.0: 1fcd516a0c55f22d06b4ce3d9bc37fb9d03f0e33

> FlinkKafkaConsumer should support to use timestamp to set up start offset
> -
>
> Key: FLINK-6352
> URL: https://issues.apache.org/jira/browse/FLINK-6352
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Fang Yong
>Assignee: Fang Yong
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently "auto.offset.reset" is used to initialize the start offset of 
> FlinkKafkaConsumer, and the value should be earliest/latest/none. This method 
> can only let the job comsume the beginning or the most recent data, but can 
> not specify the specific offset of Kafka began to consume. 
> So, there should be a configuration item (such as 
> "flink.source.start.time" and the format is "-MM-dd HH:mm:ss") that 
> allows user to configure the initial offset of Kafka. The action of 
> "flink.source.start.time" is as follows:
> 1) job start from checkpoint / savepoint
>   a> offset of partition can be restored from checkpoint/savepoint,  
> "flink.source.start.time" will be ignored.
>   b> there's no checkpoint/savepoint for the partition (For example, this 
> partition is newly increased), the "flink.kafka.start.time" will be used to 
> initialize the offset of the partition
> 2) job has no checkpoint / savepoint, the "flink.source.start.time" is used 
> to initialize the offset of the kafka
>   a> the "flink.source.start.time" is valid, use it to set the offset of kafka
>   b> the "flink.source.start.time" is out-of-range, the same as it does 
> currently with no initial offset, get kafka's current offset and start reading



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6352) FlinkKafkaConsumer should support to use timestamp to set up start offset

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383255#comment-16383255
 ] 

ASF GitHub Bot commented on FLINK-6352:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5282


> FlinkKafkaConsumer should support to use timestamp to set up start offset
> -
>
> Key: FLINK-6352
> URL: https://issues.apache.org/jira/browse/FLINK-6352
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Fang Yong
>Assignee: Fang Yong
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently "auto.offset.reset" is used to initialize the start offset of 
> FlinkKafkaConsumer, and the value should be earliest/latest/none. This method 
> can only let the job comsume the beginning or the most recent data, but can 
> not specify the specific offset of Kafka began to consume. 
> So, there should be a configuration item (such as 
> "flink.source.start.time" and the format is "-MM-dd HH:mm:ss") that 
> allows user to configure the initial offset of Kafka. The action of 
> "flink.source.start.time" is as follows:
> 1) job start from checkpoint / savepoint
>   a> offset of partition can be restored from checkpoint/savepoint,  
> "flink.source.start.time" will be ignored.
>   b> there's no checkpoint/savepoint for the partition (For example, this 
> partition is newly increased), the "flink.kafka.start.time" will be used to 
> initialize the offset of the partition
> 2) job has no checkpoint / savepoint, the "flink.source.start.time" is used 
> to initialize the offset of the kafka
>   a> the "flink.source.start.time" is valid, use it to set the offset of kafka
>   b> the "flink.source.start.time" is out-of-range, the same as it does 
> currently with no initial offset, get kafka's current offset and start reading



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5282: [FLINK-6352] [kafka] Timestamp-based offset config...

2018-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5282


---


[jira] [Created] (FLINK-8828) Add collect method to DataStream / DataSet scala api

2018-03-01 Thread Jelmer Kuperus (JIRA)
Jelmer Kuperus created FLINK-8828:
-

 Summary: Add collect method to DataStream / DataSet scala api
 Key: FLINK-8828
 URL: https://issues.apache.org/jira/browse/FLINK-8828
 Project: Flink
  Issue Type: Improvement
  Components: Core, DataSet API, DataStream API, Scala API
Affects Versions: 1.4.0
Reporter: Jelmer Kuperus


A collect function is a method that takes a Partial Function as its parameter 
and applies it to all the elements in the collection to create a new collection 
which satisfies the Partial Function.

It can be found on all [core scala collection 
classes|http://www.scala-lang.org/api/2.9.2/scala/collection/TraversableLike.html]
 as well as on spark's [rdd 
interface|https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.rdd.RDD]

To understand its utility imagine the following scenario :(

You have a DataStream that produces events of type _Purchase_ and View 
You would like to transform this stream into a stream of purchase amounts over 
1000 euros. 

Currently an implementation might look like
{noformat}
val x = dataStream
  .filter(_.isInstanceOf[Purchase])
  .map(_.asInstanceOf[Purchase])
  .filter(_.amount > 1000)
  .map(_.amount){noformat}
Or alternatively you could do this
{noformat}
dataStream.flatMap(_ match {
  case p: Purchase if p.amount > 1000 => Some(p.amount)
  case _ => None
}){noformat}
But with collect implemented it could look like
{noformat}
dataStream.collect {
  case p: Purchase if p.amount > 1000 => p.amount
}{noformat}
 

Which is a lot nicer to both read and write



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8794) When using BucketingSink, it happens that one of the files is always in the [.in-progress] state

2018-03-01 Thread yanxiaobin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383209#comment-16383209
 ] 

yanxiaobin commented on FLINK-8794:
---

hi, [~aljoscha] Thank you for your reply!
 
There are the following points:

1.What I described above is that there will be such a situation when there is 
no failure in this job.

2.This happens when a job has a failure(because one of the taskmanager nodes 
downtime) and recovery. Fault tolerance of a node in distributed computing is 
necessary.Because this is a problem in this case.

3.When recovery, the previous in-progress and pending files are not 
cleared,this causes the downstream processor to read excess dirty data.

5.I think we should first place data in computing nodes' local files, then 
upload them to the distributed file system after the local file is written 
completely, for example, S3, HDFS.

We are blocked of the problem at the moment. and because of this problem, we 
can't use this job.

 

> When using BucketingSink, it happens that one of the files is always in the 
> [.in-progress] state
> 
>
> Key: FLINK-8794
> URL: https://issues.apache.org/jira/browse/FLINK-8794
> Project: Flink
>  Issue Type: Bug
>  Components: filesystem-connector
>Affects Versions: 1.4.0, 1.4.1
>Reporter: yanxiaobin
>Priority: Major
>
> When using BucketingSink, it happens that one of the files is always in the 
> [.in-progress] state. And this state has never changed after that.  The 
> underlying use of S3 as storage.
>  
> {code:java}
> // code placeholder
> {code}
> 2018-02-28 11:58:42  147341619 {color:#d04437}_part-28-0.in-progress{color}
> 2018-02-28 12:06:27  147315059 part-0-0
> 2018-02-28 12:06:27  147462359 part-1-0
> 2018-02-28 12:06:27  147316006 part-10-0
> 2018-02-28 12:06:28  147349854 part-100-0
> 2018-02-28 12:06:27  147421625 part-101-0
> 2018-02-28 12:06:27  147443830 part-102-0
> 2018-02-28 12:06:27  147372801 part-103-0
> 2018-02-28 12:06:27  147343670 part-104-0
> ..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6895) Add STR_TO_DATE supported in SQL

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383200#comment-16383200
 ] 

ASF GitHub Bot commented on FLINK-6895:
---

Github user buptljy commented on the issue:

https://github.com/apache/flink/pull/5615
  
Actually I am not sure if it is appropriate to return "string" because it 
should be the inverse of the "DATE_FORMAT()". However, If I return 
DATE/TIME/DATETIME as the jira issue described, the type of data user receives 
will be uncertain in one of DATE/TIME/DATETIME.
Do you have some good ideas ? I will optimize it.


> Add STR_TO_DATE supported in SQL
> 
>
> Key: FLINK-6895
> URL: https://issues.apache.org/jira/browse/FLINK-6895
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API  SQL
>Affects Versions: 1.4.0
>Reporter: sunjincheng
>Assignee: Aegeaner
>Priority: Major
>
> STR_TO_DATE(str,format) This is the inverse of the DATE_FORMAT() function. It 
> takes a string str and a format string format. STR_TO_DATE() returns a 
> DATETIME value if the format string contains both date and time parts, or a 
> DATE or TIME value if the string contains only date or time parts. If the 
> date, time, or datetime value extracted from str is illegal, STR_TO_DATE() 
> returns NULL and produces a warning.
> * Syntax:
> STR_TO_DATE(str,format) 
> * Arguments
> **str: -
> **format: -
> * Return Types
>   DATAETIME/DATE/TIME
> * Example:
>   STR_TO_DATE('01,5,2013','%d,%m,%Y') -> '2013-05-01'
>   SELECT STR_TO_DATE('a09:30:17','a%h:%i:%s') -> '09:30:17'
> * See more:
> ** [MySQL| 
> https://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html#function_str-to-date]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5615: [FLINK-6895][table]Add STR_TO_DATE supported in SQL

2018-03-01 Thread buptljy
Github user buptljy commented on the issue:

https://github.com/apache/flink/pull/5615
  
Actually I am not sure if it is appropriate to return "string" because it 
should be the inverse of the "DATE_FORMAT()". However, If I return 
DATE/TIME/DATETIME as the jira issue described, the type of data user receives 
will be uncertain in one of DATE/TIME/DATETIME.
Do you have some good ideas ? I will optimize it.


---


[jira] [Commented] (FLINK-6895) Add STR_TO_DATE supported in SQL

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383197#comment-16383197
 ] 

ASF GitHub Bot commented on FLINK-6895:
---

GitHub user buptljy opened a pull request:

https://github.com/apache/flink/pull/5615

[FLINK-6895][table]Add STR_TO_DATE supported in SQL

## What is the purpose of the change
Add STR_TO_DATE Function supported in SQL
## Brief change log
 * STR_TO_DATE(str string, format string)  
\-  str is the string that need to be transformed.
\- format is the pattern of "str"
 * Add tests in ScalarFunctionsTest.scala
 * Add docs in sql.md
## Verifying this change
 * Run unit tests in  ScalarFunctionsTest.scala
## Does this pull request potentially affect one of the following parts:
 * A new sql function
## Documentation
  * Add docs in sql.md

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/buptljy/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5615.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5615


commit 59752143ee438cb11969ae4bdda1fac5fc32813c
Author: Liao Jiayi 
Date:   2018-03-01T11:58:08Z

add str_to_date sql function

commit 63f71e4b3d6378f2114aa04ba4d1128f1ec3bc38
Author: Liao Jiayi 
Date:   2018-03-01T11:58:41Z

Merge branch 'master' of github.com:apache/flink




> Add STR_TO_DATE supported in SQL
> 
>
> Key: FLINK-6895
> URL: https://issues.apache.org/jira/browse/FLINK-6895
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API  SQL
>Affects Versions: 1.4.0
>Reporter: sunjincheng
>Assignee: Aegeaner
>Priority: Major
>
> STR_TO_DATE(str,format) This is the inverse of the DATE_FORMAT() function. It 
> takes a string str and a format string format. STR_TO_DATE() returns a 
> DATETIME value if the format string contains both date and time parts, or a 
> DATE or TIME value if the string contains only date or time parts. If the 
> date, time, or datetime value extracted from str is illegal, STR_TO_DATE() 
> returns NULL and produces a warning.
> * Syntax:
> STR_TO_DATE(str,format) 
> * Arguments
> **str: -
> **format: -
> * Return Types
>   DATAETIME/DATE/TIME
> * Example:
>   STR_TO_DATE('01,5,2013','%d,%m,%Y') -> '2013-05-01'
>   SELECT STR_TO_DATE('a09:30:17','a%h:%i:%s') -> '09:30:17'
> * See more:
> ** [MySQL| 
> https://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html#function_str-to-date]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5615: [FLINK-6895][table]Add STR_TO_DATE supported in SQ...

2018-03-01 Thread buptljy
GitHub user buptljy opened a pull request:

https://github.com/apache/flink/pull/5615

[FLINK-6895][table]Add STR_TO_DATE supported in SQL

## What is the purpose of the change
Add STR_TO_DATE Function supported in SQL
## Brief change log
 * STR_TO_DATE(str string, format string)  
\-  str is the string that need to be transformed.
\- format is the pattern of "str"
 * Add tests in ScalarFunctionsTest.scala
 * Add docs in sql.md
## Verifying this change
 * Run unit tests in  ScalarFunctionsTest.scala
## Does this pull request potentially affect one of the following parts:
 * A new sql function
## Documentation
  * Add docs in sql.md

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/buptljy/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5615.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5615


commit 59752143ee438cb11969ae4bdda1fac5fc32813c
Author: Liao Jiayi 
Date:   2018-03-01T11:58:08Z

add str_to_date sql function

commit 63f71e4b3d6378f2114aa04ba4d1128f1ec3bc38
Author: Liao Jiayi 
Date:   2018-03-01T11:58:41Z

Merge branch 'master' of github.com:apache/flink




---


[jira] [Commented] (FLINK-8560) add KeyedProcessFunction to expose the key in onTimer() and other methods

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383180#comment-16383180
 ] 

ASF GitHub Bot commented on FLINK-8560:
---

Github user bowenli86 commented on the issue:

https://github.com/apache/flink/pull/5481
  
@kl0u I added the comments for `@deprecated` in the javadoc. Let me know if 
you can merge the two related PRs. Thanks


> add KeyedProcessFunction to expose the key in onTimer() and other methods
> -
>
> Key: FLINK-8560
> URL: https://issues.apache.org/jira/browse/FLINK-8560
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 1.4.0
>Reporter: Jürgen Thomann
>Assignee: Bowen Li
>Priority: Major
> Fix For: 1.5.0
>
>
> Currently it is required to store the key of a keyBy() in the processElement 
> method to have access to it in the OnTimerContext.
> This is not so good as you have to check in the processElement method for 
> every element if the key is already stored and set it if it's not already set.
> A possible solution would adding OnTimerContext#getCurrentKey() or a similar 
> method. Maybe having it in the open() method could maybe work as well.
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Getting-Key-from-keyBy-in-ProcessFunction-tt18126.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5481: [FLINK-8560] Access to the current key in ProcessFunction...

2018-03-01 Thread bowenli86
Github user bowenli86 commented on the issue:

https://github.com/apache/flink/pull/5481
  
@kl0u I added the comments for `@deprecated` in the javadoc. Let me know if 
you can merge the two related PRs. Thanks


---


[jira] [Commented] (FLINK-8821) Fix non-terminating decimal error

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383147#comment-16383147
 ] 

ASF GitHub Bot commented on FLINK-8821:
---

Github user Xpray commented on the issue:

https://github.com/apache/flink/pull/5608
  
@twalthr I've updated the PR, I think the new configuration in 
`TableConfig` shall only be responsible for decimal division, other decimal 
operations should have `UNLIMITED` context.


> Fix non-terminating decimal error
> -
>
> Key: FLINK-8821
> URL: https://issues.apache.org/jira/browse/FLINK-8821
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Reporter: Ruidong Li
>Assignee: Ruidong Li
>Priority: Major
>
> The DecimalAvgAggFunction lacks precision protection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5608: [FLINK-8821][TableAPI && SQL] Fix non-terminating decimal...

2018-03-01 Thread Xpray
Github user Xpray commented on the issue:

https://github.com/apache/flink/pull/5608
  
@twalthr I've updated the PR, I think the new configuration in 
`TableConfig` shall only be responsible for decimal division, other decimal 
operations should have `UNLIMITED` context.


---


[jira] [Updated] (FLINK-8821) Fix non-terminating decimal error

2018-03-01 Thread Ruidong Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruidong Li updated FLINK-8821:
--
Summary: Fix non-terminating decimal error  (was: Fix BigDecimal divide in 
AvgAggFunction)

> Fix non-terminating decimal error
> -
>
> Key: FLINK-8821
> URL: https://issues.apache.org/jira/browse/FLINK-8821
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Reporter: Ruidong Li
>Assignee: Ruidong Li
>Priority: Major
>
> The DecimalAvgAggFunction lacks precision protection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8822) RotateLogFile may not work well when sed version is below 4.2

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383090#comment-16383090
 ] 

ASF GitHub Bot commented on FLINK-8822:
---

Github user BigOneLiu commented on the issue:

https://github.com/apache/flink/pull/5609
  
@zentol thx for comment.
I checked my system SUSE 11 SP4(which is a wildly used version),sed version 
is 4.1.5,and it do not support '-E'. I don't konw how many linux version use 
sed 4.1.5.

According to the source code,'-E' is exactly the same as '-r' and '-E' is 
Undocumented.
So maybe '-r' is more compatible.
If don't need to be changed,at least note in the doc that "log may not work 
well in SUSE 11".



> RotateLogFile may not work well when sed version is below 4.2
> -
>
> Key: FLINK-8822
> URL: https://issues.apache.org/jira/browse/FLINK-8822
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Xin Liu
>Priority: Major
> Fix For: 1.5.0
>
>
> In bin/config.sh rotateLogFilesWithPrefix(),it use extended regular to 
> process filename with "sed -E",but when sed version is below 4.2,it turns out 
> "sed: invalid option -- 'E'"
> and RotateLogFile won't work well : There will be only one logfile no matter 
> what is $MAX_LOG_FILE_NUMBER.
> so use sed -r may be more suitable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5609: [FLINK-8822] RotateLogFile may not work well when sed ver...

2018-03-01 Thread BigOneLiu
Github user BigOneLiu commented on the issue:

https://github.com/apache/flink/pull/5609
  
@zentol thx for comment.
I checked my system SUSE 11 SP4(which is a wildly used version),sed version 
is 4.1.5,and it do not support '-E'. I don't konw how many linux version use 
sed 4.1.5.

According to the source code,'-E' is exactly the same as '-r' and '-E' is 
Undocumented.
So maybe '-r' is more compatible.
If don't need to be changed,at least note in the doc that "log may not work 
well in SUSE 11".



---


[jira] [Commented] (FLINK-8790) Improve performance for recovery from incremental checkpoint

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383075#comment-16383075
 ] 

ASF GitHub Bot commented on FLINK-8790:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/5582
  
Thanks, looking forward.


> Improve performance for recovery from incremental checkpoint
> 
>
> Key: FLINK-8790
> URL: https://issues.apache.org/jira/browse/FLINK-8790
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Major
> Fix For: 1.5.0
>
>
> When there are multi state handle to be restored, we can improve the 
> performance as follow:
> 1. Choose the best state handle to init the target db
> 2. Use the other state handles to create temp db, and clip the db according 
> to the target key group range (via rocksdb.deleteRange()), this can help use 
> get rid of the `key group check` in 
>  `data insertion loop` and also help us get rid of traversing the useless 
> record.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5582: [FLINK-8790][State] Improve performance for recovery from...

2018-03-01 Thread sihuazhou
Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/5582
  
Thanks, looking forward.


---


[jira] [Commented] (FLINK-8827) When FLINK_CONF_DIR contains spaces, execute zookeeper related scripts failed

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383072#comment-16383072
 ] 

ASF GitHub Bot commented on FLINK-8827:
---

GitHub user davidxdh opened a pull request:

https://github.com/apache/flink/pull/5614

[FLINK-8827] When FLINK_CONF_DIR contains spaces, execute zookeeper r…

…elated scripts failed

When the path of FLINK_CONF_DIR including spaces, executing zookeeper 
related scripts failed with the following error message: Expect binary 
expression.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidxdh/flink develop0302

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5614.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5614


commit 79852f41c25bd891d1fa58952aca983e8a8afeac
Author: davidxdh 
Date:   2018-03-02T02:55:23Z

[FLINK-8827] When FLINK_CONF_DIR contains spaces, execute zookeeper related 
scripts failed




> When FLINK_CONF_DIR contains spaces, execute zookeeper related scripts failed
> -
>
> Key: FLINK-8827
> URL: https://issues.apache.org/jira/browse/FLINK-8827
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.4.0
> Environment: Red Hat Enterprise Linux Server release 6.5 (Santiago)
>Reporter: Donghui Xu
>Priority: Major
>
> When the path of FLINK_CONF_DIR including spaces, executing zookeeper related 
> scripts failed with the following error message: Expect binary expression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5614: [FLINK-8827] When FLINK_CONF_DIR contains spaces, ...

2018-03-01 Thread davidxdh
GitHub user davidxdh opened a pull request:

https://github.com/apache/flink/pull/5614

[FLINK-8827] When FLINK_CONF_DIR contains spaces, execute zookeeper r…

…elated scripts failed

When the path of FLINK_CONF_DIR including spaces, executing zookeeper 
related scripts failed with the following error message: Expect binary 
expression.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidxdh/flink develop0302

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5614.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5614


commit 79852f41c25bd891d1fa58952aca983e8a8afeac
Author: davidxdh 
Date:   2018-03-02T02:55:23Z

[FLINK-8827] When FLINK_CONF_DIR contains spaces, execute zookeeper related 
scripts failed




---


[jira] [Commented] (FLINK-8811) Add MiniClusterClient to allow fast MiniCluster operations

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383071#comment-16383071
 ] 

ASF GitHub Bot commented on FLINK-8811:
---

Github user zhangminglei commented on a diff in the pull request:

https://github.com/apache/flink/pull/5600#discussion_r171754724
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/minicluster/MiniCluster.java
 ---
@@ -178,6 +181,13 @@ public URI getRestAddress() {
}
}
 
+   public HighAvailabilityServices getHighAvailabilityServices() {
+   synchronized (lock) {
+   checkState(running, "MiniCluster is not yet running.");
--- End diff --

I am curious about the lock in this place. Is not such a scene as follow ?

Some tests or ITCase running in maven test simultaneously. And might be 
change this object at the same time. So, that is why we should have a lock here 
?


> Add MiniClusterClient to allow fast MiniCluster operations
> --
>
> Key: FLINK-8811
> URL: https://issues.apache.org/jira/browse/FLINK-8811
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: flip-6
> Fix For: 1.5.0, 1.6.0
>
>
> We should offer a {{ClusterClient}} implementation for the {{MiniCluster}}. 
> That way we would be able to submit and wait for result without polling how 
> it would be the case by using the {{RestClusterClient}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5600: [FLINK-8811] [flip6] Add initial implementation of...

2018-03-01 Thread zhangminglei
Github user zhangminglei commented on a diff in the pull request:

https://github.com/apache/flink/pull/5600#discussion_r171754724
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/minicluster/MiniCluster.java
 ---
@@ -178,6 +181,13 @@ public URI getRestAddress() {
}
}
 
+   public HighAvailabilityServices getHighAvailabilityServices() {
+   synchronized (lock) {
+   checkState(running, "MiniCluster is not yet running.");
--- End diff --

I am curious about the lock in this place. Is not such a scene as follow ?

Some tests or ITCase running in maven test simultaneously. And might be 
change this object at the same time. So, that is why we should have a lock here 
?


---


[jira] [Created] (FLINK-8827) When FLINK_CONF_DIR contains spaces, execute zookeeper related scripts failed

2018-03-01 Thread Donghui Xu (JIRA)
Donghui Xu created FLINK-8827:
-

 Summary: When FLINK_CONF_DIR contains spaces, execute zookeeper 
related scripts failed
 Key: FLINK-8827
 URL: https://issues.apache.org/jira/browse/FLINK-8827
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.4.0
 Environment: Red Hat Enterprise Linux Server release 6.5 (Santiago)
Reporter: Donghui Xu


When the path of FLINK_CONF_DIR including spaces, executing zookeeper related 
scripts failed with the following error message: Expect binary expression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-8824) In Kafka Consumers, replace 'getCanonicalName()' with 'getClassName()'

2018-03-01 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang reassigned FLINK-8824:
---

Assignee: mingleizhang

> In Kafka Consumers, replace 'getCanonicalName()' with 'getClassName()'
> --
>
> Key: FLINK-8824
> URL: https://issues.apache.org/jira/browse/FLINK-8824
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Reporter: Stephan Ewen
>Assignee: mingleizhang
>Priority: Major
> Fix For: 1.5.0
>
>
> The connector uses {{getCanonicalName()}} in all places, gather than 
> {{getClassName()}}.
> {{getCanonicalName()}}'s intention is to normalize class names for arrays, 
> etc, but is problematic when instantiating classes from class names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8824) In Kafka Consumers, replace 'getCanonicalName()' with 'getClassName()'

2018-03-01 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang updated FLINK-8824:

Description: 
The connector uses {{getCanonicalName()}} in all places, gather than 
{{getClassName()}}.

{{getCanonicalName()}}'s intention is to normalize class names for arrays, etc, 
but is problematic when instantiating classes from class names.



  was:
The connector uses {{getCanonicalClassName()}} in all places, gather than 
{{getClassName()}}.

{{getCanonicalClassName()}}'s intention is to normalize class names for arrays, 
etc, but is problematic when instantiating classes from class names.




> In Kafka Consumers, replace 'getCanonicalName()' with 'getClassName()'
> --
>
> Key: FLINK-8824
> URL: https://issues.apache.org/jira/browse/FLINK-8824
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Reporter: Stephan Ewen
>Priority: Major
> Fix For: 1.5.0
>
>
> The connector uses {{getCanonicalName()}} in all places, gather than 
> {{getClassName()}}.
> {{getCanonicalName()}}'s intention is to normalize class names for arrays, 
> etc, but is problematic when instantiating classes from class names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8824) In Kafka Consumers, replace 'getCanonicalName()' with 'getClassName()'

2018-03-01 Thread mingleizhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingleizhang updated FLINK-8824:

Summary: In Kafka Consumers, replace 'getCanonicalName()' with 
'getClassName()'  (was: In Kafka Consumers, replace 'getCanonicalClassName()' 
with 'getClassName()')

> In Kafka Consumers, replace 'getCanonicalName()' with 'getClassName()'
> --
>
> Key: FLINK-8824
> URL: https://issues.apache.org/jira/browse/FLINK-8824
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Reporter: Stephan Ewen
>Priority: Major
> Fix For: 1.5.0
>
>
> The connector uses {{getCanonicalClassName()}} in all places, gather than 
> {{getClassName()}}.
> {{getCanonicalClassName()}}'s intention is to normalize class names for 
> arrays, etc, but is problematic when instantiating classes from class names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-7488) TaskManagerHeapSizeCalculationJavaBashTest sometimes fails

2018-03-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-7488:
--
Description: 
{code}
compareNetworkBufShellScriptWithJava(org.apache.flink.dist.TaskManagerHeapSizeCalculationJavaBashTest)
  Time elapsed: 0.239 sec  <<< FAILURE!
org.junit.ComparisonFailure: Different network buffer memory sizes with 
configuration: {taskmanager.network.memory.fraction=0.1, 
taskmanager.memory.off-heap=false, taskmanager.memory.fraction=0.7, 
taskmanager.memory.size=-1, taskmanager.network.memory.max=1073741824, 
taskmanager.heap.mb=1000, taskmanager.network.memory.min=67108864} 
expected:<[]104857600> but was:<[Setting HADOOP_CONF_DIR=/etc/hadoop/conf 
because no HADOOP_CONF_DIR was set.Using the result of 'hadoop classpath' to 
augment the Hadoop classpath: 
/usr/hdp/2.5.0.0-1245/hadoop/conf:/usr/hdp/2.5.0.0-1245/hadoop/lib/*:/usr/hdp/2.5.0.0-1245/hadoop/.//*:/usr/hdp/2.5.0.0-1245/hadoop-hdfs/./:/usr/hdp/2.5.0.0-1245/hadoop-hdfs/lib/*:/usr/hdp/2.5.0.0-1245/hadoop-hdfs/.//*:/usr/hdp/2.5.0.0-1245/hadoop-yarn/lib/*:/usr/hdp/2.5.0.0-1245/hadoop-yarn/.//*:/usr/hdp/2.5.0.0-1245/hadoop-mapreduce/lib/*:/usr/hdp/2.5.0.0-1245/hadoop-mapreduce/.//*:/usr/hdp/2.5.0.0-1245/tez/*:/usr/hdp/2.5.0.0-1245/tez/lib/*:/usr/hdp/2.5.0.0-1245/tez/conf]104857600>
  at org.junit.Assert.assertEquals(Assert.java:115)
  at 
org.apache.flink.dist.TaskManagerHeapSizeCalculationJavaBashTest.compareNetworkBufJavaVsScript(TaskManagerHeapSizeCalculationJavaBashTest.java:235)
  at 
org.apache.flink.dist.TaskManagerHeapSizeCalculationJavaBashTest.compareNetworkBufShellScriptWithJava(TaskManagerHeapSizeCalculationJavaBashTest.java:81)

compareHeapSizeShellScriptWithJava(org.apache.flink.dist.TaskManagerHeapSizeCalculationJavaBashTest)
  Time elapsed: 0.16 sec  <<< FAILURE!
org.junit.ComparisonFailure: Different heap sizes with configuration: 
{taskmanager.network.memory.fraction=0.1, taskmanager.memory.off-heap=false, 
taskmanager.memory.fraction=0.7, taskmanager.memory.size=-1, 
taskmanager.network.memory.max=1073741824, taskmanager.heap.mb=1000, 
taskmanager.network.memory.min=67108864} expected:<[]1000> but was:<[Setting 
HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.Using the 
result of 'hadoop classpath' to augment the Hadoop classpath: 
/usr/hdp/2.5.0.0-1245/hadoop/conf:/usr/hdp/2.5.0.0-1245/hadoop/lib/*:/usr/hdp/2.5.0.0-1245/hadoop/.//*:/usr/hdp/2.5.0.0-1245/hadoop-hdfs/./:/usr/hdp/2.5.0.0-1245/hadoop-hdfs/lib/*:/usr/hdp/2.5.0.0-1245/hadoop-hdfs/.//*:/usr/hdp/2.5.0.0-1245/hadoop-yarn/lib/*:/usr/hdp/2.5.0.0-1245/hadoop-yarn/.//*:/usr/hdp/2.5.0.0-1245/hadoop-mapreduce/lib/*:/usr/hdp/2.5.0.0-1245/hadoop-mapreduce/.//*:/usr/hdp/2.5.0.0-1245/tez/*:/usr/hdp/2.5.0.0-1245/tez/lib/*:/usr/hdp/2.5.0.0-1245/tez/conf]1000>
  at org.junit.Assert.assertEquals(Assert.java:115)
  at 
org.apache.flink.dist.TaskManagerHeapSizeCalculationJavaBashTest.compareHeapSizeJavaVsScript(TaskManagerHeapSizeCalculationJavaBashTest.java:275)
  at 
org.apache.flink.dist.TaskManagerHeapSizeCalculationJavaBashTest.compareHeapSizeShellScriptWithJava(TaskManagerHeapSizeCalculationJavaBashTest.java:110)
{code}

$HADOOP_CONF_DIR was not set prior to running the test.

  was:
{code}
compareNetworkBufShellScriptWithJava(org.apache.flink.dist.TaskManagerHeapSizeCalculationJavaBashTest)
  Time elapsed: 0.239 sec  <<< FAILURE!
org.junit.ComparisonFailure: Different network buffer memory sizes with 
configuration: {taskmanager.network.memory.fraction=0.1, 
taskmanager.memory.off-heap=false, taskmanager.memory.fraction=0.7, 
taskmanager.memory.size=-1, taskmanager.network.memory.max=1073741824, 
taskmanager.heap.mb=1000, taskmanager.network.memory.min=67108864} 
expected:<[]104857600> but was:<[Setting HADOOP_CONF_DIR=/etc/hadoop/conf 
because no HADOOP_CONF_DIR was set.Using the result of 'hadoop classpath' to 
augment the Hadoop classpath: 
/usr/hdp/2.5.0.0-1245/hadoop/conf:/usr/hdp/2.5.0.0-1245/hadoop/lib/*:/usr/hdp/2.5.0.0-1245/hadoop/.//*:/usr/hdp/2.5.0.0-1245/hadoop-hdfs/./:/usr/hdp/2.5.0.0-1245/hadoop-hdfs/lib/*:/usr/hdp/2.5.0.0-1245/hadoop-hdfs/.//*:/usr/hdp/2.5.0.0-1245/hadoop-yarn/lib/*:/usr/hdp/2.5.0.0-1245/hadoop-yarn/.//*:/usr/hdp/2.5.0.0-1245/hadoop-mapreduce/lib/*:/usr/hdp/2.5.0.0-1245/hadoop-mapreduce/.//*:/usr/hdp/2.5.0.0-1245/tez/*:/usr/hdp/2.5.0.0-1245/tez/lib/*:/usr/hdp/2.5.0.0-1245/tez/conf]104857600>
  at org.junit.Assert.assertEquals(Assert.java:115)
  at 
org.apache.flink.dist.TaskManagerHeapSizeCalculationJavaBashTest.compareNetworkBufJavaVsScript(TaskManagerHeapSizeCalculationJavaBashTest.java:235)
  at 
org.apache.flink.dist.TaskManagerHeapSizeCalculationJavaBashTest.compareNetworkBufShellScriptWithJava(TaskManagerHeapSizeCalculationJavaBashTest.java:81)

compareHeapSizeShellScriptWithJava(org.apache.flink.dist.TaskManagerHeapSizeCalculationJavaBashTest)
  Time elapsed: 0.16 sec  <<< FAILURE!
org.junit.ComparisonFailure: Different 

[jira] [Assigned] (FLINK-8826) In Flip6 mode, when starting yarn cluster, configured taskmanager.heap.mb is ignored

2018-03-01 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-8826:


Assignee: Till Rohrmann

> In Flip6 mode, when starting yarn cluster, configured taskmanager.heap.mb is 
> ignored
> 
>
> Key: FLINK-8826
> URL: https://issues.apache.org/jira/browse/FLINK-8826
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager, YARN
>Affects Versions: 1.5.0
>Reporter: Piotr Nowojski
>Assignee: Till Rohrmann
>Priority: Blocker
>
> When I tried running some job on the cluster, despite setting 
> taskmanager.heap.mb = 3072
> taskmanager.network.memory.fraction: 0.4
> and reported in the console
> {code:java}
> Cluster specification: ClusterSpecification{masterMemoryMB=768, 
> taskManagerMemoryMB=3072, numberTaskManagers=92, slotsPerTaskManager=1}{code}
> The actual settings were:
> {noformat}
>  
> 2018-03-01 14:53:18,918 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               - 
> 
> 2018-03-01 14:53:18,921 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -  Starting YARN TaskExecutor runner (Version: 1.5-SNAPSHOT, 
> Rev:e92eb39, Date:28.02.2018 @ 17:43:39 UTC)
> 2018-03-01 14:53:18,921 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -  OS current user: yarn
> 2018-03-01 14:53:19,780 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -  Current Hadoop/Kerberos user: hadoop
> 2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 
> 1.8/25.161-b14
> 2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -  Maximum heap size: 245 MiBytes
> 2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -  JAVA_HOME: /usr/lib/jvm/java-openjdk
> 2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -  Hadoop version: 2.4.1
> 2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -  JVM Options:
> 2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -     -Xms255m
> 2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -     -Xmx255m
> 2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -     -XX:MaxDirectMemorySize=769m
> 2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -     
> -Dlog.file=/var/log/hadoop-yarn/containers/application_1516373731080_1150/container_1516373731080_1150_01_000105/taskmanager.log
> 2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -     -Dlogback.configurationFile=file:./logback.xml
> 2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -     -Dlog4j.configuration=file:./log4j.properties
> 2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -  Program Arguments:
> 2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner    
>               -     --configDir{noformat}
> Heap was set to 255, while with default cuts of it should be 1383. 255MB 
> seems like coming from default taskmanager.heap.mb value of 1024.
> When starting in non flip6 everything works as expected:
> {noformat}
>  
> 2018-03-01 14:04:49,650 INFO  
> org.apache.flink.yarn.YarnTaskManagerRunnerFactory            - 
> 
> 2018-03-01 14:04:49,700 INFO  
> org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Starting 
> YARN TaskManager (Version: 1.5-SNAPSHOT, Rev:e92eb39, Date:28.02.2018 @ 
> 17:43:39 UTC)
> 2018-03-01 14:04:49,700 INFO  
> org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  OS current 
> user: yarn
> 2018-03-01 14:04:53,277 INFO  
> org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Current 
> Hadoop/Kerberos user: hadoop
> 2018-03-01 14:04:53,278 INFO  
> org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  JVM: OpenJDK 
> 64-Bit Server VM - Oracle Corporation - 1.8/25.161-b14
> 2018-03-01 14:04:53,279 INFO  
> org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Maximum heap 
> size: 1326 MiBytes
> 2018-03-01 14:04:53,279 INFO  
> org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  JAVA_HOME: 
> /usr/lib/jvm/java-openjdk
> 2018-03-01 14:04:53,282 INFO  
> org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  

[jira] [Updated] (FLINK-8415) Unprotected access to recordsToSend in LongRecordWriterThread#shutdown()

2018-03-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-8415:
--
Description: 
{code}
  public void shutdown() {
running = false;
recordsToSend.complete(0L);
{code}
In other methods, access to recordsToSend is protected by synchronized keyword.

shutdown() should do the same.

  was:
{code}
  public void shutdown() {
running = false;
recordsToSend.complete(0L);
{code}

In other methods, access to recordsToSend is protected by synchronized keyword.

shutdown() should do the same.


> Unprotected access to recordsToSend in LongRecordWriterThread#shutdown()
> 
>
> Key: FLINK-8415
> URL: https://issues.apache.org/jira/browse/FLINK-8415
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
>   public void shutdown() {
> running = false;
> recordsToSend.complete(0L);
> {code}
> In other methods, access to recordsToSend is protected by synchronized 
> keyword.
> shutdown() should do the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8394) Lack of synchronization accessing expectedRecord in ReceiverThread#shutdown

2018-03-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-8394:
--
Description: 
{code}
  public void shutdown() {
running = false;
interrupt();
expectedRecord.complete(0L);
{code}

Access to expectedRecord should be protected by synchronization, as done on 
other methods.

  was:
{code}
  public void shutdown() {
running = false;
interrupt();
expectedRecord.complete(0L);
{code}
Access to expectedRecord should be protected by synchronization, as done on 
other methods.


> Lack of synchronization accessing expectedRecord in ReceiverThread#shutdown
> ---
>
> Key: FLINK-8394
> URL: https://issues.apache.org/jira/browse/FLINK-8394
> Project: Flink
>  Issue Type: Test
>  Components: Streaming
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
>   public void shutdown() {
> running = false;
> interrupt();
> expectedRecord.complete(0L);
> {code}
> Access to expectedRecord should be protected by synchronization, as done on 
> other methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5607: [hotfix][docs] Drop the incorrect parallel remark in wind...

2018-03-01 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/5607
  
merging.


---


[jira] [Commented] (FLINK-8822) RotateLogFile may not work well when sed version is below 4.2

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382631#comment-16382631
 ] 

ASF GitHub Bot commented on FLINK-8822:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/5609
  
Not sure about this one, after all `4.2` is already _9 years old_.

According to 
https://stackoverflow.com/questions/3139126/whats-the-difference-between-sed-e-and-sed-e
 and https://www.gnu.org/software/sed/manual/sed.html it also looks like `-E` 
is more portable.


> RotateLogFile may not work well when sed version is below 4.2
> -
>
> Key: FLINK-8822
> URL: https://issues.apache.org/jira/browse/FLINK-8822
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Xin Liu
>Priority: Major
> Fix For: 1.5.0
>
>
> In bin/config.sh rotateLogFilesWithPrefix(),it use extended regular to 
> process filename with "sed -E",but when sed version is below 4.2,it turns out 
> "sed: invalid option -- 'E'"
> and RotateLogFile won't work well : There will be only one logfile no matter 
> what is $MAX_LOG_FILE_NUMBER.
> so use sed -r may be more suitable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5609: [FLINK-8822] RotateLogFile may not work well when sed ver...

2018-03-01 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/5609
  
Not sure about this one, after all `4.2` is already _9 years old_.

According to 
https://stackoverflow.com/questions/3139126/whats-the-difference-between-sed-e-and-sed-e
 and https://www.gnu.org/software/sed/manual/sed.html it also looks like `-E` 
is more portable.


---


[jira] [Commented] (FLINK-8819) Rework travis script to use build stages

2018-03-01 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382609#comment-16382609
 ] 

Chesnay Schepler commented on FLINK-8819:
-

A nearly finished prototype can be found 
[here|https://github.com/zentol/flink/tree/travis_stage_prototype], and an 
example build [here|https://travis-ci.org/zentol/flink/builds/347908225].

> Rework travis script to use build stages
> 
>
> Key: FLINK-8819
> URL: https://issues.apache.org/jira/browse/FLINK-8819
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System, Travis
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Trivial
>
> This issue is for tracking efforts to rework our Travis scripts to use 
> [stages|https://docs.travis-ci.com/user/build-stages/].
> This feature allows us to define a sequence of jobs that are run one after 
> another. This implies that we can define dependencies between jobs, in 
> contrast to our existing jobs that have to be self-contained.
> As an example, we could have a compile stage, and a test stage with multiple 
> jobs.
> The main benefit here is that we no longer have to compile modules multiple 
> times, which would reduce our build times.
> The major issue here however is that there is no _proper_ support for passing 
> build-artifacts from one stage to the next. According to this 
> [issue|https://github.com/travis-ci/beta-features/issues/28] it is on their 
> to-do-list however.
> In the mean-time we could manually transfer the artifacts between stages by 
> either using the Travis cache or some other external storage. The cache 
> solution would work by setting up a cached directory (just like the mvn 
> cache) and creating build-scope directories within containing the artifacts 
> (I have a prototype that works like this).
> The major concern here is that of cleaning up the cache/storage.
>  We can clean things up if
>  * our script fails
>  * the last stage succeeds.
> We can *not* clean things up if
>  * the build is canceled
>  * travis fails the build due to a timeout or similar
> as apparently there is [no way to run a script at the end of a 
> build|https://github.com/travis-ci/travis-ci/issues/4221].
> Thus we would either have to periodically clear the cache, or encode more 
> information into the cached files that would allow _other_ builds to clean up 
> stale date. (For example the build number or date).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8274) Fix Java 64K method compiling limitation for CommonCalc

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382427#comment-16382427
 ] 

ASF GitHub Bot commented on FLINK-8274:
---

Github user twalthr commented on the issue:

https://github.com/apache/flink/pull/5613
  
@Xpray what do you think about this approach?


> Fix Java 64K method compiling limitation for CommonCalc
> ---
>
> Key: FLINK-8274
> URL: https://issues.apache.org/jira/browse/FLINK-8274
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Affects Versions: 1.5.0
>Reporter: Ruidong Li
>Assignee: Ruidong Li
>Priority: Critical
>
> For complex SQL Queries, the generated code for {code}DataStreamCalc{code}, 
> {code}DataSetCalc{code} may exceed Java's method length limitation 64kb.
>  
> This issue will split long method to several sub method calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5613: [FLINK-8274] [table] Split generated methods for preventi...

2018-03-01 Thread twalthr
Github user twalthr commented on the issue:

https://github.com/apache/flink/pull/5613
  
@Xpray what do you think about this approach?


---


[jira] [Resolved] (FLINK-5281) Extend KafkaJsonTableSources to support nested data

2018-03-01 Thread Timo Walther (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther resolved FLINK-5281.
-
   Resolution: Fixed
Fix Version/s: 1.5.0

Fixed in FLINK-8630.

> Extend KafkaJsonTableSources to support nested data
> ---
>
> Key: FLINK-5281
> URL: https://issues.apache.org/jira/browse/FLINK-5281
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API  SQL
>Affects Versions: 1.2.0
>Reporter: Fabian Hueske
>Assignee: Ivan Mushketyk
>Priority: Major
> Fix For: 1.5.0
>
>
> The {{TableSource}} does currently not support nested data. 
> Once FLINK-5280 is fixed, the KafkaJsonTableSources should be extended to 
> support nested input data. The nested data should be produced as {{Row}}s 
> nested in {{Row}}s.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8826) In Flip6 mode, when starting yarn cluster, configured taskmanager.heap.mb is ignored

2018-03-01 Thread Piotr Nowojski (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-8826:
--
Description: 
When I tried running some job on the cluster, despite setting 

taskmanager.heap.mb = 3072

taskmanager.network.memory.fraction: 0.4

and reported in the console
{code:java}
Cluster specification: ClusterSpecification{masterMemoryMB=768, 
taskManagerMemoryMB=3072, numberTaskManagers=92, slotsPerTaskManager=1}{code}
The actual settings were:
{noformat}
 

2018-03-01 14:53:18,918 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            - 


2018-03-01 14:53:18,921 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Starting YARN TaskExecutor runner (Version: 1.5-SNAPSHOT, 
Rev:e92eb39, Date:28.02.2018 @ 17:43:39 UTC)

2018-03-01 14:53:18,921 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  OS current user: yarn

2018-03-01 14:53:19,780 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Current Hadoop/Kerberos user: hadoop

2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 
1.8/25.161-b14

2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Maximum heap size: 245 MiBytes

2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  JAVA_HOME: /usr/lib/jvm/java-openjdk

2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Hadoop version: 2.4.1

2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  JVM Options:

2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Xms255m

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Xmx255m

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -XX:MaxDirectMemorySize=769m

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     
-Dlog.file=/var/log/hadoop-yarn/containers/application_1516373731080_1150/container_1516373731080_1150_01_000105/taskmanager.log

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Dlogback.configurationFile=file:./logback.xml

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Dlog4j.configuration=file:./log4j.properties

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Program Arguments:

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     --configDir{noformat}
Heap was set to 255, while with default cuts of it should be 1383. 255MB seems 
like coming from default taskmanager.heap.mb value of 1024.

When starting in non flip6 everything works as expected:
{noformat}
 

2018-03-01 14:04:49,650 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            - 


2018-03-01 14:04:49,700 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Starting YARN 
TaskManager (Version: 1.5-SNAPSHOT, Rev:e92eb39, Date:28.02.2018 @ 17:43:39 UTC)

2018-03-01 14:04:49,700 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  OS current 
user: yarn

2018-03-01 14:04:53,277 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Current 
Hadoop/Kerberos user: hadoop

2018-03-01 14:04:53,278 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  JVM: OpenJDK 
64-Bit Server VM - Oracle Corporation - 1.8/25.161-b14

2018-03-01 14:04:53,279 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Maximum heap 
size: 1326 MiBytes

2018-03-01 14:04:53,279 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  JAVA_HOME: 
/usr/lib/jvm/java-openjdk

2018-03-01 14:04:53,282 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Hadoop 
version: 2.4.1

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  JVM Options:

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -     -Xms1383m

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -     -Xmx1383m

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -     
-XX:MaxDirectMemorySize=1689m

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -     

[jira] [Commented] (FLINK-8537) Add a Kafka table source factory with Avro format support

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382230#comment-16382230
 ] 

ASF GitHub Bot commented on FLINK-8537:
---

Github user xccui commented on the issue:

https://github.com/apache/flink/pull/5610
  
Hi @twalthr @fhueske, I wonder if you could help review this PR when you 
are convenient.

Thanks, Xingcan 


> Add a Kafka table source factory with Avro format support
> -
>
> Key: FLINK-8537
> URL: https://issues.apache.org/jira/browse/FLINK-8537
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API  SQL
>Reporter: Timo Walther
>Assignee: Xingcan Cui
>Priority: Major
>
> Similar to {{CSVTableSourceFactory}} a Kafka table source factory should be 
> added. This issue includes creating a {{Avro}} descriptor with validation 
> that can be used for other connectors as well. It is up for discussion if we 
> want to split the KafkaAvroTableSource into connector and format such that we 
> can reuse the format for other table sources as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5610: [FLINK-8537][table]Add a Kafka table source factory with ...

2018-03-01 Thread xccui
Github user xccui commented on the issue:

https://github.com/apache/flink/pull/5610
  
Hi @twalthr @fhueske, I wonder if you could help review this PR when you 
are convenient.

Thanks, Xingcan 


---


[jira] [Created] (FLINK-8826) In Flip6 mode, when starting yarn cluster, configured taskmanager.heap.mb is ignored

2018-03-01 Thread Piotr Nowojski (JIRA)
Piotr Nowojski created FLINK-8826:
-

 Summary: In Flip6 mode, when starting yarn cluster, configured 
taskmanager.heap.mb is ignored
 Key: FLINK-8826
 URL: https://issues.apache.org/jira/browse/FLINK-8826
 Project: Flink
  Issue Type: Bug
  Components: ResourceManager, YARN
Affects Versions: 1.5.0
Reporter: Piotr Nowojski


When I tried running some job on the cluster, despite setting 

taskmanager.heap.mb = 3072

taskmanager.network.memory.fraction: 0.4

and reported in the console

{{

Cluster specification: ClusterSpecification\{masterMemoryMB=768, 
taskManagerMemoryMB=3072, numberTaskManagers=92, slotsPerTaskManager=1}

}}

The actual settings were:

{{

2018-03-01 14:53:18,918 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            - 


2018-03-01 14:53:18,921 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Starting YARN TaskExecutor runner (Version: 1.5-SNAPSHOT, 
Rev:e92eb39, Date:28.02.2018 @ 17:43:39 UTC)

2018-03-01 14:53:18,921 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  OS current user: yarn

2018-03-01 14:53:19,780 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Current Hadoop/Kerberos user: hadoop

2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 
1.8/25.161-b14

2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Maximum heap size: 245 MiBytes

2018-03-01 14:53:19,781 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  JAVA_HOME: /usr/lib/jvm/java-openjdk

2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Hadoop version: 2.4.1

2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  JVM Options:

2018-03-01 14:53:19,783 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Xms255m

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Xmx255m

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -XX:MaxDirectMemorySize=769m

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     
-Dlog.file=/var/log/hadoop-yarn/containers/application_1516373731080_1150/container_1516373731080_1150_01_000105/taskmanager.log

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Dlogback.configurationFile=file:./logback.xml

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     -Dlog4j.configuration=file:./log4j.properties

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -  Program Arguments:

2018-03-01 14:53:19,784 INFO  org.apache.flink.yarn.YarnTaskExecutorRunner      
            -     --configDir

}}

Heap was set to 255, while with default cuts of it should be 1383. 255MB seems 
like coming from default taskmanager.heap.mb value of 1024.

 

When starting in non flip6 everything works as expected:

{{

2018-03-01 14:04:49,650 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            - 


2018-03-01 14:04:49,700 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Starting YARN 
TaskManager (Version: 1.5-SNAPSHOT, Rev:e92eb39, Date:28.02.2018 @ 17:43:39 UTC)

2018-03-01 14:04:49,700 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  OS current 
user: yarn

2018-03-01 14:04:53,277 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Current 
Hadoop/Kerberos user: hadoop

2018-03-01 14:04:53,278 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  JVM: OpenJDK 
64-Bit Server VM - Oracle Corporation - 1.8/25.161-b14

2018-03-01 14:04:53,279 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Maximum heap 
size: 1326 MiBytes

2018-03-01 14:04:53,279 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  JAVA_HOME: 
/usr/lib/jvm/java-openjdk

2018-03-01 14:04:53,282 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  Hadoop 
version: 2.4.1

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -  JVM Options:

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -     -Xms1383m

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -     -Xmx1383m

2018-03-01 14:04:53,284 INFO  
org.apache.flink.yarn.YarnTaskManagerRunnerFactory            -     

[jira] [Commented] (FLINK-8274) Fix Java 64K method compiling limitation for CommonCalc

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382229#comment-16382229
 ] 

ASF GitHub Bot commented on FLINK-8274:
---

GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/5613

[FLINK-8274] [table] Split generated methods for preventing compiler 
exceptions

## What is the purpose of the change

This PR splits a result record into multiple methods that evaluate a field. 
This prevents compiler exceptions such as mentioned in FLINK-8274. In 
comparison to #5174 this PR prevents the exception for all generated functions 
and also covers corner cases such as timestamps.

## Brief change log

- Splitting added to code generation classes

## Verifying this change

See added unit tests.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): no
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
  - The serializers: no
  - The runtime per-record code paths (performance sensitive): yes
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  - The S3 file system connector: no

## Documentation

  - Does this pull request introduce a new feature? no
  - If yes, how is the feature documented? JavaDocs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink FLINK-8274_3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5613


commit 5e967692887b403a7d86805b21956773ed9c6f02
Author: Timo Walther 
Date:   2018-03-01T15:26:21Z

[FLINK-8274] [table] Split generated methods for preventing compiler 
exceptions




> Fix Java 64K method compiling limitation for CommonCalc
> ---
>
> Key: FLINK-8274
> URL: https://issues.apache.org/jira/browse/FLINK-8274
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Affects Versions: 1.5.0
>Reporter: Ruidong Li
>Assignee: Ruidong Li
>Priority: Critical
>
> For complex SQL Queries, the generated code for {code}DataStreamCalc{code}, 
> {code}DataSetCalc{code} may exceed Java's method length limitation 64kb.
>  
> This issue will split long method to several sub method calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5613: [FLINK-8274] [table] Split generated methods for p...

2018-03-01 Thread twalthr
GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/5613

[FLINK-8274] [table] Split generated methods for preventing compiler 
exceptions

## What is the purpose of the change

This PR splits a result record into multiple methods that evaluate a field. 
This prevents compiler exceptions such as mentioned in FLINK-8274. In 
comparison to #5174 this PR prevents the exception for all generated functions 
and also covers corner cases such as timestamps.

## Brief change log

- Splitting added to code generation classes

## Verifying this change

See added unit tests.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): no
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
  - The serializers: no
  - The runtime per-record code paths (performance sensitive): yes
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  - The S3 file system connector: no

## Documentation

  - Does this pull request introduce a new feature? no
  - If yes, how is the feature documented? JavaDocs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink FLINK-8274_3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5613


commit 5e967692887b403a7d86805b21956773ed9c6f02
Author: Timo Walther 
Date:   2018-03-01T15:26:21Z

[FLINK-8274] [table] Split generated methods for preventing compiler 
exceptions




---


[jira] [Updated] (FLINK-8771) Upgrade scalastyle to 1.0.0

2018-03-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-8771:
--
Description: 
scalastyle 1.0.0 fixes issue with import order, explicit type for public 
methods, line length limitation and comment validation.

We should upgrade to scalastyle 1.0.0

  was:scalastyle 1.0.0 fixes issue with import order, explicit type for public 
methods, line length limitation and comment validation.


> Upgrade scalastyle to 1.0.0
> ---
>
> Key: FLINK-8771
> URL: https://issues.apache.org/jira/browse/FLINK-8771
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Ted Yu
>Priority: Major
>
> scalastyle 1.0.0 fixes issue with import order, explicit type for public 
> methods, line length limitation and comment validation.
> We should upgrade to scalastyle 1.0.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8814) Control over the extension of part files created by BucketingSink

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382187#comment-16382187
 ] 

ASF GitHub Bot commented on FLINK-8814:
---

Github user jelmerk closed the pull request at:

https://github.com/apache/flink/pull/5603


> Control over the extension of part files created by BucketingSink
> -
>
> Key: FLINK-8814
> URL: https://issues.apache.org/jira/browse/FLINK-8814
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.4.0
>Reporter: Jelmer Kuperus
>Priority: Major
> Fix For: 1.5.0
>
>
> BucketingSink creates files with the following pattern
> {noformat}
> partPrefix + "-" + subtaskIndex + "-" + bucketState.partCounter{noformat}
> When using checkpointing you have no control over the extension of the final 
> files generated. This is incovenient when you are for instance writing files 
> in the avro format because
>  # [Hue|http://gethue.com/] will not be able to render the files as avro See 
> this 
> [file|https://github.com/cloudera/hue/blob/master/apps/filebrowser/src/filebrowser/views.py#L730]
>  # [Spark avro|https://github.com/databricks/spark-avro/] will not be able to 
> read the files unless you set a special property. See [this 
> ticket|https://github.com/databricks/spark-avro/issues/203]
> It would be good if we had the ability to customize the extension of the 
> files created
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8814) Control over the extension of part files created by BucketingSink

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382188#comment-16382188
 ] 

ASF GitHub Bot commented on FLINK-8814:
---

Github user jelmerk commented on the issue:

https://github.com/apache/flink/pull/5603
  
Thanks!


> Control over the extension of part files created by BucketingSink
> -
>
> Key: FLINK-8814
> URL: https://issues.apache.org/jira/browse/FLINK-8814
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.4.0
>Reporter: Jelmer Kuperus
>Priority: Major
> Fix For: 1.5.0
>
>
> BucketingSink creates files with the following pattern
> {noformat}
> partPrefix + "-" + subtaskIndex + "-" + bucketState.partCounter{noformat}
> When using checkpointing you have no control over the extension of the final 
> files generated. This is incovenient when you are for instance writing files 
> in the avro format because
>  # [Hue|http://gethue.com/] will not be able to render the files as avro See 
> this 
> [file|https://github.com/cloudera/hue/blob/master/apps/filebrowser/src/filebrowser/views.py#L730]
>  # [Spark avro|https://github.com/databricks/spark-avro/] will not be able to 
> read the files unless you set a special property. See [this 
> ticket|https://github.com/databricks/spark-avro/issues/203]
> It would be good if we had the ability to customize the extension of the 
> files created
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8825) Disallow new String() without charset in checkstyle

2018-03-01 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-8825:
---

 Summary: Disallow new String() without charset in checkstyle
 Key: FLINK-8825
 URL: https://issues.apache.org/jira/browse/FLINK-8825
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Aljoscha Krettek
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5603: [FLINK-8814] Control over the extension of part files cre...

2018-03-01 Thread jelmerk
Github user jelmerk commented on the issue:

https://github.com/apache/flink/pull/5603
  
Thanks!


---


[GitHub] flink pull request #5603: [FLINK-8814] Control over the extension of part fi...

2018-03-01 Thread jelmerk
Github user jelmerk closed the pull request at:

https://github.com/apache/flink/pull/5603


---


[jira] [Commented] (FLINK-6763) Inefficient PojoSerializerConfigSnapshot serialization format

2018-03-01 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382142#comment-16382142
 ] 

Aljoscha Krettek commented on FLINK-6763:
-

[~tzulitai] Did we decide to move this to 1.6.0? Or at least make it 
non-blocking?

> Inefficient PojoSerializerConfigSnapshot serialization format
> -
>
> Key: FLINK-6763
> URL: https://issues.apache.org/jira/browse/FLINK-6763
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing, Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> The {{PojoSerializerConfigSnapshot}} stores for each serializer the beginning 
> offset and ending offset in the serialization stream. This information is 
> also written if the serializer serialization is supposed to be ignored. The 
> beginning and ending offsets are stored as a sequence of integers at the 
> beginning of the serialization stream. We store this information to skip 
> broken serializers.
> I think we don't need both offsets. Instead I would suggest to write the 
> length of the serialized serializer first into the serialization stream and 
> then the serialized serializer. This can be done in 
> {{TypeSerializerSerializationUtil.writeSerializer}}. When reading the 
> serializer via {{TypeSerializerSerializationUtil.tryReadSerializer}}, we can 
> try to deserialize the serializer. If this operation fails, then we can skip 
> the number of serialized serializer because we know how long it was.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2018-03-01 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382140#comment-16382140
 ] 

Aljoscha Krettek commented on FLINK-6764:
-

[~tzulitai] Did we decide to move this to 1.6.0? Or at least make it 
non-blocking?

> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6944) Introduce DefaultTypeSerializerConfigSnapshot as a base implementation for serializer compatibility checks

2018-03-01 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382139#comment-16382139
 ] 

Aljoscha Krettek commented on FLINK-6944:
-

[~tzulitai] Did we decide to move this to 1.6.0? Or at least make it 
non-blocking?

> Introduce DefaultTypeSerializerConfigSnapshot as a base implementation for 
> serializer compatibility checks
> --
>
> Key: FLINK-6944
> URL: https://issues.apache.org/jira/browse/FLINK-6944
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing, Type Serialization System
>Affects Versions: 1.3.0, 1.3.1
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, we store both the {{TypeSerializer}} and its corresponding 
> {{TypeSerializerConfigSnapshot}} in checkpoints of managed state. This, in 
> most cases, are actually duplicate information.
> This JIRA proposes to change this by only storing the 
> {{TypeSerializerConfigSnapshot}}, while at the same time, letting 
> {{TypeSerializer.snapshotConfiguration}} return a default 
> {{DefaultTypeSerializerConfigSnapshot}}.
> This default simply serializes the serializer instance using Java 
> serialization.
> The {{DefaultTypeSerializerConfigSnapshot}} should wrap the serializer bytes, 
> the serialVersionUID of the serializer class, and the serializer class' 
> classname. The latter two will be used to check compatibility in the default 
> implementation of {{TypeSerializer.ensureCompatibility}}. Specifically, if 
> classname / serialVersionUID has changed, the default implementation of 
> {{TypeSerializer.ensureCompatibility}} will simply return 
> {{CompatibilityResult.requiresMigration}} with the deserialized serializer as 
> the convert deserializer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-7733) test instability in Kafka end-to-end test (RetriableCommitFailedException)

2018-03-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-7733.
---
   Resolution: Cannot Reproduce
Fix Version/s: (was: 1.5.0)

Closing for now since this hasn't reappeared in a while.

> test instability in Kafka end-to-end test (RetriableCommitFailedException)
> --
>
> Key: FLINK-7733
> URL: https://issues.apache.org/jira/browse/FLINK-7733
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0
>Reporter: Nico Kruber
>Priority: Blocker
>  Labels: test-stability
>
> In a branch with unrelated changes, the Kafka end-to-end tests fails with the 
> following strange entries in the log.
> {code}
> 2017-09-27 17:50:36,777 INFO  org.apache.kafka.common.utils.AppInfoParser 
>   - Kafka version : 0.10.2.1
> 2017-09-27 17:50:36,778 INFO  org.apache.kafka.common.utils.AppInfoParser 
>   - Kafka commitId : e89bffd6b2eff799
> 2017-09-27 17:50:38,492 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Discovered 
> coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) for group myconsumer.
> 2017-09-27 17:50:38,511 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking 
> the coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) dead for group myconsumer
> 2017-09-27 17:50:38,618 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Discovered 
> coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) for group myconsumer.
> 2017-09-27 17:50:41,525 INFO  
> org.apache.flink.runtime.state.DefaultOperatorStateBackend- 
> DefaultOperatorStateBackend snapshot (In-Memory Stream Factory, synchronous 
> part) in thread Thread[Async calls on Source: Custom Source -> Map -> Sink: 
> Unnamed (1/1),5,Flink Task Threads] took 481 ms.
> 2017-09-27 17:50:41,598 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking 
> the coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) dead for group myconsumer
> 2017-09-27 17:50:41,600 WARN  
> org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  - 
> Committing offsets to Kafka failed. This does not compromise Flink's 
> checkpoints.
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: org.apache.kafka.common.errors.DisconnectException
> 2017-09-27 17:50:41,608 ERROR 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - Async 
> Kafka commit failed.
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: org.apache.kafka.common.errors.DisconnectException
> {code}
> https://travis-ci.org/NicoK/flink/jobs/280477275



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8191) Add a RoundRobinPartitioner to be shipped with the Kafka connector

2018-03-01 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382134#comment-16382134
 ] 

Aljoscha Krettek commented on FLINK-8191:
-

[~tzulitai] Did we decide to move this to 1.6.0? Or at least make it 
non-blocking?

> Add a RoundRobinPartitioner to be shipped with the Kafka connector
> --
>
> Key: FLINK-8191
> URL: https://issues.apache.org/jira/browse/FLINK-8191
> Project: Flink
>  Issue Type: New Feature
>  Components: Kafka Connector
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Aegeaner
>Priority: Blocker
> Fix For: 1.5.0
>
>
> We should perhaps consider adding a round-robin partitioner ready for use to 
> be shipped with the Kafka connector, along side the already available 
> {{FlinkFixedPartitioner}}.
> See the original discussion here:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FlinkKafkaProducerXX-td16951.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7913) Add support for Kafka default partitioner

2018-03-01 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382135#comment-16382135
 ] 

Aljoscha Krettek commented on FLINK-7913:
-

[~tzulitai] Did we decide to move this to 1.6.0? Or at least make it 
non-blocking?

> Add support for Kafka default partitioner
> -
>
> Key: FLINK-7913
> URL: https://issues.apache.org/jira/browse/FLINK-7913
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.4.0
>Reporter: Konstantin Lalafaryan
>Assignee: Konstantin Lalafaryan
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently in the Apache Flink it is available only *FlinkKafkaPartitioner* 
> and just one implementation *FlinkFixedPartitioner*. 
> In order to be able to use Kafka's default partitioner you have to create new 
> implementation for *FlinkKafkaPartitioner* and fork the code from the Kafka. 
> It will be really good to be able to define the partitioner without 
> implementing the new class.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8500) Get the timestamp of the Kafka message from kafka consumer(Kafka010Fetcher)

2018-03-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8500:

Priority: Major  (was: Blocker)

> Get the timestamp of the Kafka message from kafka consumer(Kafka010Fetcher)
> ---
>
> Key: FLINK-8500
> URL: https://issues.apache.org/jira/browse/FLINK-8500
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.4.0
>Reporter: yanxiaobin
>Priority: Major
> Fix For: 1.6.0
>
> Attachments: image-2018-01-30-14-58-58-167.png, 
> image-2018-01-31-10-48-59-633.png
>
>
> The method deserialize of KeyedDeserializationSchema  needs a parameter 
> 'kafka message timestamp' (from ConsumerRecord) .In some business scenarios, 
> this is useful!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8824) In Kafka Consumers, replace 'getCanonicalClassName()' with 'getClassName()'

2018-03-01 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-8824:
---

 Summary: In Kafka Consumers, replace 'getCanonicalClassName()' 
with 'getClassName()'
 Key: FLINK-8824
 URL: https://issues.apache.org/jira/browse/FLINK-8824
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Reporter: Stephan Ewen
 Fix For: 1.5.0


The connector uses {{getCanonicalClassName()}} in all places, gather than 
{{getClassName()}}.

{{getCanonicalClassName()}}'s intention is to normalize class names for arrays, 
etc, but is problematic when instantiating classes from class names.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8810) Move end-to-end test scripts to end-to-end module

2018-03-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-8810.
---
Resolution: Fixed

Implemented on release-1.5 in
122e2eb4606afd0cdf20ab3bf82bb524bb9c3673

Implemented on master in
a0336f2e822b738e7843bd2fb69cd0347d9fb757

> Move end-to-end test scripts to end-to-end module
> -
>
> Key: FLINK-8810
> URL: https://issues.apache.org/jira/browse/FLINK-8810
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8810) Move end-to-end test scripts to end-to-end module

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382111#comment-16382111
 ] 

ASF GitHub Bot commented on FLINK-8810:
---

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/5612


> Move end-to-end test scripts to end-to-end module
> -
>
> Key: FLINK-8810
> URL: https://issues.apache.org/jira/browse/FLINK-8810
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5612: [FLINK-8810] Move end-to-end test scripts to end-t...

2018-03-01 Thread aljoscha
Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/5612


---


[jira] [Commented] (FLINK-8810) Move end-to-end test scripts to end-to-end module

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382105#comment-16382105
 ] 

ASF GitHub Bot commented on FLINK-8810:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/5612
  
Thanks @zentol! I'm fixing the indentation and merging.

Btw, I'm using spaces in `run-pre-commit-tests.sh` because all other shell 
scripts in that folder also use spaces. Inconsistent, yes, but what can you do. 
I think at some point we have to normalise all those and enforce.


> Move end-to-end test scripts to end-to-end module
> -
>
> Key: FLINK-8810
> URL: https://issues.apache.org/jira/browse/FLINK-8810
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5612: [FLINK-8810] Move end-to-end test scripts to end-to-end m...

2018-03-01 Thread aljoscha
Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/5612
  
Thanks @zentol! I'm fixing the indentation and merging.

Btw, I'm using spaces in `run-pre-commit-tests.sh` because all other shell 
scripts in that folder also use spaces. Inconsistent, yes, but what can you do. 
I think at some point we have to normalise all those and enforce.


---


[jira] [Closed] (FLINK-8814) Control over the extension of part files created by BucketingSink

2018-03-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-8814.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Implemented on release-1.5 in
06b05cd204bd9a12884ad12805a61005ef40fbe7

Implemented on master in
f152542468b37783932fc2c7725a3a5871b7a701

> Control over the extension of part files created by BucketingSink
> -
>
> Key: FLINK-8814
> URL: https://issues.apache.org/jira/browse/FLINK-8814
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.4.0
>Reporter: Jelmer Kuperus
>Priority: Major
> Fix For: 1.5.0
>
>
> BucketingSink creates files with the following pattern
> {noformat}
> partPrefix + "-" + subtaskIndex + "-" + bucketState.partCounter{noformat}
> When using checkpointing you have no control over the extension of the final 
> files generated. This is incovenient when you are for instance writing files 
> in the avro format because
>  # [Hue|http://gethue.com/] will not be able to render the files as avro See 
> this 
> [file|https://github.com/cloudera/hue/blob/master/apps/filebrowser/src/filebrowser/views.py#L730]
>  # [Spark avro|https://github.com/databricks/spark-avro/] will not be able to 
> read the files unless you set a special property. See [this 
> ticket|https://github.com/databricks/spark-avro/issues/203]
> It would be good if we had the ability to customize the extension of the 
> files created
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8814) Control over the extension of part files created by BucketingSink

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382089#comment-16382089
 ] 

ASF GitHub Bot commented on FLINK-8814:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/5603
  
Merged.  Could you please close the PR?


> Control over the extension of part files created by BucketingSink
> -
>
> Key: FLINK-8814
> URL: https://issues.apache.org/jira/browse/FLINK-8814
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.4.0
>Reporter: Jelmer Kuperus
>Priority: Major
>
> BucketingSink creates files with the following pattern
> {noformat}
> partPrefix + "-" + subtaskIndex + "-" + bucketState.partCounter{noformat}
> When using checkpointing you have no control over the extension of the final 
> files generated. This is incovenient when you are for instance writing files 
> in the avro format because
>  # [Hue|http://gethue.com/] will not be able to render the files as avro See 
> this 
> [file|https://github.com/cloudera/hue/blob/master/apps/filebrowser/src/filebrowser/views.py#L730]
>  # [Spark avro|https://github.com/databricks/spark-avro/] will not be able to 
> read the files unless you set a special property. See [this 
> ticket|https://github.com/databricks/spark-avro/issues/203]
> It would be good if we had the ability to customize the extension of the 
> files created
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5603: [FLINK-8814] Control over the extension of part files cre...

2018-03-01 Thread aljoscha
Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/5603
  
Merged. 👍 Could you please close the PR?


---


[jira] [Commented] (FLINK-8458) Add the switch for keeping both the old mode and the new credit-based mode

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382064#comment-16382064
 ] 

ASF GitHub Bot commented on FLINK-8458:
---

Github user zhijiangW commented on a diff in the pull request:

https://github.com/apache/flink/pull/5317#discussion_r171575083
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java 
---
@@ -269,15 +269,21 @@
public static final ConfigOption NETWORK_BUFFERS_PER_CHANNEL =
key("taskmanager.network.memory.buffers-per-channel")
.defaultValue(2)
-   .withDescription("Number of network buffers to use for 
each outgoing/incoming channel (subpartition/input channel).");
+   .withDescription("Number of network buffers to use for 
each outgoing/incoming channel (subpartition/input channel)." +
+   "In credit-based flow control mode, this 
indicates how many credits are exclusive in each input channel. It should be" +
+   " configured at least 2 for good performance. 1 
buffer is for receiving in-flight data in the subpartition and 1 buffer is" +
+   " for parallel serialization.");
 
/**
 * Number of extra network buffers to use for each outgoing/incoming 
gate (result partition/input gate).
 */
public static final ConfigOption 
NETWORK_EXTRA_BUFFERS_PER_GATE =

key("taskmanager.network.memory.floating-buffers-per-gate")
.defaultValue(8)
-   .withDescription("Number of extra network buffers to 
use for each outgoing/incoming gate (result partition/input gate).");
+   .withDescription("Number of extra network buffers to 
use for each outgoing/incoming gate (result partition/input gate)." +
+   " In credit-based flow control mode, this 
indicates how many floating credits are shared among all the input channels." +
+   " The floating buffers are distributed based on 
backlog (real-time output buffers in the subpartition) feedback, and can" +
+   " help relieve back-pressure caused by 
unbalanced data distribution among the subpartitions.");
--- End diff --

Yeah, I already added it.


> Add the switch for keeping both the old mode and the new credit-based mode
> --
>
> Key: FLINK-8458
> URL: https://issues.apache.org/jira/browse/FLINK-8458
> Project: Flink
>  Issue Type: Sub-task
>  Components: Network
>Reporter: zhijiang
>Assignee: zhijiang
>Priority: Major
> Fix For: 1.5.0
>
>
> After the whole feature of credit-based flow control is done, we should add a 
> config parameter to switch on/off the new credit-based mode. To do so, we can 
> roll back to the old network mode for any expected risks.
> The parameter is defined as 
> {{taskmanager.network.credit-based-flow-control.enabled}} and the default 
> value is true. This switch may be removed after next release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5317: [FLINK-8458] Add the switch for keeping both the o...

2018-03-01 Thread zhijiangW
Github user zhijiangW commented on a diff in the pull request:

https://github.com/apache/flink/pull/5317#discussion_r171575083
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java 
---
@@ -269,15 +269,21 @@
public static final ConfigOption NETWORK_BUFFERS_PER_CHANNEL =
key("taskmanager.network.memory.buffers-per-channel")
.defaultValue(2)
-   .withDescription("Number of network buffers to use for 
each outgoing/incoming channel (subpartition/input channel).");
+   .withDescription("Number of network buffers to use for 
each outgoing/incoming channel (subpartition/input channel)." +
+   "In credit-based flow control mode, this 
indicates how many credits are exclusive in each input channel. It should be" +
+   " configured at least 2 for good performance. 1 
buffer is for receiving in-flight data in the subpartition and 1 buffer is" +
+   " for parallel serialization.");
 
/**
 * Number of extra network buffers to use for each outgoing/incoming 
gate (result partition/input gate).
 */
public static final ConfigOption 
NETWORK_EXTRA_BUFFERS_PER_GATE =

key("taskmanager.network.memory.floating-buffers-per-gate")
.defaultValue(8)
-   .withDescription("Number of extra network buffers to 
use for each outgoing/incoming gate (result partition/input gate).");
+   .withDescription("Number of extra network buffers to 
use for each outgoing/incoming gate (result partition/input gate)." +
+   " In credit-based flow control mode, this 
indicates how many floating credits are shared among all the input channels." +
+   " The floating buffers are distributed based on 
backlog (real-time output buffers in the subpartition) feedback, and can" +
+   " help relieve back-pressure caused by 
unbalanced data distribution among the subpartitions.");
--- End diff --

Yeah, I already added it.


---


[GitHub] flink pull request #5612: [FLINK-8810] Move end-to-end test scripts to end-t...

2018-03-01 Thread zentol
Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5612#discussion_r171571281
  
--- Diff: tools/travis_mvn_watchdog.sh ---
@@ -576,61 +576,9 @@ case $TEST in
printf "Running end-to-end tests\n"
printf 
"==\n"
 
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Wordcount end-to-end test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_batch_wordcount.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Kafka end-to-end test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_streaming_kafka010.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running class loading end-to-end test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_streaming_classloader.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Shaded Hadoop S3A end-to-end 
test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_shaded_hadoop_s3a.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Shaded Presto S3 end-to-end 
test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_shaded_presto_s3.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Hadoop-free Wordcount 
end-to-end test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target CLUSTER_MODE=cluster 
test-infra/end-to-end-test/test_hadoop_free.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Streaming Python Wordcount 
end-to-end test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_streaming_python_wordcount.sh
-   EXIT_CODE=$?
-   fi
+FLINK_DIR=build-target 
flink-end-to-end-tests/run-pre-commit-tests.sh
--- End diff --

indentation is off as it uses spaces opposed to tabs


---


[jira] [Commented] (FLINK-8810) Move end-to-end test scripts to end-to-end module

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382054#comment-16382054
 ] 

ASF GitHub Bot commented on FLINK-8810:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5612#discussion_r171572132
  
--- Diff: flink-end-to-end-tests/run-pre-commit-tests.sh ---
@@ -0,0 +1,98 @@
+#!/usr/bin/env bash

+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+END_TO_END_DIR="`dirname \"$0\"`"  # relative
+END_TO_END_DIR="`( cd \"$END_TO_END_DIR\" && pwd )`"   # absolutized 
and normalized
+if [ -z "$END_TO_END_DIR" ] ; then
+   # error; for some reason, the path is not accessible
+   # to the script (e.g. permissions re-evaled after suid)
--- End diff --

inconsistent indentation (tabs vs spaces)


> Move end-to-end test scripts to end-to-end module
> -
>
> Key: FLINK-8810
> URL: https://issues.apache.org/jira/browse/FLINK-8810
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8810) Move end-to-end test scripts to end-to-end module

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382053#comment-16382053
 ] 

ASF GitHub Bot commented on FLINK-8810:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5612#discussion_r171571281
  
--- Diff: tools/travis_mvn_watchdog.sh ---
@@ -576,61 +576,9 @@ case $TEST in
printf "Running end-to-end tests\n"
printf 
"==\n"
 
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Wordcount end-to-end test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_batch_wordcount.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Kafka end-to-end test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_streaming_kafka010.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running class loading end-to-end test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_streaming_classloader.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Shaded Hadoop S3A end-to-end 
test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_shaded_hadoop_s3a.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Shaded Presto S3 end-to-end 
test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_shaded_presto_s3.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Hadoop-free Wordcount 
end-to-end test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target CLUSTER_MODE=cluster 
test-infra/end-to-end-test/test_hadoop_free.sh
-   EXIT_CODE=$?
-   fi
-
-   if [ $EXIT_CODE == 0 ]; then
-   printf 
"\n==\n"
-   printf "Running Streaming Python Wordcount 
end-to-end test\n"
-   printf 
"==\n"
-   FLINK_DIR=build-target 
test-infra/end-to-end-test/test_streaming_python_wordcount.sh
-   EXIT_CODE=$?
-   fi
+FLINK_DIR=build-target 
flink-end-to-end-tests/run-pre-commit-tests.sh
--- End diff --

indentation is off as it uses spaces opposed to tabs


> Move end-to-end test scripts to end-to-end module
> -
>
> Key: FLINK-8810
> URL: 

[GitHub] flink pull request #5612: [FLINK-8810] Move end-to-end test scripts to end-t...

2018-03-01 Thread zentol
Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5612#discussion_r171572132
  
--- Diff: flink-end-to-end-tests/run-pre-commit-tests.sh ---
@@ -0,0 +1,98 @@
+#!/usr/bin/env bash

+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+END_TO_END_DIR="`dirname \"$0\"`"  # relative
+END_TO_END_DIR="`( cd \"$END_TO_END_DIR\" && pwd )`"   # absolutized 
and normalized
+if [ -z "$END_TO_END_DIR" ] ; then
+   # error; for some reason, the path is not accessible
+   # to the script (e.g. permissions re-evaled after suid)
--- End diff --

inconsistent indentation (tabs vs spaces)


---


[jira] [Commented] (FLINK-8667) expose key in KeyedBroadcastProcessFunction#onTimer()

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382043#comment-16382043
 ] 

ASF GitHub Bot commented on FLINK-8667:
---

Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/5500
  
LGTM! Thanks for the work @bowenli86 !


> expose key in KeyedBroadcastProcessFunction#onTimer()
> -
>
> Key: FLINK-8667
> URL: https://issues.apache.org/jira/browse/FLINK-8667
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Major
> Fix For: 1.5.0, 1.6.0
>
>
> [~aljoscha] [~pnowojski]  
> Since KeyedBroadcastProcessFunction is about to get out of the door, I think 
> it will be great to expose the timer's key in KeyedBroadcastProcessFunction 
> too. If we don't do it now, it will be much more difficult to add the feature 
> on later because of user app compatibility issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5500: [FLINK-8667] expose key in KeyedBroadcastProcessFunction#...

2018-03-01 Thread kl0u
Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/5500
  
LGTM! Thanks for the work @bowenli86 !


---


[jira] [Commented] (FLINK-8560) add KeyedProcessFunction to expose the key in onTimer() and other methods

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382032#comment-16382032
 ] 

ASF GitHub Bot commented on FLINK-8560:
---

Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/5481#discussion_r171566773
  
--- Diff: 
flink-streaming-scala/src/main/scala/org/apache/flink/streaming/api/scala/KeyedStream.scala
 ---
@@ -66,9 +66,9 @@ class KeyedStream[T, K](javaStream: KeyedJavaStream[T, 
K]) extends DataStream[T]
 * function, this function can also query the time and set timers. When 
reacting to the firing
 * of set timers the function can directly emit elements and/or 
register yet more timers.
 *
-* @param processFunction The [[ProcessFunction]] that is called for 
each element
-*   in the stream.
+* @param processFunction The [[ProcessFunction]] that is called for 
each element in the stream.
--- End diff --

Please also add that the user now should use the new `KeyedProcessFunction` 
instead.


> add KeyedProcessFunction to expose the key in onTimer() and other methods
> -
>
> Key: FLINK-8560
> URL: https://issues.apache.org/jira/browse/FLINK-8560
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 1.4.0
>Reporter: Jürgen Thomann
>Assignee: Bowen Li
>Priority: Major
> Fix For: 1.5.0
>
>
> Currently it is required to store the key of a keyBy() in the processElement 
> method to have access to it in the OnTimerContext.
> This is not so good as you have to check in the processElement method for 
> every element if the key is already stored and set it if it's not already set.
> A possible solution would adding OnTimerContext#getCurrentKey() or a similar 
> method. Maybe having it in the open() method could maybe work as well.
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Getting-Key-from-keyBy-in-ProcessFunction-tt18126.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5481: [FLINK-8560] Access to the current key in ProcessF...

2018-03-01 Thread kl0u
Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/5481#discussion_r171566773
  
--- Diff: 
flink-streaming-scala/src/main/scala/org/apache/flink/streaming/api/scala/KeyedStream.scala
 ---
@@ -66,9 +66,9 @@ class KeyedStream[T, K](javaStream: KeyedJavaStream[T, 
K]) extends DataStream[T]
 * function, this function can also query the time and set timers. When 
reacting to the firing
 * of set timers the function can directly emit elements and/or 
register yet more timers.
 *
-* @param processFunction The [[ProcessFunction]] that is called for 
each element
-*   in the stream.
+* @param processFunction The [[ProcessFunction]] that is called for 
each element in the stream.
--- End diff --

Please also add that the user now should use the new `KeyedProcessFunction` 
instead.


---


[jira] [Updated] (FLINK-8820) FlinkKafkaConsumer010 reads too many bytes

2018-03-01 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8820:

Priority: Blocker  (was: Critical)

> FlinkKafkaConsumer010 reads too many bytes
> --
>
> Key: FLINK-8820
> URL: https://issues.apache.org/jira/browse/FLINK-8820
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.4.0
>Reporter: Fabian Hueske
>Priority: Blocker
> Fix For: 1.5.0
>
>
> A user reported that the FlinkKafkaConsumer010 very rarely consumes too many 
> bytes, i.e., the returned message is too large. The application is running 
> for about a year and the problem started to occur after upgrading to Flink 
> 1.4.0.
> The user made a good effort in debugging the problem but was not able to 
> reproduce it in a controlled environment. It seems that the data is correctly 
> stored in Kafka.
> Here's the thread on the thread on the user mailing list for a detailed 
> description of the problem and analysis so far: 
> https://lists.apache.org/thread.html/1d62f616d275e9e23a5215ddf7f5466051be7ea96897d827232fcb4e@%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >