[jira] [Commented] (FLINK-2832) Failing test: RandomSamplerTest.testReservoirSamplerWithReplacement

2016-02-09 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138691#comment-15138691
 ] 

Robert Metzger commented on FLINK-2832:
---

Another instance: 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/107973266/log.txt


{code}
Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.914 sec <<< 
FAILURE! - in org.apache.flink.api.java.sampling.RandomSamplerTest
testReservoirSamplerWithReplacement(org.apache.flink.api.java.sampling.RandomSamplerTest)
  Time elapsed: 1.282 sec  <<< FAILURE!
java.lang.AssertionError: KS test result with p value(0.034000), d 
value(0.032600)
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.verifyKSTest(RandomSamplerTest.java:342)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.verifyRandomSamplerWithSampleSize(RandomSamplerTest.java:330)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.verifyReservoirSamplerWithReplacement(RandomSamplerTest.java:289)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.testReservoirSamplerWithReplacement(RandomSamplerTest.java:194)

{code}

> Failing test: RandomSamplerTest.testReservoirSamplerWithReplacement
> ---
>
> Key: FLINK-2832
> URL: https://issues.apache.org/jira/browse/FLINK-2832
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.10.0
>Reporter: Vasia Kalavri
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.0.0
>
>
> Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.133 sec 
> <<< FAILURE! - in org.apache.flink.api.java.sampling.RandomSamplerTest
> testReservoirSamplerWithReplacement(org.apache.flink.api.java.sampling.RandomSamplerTest)
>   Time elapsed: 2.534 sec  <<< FAILURE!
> java.lang.AssertionError: KS test result with p value(0.11), d 
> value(0.103090)
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.flink.api.java.sampling.RandomSamplerTest.verifyKSTest(RandomSamplerTest.java:342)
>   at 
> org.apache.flink.api.java.sampling.RandomSamplerTest.verifyRandomSamplerWithSampleSize(RandomSamplerTest.java:330)
>   at 
> org.apache.flink.api.java.sampling.RandomSamplerTest.verifyReservoirSamplerWithReplacement(RandomSamplerTest.java:289)
>   at 
> org.apache.flink.api.java.sampling.RandomSamplerTest.testReservoirSamplerWithReplacement(RandomSamplerTest.java:192)
> Results :
> Failed tests: 
>   
> RandomSamplerTest.testReservoirSamplerWithReplacement:192->verifyReservoirSamplerWithReplacement:289->verifyRandomSamplerWithSampleSize:330->verifyKSTest:342
>  KS test result with p value(0.11), d value(0.103090)
> Full log [here|https://travis-ci.org/apache/flink/jobs/84120131].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138686#comment-15138686
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chobeat commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181790876
  
I agree with @sachingoel0101 on the import complexity but, from our point 
of view, Flink is the perfect platform to evaluate models in streaming and we 
are using it that way in our architecture. Why do you think it wouldn't be 
suitable? 


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1966][ml]Add support for Predictive Mod...

2016-02-09 Thread chobeat
Github user chobeat commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181790876
  
I agree with @sachingoel0101 on the import complexity but, from our point 
of view, Flink is the perfect platform to evaluate models in streaming and we 
are using it that way in our architecture. Why do you think it wouldn't be 
suitable? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3066) Kafka producer fails on leader change

2016-02-09 Thread Gyula Fora (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138677#comment-15138677
 ] 

Gyula Fora commented on FLINK-3066:
---

Thank you Robert for the help, it is a good catch :)

> Kafka producer fails on leader change
> -
>
> Key: FLINK-3066
> URL: https://issues.apache.org/jira/browse/FLINK-3066
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Streaming Connectors
>Affects Versions: 0.10.0, 1.0.0
>Reporter: Gyula Fora
>
> I got the following exception during my streaming job:
> {code}
> 16:44:50,637 INFO  org.apache.flink.runtime.jobmanager.JobManager 
>- Status of job 4d3f9443df4822e875f1400244a6e8dd (deduplo!) changed to 
> FAILING.
> java.lang.Exception: Failed to send data to Kafka: This server is not the 
> leader for that topic-partition.
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.checkErroneous(FlinkKafkaProducer.java:275)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:246)
>   at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:37)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:166)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:221)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:588)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException: 
> This server is not the leader for that topic-partition.
> {code}
> And then the job crashed and recovered. This should probably be something 
> that we handle without crashing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3374) CEPITCase testSimplePatternEventTime fails

2016-02-09 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138626#comment-15138626
 ] 

Till Rohrmann commented on FLINK-3374:
--

Probably because it uses {{WriteMode.OVERWRITE}}.

> CEPITCase testSimplePatternEventTime fails
> --
>
> Key: FLINK-3374
> URL: https://issues.apache.org/jira/browse/FLINK-3374
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Reporter: Ufuk Celebi
>Assignee: Till Rohrmann
>Priority: Minor
>
> {code}
> testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 1.68 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:659)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.io.FileNotFoundException: 
> /tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
> directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.flink.core.fs.local.LocalDataOutputStream.(LocalDataOutputStream.java:56)
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:256)
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:263)
>   at 
> org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:248)
>   at 
> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:76)
>   at 
> org.apache.flink.streaming.api.functions.sink.FileSinkFunction.open(FileSinkFunction.java:66)
>   at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:89)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:308)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
>   at java.lang.Thread.run(Thread.java:745)
> testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 1.68 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:306)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:292)
>   at org.apache.flink.cep.CEPITCase.after(CEPITCase.java:56)
> {code}
> https://s3.amazonaws.com/flink-logs-us/travis-artifacts/uce/flink/894/894.2.tar.gz
> {code}
> 04:53:46,840 INFO  org.apache.flink.runtime.client.JobClientActor 
>- 02/09/2016 04:53:46Map -> Sink: Unnamed(2/4) switched to FAILED 
> java.io.FileNotFoundException: 
> /tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
> directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.flink.core.fs.local.LocalDataOutputStream.(LocalDataOutputStream.java:56)
>   at 
> org.

[jira] [Assigned] (FLINK-3374) CEPITCase testSimplePatternEventTime fails

2016-02-09 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-3374:


Assignee: Till Rohrmann

> CEPITCase testSimplePatternEventTime fails
> --
>
> Key: FLINK-3374
> URL: https://issues.apache.org/jira/browse/FLINK-3374
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Reporter: Ufuk Celebi
>Assignee: Till Rohrmann
>Priority: Minor
>
> {code}
> testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 1.68 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:659)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.io.FileNotFoundException: 
> /tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
> directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.flink.core.fs.local.LocalDataOutputStream.(LocalDataOutputStream.java:56)
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:256)
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:263)
>   at 
> org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:248)
>   at 
> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:76)
>   at 
> org.apache.flink.streaming.api.functions.sink.FileSinkFunction.open(FileSinkFunction.java:66)
>   at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:89)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:308)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
>   at java.lang.Thread.run(Thread.java:745)
> testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 1.68 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:306)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:292)
>   at org.apache.flink.cep.CEPITCase.after(CEPITCase.java:56)
> {code}
> https://s3.amazonaws.com/flink-logs-us/travis-artifacts/uce/flink/894/894.2.tar.gz
> {code}
> 04:53:46,840 INFO  org.apache.flink.runtime.client.JobClientActor 
>- 02/09/2016 04:53:46Map -> Sink: Unnamed(2/4) switched to FAILED 
> java.io.FileNotFoundException: 
> /tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
> directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.flink.core.fs.local.LocalDataOutputStream.(LocalDataOutputStream.java:56)
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:25

[jira] [Commented] (FLINK-3243) Fix Interplay of TimeCharacteristic and Time Windows

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138616#comment-15138616
 ] 

ASF GitHub Bot commented on FLINK-3243:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1513#issuecomment-181777122
  
You're right, I'm changing it. But it was also me who didn't notice when we 
put it in initially :sweat_smile: 


> Fix Interplay of TimeCharacteristic and Time Windows
> 
>
> Key: FLINK-3243
> URL: https://issues.apache.org/jira/browse/FLINK-3243
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Blocker
>
> As per the discussion on the Dev ML: 
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Time-Behavior-in-Streaming-Jobs-Event-time-processing-time-td9616.html.
> The discussion seems to have converged on option 2):
> - Add dedicated WindowAssigners for processing time and event time
> - {{timeWindow()}} and {{timeWindowAll()}} respect the set 
> {{TimeCharacteristic}}. 
> This will make the easy stuff easy, i.e. using time windows and quickly 
> switching the time characteristic. Users will then have the flexibility to 
> mix different kinds of window assigners in their job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3374) CEPITCase testSimplePatternEventTime fails

2016-02-09 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138617#comment-15138617
 ] 

Till Rohrmann commented on FLINK-3374:
--

Hmm I just saw that the CEPITCase creates a file under the parent path
instead of a folder. I'm wondering how this could pass before.

On Tue, Feb 9, 2016 at 10:17 AM, Stephan Ewen (JIRA) 



> CEPITCase testSimplePatternEventTime fails
> --
>
> Key: FLINK-3374
> URL: https://issues.apache.org/jira/browse/FLINK-3374
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Reporter: Ufuk Celebi
>Priority: Minor
>
> {code}
> testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 1.68 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:659)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.io.FileNotFoundException: 
> /tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
> directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.flink.core.fs.local.LocalDataOutputStream.(LocalDataOutputStream.java:56)
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:256)
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:263)
>   at 
> org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:248)
>   at 
> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:76)
>   at 
> org.apache.flink.streaming.api.functions.sink.FileSinkFunction.open(FileSinkFunction.java:66)
>   at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:89)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:308)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
>   at java.lang.Thread.run(Thread.java:745)
> testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 1.68 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:306)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:292)
>   at org.apache.flink.cep.CEPITCase.after(CEPITCase.java:56)
> {code}
> https://s3.amazonaws.com/flink-logs-us/travis-artifacts/uce/flink/894/894.2.tar.gz
> {code}
> 04:53:46,840 INFO  org.apache.flink.runtime.client.JobClientActor 
>- 02/09/2016 04:53:46Map -> Sink: Unnamed(2/4) switched to FAILED 
> java.io.FileNotFoundException: 
> /tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
> directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
>

[GitHub] flink pull request: [FLINK-3243] Fix Interplay of TimeCharacterist...

2016-02-09 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1513#issuecomment-181777122
  
You're right, I'm changing it. But it was also me who didn't notice when we 
put it in initially :sweat_smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3243) Fix Interplay of TimeCharacteristic and Time Windows

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138610#comment-15138610
 ] 

ASF GitHub Bot commented on FLINK-3243:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1513#issuecomment-181776065
  
For new classes, it makes sense. Was a mistake on my end to name them like 
this in the first place.

But users that adopted this draw in my experience more satisfaction from 
stable code than from a style nuance.


> Fix Interplay of TimeCharacteristic and Time Windows
> 
>
> Key: FLINK-3243
> URL: https://issues.apache.org/jira/browse/FLINK-3243
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Blocker
>
> As per the discussion on the Dev ML: 
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Time-Behavior-in-Streaming-Jobs-Event-time-processing-time-td9616.html.
> The discussion seems to have converged on option 2):
> - Add dedicated WindowAssigners for processing time and event time
> - {{timeWindow()}} and {{timeWindowAll()}} respect the set 
> {{TimeCharacteristic}}. 
> This will make the easy stuff easy, i.e. using time windows and quickly 
> switching the time characteristic. Users will then have the flexibility to 
> mix different kinds of window assigners in their job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3243] Fix Interplay of TimeCharacterist...

2016-02-09 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1513#issuecomment-181776065
  
For new classes, it makes sense. Was a mistake on my end to name them like 
this in the first place.

But users that adopted this draw in my experience more satisfaction from 
stable code than from a style nuance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3234] [dataSet] Add KeySelector support...

2016-02-09 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52282118
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SortPartitionOperator.java
 ---
@@ -36,27 +40,58 @@
  */
 public class SortPartitionOperator extends SingleInputOperator> {
 
-   private int[] sortKeyPositions;
+   private List> keys;
 
-   private Order[] sortOrders;
+   private List orders;
 
private final String sortLocationName;
 
+   private boolean useKeySelector;
 
-   public SortPartitionOperator(DataSet dataSet, int sortField, Order 
sortOrder, String sortLocationName) {
+   private SortPartitionOperator(DataSet dataSet, String 
sortLocationName) {
super(dataSet, dataSet.getType());
+
+   keys = new ArrayList<>();
+   orders = new ArrayList<>();
this.sortLocationName = sortLocationName;
+   }
+
+
+   public SortPartitionOperator(DataSet dataSet, int sortField, Order 
sortOrder, String sortLocationName) {
+   this(dataSet, sortLocationName);
+   this.useKeySelector = false;
+
+   ensureSortableKey(sortField);
 
-   int[] flatOrderKeys = getFlatFields(sortField);
-   this.appendSorting(flatOrderKeys, sortOrder);
+   keys.add(new Keys.ExpressionKeys<>(sortField, getType()));
+   orders.add(sortOrder);
}
 
public SortPartitionOperator(DataSet dataSet, String sortField, 
Order sortOrder, String sortLocationName) {
-   super(dataSet, dataSet.getType());
-   this.sortLocationName = sortLocationName;
+   this(dataSet, sortLocationName);
+   this.useKeySelector = false;
+
+   ensureSortableKey(sortField);
+
+   keys.add(new Keys.ExpressionKeys<>(sortField, getType()));
+   orders.add(sortOrder);
+   }
+
+   public SortPartitionOperator(DataSet dataSet, Keys sortKey, Order 
sortOrder, String sortLocationName) {
--- End diff --

Change the `sortKey` parameter type to `SelectorFunctionKeys` (or accept 
the `KeySelector` and create the `SelectorFunctionKeys` in the constructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3374) CEPITCase testSimplePatternEventTime fails

2016-02-09 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138600#comment-15138600
 ] 

Stephan Ewen commented on FLINK-3374:
-

My guess is that the parent path does not exist. Maybe an issue in the 
FileOutputFormat

> CEPITCase testSimplePatternEventTime fails
> --
>
> Key: FLINK-3374
> URL: https://issues.apache.org/jira/browse/FLINK-3374
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Reporter: Ufuk Celebi
>Priority: Minor
>
> {code}
> testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 1.68 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:659)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.io.FileNotFoundException: 
> /tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
> directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.flink.core.fs.local.LocalDataOutputStream.(LocalDataOutputStream.java:56)
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:256)
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:263)
>   at 
> org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:248)
>   at 
> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:76)
>   at 
> org.apache.flink.streaming.api.functions.sink.FileSinkFunction.open(FileSinkFunction.java:66)
>   at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:89)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:308)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
>   at java.lang.Thread.run(Thread.java:745)
> testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 1.68 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:306)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:292)
>   at org.apache.flink.cep.CEPITCase.after(CEPITCase.java:56)
> {code}
> https://s3.amazonaws.com/flink-logs-us/travis-artifacts/uce/flink/894/894.2.tar.gz
> {code}
> 04:53:46,840 INFO  org.apache.flink.runtime.client.JobClientActor 
>- 02/09/2016 04:53:46Map -> Sink: Unnamed(2/4) switched to FAILED 
> java.io.FileNotFoundException: 
> /tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
> directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.flink.core.fs.local.LocalDataOutputStream.(LocalDataOutputStream.java:56)
>   at 
> org.

[jira] [Commented] (FLINK-3234) SortPartition does not support KeySelectorFunctions

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138598#comment-15138598
 ] 

ASF GitHub Bot commented on FLINK-3234:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1585#issuecomment-181773503
  
The refactoring looks good, @chiwanpark.
I have just a few minor remarks. The PR can be resolved after these have 
been addressed.


> SortPartition does not support KeySelectorFunctions
> ---
>
> Key: FLINK-3234
> URL: https://issues.apache.org/jira/browse/FLINK-3234
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0, 0.10.1
>Reporter: Fabian Hueske
>Assignee: Chiwan Park
> Fix For: 1.0.0
>
>
> The following is not supported by the DataSet API:
> {code}
> DataSet data = ...
> DataSet data.sortPartition(
>   new KeySelector() {
> public Long getKey(MyObject v) {
>   ...
> }
>   }, 
>   Order.ASCENDING);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3234] [dataSet] Add KeySelector support...

2016-02-09 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1585#issuecomment-181773503
  
The refactoring looks good, @chiwanpark.
I have just a few minor remarks. The PR can be resolved after these have 
been addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3234) SortPartition does not support KeySelectorFunctions

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138594#comment-15138594
 ] 

ASF GitHub Bot commented on FLINK-3234:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52282336
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/javaApiOperators/SortPartitionITCase.java
 ---
@@ -197,6 +198,58 @@ public void testSortPartitionParallelismChange() 
throws Exception {
compareResultAsText(result, expected);
}
 
+   @Test
+   public void testSortPartitionWithKeySelector1() throws Exception {
+   /*
+* Test sort partition on an extracted key
+*/
+
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+   env.setParallelism(4);
+
+   DataSet> ds = 
CollectionDataSets.get3TupleDataSet(env);
+   List> result = ds
+   .map(new IdMapper>()).setParallelism(4) // parallelize input
+   .sortPartition(new KeySelector, Long>() {
+   @Override
+   public Long getKey(Tuple3 value) throws Exception {
+   return value.f1;
+   }
+   }, Order.DESCENDING)
--- End diff --

Change sort order to `ASCENDING` (or in the other test).


> SortPartition does not support KeySelectorFunctions
> ---
>
> Key: FLINK-3234
> URL: https://issues.apache.org/jira/browse/FLINK-3234
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0, 0.10.1
>Reporter: Fabian Hueske
>Assignee: Chiwan Park
> Fix For: 1.0.0
>
>
> The following is not supported by the DataSet API:
> {code}
> DataSet data = ...
> DataSet data.sortPartition(
>   new KeySelector() {
> public Long getKey(MyObject v) {
>   ...
> }
>   }, 
>   Order.ASCENDING);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3234) SortPartition does not support KeySelectorFunctions

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138595#comment-15138595
 ] 

ASF GitHub Bot commented on FLINK-3234:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52282391
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/operators/SortPartitionITCase.scala
 ---
@@ -166,6 +167,58 @@ class SortPartitionITCase(mode: TestExecutionMode) 
extends MultipleProgramsTestB
 TestBaseUtils.compareResultAsText(result.asJava, expected)
   }
 
+  @Test
+  def testSortPartitionWithKeySelector1(): Unit = {
+val env = ExecutionEnvironment.getExecutionEnvironment
+env.setParallelism(4)
+val ds = CollectionDataSets.get3TupleDataSet(env)
+
+val result = ds
+  .map { x => x }.setParallelism(4)
+  .sortPartition(_._2, Order.DESCENDING)
--- End diff --

Change sort order to `ASCENDING` (or in the other test).


> SortPartition does not support KeySelectorFunctions
> ---
>
> Key: FLINK-3234
> URL: https://issues.apache.org/jira/browse/FLINK-3234
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0, 0.10.1
>Reporter: Fabian Hueske
>Assignee: Chiwan Park
> Fix For: 1.0.0
>
>
> The following is not supported by the DataSet API:
> {code}
> DataSet data = ...
> DataSet data.sortPartition(
>   new KeySelector() {
> public Long getKey(MyObject v) {
>   ...
> }
>   }, 
>   Order.ASCENDING);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3359) Make RocksDB file copies asynchronous

2016-02-09 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek reassigned FLINK-3359:
---

Assignee: Aljoscha Krettek

> Make RocksDB file copies asynchronous
> -
>
> Key: FLINK-3359
> URL: https://issues.apache.org/jira/browse/FLINK-3359
> Project: Flink
>  Issue Type: Bug
>  Components: state backends
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Aljoscha Krettek
> Fix For: 1.0.0
>
>
> While the incremental backup of the RocksDB files needs to be synchronous, 
> the copying of that file to the backup file system can be fully asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3234] [dataSet] Add KeySelector support...

2016-02-09 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52282391
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/operators/SortPartitionITCase.scala
 ---
@@ -166,6 +167,58 @@ class SortPartitionITCase(mode: TestExecutionMode) 
extends MultipleProgramsTestB
 TestBaseUtils.compareResultAsText(result.asJava, expected)
   }
 
+  @Test
+  def testSortPartitionWithKeySelector1(): Unit = {
+val env = ExecutionEnvironment.getExecutionEnvironment
+env.setParallelism(4)
+val ds = CollectionDataSets.get3TupleDataSet(env)
+
+val result = ds
+  .map { x => x }.setParallelism(4)
+  .sortPartition(_._2, Order.DESCENDING)
--- End diff --

Change sort order to `ASCENDING` (or in the other test).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3234) SortPartition does not support KeySelectorFunctions

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138591#comment-15138591
 ] 

ASF GitHub Bot commented on FLINK-3234:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52282275
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/DataSet.scala ---
@@ -1508,6 +1508,31 @@ class DataSet[T: ClassTag](set: JavaDataSet[T]) {
   new SortPartitionOperator[T](javaSet, field, order, 
getCallLocationName()))
   }
 
+  /**
+* Locally sorts the partitions of the DataSet on the specified field 
in the specified order.
+* The DataSet can be sorted on multiple fields by chaining 
sortPartition() calls.
+*
+* Note that any key extraction methods cannot be chained with the 
KeySelector. To sort the
+* partition by multiple values using KeySelector, the KeySelector must 
return a tuple
+* consisting of the values.
+*/
+  def sortPartition[K: TypeInformation](fun: T => K, order: Order): 
DataSet[T] ={
--- End diff --

Copy the method docs from the `DataSet.java`. 


> SortPartition does not support KeySelectorFunctions
> ---
>
> Key: FLINK-3234
> URL: https://issues.apache.org/jira/browse/FLINK-3234
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0, 0.10.1
>Reporter: Fabian Hueske
>Assignee: Chiwan Park
> Fix For: 1.0.0
>
>
> The following is not supported by the DataSet API:
> {code}
> DataSet data = ...
> DataSet data.sortPartition(
>   new KeySelector() {
> public Long getKey(MyObject v) {
>   ...
> }
>   }, 
>   Order.ASCENDING);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1966][ml]Add support for Predictive Mod...

2016-02-09 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181772422
  
That said, just for a comparison purpose, spark has its own model export 
and import feature, along with pmml export. Hoping to fully support pmml import 
in a framework like flink or spark is a next to impossible thing which requires 
changes to the entire way our pipelines and datasets and represented. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138593#comment-15138593
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181772422
  
That said, just for a comparison purpose, spark has its own model export 
and import feature, along with pmml export. Hoping to fully support pmml import 
in a framework like flink or spark is a next to impossible thing which requires 
changes to the entire way our pipelines and datasets and represented. 


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3234] [dataSet] Add KeySelector support...

2016-02-09 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52282336
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/javaApiOperators/SortPartitionITCase.java
 ---
@@ -197,6 +198,58 @@ public void testSortPartitionParallelismChange() 
throws Exception {
compareResultAsText(result, expected);
}
 
+   @Test
+   public void testSortPartitionWithKeySelector1() throws Exception {
+   /*
+* Test sort partition on an extracted key
+*/
+
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+   env.setParallelism(4);
+
+   DataSet> ds = 
CollectionDataSets.get3TupleDataSet(env);
+   List> result = ds
+   .map(new IdMapper>()).setParallelism(4) // parallelize input
+   .sortPartition(new KeySelector, Long>() {
+   @Override
+   public Long getKey(Tuple3 value) throws Exception {
+   return value.f1;
+   }
+   }, Order.DESCENDING)
--- End diff --

Change sort order to `ASCENDING` (or in the other test).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3234] [dataSet] Add KeySelector support...

2016-02-09 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52282275
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/DataSet.scala ---
@@ -1508,6 +1508,31 @@ class DataSet[T: ClassTag](set: JavaDataSet[T]) {
   new SortPartitionOperator[T](javaSet, field, order, 
getCallLocationName()))
   }
 
+  /**
+* Locally sorts the partitions of the DataSet on the specified field 
in the specified order.
+* The DataSet can be sorted on multiple fields by chaining 
sortPartition() calls.
+*
+* Note that any key extraction methods cannot be chained with the 
KeySelector. To sort the
+* partition by multiple values using KeySelector, the KeySelector must 
return a tuple
+* consisting of the values.
+*/
+  def sortPartition[K: TypeInformation](fun: T => K, order: Order): 
DataSet[T] ={
--- End diff --

Copy the method docs from the `DataSet.java`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3234) SortPartition does not support KeySelectorFunctions

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138589#comment-15138589
 ] 

ASF GitHub Bot commented on FLINK-3234:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52282174
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SortPartitionOperator.java
 ---
@@ -79,58 +119,41 @@ public SortPartitionOperator(DataSet dataSet, 
String sortField, Order sortOrd
 * local partition sorting of the DataSet.
 *
 * @param field The field expression referring to the field of the 
additional sort order of
-* the local partition sorting.
-* @param order The order  of the additional sort order of the local 
partition sorting.
+*  the local partition sorting.
+* @param order The order of the additional sort order of the local 
partition sorting.
 * @return The DataSet with sorted local partitions.
 */
public SortPartitionOperator sortPartition(String field, Order 
order) {
-   int[] flatOrderKeys = getFlatFields(field);
-   this.appendSorting(flatOrderKeys, order);
+   if (useKeySelector) {
+   throw new InvalidProgramException("Expression keys 
cannot be appended after a KeySelector");
+   }
+
+   ensureSortableKey(field);
+   keys.add(new Keys.ExpressionKeys<>(field, getType()));
+   orders.add(order);
+
return this;
}
 
-   // 

-   //  Key Extraction
-   // 

-
-   private int[] getFlatFields(int field) {
+   public  SortPartitionOperator sortPartition(KeySelector 
keyExtractor, Order order) {
+   throw new InvalidProgramException("KeySelector cannot be 
chained.");
+   }
 
-   if (!Keys.ExpressionKeys.isSortKey(field, super.getType())) {
+   private void ensureSortableKey(int field) throws 
InvalidProgramException {
+   if (!Keys.ExpressionKeys.isSortKey(field, getType())) {
throw new InvalidProgramException("Selected sort key is 
not a sortable type");
}
-
-   Keys.ExpressionKeys ek = new Keys.ExpressionKeys<>(field, 
super.getType());
-   return ek.computeLogicalKeyPositions();
}
 
-   private int[] getFlatFields(String fields) {
-
-   if (!Keys.ExpressionKeys.isSortKey(fields, super.getType())) {
+   private void ensureSortableKey(String field) throws 
InvalidProgramException {
+   if (!Keys.ExpressionKeys.isSortKey(field, getType())) {
throw new InvalidProgramException("Selected sort key is 
not a sortable type");
}
-
-   Keys.ExpressionKeys ek = new Keys.ExpressionKeys<>(fields, 
super.getType());
-   return ek.computeLogicalKeyPositions();
}
 
-   private void appendSorting(int[] flatOrderFields, Order order) {
-
-   if(this.sortKeyPositions == null) {
-   // set sorting info
-   this.sortKeyPositions = flatOrderFields;
-   this.sortOrders = new Order[flatOrderFields.length];
-   Arrays.fill(this.sortOrders, order);
-   } else {
-   // append sorting info to exising info
-   int oldLength = this.sortKeyPositions.length;
-   int newLength = oldLength + flatOrderFields.length;
-   this.sortKeyPositions = 
Arrays.copyOf(this.sortKeyPositions, newLength);
-   this.sortOrders = Arrays.copyOf(this.sortOrders, 
newLength);
-
-   for(int i=0; i sortKey) {
--- End diff --

Change the `sortKey` parameter type to `SelectorFunctionKeys` and remove 
the type check + cast.


> SortPartition does not support KeySelectorFunctions
> ---
>
> Key: FLINK-3234
> URL: https://issues.apache.org/jira/browse/FLINK-3234
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0, 0.10.1
>Reporter: Fabian Hueske
>Assignee: Chiwan Park
> Fix For: 1.0.0
>
>
> The following is not supported by the DataSet API:
> {code}
> DataSet data = ...
> DataSet data.sortPartition(
>   new KeySelector() {
> public Long getKey(MyObject v) {
>   ...
> }
>   }, 
>   Order.ASCENDING);
> {code}



--

[GitHub] flink pull request: [FLINK-3234] [dataSet] Add KeySelector support...

2016-02-09 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52282174
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SortPartitionOperator.java
 ---
@@ -79,58 +119,41 @@ public SortPartitionOperator(DataSet dataSet, 
String sortField, Order sortOrd
 * local partition sorting of the DataSet.
 *
 * @param field The field expression referring to the field of the 
additional sort order of
-* the local partition sorting.
-* @param order The order  of the additional sort order of the local 
partition sorting.
+*  the local partition sorting.
+* @param order The order of the additional sort order of the local 
partition sorting.
 * @return The DataSet with sorted local partitions.
 */
public SortPartitionOperator sortPartition(String field, Order 
order) {
-   int[] flatOrderKeys = getFlatFields(field);
-   this.appendSorting(flatOrderKeys, order);
+   if (useKeySelector) {
+   throw new InvalidProgramException("Expression keys 
cannot be appended after a KeySelector");
+   }
+
+   ensureSortableKey(field);
+   keys.add(new Keys.ExpressionKeys<>(field, getType()));
+   orders.add(order);
+
return this;
}
 
-   // 

-   //  Key Extraction
-   // 

-
-   private int[] getFlatFields(int field) {
+   public  SortPartitionOperator sortPartition(KeySelector 
keyExtractor, Order order) {
+   throw new InvalidProgramException("KeySelector cannot be 
chained.");
+   }
 
-   if (!Keys.ExpressionKeys.isSortKey(field, super.getType())) {
+   private void ensureSortableKey(int field) throws 
InvalidProgramException {
+   if (!Keys.ExpressionKeys.isSortKey(field, getType())) {
throw new InvalidProgramException("Selected sort key is 
not a sortable type");
}
-
-   Keys.ExpressionKeys ek = new Keys.ExpressionKeys<>(field, 
super.getType());
-   return ek.computeLogicalKeyPositions();
}
 
-   private int[] getFlatFields(String fields) {
-
-   if (!Keys.ExpressionKeys.isSortKey(fields, super.getType())) {
+   private void ensureSortableKey(String field) throws 
InvalidProgramException {
+   if (!Keys.ExpressionKeys.isSortKey(field, getType())) {
throw new InvalidProgramException("Selected sort key is 
not a sortable type");
}
-
-   Keys.ExpressionKeys ek = new Keys.ExpressionKeys<>(fields, 
super.getType());
-   return ek.computeLogicalKeyPositions();
}
 
-   private void appendSorting(int[] flatOrderFields, Order order) {
-
-   if(this.sortKeyPositions == null) {
-   // set sorting info
-   this.sortKeyPositions = flatOrderFields;
-   this.sortOrders = new Order[flatOrderFields.length];
-   Arrays.fill(this.sortOrders, order);
-   } else {
-   // append sorting info to exising info
-   int oldLength = this.sortKeyPositions.length;
-   int newLength = oldLength + flatOrderFields.length;
-   this.sortKeyPositions = 
Arrays.copyOf(this.sortKeyPositions, newLength);
-   this.sortOrders = Arrays.copyOf(this.sortOrders, 
newLength);
-
-   for(int i=0; i sortKey) {
--- End diff --

Change the `sortKey` parameter type to `SelectorFunctionKeys` and remove 
the type check + cast.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3234) SortPartition does not support KeySelectorFunctions

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138577#comment-15138577
 ] 

ASF GitHub Bot commented on FLINK-3234:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52282118
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/operators/SortPartitionOperator.java
 ---
@@ -36,27 +40,58 @@
  */
 public class SortPartitionOperator extends SingleInputOperator> {
 
-   private int[] sortKeyPositions;
+   private List> keys;
 
-   private Order[] sortOrders;
+   private List orders;
 
private final String sortLocationName;
 
+   private boolean useKeySelector;
 
-   public SortPartitionOperator(DataSet dataSet, int sortField, Order 
sortOrder, String sortLocationName) {
+   private SortPartitionOperator(DataSet dataSet, String 
sortLocationName) {
super(dataSet, dataSet.getType());
+
+   keys = new ArrayList<>();
+   orders = new ArrayList<>();
this.sortLocationName = sortLocationName;
+   }
+
+
+   public SortPartitionOperator(DataSet dataSet, int sortField, Order 
sortOrder, String sortLocationName) {
+   this(dataSet, sortLocationName);
+   this.useKeySelector = false;
+
+   ensureSortableKey(sortField);
 
-   int[] flatOrderKeys = getFlatFields(sortField);
-   this.appendSorting(flatOrderKeys, sortOrder);
+   keys.add(new Keys.ExpressionKeys<>(sortField, getType()));
+   orders.add(sortOrder);
}
 
public SortPartitionOperator(DataSet dataSet, String sortField, 
Order sortOrder, String sortLocationName) {
-   super(dataSet, dataSet.getType());
-   this.sortLocationName = sortLocationName;
+   this(dataSet, sortLocationName);
+   this.useKeySelector = false;
+
+   ensureSortableKey(sortField);
+
+   keys.add(new Keys.ExpressionKeys<>(sortField, getType()));
+   orders.add(sortOrder);
+   }
+
+   public SortPartitionOperator(DataSet dataSet, Keys sortKey, Order 
sortOrder, String sortLocationName) {
--- End diff --

Change the `sortKey` parameter type to `SelectorFunctionKeys` (or accept 
the `KeySelector` and create the `SelectorFunctionKeys` in the constructor.


> SortPartition does not support KeySelectorFunctions
> ---
>
> Key: FLINK-3234
> URL: https://issues.apache.org/jira/browse/FLINK-3234
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0, 0.10.1
>Reporter: Fabian Hueske
>Assignee: Chiwan Park
> Fix For: 1.0.0
>
>
> The following is not supported by the DataSet API:
> {code}
> DataSet data = ...
> DataSet data.sortPartition(
>   new KeySelector() {
> public Long getKey(MyObject v) {
>   ...
> }
>   }, 
>   Order.ASCENDING);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138576#comment-15138576
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181771679
  
As the original  author of this PR, I'd say this:
I tried implementing the import features but they aren't worth it. You have 
to discard most of the valid pmml models because they don't fit in with the 
flink framework. 
Further, in my opinion, the use of flink is to train the model. Once we 
export that model in pmml, you can use it pretty much anywhere, say R or 
matlab, which support a complete pmml import and export functionality. The 
exported model is in most cases going to be used for testing, evaluating and 
predictions purposes, for which flink isn't a good platform to use anyway. This 
can be accomplished anywhere. 


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3374) CEPITCase testSimplePatternEventTime fails

2016-02-09 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138575#comment-15138575
 ] 

Till Rohrmann commented on FLINK-3374:
--

Is this reproducible? Looks to me as if the testing file could not be created 
by the test. This might be simply a problem of the Travis machine.

> CEPITCase testSimplePatternEventTime fails
> --
>
> Key: FLINK-3374
> URL: https://issues.apache.org/jira/browse/FLINK-3374
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Reporter: Ufuk Celebi
>Priority: Minor
>
> {code}
> testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 1.68 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:659)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.io.FileNotFoundException: 
> /tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
> directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.flink.core.fs.local.LocalDataOutputStream.(LocalDataOutputStream.java:56)
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:256)
>   at 
> org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:263)
>   at 
> org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:248)
>   at 
> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:76)
>   at 
> org.apache.flink.streaming.api.functions.sink.FileSinkFunction.open(FileSinkFunction.java:66)
>   at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:89)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:308)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
>   at java.lang.Thread.run(Thread.java:745)
> testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 1.68 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:306)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:292)
>   at org.apache.flink.cep.CEPITCase.after(CEPITCase.java:56)
> {code}
> https://s3.amazonaws.com/flink-logs-us/travis-artifacts/uce/flink/894/894.2.tar.gz
> {code}
> 04:53:46,840 INFO  org.apache.flink.runtime.client.JobClientActor 
>- 02/09/2016 04:53:46Map -> Sink: Unnamed(2/4) switched to FAILED 
> java.io.FileNotFoundException: 
> /tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
> directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.flink.core.fs.local.LocalDataOutp

[GitHub] flink pull request: [FLINK-1966][ml]Add support for Predictive Mod...

2016-02-09 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181771679
  
As the original  author of this PR, I'd say this:
I tried implementing the import features but they aren't worth it. You have 
to discard most of the valid pmml models because they don't fit in with the 
flink framework. 
Further, in my opinion, the use of flink is to train the model. Once we 
export that model in pmml, you can use it pretty much anywhere, say R or 
matlab, which support a complete pmml import and export functionality. The 
exported model is in most cases going to be used for testing, evaluating and 
predictions purposes, for which flink isn't a good platform to use anyway. This 
can be accomplished anywhere. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3234) SortPartition does not support KeySelectorFunctions

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138573#comment-15138573
 ] 

ASF GitHub Bot commented on FLINK-3234:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52281965
  
--- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java 
---
@@ -1377,6 +1377,24 @@ public long count() throws Exception {
return new SortPartitionOperator<>(this, field, order, 
Utils.getCallLocationName());
}
 
+   /**
+* Locally sorts the partitions of the DataSet on the an extracted key 
in the specified order.
+* DataSet can be sorted on multiple values by returning a tuple from 
the KeySelector.
+*
+* Note that any key extraction methods cannot be chained with the 
KeySelector. To sort the
--- End diff --

"Note that any key extraction methods cannot be ..." -> "Note that no 
additional sort keys can be appended to a KeySelector."


> SortPartition does not support KeySelectorFunctions
> ---
>
> Key: FLINK-3234
> URL: https://issues.apache.org/jira/browse/FLINK-3234
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0, 0.10.1
>Reporter: Fabian Hueske
>Assignee: Chiwan Park
> Fix For: 1.0.0
>
>
> The following is not supported by the DataSet API:
> {code}
> DataSet data = ...
> DataSet data.sortPartition(
>   new KeySelector() {
> public Long getKey(MyObject v) {
>   ...
> }
>   }, 
>   Order.ASCENDING);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3234] [dataSet] Add KeySelector support...

2016-02-09 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52281965
  
--- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java 
---
@@ -1377,6 +1377,24 @@ public long count() throws Exception {
return new SortPartitionOperator<>(this, field, order, 
Utils.getCallLocationName());
}
 
+   /**
+* Locally sorts the partitions of the DataSet on the an extracted key 
in the specified order.
+* DataSet can be sorted on multiple values by returning a tuple from 
the KeySelector.
+*
+* Note that any key extraction methods cannot be chained with the 
KeySelector. To sort the
--- End diff --

"Note that any key extraction methods cannot be ..." -> "Note that no 
additional sort keys can be appended to a KeySelector."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3234) SortPartition does not support KeySelectorFunctions

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138571#comment-15138571
 ] 

ASF GitHub Bot commented on FLINK-3234:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52281750
  
--- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java 
---
@@ -1377,6 +1377,24 @@ public long count() throws Exception {
return new SortPartitionOperator<>(this, field, order, 
Utils.getCallLocationName());
}
 
+   /**
+* Locally sorts the partitions of the DataSet on the an extracted key 
in the specified order.
+* DataSet can be sorted on multiple values by returning a tuple from 
the KeySelector.
--- End diff --

"The DataSet can be ...", add "The"


> SortPartition does not support KeySelectorFunctions
> ---
>
> Key: FLINK-3234
> URL: https://issues.apache.org/jira/browse/FLINK-3234
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0, 0.10.1
>Reporter: Fabian Hueske
>Assignee: Chiwan Park
> Fix For: 1.0.0
>
>
> The following is not supported by the DataSet API:
> {code}
> DataSet data = ...
> DataSet data.sortPartition(
>   new KeySelector() {
> public Long getKey(MyObject v) {
>   ...
> }
>   }, 
>   Order.ASCENDING);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3234] [dataSet] Add KeySelector support...

2016-02-09 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52281750
  
--- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java 
---
@@ -1377,6 +1377,24 @@ public long count() throws Exception {
return new SortPartitionOperator<>(this, field, order, 
Utils.getCallLocationName());
}
 
+   /**
+* Locally sorts the partitions of the DataSet on the an extracted key 
in the specified order.
+* DataSet can be sorted on multiple values by returning a tuple from 
the KeySelector.
--- End diff --

"The DataSet can be ...", add "The"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3234] [dataSet] Add KeySelector support...

2016-02-09 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52281721
  
--- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java 
---
@@ -1377,6 +1377,24 @@ public long count() throws Exception {
return new SortPartitionOperator<>(this, field, order, 
Utils.getCallLocationName());
}
 
+   /**
+* Locally sorts the partitions of the DataSet on the an extracted key 
in the specified order.
--- End diff --

"...the DataSet on the **an** extracted key...", remove "an"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3234) SortPartition does not support KeySelectorFunctions

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138570#comment-15138570
 ] 

ASF GitHub Bot commented on FLINK-3234:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1585#discussion_r52281721
  
--- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java 
---
@@ -1377,6 +1377,24 @@ public long count() throws Exception {
return new SortPartitionOperator<>(this, field, order, 
Utils.getCallLocationName());
}
 
+   /**
+* Locally sorts the partitions of the DataSet on the an extracted key 
in the specified order.
--- End diff --

"...the DataSet on the **an** extracted key...", remove "an"


> SortPartition does not support KeySelectorFunctions
> ---
>
> Key: FLINK-3234
> URL: https://issues.apache.org/jira/browse/FLINK-3234
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0, 0.10.1
>Reporter: Fabian Hueske
>Assignee: Chiwan Park
> Fix For: 1.0.0
>
>
> The following is not supported by the DataSet API:
> {code}
> DataSet data = ...
> DataSet data.sortPartition(
>   new KeySelector() {
> public Long getKey(MyObject v) {
>   ...
> }
>   }, 
>   Order.ASCENDING);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3374) CEPITCase testSimplePatternEventTime fails

2016-02-09 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-3374:
--

 Summary: CEPITCase testSimplePatternEventTime fails
 Key: FLINK-3374
 URL: https://issues.apache.org/jira/browse/FLINK-3374
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: Ufuk Celebi
Priority: Minor


{code}
testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 1.68 
sec  <<< ERROR!
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:659)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.FileNotFoundException: 
/tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:162)
at 
org.apache.flink.core.fs.local.LocalDataOutputStream.(LocalDataOutputStream.java:56)
at 
org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:256)
at 
org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:263)
at 
org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:248)
at 
org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:76)
at 
org.apache.flink.streaming.api.functions.sink.FileSinkFunction.open(FileSinkFunction.java:66)
at 
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:89)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:308)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.lang.Thread.run(Thread.java:745)

testSimplePatternEventTime(org.apache.flink.cep.CEPITCase)  Time elapsed: 1.68 
sec  <<< FAILURE!
java.lang.AssertionError: Different number of lines in expected and obtained 
result. expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:306)
at 
org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:292)
at org.apache.flink.cep.CEPITCase.after(CEPITCase.java:56)
{code}

https://s3.amazonaws.com/flink-logs-us/travis-artifacts/uce/flink/894/894.2.tar.gz

{code}
04:53:46,840 INFO  org.apache.flink.runtime.client.JobClientActor   
 - 02/09/2016 04:53:46  Map -> Sink: Unnamed(2/4) switched to FAILED 
java.io.FileNotFoundException: 
/tmp/junit8551910958461988945/junit1329117229388301975.tmp/2 (No such file or 
directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:162)
at 
org.apache.flink.core.fs.local.LocalDataOutputStream.(LocalDataOutputStream.java:56)
at 
org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:256)
at 
org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:263)
at 
org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:248)
at 
org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:76)
at 
org.apache.flink.streaming.api.funct

[jira] [Created] (FLINK-3373) Using a newer library of Apache HttpClient than 4.2.6 will get class loading problems

2016-02-09 Thread Jakob Sultan Ericsson (JIRA)
Jakob Sultan Ericsson created FLINK-3373:


 Summary: Using a newer library of Apache HttpClient than 4.2.6 
will get class loading problems
 Key: FLINK-3373
 URL: https://issues.apache.org/jira/browse/FLINK-3373
 Project: Flink
  Issue Type: Bug
 Environment: Latest Flink snapshot 1.0
Reporter: Jakob Sultan Ericsson


When I trying to use Apache HTTP client 4.5.1 in my flink job it will crash 
with NoClassDefFound.
This has to do that it load some classes from provided httpclient 4.2.5/6 in 
core flink.

{noformat}
17:05:56,193 INFO  org.apache.flink.runtime.taskmanager.Task
 - DuplicateFilter -> InstallKeyLookup (11/16) switched to FAILED with 
exception.
java.lang.NoSuchFieldError: INSTANCE
at 
org.apache.http.conn.ssl.SSLConnectionSocketFactory.(SSLConnectionSocketFactory.java:144)
at 
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.getDefaultRegistry(PoolingHttpClientConnectionManager.java:109)
at 
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.(PoolingHttpClientConnectionManager.java:116)
...
at 
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:89)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:305)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:227)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
at java.lang.Thread.run(Thread.java:745)
{noformat}

SSLConnectionSocketFactory and finds an earlier version of the 
AllowAllHostnameVerifier that does have the INSTANCE variable (instance 
variable was probably added in 4.3).

{noformat}
jar tvf lib/flink-dist-1.0-SNAPSHOT.jar |grep AllowAllHostnameVerifier  
   791 Thu Dec 17 09:55:46 CET 2015 
org/apache/http/conn/ssl/AllowAllHostnameVerifier.class
{noformat}

Solutions would be:
- Fix the classloader so that my custom job does not conflict with internal 
flink-core classes... pretty hard
- Remove the dependency somehow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1966) Add support for predictive model markup language (PMML)

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138528#comment-15138528
 ] 

ASF GitHub Bot commented on FLINK-1966:
---

Github user chobeat commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181757426
  
Well that wouldn't be a problem for the export: you will create and 
therefore export only models that have `double` as datatype for parameters but 
that's not an issue. 

This would be a problem for import though because PMML does support a wider 
set of data types and model types but you can't really achieve any satisfying 
degree of support for PMML in a platform like Flink and that's why everyone use 
JPMML for evaluation. You will be able to only import compatible models with 
compatible data fields. This would require a simple validation at runtime on 
the model type and on fields' data types.


> Add support for predictive model markup language (PMML)
> ---
>
> Key: FLINK-1966
> URL: https://issues.apache.org/jira/browse/FLINK-1966
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Sachin Goel
>Priority: Minor
>  Labels: ML
>
> The predictive model markup language (PMML) [1] is a widely used language to 
> describe predictive and descriptive models as well as pre- and 
> post-processing steps. That way it allows and easy way to export for and 
> import models from other ML tools.
> Resources:
> [1] 
> http://journal.r-project.org/archive/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1966][ml]Add support for Predictive Mod...

2016-02-09 Thread chobeat
Github user chobeat commented on the pull request:

https://github.com/apache/flink/pull/1186#issuecomment-181757426
  
Well that wouldn't be a problem for the export: you will create and 
therefore export only models that have `double` as datatype for parameters but 
that's not an issue. 

This would be a problem for import though because PMML does support a wider 
set of data types and model types but you can't really achieve any satisfying 
degree of support for PMML in a platform like Flink and that's why everyone use 
JPMML for evaluation. You will be able to only import compatible models with 
compatible data fields. This would require a simple validation at runtime on 
the model type and on fields' data types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


<    1   2   3