[jira] [Resolved] (FLINK-1457) RAT check fails on Windows

2015-01-27 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1457.
--
Resolution: Fixed

Fixed in 06c2c35a2c165b6a612b0be6d00d99f287330dc7

Thanks for the patch!

> RAT check fails on Windows
> --
>
> Key: FLINK-1457
> URL: https://issues.apache.org/jira/browse/FLINK-1457
> Project: Flink
>  Issue Type: Bug
>Reporter: Sebastian Kruse
>Assignee: Sebastian Kruse
>Priority: Trivial
>
> On (my) Windows 7 (Maven 3.2.2), the RAT check fails as 
> flink-addons/flink-avro/src/test/resources/testdata.avro has no approved 
> license. Not being an actual code file, it should be excluded from the RAT 
> check so that verification also passes on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1457) RAT check fails on Windows

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294569#comment-14294569
 ] 

ASF GitHub Bot commented on FLINK-1457:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/345


> RAT check fails on Windows
> --
>
> Key: FLINK-1457
> URL: https://issues.apache.org/jira/browse/FLINK-1457
> Project: Flink
>  Issue Type: Bug
>Reporter: Sebastian Kruse
>Assignee: Sebastian Kruse
>Priority: Trivial
>
> On (my) Windows 7 (Maven 3.2.2), the RAT check fails as 
> flink-addons/flink-avro/src/test/resources/testdata.avro has no approved 
> license. Not being an actual code file, it should be excluded from the RAT 
> check so that verification also passes on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Typo] Delete DiscardingOuputFormat

2015-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/343


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1457] exclude avro test file from RAT c...

2015-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/345


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1329) Enable constant field definitions for Pojo DataTypes

2015-01-27 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1329.
--
Resolution: Implemented

Implemented in de8e066ccbd0a31e5746bc0bee524a48bba3a552

> Enable constant field definitions for Pojo DataTypes
> 
>
> Key: FLINK-1329
> URL: https://issues.apache.org/jira/browse/FLINK-1329
> Project: Flink
>  Issue Type: Sub-task
>  Components: Java API, Optimizer, Scala API
>Affects Versions: 0.7.0-incubating
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Enable constant field annotations also for Pojo data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Typo] Delete DiscardingOuputFormat

2015-01-27 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/343#issuecomment-71765382
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1328) Rework Constant Field Annotations

2015-01-27 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1328.
--
Resolution: Fixed

Fixed in de8e066ccbd0a31e5746bc0bee524a48bba3a552

> Rework Constant Field Annotations
> -
>
> Key: FLINK-1328
> URL: https://issues.apache.org/jira/browse/FLINK-1328
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Optimizer, Scala API
>Affects Versions: 0.7.0-incubating
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Constant field annotations are used by the optimizer to determine whether 
> physical data properties such as sorting or partitioning are retained by user 
> defined functions.
> The current implementation is limited and can be extended in several ways:
> - Fields that are copied to other positions
> - Field definitions for non-tuple data types (Pojos)
> There is a pull request (#83) that goes into this direction and which can be 
> extended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-846) Semantic Annotations - Inconsistencies

2015-01-27 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-846.
-
Resolution: Implemented

Implemented in de8e066ccbd0a31e5746bc0bee524a48bba3a552

> Semantic Annotations - Inconsistencies
> --
>
> Key: FLINK-846
> URL: https://issues.apache.org/jira/browse/FLINK-846
> Project: Flink
>  Issue Type: Sub-task
>Reporter: GitHub Import
>Assignee: Fabian Hueske
>  Labels: github-import
> Fix For: pre-apache
>
>
> This is basically a thread for discussion. There are still a few 
> inconsistencies in the semantic annotations. The following examples 
> illustrate that:
> _Constant fields_:
> `"1 -> 0,1" "3->4"`  means field 1 is copied unmodified to 0 and 1, while 3 
> becomes 4.
> `"1 ->0,1 ; 3->4"`   means the same thing, expressed in a single string 
> (semicolon is the delimiter between statements about fields).
> `"1 , 3"` means that 1 and 3 remain constant as 1 and 3. Note that comma is 
> the delimiter here.
> _Read Fields_:
> `"0 , 2"` means that fields 0 and 2 are read. Note that here, the delimiter 
> is the comma, like with the standalone constant fields, unlike the constant 
> fields with arrow notation.
> I find that a bit inconsistent, especially the mixed use of comma and 
> semicolon for the constant fields.
>   - Do we want to keep it that way?
>   - Or use the  semicolon for the standalone constant fields and read fields?
>   - Or use the comma in the constant fields (finding an alternative delimiter 
> for the right hand side of the arrow?
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/846
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: 
> Created at: Wed May 21 20:54:26 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1156) Bug in Constant Fields Annotation (Semantic Properties)

2015-01-27 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-1156.

Resolution: Fixed

Fixed in de8e066ccbd0a31e5746bc0bee524a48bba3a552

> Bug in Constant Fields Annotation (Semantic Properties) 
> 
>
> Key: FLINK-1156
> URL: https://issues.apache.org/jira/browse/FLINK-1156
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aljoscha Krettek
>Assignee: Fabian Hueske
>
> When I change the first test in SemanticPropUtilTest.java to this:
> {code}
> @Test
> public void testConstantWithArrowIndividualStrings() {
>   String[] constantFields = { "0->0,1", "1->3" };
>   TypeInformation type = new TupleTypeInfo Integer>>(BasicTypeInfo.INT_TYPE_INFO,
>   BasicTypeInfo.INT_TYPE_INFO, 
> BasicTypeInfo.INT_TYPE_INFO);
>   TypeInformation outType = new TupleTypeInfo Integer, Integer>>(BasicTypeInfo.INT_TYPE_INFO,
>   BasicTypeInfo.INT_TYPE_INFO, 
> BasicTypeInfo.INT_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO);
>   SingleInputSemanticProperties sp = 
> SemanticPropUtil.getSemanticPropsSingleFromString(constantFields, null, null, 
> type, outType);
>   FieldSet fs = sp.getForwardedField(0);
>   Assert.assertTrue(fs.size() == 2);
>   Assert.assertTrue(fs.contains(0));
>   Assert.assertTrue(fs.contains(1));
>   fs = sp.getForwardedField(1);
>   Assert.assertTrue(fs.size() == 1);
>   Assert.assertTrue(fs.contains(2));
> }
> {code}
> It fails. It seems the check whether a tuple field is in range is performed 
> on the input type and the output type is being ignored.
> Also, I'm not sure this whole thing works anymore with [~rmetzger]'s recent 
> POJO rework and nested tuple field selection rework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1328) Rework Constant Field Annotations

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294549#comment-14294549
 ] 

ASF GitHub Bot commented on FLINK-1328:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/311


> Rework Constant Field Annotations
> -
>
> Key: FLINK-1328
> URL: https://issues.apache.org/jira/browse/FLINK-1328
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Optimizer, Scala API
>Affects Versions: 0.7.0-incubating
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Constant field annotations are used by the optimizer to determine whether 
> physical data properties such as sorting or partitioning are retained by user 
> defined functions.
> The current implementation is limited and can be extended in several ways:
> - Fields that are copied to other positions
> - Field definitions for non-tuple data types (Pojos)
> There is a pull request (#83) that goes into this direction and which can be 
> extended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: integrated forwarded fields

2015-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/83


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations

2015-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/311


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.

2015-01-27 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294517#comment-14294517
 ] 

Fabian Hueske commented on FLINK-1396:
--

+1 for having direct API methods for HadoopInputFormats. If you do it manually, 
its quite a few lines of ugly boilerplate code.
I am also +1 for moving the hadoop-compat code to flink-java. Are we talking 
about all wrappers or only IFs and OFs, btw? 

> Add hadoop input formats directly to the user API.
> --
>
> Key: FLINK-1396
> URL: https://issues.apache.org/jira/browse/FLINK-1396
> Project: Flink
>  Issue Type: Bug
>Reporter: Robert Metzger
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Typo] Delete DiscardingOuputFormat

2015-01-27 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/343#issuecomment-71760881
  
+1 good to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1450) Add Fold operator to the Streaming api

2015-01-27 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294506#comment-14294506
 ] 

Fabian Hueske commented on FLINK-1450:
--

So Reduce is kind of a special case of Fold where the input and output types 
happen to be the same and the first element is a "neutral" element?

> Add Fold operator to the Streaming api
> --
>
> Key: FLINK-1450
> URL: https://issues.apache.org/jira/browse/FLINK-1450
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Gyula Fora
>Priority: Minor
>  Labels: starter
>
> The streaming API currently doesn't support a fold operator.
> This operator would work as the foldLeft method in Scala. This would allow 
> effective implementations in a lot of cases where a the simple reduce is 
> inappropriate due to different return types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1318) Make quoted String parsing optional and configurable for CSVInputFormats

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294501#comment-14294501
 ] 

ASF GitHub Bot commented on FLINK-1318:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/265#issuecomment-71760224
  
Yes, that's still left to do... ;-)


> Make quoted String parsing optional and configurable for CSVInputFormats
> 
>
> Key: FLINK-1318
> URL: https://issues.apache.org/jira/browse/FLINK-1318
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 0.8
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>Priority: Minor
>
> With the current implementation of the CSVInputFormat, quoted string parsing 
> kicks in, if the first non-whitespace character of a field is a double quote. 
> I see two issues with this implementation:
> 1. Quoted String parsing cannot be disabled
> 2. The quoting character is fixed to double quotes (")
> I propose to add parameters to disable quoted String parsing and set the 
> quote character.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1318] CsvInputFormat: Made quoted strin...

2015-01-27 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/265#issuecomment-71760224
  
Yes, that's still left to do... ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294433#comment-14294433
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71754470
  
You're right Chesney. I assume that the faulty DC wasn't noticed because it 
was probably never really used ;-) 

Your solution should make the DC to work properly. We could even get rid of 
the second counter by simply decrementing the counter upon deletion. If the 
counter is 0, then the file can be deleted. Nice illustrations btw.


> DistributedCache doesn't preserver files for subsequent operations
> --
>
> Key: FLINK-1419
> URL: https://issues.apache.org/jira/browse/FLINK-1419
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.8, 0.9
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> When subsequent operations want to access the same files in the DC it 
> frequently happens that the files are not created for the following operation.
> This is fairly odd, since the DC is supposed to either a) preserve files when 
> another operation kicks in within a certain time window, or b) just recreate 
> the deleted files. Both things don't happen.
> Increasing the time window had no effect.
> I'd like to use this issue as a starting point for a more general discussion 
> about the DistributedCache. 
> Currently:
> 1. all files reside in a common job-specific directory
> 2. are deleted during the job.
>  
> One thing that was brought up about Trait 1 is that it basically forbids 
> modification of the files, concurrent access and all. Personally I'm not sure 
> if this a problem. Changing it to a task-specific place solved the issue 
> though.
> I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
> is realized with the scheduler, which adds a lot of complexity to the current 
> code. (It really is a pain to work on...) 
> If we moved the deletion to the end of the job it could be done as a clean-up 
> step in the TaskManager, With this we could reduce the DC to a 
> cacheFile(String source) method, the delete method in the TM, and throw out 
> everything else.
> Also, the current implementation implies that big files may be copied 
> multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...

2015-01-27 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71754470
  
You're right Chesney. I assume that the faulty DC wasn't noticed because it 
was probably never really used ;-) 

Your solution should make the DC to work properly. We could even get rid of 
the second counter by simply decrementing the counter upon deletion. If the 
counter is 0, then the file can be deleted. Nice illustrations btw.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1401) Add plan visualiser support for Streaming programs

2015-01-27 Thread Gyula Fora (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora resolved FLINK-1401.
---
Resolution: Fixed

> Add plan visualiser support for Streaming programs
> --
>
> Key: FLINK-1401
> URL: https://issues.apache.org/jira/browse/FLINK-1401
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Minor
>
> The streaming api currently does not generate visualisable program plans as 
> the batch api does. This feature needs to be added to help users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1434) Web interface cannot be used to run streaming programs

2015-01-27 Thread Gyula Fora (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora resolved FLINK-1434.
---
Resolution: Fixed

> Web interface cannot be used to run streaming programs
> --
>
> Key: FLINK-1434
> URL: https://issues.apache.org/jira/browse/FLINK-1434
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming, Webfrontend
>Affects Versions: 0.9
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>
> Flink streaming programs currently cannot be submitted through the web 
> client.  When you try run the jar you get a ProgramInvocationException.
> The reason for this might be that streaming programs completely bypass the 
> use of Plans for job execution and the streaming execution environment 
> directly submits the jobgraph to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1434) Web interface cannot be used to run streaming programs

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294410#comment-14294410
 ] 

ASF GitHub Bot commented on FLINK-1434:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/334


> Web interface cannot be used to run streaming programs
> --
>
> Key: FLINK-1434
> URL: https://issues.apache.org/jira/browse/FLINK-1434
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming, Webfrontend
>Affects Versions: 0.9
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>
> Flink streaming programs currently cannot be submitted through the web 
> client.  When you try run the jar you get a ProgramInvocationException.
> The reason for this might be that streaming programs completely bypass the 
> use of Plans for job execution and the streaming execution environment 
> directly submits the jobgraph to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1434] [FLINK-1401] Streaming support ad...

2015-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/334


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1458) Interfaces and abstract classes are not valid types

2015-01-27 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294191#comment-14294191
 ] 

Stephan Ewen commented on FLINK-1458:
-

John, this is a limitation that has been fixed and is about to be merged.
[~aljoscha] Can give you the details.

In general, using abstract types is a bit less efficient than using concrete 
types (since subclass information has to be carried with the code), but it will 
work.

> Interfaces and abstract classes are not valid types
> ---
>
> Key: FLINK-1458
> URL: https://issues.apache.org/jira/browse/FLINK-1458
> Project: Flink
>  Issue Type: Bug
>Reporter: John Sandiford
>
> I don't know whether this is by design or is a bug, but I am having trouble 
> working with DataSet and traits in scala which is a major limitation.  A 
> simple example is shown below.  
> Compile time warning is 'Type Main.SimpleTrait has no fields that are visible 
> from Scala Type analysis. Falling back to Java Type Analysis...'
> Run time error is 'Interfaces and abstract classes are not valid types: 
> interface Main$SimpleTrait'
> Regards, John
>  val env = ExecutionEnvironment.getExecutionEnvironment
>   trait SimpleTrait {
> def contains(x: String): Boolean
>   }
>   class SimpleClass extends SimpleTrait {
> def contains(x: String) = true
>   }
>   val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)
>   def f(data: DataSet[Double]): DataSet[SimpleTrait] = {
> data.mapPartition(iterator => {
>   Iterator(new SimpleClass)
> })
>   }
>   val g = f(data)
>   g.print()
>   env.execute("Simple example")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294187#comment-14294187
 ] 

ASF GitHub Bot commented on FLINK-377:
--

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71727262
  
@dan-blanchard  That is fine. Thank you for the pointer to Storm's 
multilang protocol. We'll have a look at it and see whether we can make 
something similar work with Flink.


> Create a general purpose framework for language bindings
> 
>
> Key: FLINK-377
> URL: https://issues.apache.org/jira/browse/FLINK-377
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>  Labels: github-import
> Fix For: pre-apache
>
>
> A general purpose API to run operators with arbitrary binaries. 
> This will allow to run Stratosphere programs written in Python, JavaScript, 
> Ruby, Go or whatever you like. 
> We suggest using Google Protocol Buffers for data serialization. This is the 
> list of languages that currently support ProtoBuf: 
> https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns 
> Very early prototype with python: 
> https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing 
> protobuf)
> For Ruby: https://github.com/infochimps-labs/wukong
> Two new students working at Stratosphere (@skunert and @filiphaase) are 
> working on this.
> The reference binding language will be for Python, but other bindings are 
> very welcome.
> The best name for this so far is "stratosphere-lang-bindings".
> I created this issue to track the progress (and give everybody a chance to 
> comment on this)
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/377
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: enhancement, 
> Assignee: [filiphaase|https://github.com/filiphaase]
> Created at: Tue Jan 07 19:47:20 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...

2015-01-27 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71727262
  
@dan-blanchard  That is fine. Thank you for the pointer to Storm's 
multilang protocol. We'll have a look at it and see whether we can make 
something similar work with Flink.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1378) could not find implicit value for evidence parameter of type TypeInformation

2015-01-27 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1378.
-
   Resolution: Fixed
Fix Version/s: 0.8
   0.9

Fixed via 935e316a38367cab513dfe2b010129e5d47b7b68

> could not find implicit value for evidence parameter of type TypeInformation
> 
>
> Key: FLINK-1378
> URL: https://issues.apache.org/jira/browse/FLINK-1378
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API
>Affects Versions: 0.7.0-incubating
>Reporter: John Sandiford
>Assignee: Aljoscha Krettek
> Fix For: 0.9, 0.8
>
>
> This is an example of one of many cases that I cannot get to compile with the 
> scala API.  I have tried using T : TypeInformation and : ClassTag but still 
> cannot get it to work.
> //libraryDependencies += "org.apache.flink" % "flink-scala" % 
> "0.7.0-incubating"
> //
> //libraryDependencies += "org.apache.flink" % "flink-clients" % 
> "0.7.0-incubating"
> import org.apache.flink.api.scala._
> import scala.util.{Success, Try}
> object Main extends App {
>   val env = ExecutionEnvironment.getExecutionEnvironment
>   val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)
>   def f[T](data: DataSet[T]): DataSet[(T, Try[Seq[Double]])] = {
> data.mapPartition((iterator: Iterator[T]) => {
>   val first = iterator.next()
>   val second = iterator.next()
>   Iterator((first, Success(Seq(2.0, 3.0))), (second, Success(Seq(3.0, 
> 1.0
> })
>   }
>   val g = f(data)
>   g.print()
>   env.execute("Flink Test")
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1458) Interfaces and abstract classes are not valid types

2015-01-27 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1458.
--
Resolution: Duplicate

Duplicate of FLINK-1369

> Interfaces and abstract classes are not valid types
> ---
>
> Key: FLINK-1458
> URL: https://issues.apache.org/jira/browse/FLINK-1458
> Project: Flink
>  Issue Type: Bug
>Reporter: John Sandiford
>
> I don't know whether this is by design or is a bug, but I am having trouble 
> working with DataSet and traits in scala which is a major limitation.  A 
> simple example is shown below.  
> Compile time warning is 'Type Main.SimpleTrait has no fields that are visible 
> from Scala Type analysis. Falling back to Java Type Analysis...'
> Run time error is 'Interfaces and abstract classes are not valid types: 
> interface Main$SimpleTrait'
> Regards, John
>  val env = ExecutionEnvironment.getExecutionEnvironment
>   trait SimpleTrait {
> def contains(x: String): Boolean
>   }
>   class SimpleClass extends SimpleTrait {
> def contains(x: String) = true
>   }
>   val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)
>   def f(data: DataSet[Double]): DataSet[SimpleTrait] = {
> data.mapPartition(iterator => {
>   Iterator(new SimpleClass)
> })
>   }
>   val g = f(data)
>   g.print()
>   env.execute("Simple example")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1458) Interfaces and abstract classes are not valid types

2015-01-27 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294182#comment-14294182
 ] 

Fabian Hueske commented on FLINK-1458:
--

Yes, this is a limitation at the moment but the good news is there is already a 
pull request which should solve this issue 
(https://github.com/apache/flink/pull/316). :-)

I will close this issue as it is a duplicate of FLINK-1369

> Interfaces and abstract classes are not valid types
> ---
>
> Key: FLINK-1458
> URL: https://issues.apache.org/jira/browse/FLINK-1458
> Project: Flink
>  Issue Type: Bug
>Reporter: John Sandiford
>
> I don't know whether this is by design or is a bug, but I am having trouble 
> working with DataSet and traits in scala which is a major limitation.  A 
> simple example is shown below.  
> Compile time warning is 'Type Main.SimpleTrait has no fields that are visible 
> from Scala Type analysis. Falling back to Java Type Analysis...'
> Run time error is 'Interfaces and abstract classes are not valid types: 
> interface Main$SimpleTrait'
> Regards, John
>  val env = ExecutionEnvironment.getExecutionEnvironment
>   trait SimpleTrait {
> def contains(x: String): Boolean
>   }
>   class SimpleClass extends SimpleTrait {
> def contains(x: String) = true
>   }
>   val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)
>   def f(data: DataSet[Double]): DataSet[SimpleTrait] = {
> data.mapPartition(iterator => {
>   Iterator(new SimpleClass)
> })
>   }
>   val g = f(data)
>   g.print()
>   env.execute("Simple example")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Implement the convenience methods count and co...

2015-01-27 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/210#issuecomment-71726730
  
@zentol  You are right, for the time being, that this results in parts in 
repeated execution. While not totally unavoidable in all cases, the code going 
in soon about caching intermediate results will help there big time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-27 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71722351
  
We can bind the `unlink`to the `pre-clean` phase, see if that helps.

All in all, if it does not work, it does not work. This is a nice utility, 
by no way crucial enough to spent huge amounts of time on it...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1330) Restructure directory layout

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294137#comment-14294137
 ] 

ASF GitHub Bot commented on FLINK-1330:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71722351
  
We can bind the `unlink`to the `pre-clean` phase, see if that helps.

All in all, if it does not work, it does not work. This is a nice utility, 
by no way crucial enough to spent huge amounts of time on it...


> Restructure directory layout
> 
>
> Key: FLINK-1330
> URL: https://issues.apache.org/jira/browse/FLINK-1330
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System, Documentation
>Reporter: Max Michels
>Priority: Minor
>  Labels: usability
>
> When building Flink, the build results can currently be found under 
> "flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/".
> I think we could improve the directory layout with the following:
> - provide the bin folder in the root by default
> - let the start up and submissions scripts in bin assemble the class path
> - in case the project hasn't been build yet, inform the user
> The changes would make it easier to work with Flink from source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...

2015-01-27 Thread dan-blanchard
Github user dan-blanchard commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71705775
  
@zentol Thanks for the info. I'm sorry to derail the PR conversation here 
for a bit!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293973#comment-14293973
 ] 

ASF GitHub Bot commented on FLINK-377:
--

Github user dan-blanchard commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71705775
  
@zentol Thanks for the info. I'm sorry to derail the PR conversation here 
for a bit!


> Create a general purpose framework for language bindings
> 
>
> Key: FLINK-377
> URL: https://issues.apache.org/jira/browse/FLINK-377
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>  Labels: github-import
> Fix For: pre-apache
>
>
> A general purpose API to run operators with arbitrary binaries. 
> This will allow to run Stratosphere programs written in Python, JavaScript, 
> Ruby, Go or whatever you like. 
> We suggest using Google Protocol Buffers for data serialization. This is the 
> list of languages that currently support ProtoBuf: 
> https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns 
> Very early prototype with python: 
> https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing 
> protobuf)
> For Ruby: https://github.com/infochimps-labs/wukong
> Two new students working at Stratosphere (@skunert and @filiphaase) are 
> working on this.
> The reference binding language will be for Python, but other bindings are 
> very welcome.
> The best name for this so far is "stratosphere-lang-bindings".
> I created this issue to track the progress (and give everybody a chance to 
> comment on this)
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/377
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: enhancement, 
> Assignee: [filiphaase|https://github.com/filiphaase]
> Created at: Tue Jan 07 19:47:20 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1458) Interfaces and abstract classes are not valid types

2015-01-27 Thread John Sandiford (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sandiford updated FLINK-1458:
--
Description: 
I don't know whether this is by design or is a bug, but I am having trouble 
working with DataSet and traits in scala which is a major limitation.  A simple 
example is shown below.  

Compile time warning is 'Type Main.SimpleTrait has no fields that are visible 
from Scala Type analysis. Falling back to Java Type Analysis...'

Run time error is 'Interfaces and abstract classes are not valid types: 
interface Main$SimpleTrait'

Regards, John


 val env = ExecutionEnvironment.getExecutionEnvironment

  trait SimpleTrait {
def contains(x: String): Boolean
  }

  class SimpleClass extends SimpleTrait {
def contains(x: String) = true
  }

  val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)

  def f(data: DataSet[Double]): DataSet[SimpleTrait] = {

data.mapPartition(iterator => {
  Iterator(new SimpleClass)
})
  }


  val g = f(data)
  g.print()


  env.execute("Simple example")

  was:
I don't know whether this is by design or is a bug, but I am having trouble 
working with DataSet and traits in scala which is a major limitation.  A simple 
example is shown below.  

Compile time warning is 'Type Main.SimpleTrait has no fields that are visible 
from Scala Type analysis. Falling back to Java Type Analysis...'

Run time error is 'Interfaces and abstract classes are not valid types: 
interface Main$SimpleTrait'

Regards, John


 val env = ExecutionEnvironment.getExecutionEnvironment

  trait SimpleTrait {
def contains(x: String): Boolean
  }

  class SimpleClass extends SimpleTrait {
def contains(x: String) = true
  }

  val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)

  def f(data: DataSet[Double]): DataSet[SimpleTrait] = {

data.mapPartition(iterator => {
  Iterator(new SimpleClass)
})
  }


  val g = f(data)
  g.print()


> Interfaces and abstract classes are not valid types
> ---
>
> Key: FLINK-1458
> URL: https://issues.apache.org/jira/browse/FLINK-1458
> Project: Flink
>  Issue Type: Bug
>Reporter: John Sandiford
>
> I don't know whether this is by design or is a bug, but I am having trouble 
> working with DataSet and traits in scala which is a major limitation.  A 
> simple example is shown below.  
> Compile time warning is 'Type Main.SimpleTrait has no fields that are visible 
> from Scala Type analysis. Falling back to Java Type Analysis...'
> Run time error is 'Interfaces and abstract classes are not valid types: 
> interface Main$SimpleTrait'
> Regards, John
>  val env = ExecutionEnvironment.getExecutionEnvironment
>   trait SimpleTrait {
> def contains(x: String): Boolean
>   }
>   class SimpleClass extends SimpleTrait {
> def contains(x: String) = true
>   }
>   val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)
>   def f(data: DataSet[Double]): DataSet[SimpleTrait] = {
> data.mapPartition(iterator => {
>   Iterator(new SimpleClass)
> })
>   }
>   val g = f(data)
>   g.print()
>   env.execute("Simple example")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1458) Interfaces and abstract classes are not valid types

2015-01-27 Thread John Sandiford (JIRA)
John Sandiford created FLINK-1458:
-

 Summary: Interfaces and abstract classes are not valid types
 Key: FLINK-1458
 URL: https://issues.apache.org/jira/browse/FLINK-1458
 Project: Flink
  Issue Type: Bug
Reporter: John Sandiford


I don't know whether this is by design or is a bug, but I am having trouble 
working with DataSet and traits in scala which is a major limitation.  A simple 
example is shown below.  

Compile time warning is 'Type Main.SimpleTrait has no fields that are visible 
from Scala Type analysis. Falling back to Java Type Analysis...'

Run time error is 'Interfaces and abstract classes are not valid types: 
interface Main$SimpleTrait'

Regards, John


 val env = ExecutionEnvironment.getExecutionEnvironment

  trait SimpleTrait {
def contains(x: String): Boolean
  }

  class SimpleClass extends SimpleTrait {
def contains(x: String) = true
  }

  val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)

  def f(data: DataSet[Double]): DataSet[SimpleTrait] = {

data.mapPartition(iterator => {
  Iterator(new SimpleClass)
})
  }


  val g = f(data)
  g.print()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1378) could not find implicit value for evidence parameter of type TypeInformation

2015-01-27 Thread John Sandiford (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293869#comment-14293869
 ] 

John Sandiford commented on FLINK-1378:
---

Yes, the fix works.  Thank you.

> could not find implicit value for evidence parameter of type TypeInformation
> 
>
> Key: FLINK-1378
> URL: https://issues.apache.org/jira/browse/FLINK-1378
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API
>Affects Versions: 0.7.0-incubating
>Reporter: John Sandiford
>Assignee: Aljoscha Krettek
>
> This is an example of one of many cases that I cannot get to compile with the 
> scala API.  I have tried using T : TypeInformation and : ClassTag but still 
> cannot get it to work.
> //libraryDependencies += "org.apache.flink" % "flink-scala" % 
> "0.7.0-incubating"
> //
> //libraryDependencies += "org.apache.flink" % "flink-clients" % 
> "0.7.0-incubating"
> import org.apache.flink.api.scala._
> import scala.util.{Success, Try}
> object Main extends App {
>   val env = ExecutionEnvironment.getExecutionEnvironment
>   val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)
>   def f[T](data: DataSet[T]): DataSet[(T, Try[Seq[Double]])] = {
> data.mapPartition((iterator: Iterator[T]) => {
>   val first = iterator.next()
>   val second = iterator.next()
>   Iterator((first, Success(Seq(2.0, 3.0))), (second, Success(Seq(3.0, 
> 1.0
> })
>   }
>   val g = f(data)
>   g.print()
>   env.execute("Flink Test")
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Implement the convenience methods count and co...

2015-01-27 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/210#issuecomment-71689597
  
Looks like this is now ready to merge. 

@zentol I understand your concern. However, I think that it is much easier 
to execute in this way. Most of the times, the user probably wants just one 
accumulator result and not multiple. This is supposed to be a convenience 
function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293833#comment-14293833
 ] 

ASF GitHub Bot commented on FLINK-377:
--

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71688094
  
@dan-blanchard generally, how easy it is will depend on what you're going 
for. 

you can implement the functionality to create a plan in the given language; 
or just leave that out and focus on udf's. (this means writing plans in java 
though!) 

for UDF's you can decide whether you want to create a complete framework 
with different operations and driver strategy's (map, cogroup, reduce etc.), or 
just provide the ability to receive/send values.

the only common core is the data exchange between the given language and 
java, which for example in python takes roughly 300 lines of code. (data is 
itself is ~70, rest is serialization)


> Create a general purpose framework for language bindings
> 
>
> Key: FLINK-377
> URL: https://issues.apache.org/jira/browse/FLINK-377
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>  Labels: github-import
> Fix For: pre-apache
>
>
> A general purpose API to run operators with arbitrary binaries. 
> This will allow to run Stratosphere programs written in Python, JavaScript, 
> Ruby, Go or whatever you like. 
> We suggest using Google Protocol Buffers for data serialization. This is the 
> list of languages that currently support ProtoBuf: 
> https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns 
> Very early prototype with python: 
> https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing 
> protobuf)
> For Ruby: https://github.com/infochimps-labs/wukong
> Two new students working at Stratosphere (@skunert and @filiphaase) are 
> working on this.
> The reference binding language will be for Python, but other bindings are 
> very welcome.
> The best name for this so far is "stratosphere-lang-bindings".
> I created this issue to track the progress (and give everybody a chance to 
> comment on this)
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/377
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: enhancement, 
> Assignee: [filiphaase|https://github.com/filiphaase]
> Created at: Tue Jan 07 19:47:20 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...

2015-01-27 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71688094
  
@dan-blanchard generally, how easy it is will depend on what you're going 
for. 

you can implement the functionality to create a plan in the given language; 
or just leave that out and focus on udf's. (this means writing plans in java 
though!) 

for UDF's you can decide whether you want to create a complete framework 
with different operations and driver strategy's (map, cogroup, reduce etc.), or 
just provide the ability to receive/send values.

the only common core is the data exchange between the given language and 
java, which for example in python takes roughly 300 lines of code. (data is 
itself is ~70, rest is serialization)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...

2015-01-27 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71672497
  
Whenever I look more closely at the DC I'm always left wondering how it can 
work at all.

About your first point, i don't think thats enough. there is a more 
fundamental flaw, we need another counter for delete processes.

consider the following 2 scenarios with 2 tasks distributing the same file.
C denotes the creating of a copying process, D denotes deleting process. # 
denotes the count variable, O the oldCount variable.

```
1):   I   II  III  IV
T1:---CD
T2:---CD---
# 1   22   2
O  2   2

2)I   II  III  IV
T1:---CD---
T2:---CD
# 1   22   2
O  2   2
```

In both scenarios, D at III should not delete the file, but all D's have 
the very same information.

instead, i propose having 2 counters, one counting the # of copy 
operations; and one counting the # of delete operations, with the current value 
(at process creation) stored in the process. when executing, if the current 
value is equal to the copy count, files may be deleted, since this means that 
this delete process was the last to be started.

let's make another fancy schema to illustrate the point:
```
1):   I   II  III  IV
T1:---CD
T2:---CD---
# 1   22   2
O  1   2

2)I   II  III  IV
T1:---CD---
T2:---CD
# 1   22   2
O  1   2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293723#comment-14293723
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71672497
  
Whenever I look more closely at the DC I'm always left wondering how it can 
work at all.

About your first point, i don't think thats enough. there is a more 
fundamental flaw, we need another counter for delete processes.

consider the following 2 scenarios with 2 tasks distributing the same file.
C denotes the creating of a copying process, D denotes deleting process. # 
denotes the count variable, O the oldCount variable.

```
1):   I   II  III  IV
T1:---CD
T2:---CD---
# 1   22   2
O  2   2

2)I   II  III  IV
T1:---CD---
T2:---CD
# 1   22   2
O  2   2
```

In both scenarios, D at III should not delete the file, but all D's have 
the very same information.

instead, i propose having 2 counters, one counting the # of copy 
operations; and one counting the # of delete operations, with the current value 
(at process creation) stored in the process. when executing, if the current 
value is equal to the copy count, files may be deleted, since this means that 
this delete process was the last to be started.

let's make another fancy schema to illustrate the point:
```
1):   I   II  III  IV
T1:---CD
T2:---CD---
# 1   22   2
O  1   2

2)I   II  III  IV
T1:---CD---
T2:---CD
# 1   22   2
O  1   2
```


> DistributedCache doesn't preserver files for subsequent operations
> --
>
> Key: FLINK-1419
> URL: https://issues.apache.org/jira/browse/FLINK-1419
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.8, 0.9
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> When subsequent operations want to access the same files in the DC it 
> frequently happens that the files are not created for the following operation.
> This is fairly odd, since the DC is supposed to either a) preserve files when 
> another operation kicks in within a certain time window, or b) just recreate 
> the deleted files. Both things don't happen.
> Increasing the time window had no effect.
> I'd like to use this issue as a starting point for a more general discussion 
> about the DistributedCache. 
> Currently:
> 1. all files reside in a common job-specific directory
> 2. are deleted during the job.
>  
> One thing that was brought up about Trait 1 is that it basically forbids 
> modification of the files, concurrent access and all. Personally I'm not sure 
> if this a problem. Changing it to a task-specific place solved the issue 
> though.
> I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
> is realized with the scheduler, which adds a lot of complexity to the current 
> code. (It really is a pain to work on...) 
> If we moved the deletion to the end of the job it could be done as a clean-up 
> step in the TaskManager, With this we could reduce the DC to a 
> cacheFile(String source) method, the delete method in the TM, and throw out 
> everything else.
> Also, the current implementation implies that big files may be copied 
> multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293612#comment-14293612
 ] 

ASF GitHub Bot commented on FLINK-938:
--

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23611202
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -519,6 +519,12 @@ object JobManager {
   configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, 
config.configDir + "/..")
 }
 
+if 
(GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+InetAddress.getLocalHost.getHostName)
+}
+
 val hostname = 
configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

Oh yes, you're right. Then it's perfectly fine.


> Change start-cluster.sh script so that users don't have to configure the 
> JobManager address
> ---
>
> Key: FLINK-938
> URL: https://issues.apache.org/jira/browse/FLINK-938
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Mingliang Qi
>Priority: Minor
> Fix For: 0.9
>
>
> To improve the user experience, Flink should not require users to configure 
> the JobManager's address on a cluster.
> In combination with FLINK-934, this would allow running Flink with decent 
> performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...

2015-01-27 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23611202
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -519,6 +519,12 @@ object JobManager {
   configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, 
config.configDir + "/..")
 }
 
+if 
(GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+InetAddress.getLocalHost.getHostName)
+}
+
 val hostname = 
configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

Oh yes, you're right. Then it's perfectly fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1457] exclude avro test file from RAT c...

2015-01-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/345#issuecomment-71656804
  
Good to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1457) RAT check fails on Windows

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293610#comment-14293610
 ] 

ASF GitHub Bot commented on FLINK-1457:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/345#issuecomment-71656804
  
Good to merge


> RAT check fails on Windows
> --
>
> Key: FLINK-1457
> URL: https://issues.apache.org/jira/browse/FLINK-1457
> Project: Flink
>  Issue Type: Bug
>Reporter: Sebastian Kruse
>Assignee: Sebastian Kruse
>Priority: Trivial
>
> On (my) Windows 7 (Maven 3.2.2), the RAT check fails as 
> flink-addons/flink-avro/src/test/resources/testdata.avro has no approved 
> license. Not being an actual code file, it should be excluded from the RAT 
> check so that verification also passes on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1457] exclude avro test file from RAT c...

2015-01-27 Thread sekruse
GitHub user sekruse opened a pull request:

https://github.com/apache/flink/pull/345

[FLINK-1457] exclude avro test file from RAT check



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sekruse/flink FLINK-1475

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/345.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #345


commit 1a1adcc5c9d372d22c50167525aaa0f1d3eb4d84
Author: Sebastian Kruse 
Date:   2015-01-27T14:24:01Z

[FLINK-1457] exclude avro test file from RAT check




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1457) RAT check fails on Windows

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293590#comment-14293590
 ] 

ASF GitHub Bot commented on FLINK-1457:
---

GitHub user sekruse opened a pull request:

https://github.com/apache/flink/pull/345

[FLINK-1457] exclude avro test file from RAT check



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sekruse/flink FLINK-1475

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/345.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #345


commit 1a1adcc5c9d372d22c50167525aaa0f1d3eb4d84
Author: Sebastian Kruse 
Date:   2015-01-27T14:24:01Z

[FLINK-1457] exclude avro test file from RAT check




> RAT check fails on Windows
> --
>
> Key: FLINK-1457
> URL: https://issues.apache.org/jira/browse/FLINK-1457
> Project: Flink
>  Issue Type: Bug
>Reporter: Sebastian Kruse
>Assignee: Sebastian Kruse
>Priority: Trivial
>
> On (my) Windows 7 (Maven 3.2.2), the RAT check fails as 
> flink-addons/flink-avro/src/test/resources/testdata.avro has no approved 
> license. Not being an actual code file, it should be excluded from the RAT 
> check so that verification also passes on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293588#comment-14293588
 ] 

ASF GitHub Bot commented on FLINK-938:
--

Github user qmlmoon commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23610509
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -519,6 +519,12 @@ object JobManager {
   configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, 
config.configDir + "/..")
 }
 
+if 
(GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+InetAddress.getLocalHost.getHostName)
+}
+
 val hostname = 
configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

In local mode, jobmanager will start an internal taskmanager and  call the 
`TaskManager.parseConfiguration`. For TaskManager, we can replace the null 
default value with the `config.defaultJobManagerAdd` but not the 
`InetAddress.getLocalHost.getHostName`. I think the question is whether we add 
an additional parameter to the `parseConfiguration` for the default jobmanager 
address or put this in the configuration.


> Change start-cluster.sh script so that users don't have to configure the 
> JobManager address
> ---
>
> Key: FLINK-938
> URL: https://issues.apache.org/jira/browse/FLINK-938
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Mingliang Qi
>Priority: Minor
> Fix For: 0.9
>
>
> To improve the user experience, Flink should not require users to configure 
> the JobManager's address on a cluster.
> In combination with FLINK-934, this would allow running Flink with decent 
> performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...

2015-01-27 Thread qmlmoon
Github user qmlmoon commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23610509
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -519,6 +519,12 @@ object JobManager {
   configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, 
config.configDir + "/..")
 }
 
+if 
(GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+InetAddress.getLocalHost.getHostName)
+}
+
 val hostname = 
configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

In local mode, jobmanager will start an internal taskmanager and  call the 
`TaskManager.parseConfiguration`. For TaskManager, we can replace the null 
default value with the `config.defaultJobManagerAdd` but not the 
`InetAddress.getLocalHost.getHostName`. I think the question is whether we add 
an additional parameter to the `parseConfiguration` for the default jobmanager 
address or put this in the configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...

2015-01-27 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71654749
  
both good points. I'll address them after lunch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-1457) RAT check fails on Windows

2015-01-27 Thread Sebastian Kruse (JIRA)
Sebastian Kruse created FLINK-1457:
--

 Summary: RAT check fails on Windows
 Key: FLINK-1457
 URL: https://issues.apache.org/jira/browse/FLINK-1457
 Project: Flink
  Issue Type: Bug
Reporter: Sebastian Kruse
Assignee: Sebastian Kruse
Priority: Trivial


On (my) Windows 7 (Maven 3.2.2), the RAT check fails as 
flink-addons/flink-avro/src/test/resources/testdata.avro has no approved 
license. Not being an actual code file, it should be excluded from the RAT 
check so that verification also passes on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293584#comment-14293584
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71654749
  
both good points. I'll address them after lunch!


> DistributedCache doesn't preserver files for subsequent operations
> --
>
> Key: FLINK-1419
> URL: https://issues.apache.org/jira/browse/FLINK-1419
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.8, 0.9
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> When subsequent operations want to access the same files in the DC it 
> frequently happens that the files are not created for the following operation.
> This is fairly odd, since the DC is supposed to either a) preserve files when 
> another operation kicks in within a certain time window, or b) just recreate 
> the deleted files. Both things don't happen.
> Increasing the time window had no effect.
> I'd like to use this issue as a starting point for a more general discussion 
> about the DistributedCache. 
> Currently:
> 1. all files reside in a common job-specific directory
> 2. are deleted during the job.
>  
> One thing that was brought up about Trait 1 is that it basically forbids 
> modification of the files, concurrent access and all. Personally I'm not sure 
> if this a problem. Changing it to a task-specific place solved the issue 
> though.
> I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
> is realized with the scheduler, which adds a lot of complexity to the current 
> code. (It really is a pain to work on...) 
> If we moved the deletion to the end of the job it could be done as a clean-up 
> step in the TaskManager, With this we could reduce the DC to a 
> cacheFile(String source) method, the delete method in the TM, and throw out 
> everything else.
> Also, the current implementation implies that big files may be copied 
> multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293561#comment-14293561
 ] 

ASF GitHub Bot commented on FLINK-938:
--

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23609808
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -519,6 +519,12 @@ object JobManager {
   configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, 
config.configDir + "/..")
 }
 
+if 
(GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+InetAddress.getLocalHost.getHostName)
+}
+
 val hostname = 
configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

That is true for the TaskManager but not for the JobManager. For the 
TaskManager, we could replace the null default value with 
```InetAddress.getLocalHost.getHostName``` at both places.


> Change start-cluster.sh script so that users don't have to configure the 
> JobManager address
> ---
>
> Key: FLINK-938
> URL: https://issues.apache.org/jira/browse/FLINK-938
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Mingliang Qi
>Priority: Minor
> Fix For: 0.9
>
>
> To improve the user experience, Flink should not require users to configure 
> the JobManager's address on a cluster.
> In combination with FLINK-934, this would allow running Flink with decent 
> performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...

2015-01-27 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23609808
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -519,6 +519,12 @@ object JobManager {
   configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, 
config.configDir + "/..")
 }
 
+if 
(GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+InetAddress.getLocalHost.getHostName)
+}
+
 val hostname = 
configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

That is true for the TaskManager but not for the JobManager. For the 
TaskManager, we could replace the null default value with 
```InetAddress.getLocalHost.getHostName``` at both places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293556#comment-14293556
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71652771
  
I'm wondering whether the count hash map update should rather happen in the 
copy process. Because otherwise there could be the following interleaving:

1. You register a new temp file "foobar" for task B --> creating a copy 
task and increment file counter
2. You delete the temp file "foobar" for task A because it is finished --> 
creating a delete process with the incremented counter
3. You execute the copy process
4. You execute the delete process

Then the file "foobar" does not exist for task B.

Another thing is that the DeleteProcess tries to delete the whole directory 
below the jobID if one file shall be deleted. I don't know whether this is the 
right behaviour.


> DistributedCache doesn't preserver files for subsequent operations
> --
>
> Key: FLINK-1419
> URL: https://issues.apache.org/jira/browse/FLINK-1419
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.8, 0.9
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> When subsequent operations want to access the same files in the DC it 
> frequently happens that the files are not created for the following operation.
> This is fairly odd, since the DC is supposed to either a) preserve files when 
> another operation kicks in within a certain time window, or b) just recreate 
> the deleted files. Both things don't happen.
> Increasing the time window had no effect.
> I'd like to use this issue as a starting point for a more general discussion 
> about the DistributedCache. 
> Currently:
> 1. all files reside in a common job-specific directory
> 2. are deleted during the job.
>  
> One thing that was brought up about Trait 1 is that it basically forbids 
> modification of the files, concurrent access and all. Personally I'm not sure 
> if this a problem. Changing it to a task-specific place solved the issue 
> though.
> I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
> is realized with the scheduler, which adds a lot of complexity to the current 
> code. (It really is a pain to work on...) 
> If we moved the deletion to the end of the job it could be done as a clean-up 
> step in the TaskManager, With this we could reduce the DC to a 
> cacheFile(String source) method, the delete method in the TM, and throw out 
> everything else.
> Also, the current implementation implies that big files may be copied 
> multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...

2015-01-27 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71652771
  
I'm wondering whether the count hash map update should rather happen in the 
copy process. Because otherwise there could be the following interleaving:

1. You register a new temp file "foobar" for task B --> creating a copy 
task and increment file counter
2. You delete the temp file "foobar" for task A because it is finished --> 
creating a delete process with the incremented counter
3. You execute the copy process
4. You execute the delete process

Then the file "foobar" does not exist for task B.

Another thing is that the DeleteProcess tries to delete the whole directory 
below the jobID if one file shall be deleted. I don't know whether this is the 
right behaviour.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-1352) Buggy registration from TaskManager to JobManager

2015-01-27 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-1352.

Resolution: Fixed

Fixed in 730e056a2a2ea028495637b633396392c31337e3

> Buggy registration from TaskManager to JobManager
> -
>
> Key: FLINK-1352
> URL: https://issues.apache.org/jira/browse/FLINK-1352
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Till Rohrmann
> Fix For: 0.9
>
>
> The JobManager's InstanceManager may refuse the registration attempt from a 
> TaskManager, because it has this taskmanager already connected, or,in the 
> future, because the TaskManager has been blacklisted as unreliable.
> Unpon refused registration, the instance ID is null, to signal that refused 
> registration. TaskManager reacts incorrectly to such methods, assuming 
> successful registration
> Possible solution: JobManager sends back a dedicated "RegistrationRefused" 
> message, if the instance manager returns null as the registration result. If 
> the TastManager receives that before being registered, it knows that the 
> registration response was lost (which should not happen on TCP and it would 
> indicate a corrupt connection)
> Followup question: Does it make sense to have the TaskManager trying 
> indefinitely to connect to the JobManager. With increasing interval (from 
> seconds to minutes)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293536#comment-14293536
 ] 

ASF GitHub Bot commented on FLINK-1352:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/328


> Buggy registration from TaskManager to JobManager
> -
>
> Key: FLINK-1352
> URL: https://issues.apache.org/jira/browse/FLINK-1352
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Till Rohrmann
> Fix For: 0.9
>
>
> The JobManager's InstanceManager may refuse the registration attempt from a 
> TaskManager, because it has this taskmanager already connected, or,in the 
> future, because the TaskManager has been blacklisted as unreliable.
> Unpon refused registration, the instance ID is null, to signal that refused 
> registration. TaskManager reacts incorrectly to such methods, assuming 
> successful registration
> Possible solution: JobManager sends back a dedicated "RegistrationRefused" 
> message, if the instance manager returns null as the registration result. If 
> the TastManager receives that before being registered, it knows that the 
> registration response was lost (which should not happen on TCP and it would 
> indicate a corrupt connection)
> Followup question: Does it make sense to have the TaskManager trying 
> indefinitely to connect to the JobManager. With increasing interval (from 
> seconds to minutes)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...

2015-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/328


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...

2015-01-27 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/331#issuecomment-71649409
  
Thanks for your feedback, @rmetzger . The error message is at the bottom 
because that way it is most easily identifiable by the user (no scrolling 
necessary). Before, we printed the error and then the help which let the help 
shadow the error message.

I changed the error reporting in case the user didn't specify an action.

Concerning the printing of the help message, you're probably right. Let's 
just print the help if the user asks for it. Now it prints:
> "./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar" is not a 
valid action.
> Valid actions are "run", "list", "info", or "cancel".

Additionally, let's 
* change `info` to print the plan by default
* change `cancel` to accept the job id as a parameter instead of an option
* change `list` to print scheduled and running jobs by default


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293527#comment-14293527
 ] 

ASF GitHub Bot commented on FLINK-938:
--

Github user qmlmoon commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23607815
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -519,6 +519,12 @@ object JobManager {
   configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, 
config.configDir + "/..")
 }
 
+if 
(GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+InetAddress.getLocalHost.getHostName)
+}
+
 val hostname = 
configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

It was my first thought. But there is another part in 
`TaskManager.parseConfiguration` method that wants to get jobmanager address 
from configuration


> Change start-cluster.sh script so that users don't have to configure the 
> JobManager address
> ---
>
> Key: FLINK-938
> URL: https://issues.apache.org/jira/browse/FLINK-938
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Mingliang Qi
>Priority: Minor
> Fix For: 0.9
>
>
> To improve the user experience, Flink should not require users to configure 
> the JobManager's address on a cluster.
> In combination with FLINK-934, this would allow running Flink with decent 
> performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...

2015-01-27 Thread qmlmoon
Github user qmlmoon commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23607815
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -519,6 +519,12 @@ object JobManager {
   configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, 
config.configDir + "/..")
 }
 
+if 
(GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+InetAddress.getLocalHost.getHostName)
+}
+
 val hostname = 
configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

It was my first thought. But there is another part in 
`TaskManager.parseConfiguration` method that wants to get jobmanager address 
from configuration


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293525#comment-14293525
 ] 

ASF GitHub Bot commented on FLINK-938:
--

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23607703
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
@@ -596,6 +596,10 @@ object TaskManager {
   opt[String]("tempDir") optional() action { (x, c) =>
 c.copy(tmpDir = x)
   } text ("Specify temporary directory.")
+
+  opt[String]("defaultJobManagerAdd") optional() action { (x, c) =>
--- End diff --

I'm slightly in favour of writing the option completely out: 
defaultJobManagerAddress


> Change start-cluster.sh script so that users don't have to configure the 
> JobManager address
> ---
>
> Key: FLINK-938
> URL: https://issues.apache.org/jira/browse/FLINK-938
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Mingliang Qi
>Priority: Minor
> Fix For: 0.9
>
>
> To improve the user experience, Flink should not require users to configure 
> the JobManager's address on a cluster.
> In combination with FLINK-934, this would allow running Flink with decent 
> performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...

2015-01-27 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23607703
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
@@ -596,6 +596,10 @@ object TaskManager {
   opt[String]("tempDir") optional() action { (x, c) =>
 c.copy(tmpDir = x)
   } text ("Specify temporary directory.")
+
+  opt[String]("defaultJobManagerAdd") optional() action { (x, c) =>
--- End diff --

I'm slightly in favour of writing the option completely out: 
defaultJobManagerAddress


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293523#comment-14293523
 ] 

ASF GitHub Bot commented on FLINK-938:
--

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23607635
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
@@ -611,6 +615,13 @@ object TaskManager {
   
configuration.setString(ConfigConstants.TASK_MANAGER_TMP_DIR_KEY, config.tmpDir)
 }
 
+if (config.defaultJobManagerAdd != null && 
GlobalConfiguration.getString(ConfigConstants
+  .JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+config.defaultJobManagerAdd)
+}
+
 val jobManagerHostname = configuration.getString(ConfigConstants
   .JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

The same question as above.


> Change start-cluster.sh script so that users don't have to configure the 
> JobManager address
> ---
>
> Key: FLINK-938
> URL: https://issues.apache.org/jira/browse/FLINK-938
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Mingliang Qi
>Priority: Minor
> Fix For: 0.9
>
>
> To improve the user experience, Flink should not require users to configure 
> the JobManager's address on a cluster.
> In combination with FLINK-934, this would allow running Flink with decent 
> performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...

2015-01-27 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23607635
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
@@ -611,6 +615,13 @@ object TaskManager {
   
configuration.setString(ConfigConstants.TASK_MANAGER_TMP_DIR_KEY, config.tmpDir)
 }
 
+if (config.defaultJobManagerAdd != null && 
GlobalConfiguration.getString(ConfigConstants
+  .JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+config.defaultJobManagerAdd)
+}
+
 val jobManagerHostname = configuration.getString(ConfigConstants
   .JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

The same question as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293521#comment-14293521
 ] 

ASF GitHub Bot commented on FLINK-377:
--

Github user dan-blanchard commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71648291
  
@rmetzger I'm really more curious than anything at this point.  I recently 
worked on a fairly large Storm topology that has parts written Java, Python, 
and Perl. As part of that I ended up taking over as the maintainer of 
IO::Storm, the Perl library for interfacing with Storm via their [Multilang 
protocol](https://storm.apache.org/documentation/Multilang-protocol.html).  
Multilang makes it incredibly easy to add support for other languages, so I 
just wanted to know if you guys were going for something that simple or not.


> Create a general purpose framework for language bindings
> 
>
> Key: FLINK-377
> URL: https://issues.apache.org/jira/browse/FLINK-377
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>  Labels: github-import
> Fix For: pre-apache
>
>
> A general purpose API to run operators with arbitrary binaries. 
> This will allow to run Stratosphere programs written in Python, JavaScript, 
> Ruby, Go or whatever you like. 
> We suggest using Google Protocol Buffers for data serialization. This is the 
> list of languages that currently support ProtoBuf: 
> https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns 
> Very early prototype with python: 
> https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing 
> protobuf)
> For Ruby: https://github.com/infochimps-labs/wukong
> Two new students working at Stratosphere (@skunert and @filiphaase) are 
> working on this.
> The reference binding language will be for Python, but other bindings are 
> very welcome.
> The best name for this so far is "stratosphere-lang-bindings".
> I created this issue to track the progress (and give everybody a chance to 
> comment on this)
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/377
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: enhancement, 
> Assignee: [filiphaase|https://github.com/filiphaase]
> Created at: Tue Jan 07 19:47:20 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...

2015-01-27 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23607471
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -519,6 +519,12 @@ object JobManager {
   configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, 
config.configDir + "/..")
 }
 
+if 
(GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+InetAddress.getLocalHost.getHostName)
+}
+
 val hostname = 
configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

Can't we just set InetAddress.getLocalHost.getHostName as the default value 
here instead of having the if block above?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...

2015-01-27 Thread dan-blanchard
Github user dan-blanchard commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71648291
  
@rmetzger I'm really more curious than anything at this point.  I recently 
worked on a fairly large Storm topology that has parts written Java, Python, 
and Perl. As part of that I ended up taking over as the maintainer of 
IO::Storm, the Perl library for interfacing with Storm via their [Multilang 
protocol](https://storm.apache.org/documentation/Multilang-protocol.html).  
Multilang makes it incredibly easy to add support for other languages, so I 
just wanted to know if you guys were going for something that simple or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293520#comment-14293520
 ] 

ASF GitHub Bot commented on FLINK-938:
--

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/248#discussion_r23607471
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -519,6 +519,12 @@ object JobManager {
   configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, 
config.configDir + "/..")
 }
 
+if 
(GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+  null) == null) {
+  
configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY,
+InetAddress.getLocalHost.getHostName)
+}
+
 val hostname = 
configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null)
--- End diff --

Can't we just set InetAddress.getLocalHost.getHostName as the default value 
here instead of having the if block above?


> Change start-cluster.sh script so that users don't have to configure the 
> JobManager address
> ---
>
> Key: FLINK-938
> URL: https://issues.apache.org/jira/browse/FLINK-938
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Mingliang Qi
>Priority: Minor
> Fix For: 0.9
>
>
> To improve the user experience, Flink should not require users to configure 
> the JobManager's address on a cluster.
> In combination with FLINK-934, this would allow running Flink with decent 
> performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293508#comment-14293508
 ] 

ASF GitHub Bot commented on FLINK-938:
--

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/248#issuecomment-71647940
  
Great, thanks.


> Change start-cluster.sh script so that users don't have to configure the 
> JobManager address
> ---
>
> Key: FLINK-938
> URL: https://issues.apache.org/jira/browse/FLINK-938
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Mingliang Qi
>Priority: Minor
> Fix For: 0.9
>
>
> To improve the user experience, Flink should not require users to configure 
> the JobManager's address on a cluster.
> In combination with FLINK-934, this would allow running Flink with decent 
> performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...

2015-01-27 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/248#issuecomment-71647940
  
Great, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293502#comment-14293502
 ] 

ASF GitHub Bot commented on FLINK-1352:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/328#issuecomment-71647585
  
I'll merge it.


> Buggy registration from TaskManager to JobManager
> -
>
> Key: FLINK-1352
> URL: https://issues.apache.org/jira/browse/FLINK-1352
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Till Rohrmann
> Fix For: 0.9
>
>
> The JobManager's InstanceManager may refuse the registration attempt from a 
> TaskManager, because it has this taskmanager already connected, or,in the 
> future, because the TaskManager has been blacklisted as unreliable.
> Unpon refused registration, the instance ID is null, to signal that refused 
> registration. TaskManager reacts incorrectly to such methods, assuming 
> successful registration
> Possible solution: JobManager sends back a dedicated "RegistrationRefused" 
> message, if the instance manager returns null as the registration result. If 
> the TastManager receives that before being registered, it knows that the 
> registration response was lost (which should not happen on TCP and it would 
> indicate a corrupt connection)
> Followup question: Does it make sense to have the TaskManager trying 
> indefinitely to connect to the JobManager. With increasing interval (from 
> seconds to minutes)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...

2015-01-27 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/328#issuecomment-71647585
  
I'll merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293481#comment-14293481
 ] 

ASF GitHub Bot commented on FLINK-938:
--

Github user qmlmoon commented on the pull request:

https://github.com/apache/flink/pull/248#issuecomment-71645508
  
ok. I rebased the PR and modified it with scala implementation.


> Change start-cluster.sh script so that users don't have to configure the 
> JobManager address
> ---
>
> Key: FLINK-938
> URL: https://issues.apache.org/jira/browse/FLINK-938
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Robert Metzger
>Assignee: Mingliang Qi
>Priority: Minor
> Fix For: 0.9
>
>
> To improve the user experience, Flink should not require users to configure 
> the JobManager's address on a cluster.
> In combination with FLINK-934, this would allow running Flink with decent 
> performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...

2015-01-27 Thread qmlmoon
Github user qmlmoon commented on the pull request:

https://github.com/apache/flink/pull/248#issuecomment-71645508
  
ok. I rebased the PR and modified it with scala implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.

2015-01-27 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293458#comment-14293458
 ] 

Aljoscha Krettek commented on FLINK-1396:
-

For this to work I have to move the Hadoop Formats from addons to the java 
(resp. scala) packages. Should I move the input formats and leave the rest 
intact, or should I duplicate the input formats and leave the addons package as 
it is?

Also, I should probably add direct methods for both the old and the new API.

What are your thoughts on this?

> Add hadoop input formats directly to the user API.
> --
>
> Key: FLINK-1396
> URL: https://issues.apache.org/jira/browse/FLINK-1396
> Project: Flink
>  Issue Type: Bug
>Reporter: Robert Metzger
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1303) HadoopInputFormat does not work with Scala API

2015-01-27 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-1303:

Issue Type: Sub-task  (was: Bug)
Parent: FLINK-1396

> HadoopInputFormat does not work with Scala API
> --
>
> Key: FLINK-1303
> URL: https://issues.apache.org/jira/browse/FLINK-1303
> Project: Flink
>  Issue Type: Sub-task
>  Components: Scala API
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 0.9
>
>
> It fails because the HadoopInputFormat uses the Flink Tuple2 type. For this, 
> type extraction fails at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293416#comment-14293416
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71638812
  
updated to include discussed changes


> DistributedCache doesn't preserver files for subsequent operations
> --
>
> Key: FLINK-1419
> URL: https://issues.apache.org/jira/browse/FLINK-1419
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.8, 0.9
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> When subsequent operations want to access the same files in the DC it 
> frequently happens that the files are not created for the following operation.
> This is fairly odd, since the DC is supposed to either a) preserve files when 
> another operation kicks in within a certain time window, or b) just recreate 
> the deleted files. Both things don't happen.
> Increasing the time window had no effect.
> I'd like to use this issue as a starting point for a more general discussion 
> about the DistributedCache. 
> Currently:
> 1. all files reside in a common job-specific directory
> 2. are deleted during the job.
>  
> One thing that was brought up about Trait 1 is that it basically forbids 
> modification of the files, concurrent access and all. Personally I'm not sure 
> if this a problem. Changing it to a task-specific place solved the issue 
> though.
> I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
> is realized with the scheduler, which adds a lot of complexity to the current 
> code. (It really is a pain to work on...) 
> If we moved the deletion to the end of the job it could be done as a clean-up 
> step in the TaskManager, With this we could reduce the DC to a 
> cacheFile(String source) method, the delete method in the TM, and throw out 
> everything else.
> Also, the current implementation implies that big files may be copied 
> multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...

2015-01-27 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-71638812
  
updated to include discussed changes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...

2015-01-27 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71633168
  
So either we can somehow change the execution order of the clean goal or we 
fix this in the junction plugin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1330) Restructure directory layout

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293390#comment-14293390
 ] 

ASF GitHub Bot commented on FLINK-1330:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/333#issuecomment-71633168
  
So either we can somehow change the execution order of the clean goal or we 
fix this in the junction plugin.


> Restructure directory layout
> 
>
> Key: FLINK-1330
> URL: https://issues.apache.org/jira/browse/FLINK-1330
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System, Documentation
>Reporter: Max Michels
>Priority: Minor
>  Labels: usability
>
> When building Flink, the build results can currently be found under 
> "flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/".
> I think we could improve the directory layout with the following:
> - provide the bin folder in the root by default
> - let the start up and submissions scripts in bin assemble the class path
> - in case the project hasn't been build yet, inform the user
> The changes would make it easier to work with Flink from source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts

2015-01-27 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1433.
---
   Resolution: Fixed
Fix Version/s: 0.9
 Assignee: Robert Metzger

Fixed for 0.9 in http://git-wip-us.apache.org/repos/asf/flink/commit/a5150a90
Fixed for 0.8.1 in http://git-wip-us.apache.org/repos/asf/flink/commit/2387a08e

> Add HADOOP_CLASSPATH to start scripts
> -
>
> Key: FLINK-1433
> URL: https://issues.apache.org/jira/browse/FLINK-1433
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 0.9, 0.8.1
>
>
> With the Hadoop file system wrapper, its important to have access to the 
> hadoop filesystem classes.
> The HADOOP_CLASSPATH seems to be a standard environment variable used by 
> Hadoop for such libraries.
> Deployments like Google Compute Cloud set this variable containing the 
> "Google Cloud Storage Hadoop Wrapper". So if users want to use the Cloud 
> Storage in an non-yarn environment, we need to address this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1318) Make quoted String parsing optional and configurable for CSVInputFormats

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293379#comment-14293379
 ] 

ASF GitHub Bot commented on FLINK-1318:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/265#issuecomment-71631281
  
Looks good to me. How about adding some documentation for this feature? 
Maybe under http://flink.apache.org/docs/0.8/programming_guide.html#data-sources


> Make quoted String parsing optional and configurable for CSVInputFormats
> 
>
> Key: FLINK-1318
> URL: https://issues.apache.org/jira/browse/FLINK-1318
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 0.8
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>Priority: Minor
>
> With the current implementation of the CSVInputFormat, quoted string parsing 
> kicks in, if the first non-whitespace character of a field is a double quote. 
> I see two issues with this implementation:
> 1. Quoted String parsing cannot be disabled
> 2. The quoting character is fixed to double quotes (")
> I propose to add parameters to disable quoted String parsing and set the 
> quote character.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1318] CsvInputFormat: Made quoted strin...

2015-01-27 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/265#issuecomment-71631281
  
Looks good to me. How about adding some documentation for this feature? 
Maybe under http://flink.apache.org/docs/0.8/programming_guide.html#data-sources


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293377#comment-14293377
 ] 

ASF GitHub Bot commented on FLINK-1433:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/337


> Add HADOOP_CLASSPATH to start scripts
> -
>
> Key: FLINK-1433
> URL: https://issues.apache.org/jira/browse/FLINK-1433
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
> Fix For: 0.8.1
>
>
> With the Hadoop file system wrapper, its important to have access to the 
> hadoop filesystem classes.
> The HADOOP_CLASSPATH seems to be a standard environment variable used by 
> Hadoop for such libraries.
> Deployments like Google Compute Cloud set this variable containing the 
> "Google Cloud Storage Hadoop Wrapper". So if users want to use the Cloud 
> Storage in an non-yarn environment, we need to address this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1433] Add HADOOP_CLASSPATH to start scr...

2015-01-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/337


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293375#comment-14293375
 ] 

ASF GitHub Bot commented on FLINK-1433:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/337#issuecomment-71631049
  
Merging it.


> Add HADOOP_CLASSPATH to start scripts
> -
>
> Key: FLINK-1433
> URL: https://issues.apache.org/jira/browse/FLINK-1433
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
> Fix For: 0.8.1
>
>
> With the Hadoop file system wrapper, its important to have access to the 
> hadoop filesystem classes.
> The HADOOP_CLASSPATH seems to be a standard environment variable used by 
> Hadoop for such libraries.
> Deployments like Google Compute Cloud set this variable containing the 
> "Google Cloud Storage Hadoop Wrapper". So if users want to use the Cloud 
> Storage in an non-yarn environment, we need to address this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1433] Add HADOOP_CLASSPATH to start scr...

2015-01-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/337#issuecomment-71631049
  
Merging it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293267#comment-14293267
 ] 

ASF GitHub Bot commented on FLINK-1433:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/337#issuecomment-71618846
  
Looks good to merge.

Like Robert said, the `HADOOP_CLASSPATH` is used to add third party 
libraries. From `hadoop-env.sh`:

# Extra Java CLASSPATH elements.  Automatically insert 
capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
  export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
  export HADOOP_CLASSPATH=$f
fi
done


> Add HADOOP_CLASSPATH to start scripts
> -
>
> Key: FLINK-1433
> URL: https://issues.apache.org/jira/browse/FLINK-1433
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
> Fix For: 0.8.1
>
>
> With the Hadoop file system wrapper, its important to have access to the 
> hadoop filesystem classes.
> The HADOOP_CLASSPATH seems to be a standard environment variable used by 
> Hadoop for such libraries.
> Deployments like Google Compute Cloud set this variable containing the 
> "Google Cloud Storage Hadoop Wrapper". So if users want to use the Cloud 
> Storage in an non-yarn environment, we need to address this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1433] Add HADOOP_CLASSPATH to start scr...

2015-01-27 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/337#issuecomment-71618846
  
Looks good to merge.

Like Robert said, the `HADOOP_CLASSPATH` is used to add third party 
libraries. From `hadoop-env.sh`:

# Extra Java CLASSPATH elements.  Automatically insert 
capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
  export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
  export HADOOP_CLASSPATH=$f
fi
done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1442) Archived Execution Graph consumes too much memory

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293258#comment-14293258
 ] 

ASF GitHub Bot commented on FLINK-1442:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/344#issuecomment-71616952
  
@StephanEwen Yes, that was on purpose. The previous two data structures 
(`HashMap` and `Queue`) are now replaced by the `LinkedHashMap` which serves 
the same functionality. It might not be obvious but the `LinkedHashMap` 
preserves the order of the inserted items. From 
`scala.collection.mutable.LinkedHashMap`:

> This class implements mutable maps using a hashtable.
> The iterator and all traversal methods of this class visit elements in 
the order they were inserted.

That's why `graphs.iterator.next()` always returns the least recently 
inserted item.



> Archived Execution Graph consumes too much memory
> -
>
> Key: FLINK-1442
> URL: https://issues.apache.org/jira/browse/FLINK-1442
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Max Michels
>
> The JobManager archives the execution graphs, for analysis of jobs. The 
> graphs may consume a lot of memory.
> Especially the execution edges in all2all connection patterns are extremely 
> many and add up in memory consumption.
> The execution edges connect all parallel tasks. So for a all2all pattern 
> between n and m tasks, there are n*m edges. For parallelism of multiple 100 
> tasks, this can easily reach 100k objects and more, each with a set of 
> metadata.
> I propose the following to solve that:
> 1.  Clear all execution edges from the graph (majority of the memory 
> consumers) when it is given to the archiver.
> 2. Have the map/list of the archived graphs behind a soft reference, to it 
> will be removed under memory pressure before the JVM crashes. That may remove 
> graphs from the history early, but is much preferable to the JVM crashing, in 
> which case the graph is lost as well...
> 3. Long term: The graph should be archived somewhere else. Somthing like the 
> History server used by Hadoop and Hive would be a good idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1442] Reduce memory consumption of arch...

2015-01-27 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/344#issuecomment-71616952
  
@StephanEwen Yes, that was on purpose. The previous two data structures 
(`HashMap` and `Queue`) are now replaced by the `LinkedHashMap` which serves 
the same functionality. It might not be obvious but the `LinkedHashMap` 
preserves the order of the inserted items. From 
`scala.collection.mutable.LinkedHashMap`:

> This class implements mutable maps using a hashtable.
> The iterator and all traversal methods of this class visit elements in 
the order they were inserted.

That's why `graphs.iterator.next()` always returns the least recently 
inserted item.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1452) Add "flink-contrib" maven module and README.md with the rules

2015-01-27 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293240#comment-14293240
 ] 

Robert Metzger commented on FLINK-1452:
---

In my opinion:
{{flink-addons}}:
- are usually part of {{flink-dist}}
- are maintained by the core committers
- are documented in the main documentation
- are eventually moved out of "flink-addons" once they are "stable" (for 
example {{flink-yarn}} recently, and probably {{flink-streaming}} soon)

{{flink-contrib}}:
- Not part of {{flink-dist}}
- documentation should live somewhere in the code, not in the main repo

We could call {{flink-contrib}} --> {{flink-user-contrib}} .. but that would 
make the name pretty long
Also, we could name {{flink-addons}} --> {{flink-unstable}}, but that's 
probably a bad name ;) 

I see the {{flink-addons}} as our internal incubator. Other potential names 
could be {{flink-beta}}, {{flink-incubator}}

> Add "flink-contrib" maven module and README.md with the rules
> -
>
> Key: FLINK-1452
> URL: https://issues.apache.org/jira/browse/FLINK-1452
> Project: Flink
>  Issue Type: New Feature
>  Components: flink-contrib
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>
> I'll also create a JIRA component



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations

2015-01-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/311#issuecomment-71613018
  
+1 for merging it

Whats the plan with the documentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1328) Rework Constant Field Annotations

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293227#comment-14293227
 ] 

ASF GitHub Bot commented on FLINK-1328:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/311#issuecomment-71613018
  
+1 for merging it

Whats the plan with the documentation?


> Rework Constant Field Annotations
> -
>
> Key: FLINK-1328
> URL: https://issues.apache.org/jira/browse/FLINK-1328
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Optimizer, Scala API
>Affects Versions: 0.7.0-incubating
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Constant field annotations are used by the optimizer to determine whether 
> physical data properties such as sorting or partitioning are retained by user 
> defined functions.
> The current implementation is limited and can be extended in several ways:
> - Fields that are copied to other positions
> - Field definitions for non-tuple data types (Pojos)
> There is a pull request (#83) that goes into this direction and which can be 
> extended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings

2015-01-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293223#comment-14293223
 ] 

ASF GitHub Bot commented on FLINK-377:
--

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71612618
  
@dan-blanchard What non-JVM language are you looking for?
Maybe we can do a little prototype with that language to see how well it 
works. Maybe you or somebody else from the community is interested in making 
the prototype production ready?



> Create a general purpose framework for language bindings
> 
>
> Key: FLINK-377
> URL: https://issues.apache.org/jira/browse/FLINK-377
> Project: Flink
>  Issue Type: Improvement
>Reporter: GitHub Import
>  Labels: github-import
> Fix For: pre-apache
>
>
> A general purpose API to run operators with arbitrary binaries. 
> This will allow to run Stratosphere programs written in Python, JavaScript, 
> Ruby, Go or whatever you like. 
> We suggest using Google Protocol Buffers for data serialization. This is the 
> list of languages that currently support ProtoBuf: 
> https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns 
> Very early prototype with python: 
> https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing 
> protobuf)
> For Ruby: https://github.com/infochimps-labs/wukong
> Two new students working at Stratosphere (@skunert and @filiphaase) are 
> working on this.
> The reference binding language will be for Python, but other bindings are 
> very welcome.
> The best name for this so far is "stratosphere-lang-bindings".
> I created this issue to track the progress (and give everybody a chance to 
> comment on this)
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/377
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: enhancement, 
> Assignee: [filiphaase|https://github.com/filiphaase]
> Created at: Tue Jan 07 19:47:20 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1105) Add support for locally sorted output

2015-01-27 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293225#comment-14293225
 ] 

Stephan Ewen commented on FLINK-1105:
-

Adding to my comment above: That would mean that we make sorting not just a 
property of the sink, but an operator of its own. For any efficiency, this 
operator would need to be fused with successors

> Add support for locally sorted output
> -
>
> Key: FLINK-1105
> URL: https://issues.apache.org/jira/browse/FLINK-1105
> Project: Flink
>  Issue Type: Sub-task
>  Components: Java API
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>Priority: Minor
>
> This feature will make it possible to sort the output which is sent to an 
> OutputFormat to obtain a locally sorted result.
> This feature was available in the "old" Java API and has not be ported to the 
> new Java API yet. Hence optimizer and runtime should already have support for 
> this feature. However, the API and job generation part is missing.
> It is also a subfeature of FLINK-598 which will provide also globally sorted 
> results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...

2015-01-27 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/202#issuecomment-71612618
  
@dan-blanchard What non-JVM language are you looking for?
Maybe we can do a little prototype with that language to see how well it 
works. Maybe you or somebody else from the community is interested in making 
the prototype production ready?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >