FOSDEM 2017 HPC, Bigdata and Data Science DevRoom CFP is closing soon

2016-11-23 Thread Roman Shaposhnik
Hi! apologies for the extra wide distribution (this exhausts my once a year ASF mail-to-all-bigdata-projects quota ;-)) but I wanted to suggest that all of you should consider submitting talks to FOSDEM 2017 HPC, Bigdata and Data Science DevRoom: https://hpc-bigdata-fosdem17.github.io/ It was

Re: [DISCUSS] @Public libraries

2016-11-23 Thread Theodore Vasiloudis
What Till said is true for FlinkML, until all the moving parts are in place there's not much point in annotating any as Public. The Spark project has the @Experimental tag IIRC, that would fit our case better. On Wed, Nov 23, 2016 at 4:09 PM, Till Rohrmann wrote: > I think in general annotating

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread Fabian Hueske
Hi Greg, yes we could and as Aljoscha pointed out the proper solution would be to serialize all objects (as done in the DataSet API) and not hold them as objects on the heap. This would be a major effort though and I am rather looking for a 'quick' work around that does not have major side effects

[jira] [Created] (FLINK-5151) Add discussion about object mutations to heap-based state backend docs.

2016-11-23 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-5151: Summary: Add discussion about object mutations to heap-based state backend docs. Key: FLINK-5151 URL: https://issues.apache.org/jira/browse/FLINK-5151 Project: Flink

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread Greg Hogan
On Wed, Nov 23, 2016 at 8:34 AM, Fabian Hueske wrote: > The ReduceFunction must be used in the right way and it is easy to get > wrong. > I'm likely highlighting my ignorance here, but if object reuse works properly for ReduceFunction in the batch API, can we do the same in the streaming API?

[jira] [Created] (FLINK-5150) WebUI metric-related resource leak

2016-11-23 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-5150: --- Summary: WebUI metric-related resource leak Key: FLINK-5150 URL: https://issues.apache.org/jira/browse/FLINK-5150 Project: Flink Issue Type: Bug

Re: Very wide csv files

2016-11-23 Thread Flavio Pompermaier
I usually use apache commons CSV for that, as you can see here (inside the *parseWithApacheCommonsCsv* part of the if): https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/datasourcemanager/importers/Csv2RowExample.java I hope this could help! Flavi

Re: [DISCUSS] @Public libraries

2016-11-23 Thread Till Rohrmann
I think in general annotating library methods/classes is a good idea. The question is just which APIs are going to be marked stable. In the past we've seen that we might have marked some of Flink's APIs stable too early. As a consequence we have to carry them along for quite some time (at the very

[jira] [Created] (FLINK-5149) ContinuousEventTimeTrigger doesn't fire at the end of the window

2016-11-23 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-5149: - Summary: ContinuousEventTimeTrigger doesn't fire at the end of the window Key: FLINK-5149 URL: https://issues.apache.org/jira/browse/FLINK-5149 Project: Fli

[jira] [Created] (FLINK-5148) LocalFileSystem#delete can fail with NullpointerException

2016-11-23 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-5148: --- Summary: LocalFileSystem#delete can fail with NullpointerException Key: FLINK-5148 URL: https://issues.apache.org/jira/browse/FLINK-5148 Project: Flink

Very wide csv files

2016-11-23 Thread Anton Solovev
Hi, I'm working on https://issues.apache.org/jira/browse/FLINK-2186 As I understand, Flink cannot read wide-column files in tuple, but pojo So far we must create that pojo manually, it's convenient when count of columns not so many When it's over thousand - hardly seems possible To solve this i

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread Fabian Hueske
Yes, you're right. This is not a principled solution but rather a work-around for a specific use case. The ReduceFunction must be used in the right way and it is easy to get wrong. (OTOH, there is currently no way to get object reusage right. So think the change would not worsen the current state.

[jira] [Created] (FLINK-5147) StreamingOperatorsITCase.testGroupedFoldOperation failed on Travis

2016-11-23 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-5147: --- Summary: StreamingOperatorsITCase.testGroupedFoldOperation failed on Travis Key: FLINK-5147 URL: https://issues.apache.org/jira/browse/FLINK-5147 Project: Flink

Re: [DISCUSS] @Public libraries

2016-11-23 Thread Aljoscha Krettek
I would be for also annotating library methods/classes. Maybe Robert has a stronger opinion on this because he introduced these annotations. On Tue, 22 Nov 2016 at 18:56 Greg Hogan wrote: > Hi all, > > Should stable APIs in Flink's CEP, ML, and Gelly libraries be annotated > @Public or restricte

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread Aljoscha Krettek
You can go ahead and do the change. I just think that this is quite fragile. For example, this depends on the reduce function returning the right object for reuse. If we hand in the copied object as the first input and the ReduceFunction reuses the second input then we again have a reference to the

[jira] [Created] (FLINK-5146) Improved resource cleanup in RocksDB keyed state backend

2016-11-23 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-5146: - Summary: Improved resource cleanup in RocksDB keyed state backend Key: FLINK-5146 URL: https://issues.apache.org/jira/browse/FLINK-5146 Project: Flink Issu

[jira] [Created] (FLINK-5145) WebInterface to aggressive in pulling metrics

2016-11-23 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-5145: --- Summary: WebInterface to aggressive in pulling metrics Key: FLINK-5145 URL: https://issues.apache.org/jira/browse/FLINK-5145 Project: Flink Issue Type:

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread Fabian Hueske
Hi Aljoscha, sure, there many issues with holding the state as objects on the heap. However, I think we don't have to solve all problems related to that in order to add a small fix that solves one specific issue. I would not explicitly expose the fix to users but it would be nice if we could imple

[jira] [Created] (FLINK-5144) Error while applying rule AggregateJoinTransposeRule

2016-11-23 Thread Timo Walther (JIRA)
Timo Walther created FLINK-5144: --- Summary: Error while applying rule AggregateJoinTransposeRule Key: FLINK-5144 URL: https://issues.apache.org/jira/browse/FLINK-5144 Project: Flink Issue Type:

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread sjk
hiļ¼ŒFabian Hueske, Sorry for mistake for the whole PR #2792 > On Nov 23, 2016, at 17:10, Fabian Hueske wrote: > > Hi, > > Why do you think that this means "much code changes"? > I think it would actually be a pretty lightweight change in > HeapReducingState. > > The proposal is to copy the *fir

Re: [DISCUSS] deprecated function need more detail

2016-11-23 Thread Paris Carbone
+1 This should always be the norm, especially for user-facing code. While we are at it, perhaps when someone deprecates functionality the new alternative should also be replaced right away. E.g. Checkpointed is deprecated but all state management tests are actually using this alternative. chee

Re: [DISCUSS] deprecated function need more detail

2016-11-23 Thread Kostas Kloudas
+1 and we should apply the same to all deprecated interfaces/abstract classes. > On Nov 23, 2016, at 11:13 AM, Aljoscha Krettek wrote: > > +1 That sounds excellent. > > On Wed, 23 Nov 2016 at 11:04 Till Rohrmann wrote: > >> +1 for your proposal. >> >> Cheers, >> Till >> >> On Wed, Nov 23, 2

Re: [DISCUSS] deprecated function need more detail

2016-11-23 Thread Aljoscha Krettek
+1 That sounds excellent. On Wed, 23 Nov 2016 at 11:04 Till Rohrmann wrote: > +1 for your proposal. > > Cheers, > Till > > On Wed, Nov 23, 2016 at 9:33 AM, Fabian Hueske wrote: > > > I agree on this one. > > Whenever we deprecate a method or a feature we should add a comment that > > explains t

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread Aljoscha Krettek
I think it's not a good idea to introduce this special case fix for one specific use case because this can have implications for other parts of the code. We should push for keeping the data in serialised form. There's also other problems, for example, a ListState allows modifying the returned Itera

Re: [DISCUSS] deprecated function need more detail

2016-11-23 Thread Till Rohrmann
+1 for your proposal. Cheers, Till On Wed, Nov 23, 2016 at 9:33 AM, Fabian Hueske wrote: > I agree on this one. > Whenever we deprecate a method or a feature we should add a comment that > explains the new API or why the feature was removed without replacement. > > Enforcing this information th

[jira] [Created] (FLINK-5143) Add EXISTS to list of supported operators

2016-11-23 Thread Timo Walther (JIRA)
Timo Walther created FLINK-5143: --- Summary: Add EXISTS to list of supported operators Key: FLINK-5143 URL: https://issues.apache.org/jira/browse/FLINK-5143 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-5142) Resource leak in CheckpointCoordinator

2016-11-23 Thread Frank Lauterwald (JIRA)
Frank Lauterwald created FLINK-5142: --- Summary: Resource leak in CheckpointCoordinator Key: FLINK-5142 URL: https://issues.apache.org/jira/browse/FLINK-5142 Project: Flink Issue Type: Bug

Re: [DISCUSS] Hold copies in HeapStateBackend

2016-11-23 Thread Fabian Hueske
Hi, Why do you think that this means "much code changes"? I think it would actually be a pretty lightweight change in HeapReducingState. The proposal is to copy the *first* value that goes into a ReducingState. The copy would be done by a TypeSerializer and hence be a deep copy. This will allow t

[jira] [Created] (FLINK-5141) Implement MiniClusterStreamEnvironment to run new mini cluster in flip-6 branch

2016-11-23 Thread Biao Liu (JIRA)
Biao Liu created FLINK-5141: --- Summary: Implement MiniClusterStreamEnvironment to run new mini cluster in flip-6 branch Key: FLINK-5141 URL: https://issues.apache.org/jira/browse/FLINK-5141 Project: Flink

Re: [DISCUSS] deprecated function need more detail

2016-11-23 Thread Fabian Hueske
I agree on this one. Whenever we deprecate a method or a feature we should add a comment that explains the new API or why the feature was removed without replacement. Enforcing this information through checkstyle makes sense as well, IMO. Cheers, Fabian 2016-11-23 4:42 GMT+01:00 sjk : > Hi, all