[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge
[ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413521#comment-15413521 ] ASF GitHub Bot commented on FLINK-2090: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2323 > toString of CollectionInputFormat takes long time when the collection is huge > - > > Key: FLINK-2090 > URL: https://issues.apache.org/jira/browse/FLINK-2090 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Assignee: Ivan Mushketyk >Priority: Minor > Fix For: 1.2.0 > > > The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on > its underlying {{Collection}}. Thus, {{toString}} is called for each element > of the collection. If the {{Collection}} contains many elements or the > individual {{toString}} calls for each element take a long time, then the > string generation can take a considerable amount of time. [~mikiobraun] > noticed that when he inserted several jBLAS matrices into Flink. > The {{toString}} method is mainly used for logging statements in > {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and > in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it > is necessary to print the complete content of the underlying {{Collection}} > or if it's not enough to print only the first 3 elements in the {{toString}} > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge
[ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413358#comment-15413358 ] ASF GitHub Bot commented on FLINK-2090: --- Github user mushketyk commented on the issue: https://github.com/apache/flink/pull/2323 Awesome! Thank you. > toString of CollectionInputFormat takes long time when the collection is huge > - > > Key: FLINK-2090 > URL: https://issues.apache.org/jira/browse/FLINK-2090 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Assignee: Ivan Mushketyk >Priority: Minor > > The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on > its underlying {{Collection}}. Thus, {{toString}} is called for each element > of the collection. If the {{Collection}} contains many elements or the > individual {{toString}} calls for each element take a long time, then the > string generation can take a considerable amount of time. [~mikiobraun] > noticed that when he inserted several jBLAS matrices into Flink. > The {{toString}} method is mainly used for logging statements in > {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and > in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it > is necessary to print the complete content of the underlying {{Collection}} > or if it's not enough to print only the first 3 elements in the {{toString}} > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge
[ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413355#comment-15413355 ] ASF GitHub Bot commented on FLINK-2090: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2323 Looks good, merging this... > toString of CollectionInputFormat takes long time when the collection is huge > - > > Key: FLINK-2090 > URL: https://issues.apache.org/jira/browse/FLINK-2090 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Assignee: Ivan Mushketyk >Priority: Minor > > The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on > its underlying {{Collection}}. Thus, {{toString}} is called for each element > of the collection. If the {{Collection}} contains many elements or the > individual {{toString}} calls for each element take a long time, then the > string generation can take a considerable amount of time. [~mikiobraun] > noticed that when he inserted several jBLAS matrices into Flink. > The {{toString}} method is mainly used for logging statements in > {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and > in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it > is necessary to print the complete content of the underlying {{Collection}} > or if it's not enough to print only the first 3 elements in the {{toString}} > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge
[ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412528#comment-15412528 ] ASF GitHub Bot commented on FLINK-2090: --- Github user mushketyk commented on the issue: https://github.com/apache/flink/pull/2323 @StephanEwen I've updated the PR according to your review. > toString of CollectionInputFormat takes long time when the collection is huge > - > > Key: FLINK-2090 > URL: https://issues.apache.org/jira/browse/FLINK-2090 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Assignee: Ivan Mushketyk >Priority: Minor > > The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on > its underlying {{Collection}}. Thus, {{toString}} is called for each element > of the collection. If the {{Collection}} contains many elements or the > individual {{toString}} calls for each element take a long time, then the > string generation can take a considerable amount of time. [~mikiobraun] > noticed that when he inserted several jBLAS matrices into Flink. > The {{toString}} method is mainly used for logging statements in > {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and > in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it > is necessary to print the complete content of the underlying {{Collection}} > or if it's not enough to print only the first 3 elements in the {{toString}} > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge
[ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411965#comment-15411965 ] ASF GitHub Bot commented on FLINK-2090: --- Github user mushketyk commented on the issue: https://github.com/apache/flink/pull/2323 @StephanEwen Sorry, somehow I missed your comment. I'll update the PR today. > toString of CollectionInputFormat takes long time when the collection is huge > - > > Key: FLINK-2090 > URL: https://issues.apache.org/jira/browse/FLINK-2090 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Assignee: Ivan Mushketyk >Priority: Minor > > The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on > its underlying {{Collection}}. Thus, {{toString}} is called for each element > of the collection. If the {{Collection}} contains many elements or the > individual {{toString}} calls for each element take a long time, then the > string generation can take a considerable amount of time. [~mikiobraun] > noticed that when he inserted several jBLAS matrices into Flink. > The {{toString}} method is mainly used for logging statements in > {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and > in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it > is necessary to print the complete content of the underlying {{Collection}} > or if it's not enough to print only the first 3 elements in the {{toString}} > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge
[ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411840#comment-15411840 ] ASF GitHub Bot commented on FLINK-2090: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2323 @mushketyk Are you going to update this pull request? > toString of CollectionInputFormat takes long time when the collection is huge > - > > Key: FLINK-2090 > URL: https://issues.apache.org/jira/browse/FLINK-2090 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Assignee: Ivan Mushketyk >Priority: Minor > > The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on > its underlying {{Collection}}. Thus, {{toString}} is called for each element > of the collection. If the {{Collection}} contains many elements or the > individual {{toString}} calls for each element take a long time, then the > string generation can take a considerable amount of time. [~mikiobraun] > noticed that when he inserted several jBLAS matrices into Flink. > The {{toString}} method is mainly used for logging statements in > {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and > in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it > is necessary to print the complete content of the underlying {{Collection}} > or if it's not enough to print only the first 3 elements in the {{toString}} > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge
[ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403709#comment-15403709 ] ASF GitHub Bot commented on FLINK-2090: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2323 Looks good. Can you remove the Guava dependency, though? We try to avoid Guava as much as possible, because it causes too many dependency issues... > toString of CollectionInputFormat takes long time when the collection is huge > - > > Key: FLINK-2090 > URL: https://issues.apache.org/jira/browse/FLINK-2090 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Assignee: Ivan Mushketyk >Priority: Minor > > The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on > its underlying {{Collection}}. Thus, {{toString}} is called for each element > of the collection. If the {{Collection}} contains many elements or the > individual {{toString}} calls for each element take a long time, then the > string generation can take a considerable amount of time. [~mikiobraun] > noticed that when he inserted several jBLAS matrices into Flink. > The {{toString}} method is mainly used for logging statements in > {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and > in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it > is necessary to print the complete content of the underlying {{Collection}} > or if it's not enough to print only the first 3 elements in the {{toString}} > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge
[ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402762#comment-15402762 ] ASF GitHub Bot commented on FLINK-2090: --- Github user mushketyk commented on the issue: https://github.com/apache/flink/pull/2323 Set maximum limit for the toString result, as suggested by Stephan here: https://issues.apache.org/jira/browse/FLINK-2090 > toString of CollectionInputFormat takes long time when the collection is huge > - > > Key: FLINK-2090 > URL: https://issues.apache.org/jira/browse/FLINK-2090 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Assignee: Ivan Mushketyk >Priority: Minor > > The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on > its underlying {{Collection}}. Thus, {{toString}} is called for each element > of the collection. If the {{Collection}} contains many elements or the > individual {{toString}} calls for each element take a long time, then the > string generation can take a considerable amount of time. [~mikiobraun] > noticed that when he inserted several jBLAS matrices into Flink. > The {{toString}} method is mainly used for logging statements in > {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and > in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it > is necessary to print the complete content of the underlying {{Collection}} > or if it's not enough to print only the first 3 elements in the {{toString}} > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge
[ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402759#comment-15402759 ] ASF GitHub Bot commented on FLINK-2090: --- GitHub user mushketyk opened a pull request: https://github.com/apache/flink/pull/2323 [FLINK-2090] toString of CollectionInputFormat takes long time when t… Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [x] General - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [x] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [x] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed …he collection is huge You can merge this pull request into a Git repository by running: $ git pull https://github.com/mushketyk/flink fast-to-string Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2323.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2323 commit 76c5b7dd1cf12b17b7601b2d1c8ea7cc475a031c Author: Ivan MushketykDate: 2016-08-01T19:39:17Z [FLINK-2090] toString of CollectionInputFormat takes long time when the collection is huge > toString of CollectionInputFormat takes long time when the collection is huge > - > > Key: FLINK-2090 > URL: https://issues.apache.org/jira/browse/FLINK-2090 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Assignee: Ivan Mushketyk >Priority: Minor > > The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on > its underlying {{Collection}}. Thus, {{toString}} is called for each element > of the collection. If the {{Collection}} contains many elements or the > individual {{toString}} calls for each element take a long time, then the > string generation can take a considerable amount of time. [~mikiobraun] > noticed that when he inserted several jBLAS matrices into Flink. > The {{toString}} method is mainly used for logging statements in > {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and > in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it > is necessary to print the complete content of the underlying {{Collection}} > or if it's not enough to print only the first 3 elements in the {{toString}} > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge
[ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401363#comment-15401363 ] Ivan Mushketyk commented on FLINK-2090: --- I will fix this. > toString of CollectionInputFormat takes long time when the collection is huge > - > > Key: FLINK-2090 > URL: https://issues.apache.org/jira/browse/FLINK-2090 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann >Priority: Minor > > The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on > its underlying {{Collection}}. Thus, {{toString}} is called for each element > of the collection. If the {{Collection}} contains many elements or the > individual {{toString}} calls for each element take a long time, then the > string generation can take a considerable amount of time. [~mikiobraun] > noticed that when he inserted several jBLAS matrices into Flink. > The {{toString}} method is mainly used for logging statements in > {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and > in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it > is necessary to print the complete content of the underlying {{Collection}} > or if it's not enough to print only the first 3 elements in the {{toString}} > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge
[ https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559083#comment-14559083 ] Stephan Ewen commented on FLINK-2090: - How about we print at most N (= 3 or 10) elements. After each element, we check whether the string buffer has more than 100 characters. If yes, we abort there. toString of CollectionInputFormat takes long time when the collection is huge - Key: FLINK-2090 URL: https://issues.apache.org/jira/browse/FLINK-2090 Project: Flink Issue Type: Improvement Reporter: Till Rohrmann Priority: Minor The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on its underlying {{Collection}}. Thus, {{toString}} is called for each element of the collection. If the {{Collection}} contains many elements or the individual {{toString}} calls for each element take a long time, then the string generation can take a considerable amount of time. [~mikiobraun] noticed that when he inserted several jBLAS matrices into Flink. The {{toString}} method is mainly used for logging statements in {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it is necessary to print the complete content of the underlying {{Collection}} or if it's not enough to print only the first 3 elements in the {{toString}} method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)