[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2016-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413521#comment-15413521
 ] 

ASF GitHub Bot commented on FLINK-2090:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2323


> toString of CollectionInputFormat takes long time when the collection is huge
> -
>
> Key: FLINK-2090
> URL: https://issues.apache.org/jira/browse/FLINK-2090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Ivan Mushketyk
>Priority: Minor
> Fix For: 1.2.0
>
>
> The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on 
> its underlying {{Collection}}. Thus, {{toString}} is called for each element 
> of the collection. If the {{Collection}} contains many elements or the 
> individual {{toString}} calls for each element take a long time, then the 
> string generation can take a considerable amount of time. [~mikiobraun] 
> noticed that when he inserted several jBLAS matrices into Flink.
> The {{toString}} method is mainly used for logging statements in 
> {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and 
> in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it 
> is necessary to print the complete content of the underlying {{Collection}} 
> or if it's not enough to print only the first 3 elements in the {{toString}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2016-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413358#comment-15413358
 ] 

ASF GitHub Bot commented on FLINK-2090:
---

Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2323
  
Awesome! Thank you.


> toString of CollectionInputFormat takes long time when the collection is huge
> -
>
> Key: FLINK-2090
> URL: https://issues.apache.org/jira/browse/FLINK-2090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Ivan Mushketyk
>Priority: Minor
>
> The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on 
> its underlying {{Collection}}. Thus, {{toString}} is called for each element 
> of the collection. If the {{Collection}} contains many elements or the 
> individual {{toString}} calls for each element take a long time, then the 
> string generation can take a considerable amount of time. [~mikiobraun] 
> noticed that when he inserted several jBLAS matrices into Flink.
> The {{toString}} method is mainly used for logging statements in 
> {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and 
> in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it 
> is necessary to print the complete content of the underlying {{Collection}} 
> or if it's not enough to print only the first 3 elements in the {{toString}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2016-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413355#comment-15413355
 ] 

ASF GitHub Bot commented on FLINK-2090:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2323
  
Looks good, merging this...


> toString of CollectionInputFormat takes long time when the collection is huge
> -
>
> Key: FLINK-2090
> URL: https://issues.apache.org/jira/browse/FLINK-2090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Ivan Mushketyk
>Priority: Minor
>
> The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on 
> its underlying {{Collection}}. Thus, {{toString}} is called for each element 
> of the collection. If the {{Collection}} contains many elements or the 
> individual {{toString}} calls for each element take a long time, then the 
> string generation can take a considerable amount of time. [~mikiobraun] 
> noticed that when he inserted several jBLAS matrices into Flink.
> The {{toString}} method is mainly used for logging statements in 
> {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and 
> in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it 
> is necessary to print the complete content of the underlying {{Collection}} 
> or if it's not enough to print only the first 3 elements in the {{toString}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2016-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412528#comment-15412528
 ] 

ASF GitHub Bot commented on FLINK-2090:
---

Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2323
  
@StephanEwen I've updated the PR according to your review.


> toString of CollectionInputFormat takes long time when the collection is huge
> -
>
> Key: FLINK-2090
> URL: https://issues.apache.org/jira/browse/FLINK-2090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Ivan Mushketyk
>Priority: Minor
>
> The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on 
> its underlying {{Collection}}. Thus, {{toString}} is called for each element 
> of the collection. If the {{Collection}} contains many elements or the 
> individual {{toString}} calls for each element take a long time, then the 
> string generation can take a considerable amount of time. [~mikiobraun] 
> noticed that when he inserted several jBLAS matrices into Flink.
> The {{toString}} method is mainly used for logging statements in 
> {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and 
> in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it 
> is necessary to print the complete content of the underlying {{Collection}} 
> or if it's not enough to print only the first 3 elements in the {{toString}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2016-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411965#comment-15411965
 ] 

ASF GitHub Bot commented on FLINK-2090:
---

Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2323
  
@StephanEwen Sorry, somehow I missed your comment. I'll update the PR today.


> toString of CollectionInputFormat takes long time when the collection is huge
> -
>
> Key: FLINK-2090
> URL: https://issues.apache.org/jira/browse/FLINK-2090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Ivan Mushketyk
>Priority: Minor
>
> The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on 
> its underlying {{Collection}}. Thus, {{toString}} is called for each element 
> of the collection. If the {{Collection}} contains many elements or the 
> individual {{toString}} calls for each element take a long time, then the 
> string generation can take a considerable amount of time. [~mikiobraun] 
> noticed that when he inserted several jBLAS matrices into Flink.
> The {{toString}} method is mainly used for logging statements in 
> {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and 
> in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it 
> is necessary to print the complete content of the underlying {{Collection}} 
> or if it's not enough to print only the first 3 elements in the {{toString}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2016-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411840#comment-15411840
 ] 

ASF GitHub Bot commented on FLINK-2090:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2323
  
@mushketyk Are you going to update this pull request?


> toString of CollectionInputFormat takes long time when the collection is huge
> -
>
> Key: FLINK-2090
> URL: https://issues.apache.org/jira/browse/FLINK-2090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Ivan Mushketyk
>Priority: Minor
>
> The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on 
> its underlying {{Collection}}. Thus, {{toString}} is called for each element 
> of the collection. If the {{Collection}} contains many elements or the 
> individual {{toString}} calls for each element take a long time, then the 
> string generation can take a considerable amount of time. [~mikiobraun] 
> noticed that when he inserted several jBLAS matrices into Flink.
> The {{toString}} method is mainly used for logging statements in 
> {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and 
> in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it 
> is necessary to print the complete content of the underlying {{Collection}} 
> or if it's not enough to print only the first 3 elements in the {{toString}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2016-08-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403709#comment-15403709
 ] 

ASF GitHub Bot commented on FLINK-2090:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2323
  
Looks good. Can you remove the Guava dependency, though? We try to avoid 
Guava as much as possible, because it causes too many dependency issues...


> toString of CollectionInputFormat takes long time when the collection is huge
> -
>
> Key: FLINK-2090
> URL: https://issues.apache.org/jira/browse/FLINK-2090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Ivan Mushketyk
>Priority: Minor
>
> The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on 
> its underlying {{Collection}}. Thus, {{toString}} is called for each element 
> of the collection. If the {{Collection}} contains many elements or the 
> individual {{toString}} calls for each element take a long time, then the 
> string generation can take a considerable amount of time. [~mikiobraun] 
> noticed that when he inserted several jBLAS matrices into Flink.
> The {{toString}} method is mainly used for logging statements in 
> {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and 
> in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it 
> is necessary to print the complete content of the underlying {{Collection}} 
> or if it's not enough to print only the first 3 elements in the {{toString}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2016-08-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402762#comment-15402762
 ] 

ASF GitHub Bot commented on FLINK-2090:
---

Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2323
  
Set maximum limit for the toString result, as suggested by Stephan here: 
https://issues.apache.org/jira/browse/FLINK-2090 


> toString of CollectionInputFormat takes long time when the collection is huge
> -
>
> Key: FLINK-2090
> URL: https://issues.apache.org/jira/browse/FLINK-2090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Ivan Mushketyk
>Priority: Minor
>
> The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on 
> its underlying {{Collection}}. Thus, {{toString}} is called for each element 
> of the collection. If the {{Collection}} contains many elements or the 
> individual {{toString}} calls for each element take a long time, then the 
> string generation can take a considerable amount of time. [~mikiobraun] 
> noticed that when he inserted several jBLAS matrices into Flink.
> The {{toString}} method is mainly used for logging statements in 
> {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and 
> in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it 
> is necessary to print the complete content of the underlying {{Collection}} 
> or if it's not enough to print only the first 3 elements in the {{toString}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2016-08-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402759#comment-15402759
 ] 

ASF GitHub Bot commented on FLINK-2090:
---

GitHub user mushketyk opened a pull request:

https://github.com/apache/flink/pull/2323

[FLINK-2090] toString of CollectionInputFormat takes long time when t…

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

…he collection is huge

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mushketyk/flink fast-to-string

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2323


commit 76c5b7dd1cf12b17b7601b2d1c8ea7cc475a031c
Author: Ivan Mushketyk 
Date:   2016-08-01T19:39:17Z

[FLINK-2090] toString of CollectionInputFormat takes long time when the 
collection is huge




> toString of CollectionInputFormat takes long time when the collection is huge
> -
>
> Key: FLINK-2090
> URL: https://issues.apache.org/jira/browse/FLINK-2090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Assignee: Ivan Mushketyk
>Priority: Minor
>
> The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on 
> its underlying {{Collection}}. Thus, {{toString}} is called for each element 
> of the collection. If the {{Collection}} contains many elements or the 
> individual {{toString}} calls for each element take a long time, then the 
> string generation can take a considerable amount of time. [~mikiobraun] 
> noticed that when he inserted several jBLAS matrices into Flink.
> The {{toString}} method is mainly used for logging statements in 
> {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and 
> in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it 
> is necessary to print the complete content of the underlying {{Collection}} 
> or if it's not enough to print only the first 3 elements in the {{toString}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2016-07-31 Thread Ivan Mushketyk (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401363#comment-15401363
 ] 

Ivan Mushketyk commented on FLINK-2090:
---

 I will fix this.

> toString of CollectionInputFormat takes long time when the collection is huge
> -
>
> Key: FLINK-2090
> URL: https://issues.apache.org/jira/browse/FLINK-2090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Till Rohrmann
>Priority: Minor
>
> The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on 
> its underlying {{Collection}}. Thus, {{toString}} is called for each element 
> of the collection. If the {{Collection}} contains many elements or the 
> individual {{toString}} calls for each element take a long time, then the 
> string generation can take a considerable amount of time. [~mikiobraun] 
> noticed that when he inserted several jBLAS matrices into Flink.
> The {{toString}} method is mainly used for logging statements in 
> {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and 
> in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it 
> is necessary to print the complete content of the underlying {{Collection}} 
> or if it's not enough to print only the first 3 elements in the {{toString}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2090) toString of CollectionInputFormat takes long time when the collection is huge

2015-05-26 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559083#comment-14559083
 ] 

Stephan Ewen commented on FLINK-2090:
-

How about we print at most N (= 3 or 10) elements. After each element, we check 
whether the string buffer has more than 100 characters. If yes, we abort there.

 toString of CollectionInputFormat takes long time when the collection is huge
 -

 Key: FLINK-2090
 URL: https://issues.apache.org/jira/browse/FLINK-2090
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Priority: Minor

 The {{toString}} method of {{CollectionInputFormat}} calls {{toString}} on 
 its underlying {{Collection}}. Thus, {{toString}} is called for each element 
 of the collection. If the {{Collection}} contains many elements or the 
 individual {{toString}} calls for each element take a long time, then the 
 string generation can take a considerable amount of time. [~mikiobraun] 
 noticed that when he inserted several jBLAS matrices into Flink.
 The {{toString}} method is mainly used for logging statements in 
 {{DataSourceNode}}'s {{computeOperatorSpecificDefaultEstimates}} method and 
 in {{JobGraphGenerator.getDescriptionForUserCode}}. I'm wondering whether it 
 is necessary to print the complete content of the underlying {{Collection}} 
 or if it's not enough to print only the first 3 elements in the {{toString}} 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)