[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199434#comment-16199434 ] Haibo Chen commented on HADOOP-12436: - [~aw] [~mattpaduano] This seems an incompatible change given GlobFilter and RegexFilter are Public Evolving. Hence, I have added an incompatible tag. Feel free to remove it if you disagree > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0-alpha1 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch, HADOOP-12436.05.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255017#comment-15255017 ] Harsh J commented on HADOOP-12436: -- This change subtly fixes the issue described in HADOOP-13051 (test-case added there for regression's sake) > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch, HADOOP-12436.05.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969608#comment-14969608 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #569 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/569/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 4c0bae240bea9a475e8ee9a0b081bfce6d1cd1e5) * LICENSE.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobFilter.java * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/pom.xml > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch, HADOOP-12436.05.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969505#comment-14969505 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Yarn-trunk #1305 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1305/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 4c0bae240bea9a475e8ee9a0b081bfce6d1cd1e5) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-project/pom.xml * LICENSE.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobFilter.java * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch, HADOOP-12436.05.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969975#comment-14969975 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Hdfs-trunk #2464 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2464/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 4c0bae240bea9a475e8ee9a0b081bfce6d1cd1e5) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * LICENSE.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobFilter.java * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-project/pom.xml > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch, HADOOP-12436.05.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969865#comment-14969865 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #527 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/527/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 4c0bae240bea9a475e8ee9a0b081bfce6d1cd1e5) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobFilter.java * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * LICENSE.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/pom.xml > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch, HADOOP-12436.05.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969873#comment-14969873 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2516 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2516/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 4c0bae240bea9a475e8ee9a0b081bfce6d1cd1e5) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * LICENSE.txt * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/pom.xml > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch, HADOOP-12436.05.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969174#comment-14969174 ] Hadoop QA commented on HADOOP-12436: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 20s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 3s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 11s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 3s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 17s {color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 0s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 46m 30s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-10-22 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12767470/HADOOP-12436.05.patch | | JIRA Issue | HADOOP-12436 | | Optional Tests | asflicense javac javadoc mvninstall unit xml findbugs checkstyle compile | | uname | Linux c46774885f7c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HADOOP-Build/patchprocess/apache-yetus-28a3a3d/dev-support/personality/hadoop.sh | | git revision | trunk / 381610d | | Default Java | 1.7.0_79 | | Multi-JDK versions |
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969308#comment-14969308 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-trunk-Commit #8689 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8689/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 4c0bae240bea9a475e8ee9a0b081bfce6d1cd1e5) * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobFilter.java * LICENSE.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/pom.xml > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch, HADOOP-12436.05.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969379#comment-14969379 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #584 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/584/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 4c0bae240bea9a475e8ee9a0b081bfce6d1cd1e5) * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * LICENSE.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch, HADOOP-12436.05.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959151#comment-14959151 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Yarn-trunk #1273 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1273/]) Revert "HADOOP-12436. GlobPattern regex library has performance issues (aw: rev dc45a7a7c4920a60424d60aca07a72a9eb909fe2) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * LICENSE.txt * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/pom.xml > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959087#comment-14959087 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-trunk-Commit #8644 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8644/]) Revert "HADOOP-12436. GlobPattern regex library has performance issues (aw: rev dc45a7a7c4920a60424d60aca07a72a9eb909fe2) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * LICENSE.txt > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959111#comment-14959111 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #550 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/550/]) Revert "HADOOP-12436. GlobPattern regex library has performance issues (aw: rev dc45a7a7c4920a60424d60aca07a72a9eb909fe2) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * LICENSE.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/pom.xml * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/CHANGES.txt > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959085#comment-14959085 ] Allen Wittenauer commented on HADOOP-12436: --- So a few things: a) we are clearly missing test coverage in common, since this issue wasn't detected there. Those tests should probably be either moved or at least replicated over in common for better, more complete testing. b) we're hitting a (documented!) incompatibility between com.google.re2j.PatternSyntaxException and java.util.regex.PatternSyntaxException c) GlobPattern is Private, Evolving . GlobFilter is Public, Evolving but it converts the PatternSyntaxException to IOException, so even though this is an incompatibility, no deprecation should be required. That said, we should definitely scan the source for any other calls into GlobPattern to see if they are processing PatternSyntaxException. > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959576#comment-14959576 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #502 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/502/]) Revert "HADOOP-12436. GlobPattern regex library has performance issues (aw: rev dc45a7a7c4920a60424d60aca07a72a9eb909fe2) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-project/pom.xml * LICENSE.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959159#comment-14959159 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2486 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2486/]) Revert "HADOOP-12436. GlobPattern regex library has performance issues (aw: rev dc45a7a7c4920a60424d60aca07a72a9eb909fe2) * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/pom.xml * LICENSE.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959184#comment-14959184 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #537 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/537/]) Revert "HADOOP-12436. GlobPattern regex library has performance issues (aw: rev dc45a7a7c4920a60424d60aca07a72a9eb909fe2) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-project/pom.xml * LICENSE.txt > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959450#comment-14959450 ] Hudson commented on HADOOP-12436: - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2439 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2439/]) Revert "HADOOP-12436. GlobPattern regex library has performance issues (aw: rev dc45a7a7c4920a60424d60aca07a72a9eb909fe2) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * LICENSE.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957143#comment-14957143 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-trunk-Commit #8632 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8632/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 0d77e85f0aa503fdb826886d867fe61c9e984073) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * LICENSE.txt * hadoop-common-project/hadoop-common/pom.xml > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957478#comment-14957478 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #529 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/529/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 0d77e85f0aa503fdb826886d867fe61c9e984073) * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * LICENSE.txt * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957376#comment-14957376 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #541 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/541/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 0d77e85f0aa503fdb826886d867fe61c9e984073) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * LICENSE.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957296#comment-14957296 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2477 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2477/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 0d77e85f0aa503fdb826886d867fe61c9e984073) * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * LICENSE.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/CHANGES.txt > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957512#comment-14957512 ] Hudson commented on HADOOP-12436: - SUCCESS: Integrated in Hadoop-Yarn-trunk #1265 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1265/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 0d77e85f0aa503fdb826886d867fe61c9e984073) * hadoop-project/pom.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * LICENSE.txt > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957660#comment-14957660 ] Hudson commented on HADOOP-12436: - FAILURE: Integrated in Hadoop-Hdfs-trunk #2433 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2433/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 0d77e85f0aa503fdb826886d867fe61c9e984073) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/CHANGES.txt * LICENSE.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java * hadoop-project/pom.xml > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957928#comment-14957928 ] Hudson commented on HADOOP-12436: - ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #496 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/496/]) HADOOP-12436. GlobPattern regex library has performance issues with (aw: rev 0d77e85f0aa503fdb826886d867fe61c9e984073) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/AbstractPatternFilter.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java * hadoop-project/pom.xml * LICENSE.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/RegexFilter.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/pom.xml * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestGlobPattern.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/filter/GlobFilter.java > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Fix For: 3.0.0 > > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956164#comment-14956164 ] Hadoop QA commented on HADOOP-12436: \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 21m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 3s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 30s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 2m 4s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 49s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 21s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 8m 46s | Tests passed in hadoop-common. | | | | 59m 10s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12766441/HADOOP-12436.04.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 40cac59 | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HADOOP-Build/7807/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/7807/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/7807/console | This message was automatically generated. > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch, HADOOP-12436.04.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955873#comment-14955873 ] Allen Wittenauer commented on HADOOP-12436: --- *sigh* Need a rebase'd patch. :( > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951464#comment-14951464 ] Hadoop QA commented on HADOOP-12436: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 27s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 17s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 31s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 18s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 10s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 53s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 6m 36s | Tests failed in hadoop-common. | | | | 51m 28s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.net.TestDNS | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765930/HADOOP-12436.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / def374e | | Release Audit | https://builds.apache.org/job/PreCommit-HADOOP-Build/7790/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HADOOP-Build/7790/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/7790/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/7790/console | This message was automatically generated. > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951296#comment-14951296 ] Allen Wittenauer commented on HADOOP-12436: --- Argh. I forgot to mention the 'paperwork' component. The re2 license should be added to the LICENSE file at the root of the tree. There are some examples there for other, non-Apache projects. > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951564#comment-14951564 ] Hadoop QA commented on HADOOP-12436: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 4s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 24s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 20s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 6s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 52s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 6m 54s | Tests passed in hadoop-common. | | | | 48m 27s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765930/HADOOP-12436.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / def374e | | Release Audit | https://builds.apache.org/job/PreCommit-HADOOP-Build/7792/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HADOOP-Build/7792/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/7792/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/7792/console | This message was automatically generated. > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch, > HADOOP-12436.03.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951021#comment-14951021 ] Allen Wittenauer commented on HADOOP-12436: --- bq. + 1.0 This should be parameterized from hadoop-project/pom.xml. > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: HADOOP-12436.01.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943888#comment-14943888 ] Colin Patrick McCabe commented on HADOOP-12436: --- Thanks for this, [~mattpaduano]. I don't think there are any API issues since {{GlobPattern}} has {{InterfaceAnnotation private}}. I do think adding a new dependency could be messy and we should consider shading it, since it seems like a small utility library. > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: HADOOP-12436.01.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935157#comment-14935157 ] Daniel Templeton commented on HADOOP-12436: --- The tests passed, so that's a good sign. Have you tried spinning up a cluster with the patch and banging on it a bit? > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: HADOOP-12436.01.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905231#comment-14905231 ] Matthew Paduano commented on HADOOP-12436: -- Propose switching the java.util.regex library for com.google.re2j. One possible concern: The public interface of GlobPattern does permit users to obtain a reference to the Pattern objects. re2j does not claim to be a drop in replacement, so this might break something somewhere. Please find proposed patchfile attached. > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12436) GlobPattern regex library has performance issues with wildcard characters
[ https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905557#comment-14905557 ] Hadoop QA commented on HADOOP-12436: \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 24s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 53s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 4s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 11s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 51s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 23m 21s | Tests passed in hadoop-common. | | | | 64m 18s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761973/HADOOP-12436.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1f707ec | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HADOOP-Build/7695/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/7695/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/7695/console | This message was automatically generated. > GlobPattern regex library has performance issues with wildcard characters > - > > Key: HADOOP-12436 > URL: https://issues.apache.org/jira/browse/HADOOP-12436 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.2.0, 2.7.1 >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: HADOOP-12436.01.patch > > > java.util.regex classes have performance problems with certain wildcard > patterns. Namely, consecutive * characters in a file name (not properly > escaped as literals) will cause commands such as "hadoop fs -ls > file**name" to consume 100% CPU and probably never return in a reasonable > time (time scales with number of *'s). > Here is an example: > {noformat} > hadoop fs -touchz > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > hadoop fs -ls > /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+**+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist > {noformat} > causes: > {noformat} > PIDCOMMAND %CPU TIME > 14526 java 100.0 01:18.85 > {noformat} > Not every string of *'s causes this, but the above filename reproduces this > reliably. -- This message was sent by Atlassian JIRA (v6.3.4#6332)