[ 
https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaiyao Ke updated TIKA-4254:
----------------------------
    Description: 
### Brief Description of the Bug

The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in the 
first run but fails in the second run in the same environment. The source of 
the problem is that each test execution initializes a new media type 
(`MimeType`) instance `testType` (same problem for `testType2`), and all media 
types across different test executions attempt to use the same name pattern 
`"rtg_sst_grb_0\\.5\\.\\d{8}"`. Therefore, in the second execution of the test, 
the line `this.repo.addPattern(testType, pattern, true);` will throw an error, 
since the name pattern is already used by the `testType` instance initiated 
from the first test execution. Specifically, in the second run, the `addGlob()` 
method of the `Pattern` class will assert conflict patterns and throw 
a`MimeTypeException`(line 123 in `Patterns.java`).

### Failure Message in the 2nd Test Run:
```
org.apache.tika.mime.MimeTypeException: Conflicting glob pattern: 
rtg_sst_grb_0\.5\.\d{8}
        at org.apache.tika.mime.Patterns.addGlob(Patterns.java:123)
        at org.apache.tika.mime.Patterns.add(Patterns.java:71)
        at org.apache.tika.mime.MimeTypes.addPattern(MimeTypes.java:450)
        at 
org.apache.tika.mime.TestMimeTypes.testJavaRegex(TestMimeTypes.java:851)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
```

### Reproduce

Use the `NIOInspector` plugin that supports rerunning individual tests in the 
same environment:
```
cd tika-parsers/tika-parsers-standard/tika-parsers-standard-package
mvn edu.illinois:NIOInspector:rerun 
-Dtest=org.apache.tika.mime.TestMimeTypes#testJavaRegex
```

### Proposed Fix

Declare `testType` and `testType2` as static variables and initialize them at 
class loading time. Therefore, repeated runs of `testJavaRegex()` will not 
conflict each other. All tests pass and are idempotent after the fix.

### Necessity of Fix

A fix is recommended as unit tests shall be idempotent, and state pollution 
shall be mitigated so that newly introduced tests do not fail in the future due 
to polluted shared states.

  was:
### Brief Description of the Bug

The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in the 
first run but fails in the second run in the same environment. The source of 
the problem is that each test execution initializes a new media type 
(`MimeType`) instance `testType` (same problem for `testType2`), and all media 
types across different test executions attempt to use the same name pattern 
`"rtg_sst_grb_0\\.5\\.\\d{8}"`. Therefore, in the second execution of the test, 
the line `this.repo.addPattern(testType, pattern, true);` will throw an error, 
since the name pattern is already used by the `testType` instance initiated 
from the first test execution. Specifically, in the second run, the `addGlob()` 
method of the `Pattern` class will assert conflict patterns and throw 
a`MimeTypeException`(line 123 in `Patterns.java`).

### Failure Message in the 2nd Test Run:
```
org.apache.tika.mime.MimeTypeException: Conflicting glob pattern: 
rtg_sst_grb_0\.5\.\d{8}
        at org.apache.tika.mime.Patterns.addGlob(Patterns.java:123)
        at org.apache.tika.mime.Patterns.add(Patterns.java:71)
        at org.apache.tika.mime.MimeTypes.addPattern(MimeTypes.java:450)
        at 
org.apache.tika.mime.TestMimeTypes.testJavaRegex(TestMimeTypes.java:851)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
```

### Reproduce

Use the `NIOInspector` plugin that supports rerunning individual tests in the 
same environment:
```
cd tika-parsers/tika-parsers-standard/tika-parsers-standard-package
mvn edu.illinois:NIODetector:rerun 
-Dtest=org.apache.tika.mime.TestMimeTypes#testJavaRegex
```

### Proposed Fix

Declare `testType` and `testType2` as static variables and initialize them at 
class loading time. Therefore, repeated runs of `testJavaRegex()` will not 
conflict each other. All tests pass and are idempotent after the fix.

### Necessity of Fix

A fix is recommended as unit tests shall be idempotent, and state pollution 
shall be mitigated so that newly introduced tests do not fail in the future due 
to polluted shared states.


> The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the 
> first run and fails in repeated runs in the same environment. 
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-4254
>                 URL: https://issues.apache.org/jira/browse/TIKA-4254
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Kaiyao Ke
>            Priority: Major
>
> ### Brief Description of the Bug
> The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in the 
> first run but fails in the second run in the same environment. The source of 
> the problem is that each test execution initializes a new media type 
> (`MimeType`) instance `testType` (same problem for `testType2`), and all 
> media types across different test executions attempt to use the same name 
> pattern `"rtg_sst_grb_0\\.5\\.\\d{8}"`. Therefore, in the second execution of 
> the test, the line `this.repo.addPattern(testType, pattern, true);` will 
> throw an error, since the name pattern is already used by the `testType` 
> instance initiated from the first test execution. Specifically, in the second 
> run, the `addGlob()` method of the `Pattern` class will assert conflict 
> patterns and throw a`MimeTypeException`(line 123 in `Patterns.java`).
> ### Failure Message in the 2nd Test Run:
> ```
> org.apache.tika.mime.MimeTypeException: Conflicting glob pattern: 
> rtg_sst_grb_0\.5\.\d{8}
>       at org.apache.tika.mime.Patterns.addGlob(Patterns.java:123)
>       at org.apache.tika.mime.Patterns.add(Patterns.java:71)
>       at org.apache.tika.mime.MimeTypes.addPattern(MimeTypes.java:450)
>       at 
> org.apache.tika.mime.TestMimeTypes.testJavaRegex(TestMimeTypes.java:851)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
> ```
> ### Reproduce
> Use the `NIOInspector` plugin that supports rerunning individual tests in the 
> same environment:
> ```
> cd tika-parsers/tika-parsers-standard/tika-parsers-standard-package
> mvn edu.illinois:NIOInspector:rerun 
> -Dtest=org.apache.tika.mime.TestMimeTypes#testJavaRegex
> ```
> ### Proposed Fix
> Declare `testType` and `testType2` as static variables and initialize them at 
> class loading time. Therefore, repeated runs of `testJavaRegex()` will not 
> conflict each other. All tests pass and are idempotent after the fix.
> ### Necessity of Fix
> A fix is recommended as unit tests shall be idempotent, and state pollution 
> shall be mitigated so that newly introduced tests do not fail in the future 
> due to polluted shared states.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to