Re: What happened to our document repository for type detection tests?

2024-06-22 Thread Dave Fisher



> On Jun 22, 2024, at 6:59 AM, Marcus  wrote:
> 
> Am 22.06.24 um 14:53 schrieb Bidouille:
>>> I remember from old time that the QA team at Sun/Oracle had really a
>>> lot of documents for general and special testing.
>>> 
>>> These were not part of the code repository and were loaded from their
>>> own test software. Maybe this is the link to the storage outside of
>>> the project.
>> If you have an URL, you can try to get with the WayBack machine
>> https://wayback-api.archive.org/
> 
> they were stored on an internal server.

The Apache Tika and Apache POI projects make use of Common Crawl to create a 
large corpus for regression tests.

https://commoncrawl.org

Perhaps we can start to do the same? We can ask for help from Tika at 
[email protected] or POI at [email protected]

Best,
Dave

> 
> Marcus
> 
> 
> -
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: What happened to our document repository for type detection tests?

2024-06-22 Thread Marcus

Am 22.06.24 um 14:53 schrieb Bidouille:

I remember from old time that the QA team at Sun/Oracle had really a
lot of documents for general and special testing.

These were not part of the code repository and were loaded from their
own test software. Maybe this is the link to the storage outside of
the project.


If you have an URL, you can try to get with the WayBack machine
https://wayback-api.archive.org/


they were stored on an internal server.

Marcus


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: What happened to our document repository for type detection tests?

2024-06-22 Thread Pedro Lino
Hi Damjan

I believe Carl Marcum was working on automated testing related to file types 
many moons ago...

Maybe he has a backup copy or knows where these should be?

Best,
Pedro

> On 06/22/2024 2:56 AM WEST Damjan Jovanovic  wrote:
> 
>  
> Hi
> 
> While doing some analysis and refactoring of our unit tests, I found a
> really interesting - and important - test.
> 
> It's in main/filter/qa/complex/filter/detection/typeDetection, and it
> iterates through a repository of documents, trying to get OpenOffice to
> detect the type of each document, and verifying it guessed correctly. This
> is important for preventing (and helping fix) bugs such as 126270, where
> some regression caused us to stop opening single-file OpenDocument files.
> 
> In that directory, TypeDetection.props has the path to the documents:
> # UNIX:
> #TestDocumentPath=file:///net/margritte/usr/qaapi/dev/cws/filtercfg/docTypes
> # WINDOWS
> TestDocumentPath=//margritte/qaapi/dev/cws/filtercfg/docTypes
> 
> and files.csv has their filenames, including:
> Writer/AoE2a.rtf
> Writer/Text_DOS.txt
> Writer/Word2000.doc
> Writer/Word2000_template.dot
> any many more.
> 
> Those documents are not in Git, and were never in trunk.
> 
> It would be good if we could get those documents back and continue testing
> them.
> 
> Does anyone know what happened to them?
> 
> Regards
> Damjan

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: What happened to our document repository for type detection tests?

2024-06-22 Thread Bidouille




> I remember from old time that the QA team at Sun/Oracle had really a
> lot of documents for general and special testing.
> 
> These were not part of the code repository and were loaded from their
> own test software. Maybe this is the link to the storage outside of
> the project.

If you have an URL, you can try to get with the WayBack machine
https://wayback-api.archive.org/

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: What happened to our document repository for type detection tests?

2024-06-22 Thread Marcus

Am 22.06.24 um 03:56 schrieb Damjan Jovanovic:

...

Those documents are not in Git, and were never in trunk.

It would be good if we could get those documents back and continue testing
them.

Does anyone know what happened to them?


I remember from old time that the QA team at Sun/Oracle had really a lot 
of documents for general and special testing.


These were not part of the code repository and were loaded from their 
own test software. Maybe this is the link to the storage outside of the 
project.


When these documents were not part of the handover to the ASF, then I 
think they are gone.


Marcus


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



What happened to our document repository for type detection tests?

2024-06-21 Thread Damjan Jovanovic
Hi

While doing some analysis and refactoring of our unit tests, I found a
really interesting - and important - test.

It's in main/filter/qa/complex/filter/detection/typeDetection, and it
iterates through a repository of documents, trying to get OpenOffice to
detect the type of each document, and verifying it guessed correctly. This
is important for preventing (and helping fix) bugs such as 126270, where
some regression caused us to stop opening single-file OpenDocument files.

In that directory, TypeDetection.props has the path to the documents:
# UNIX:
#TestDocumentPath=file:///net/margritte/usr/qaapi/dev/cws/filtercfg/docTypes
# WINDOWS
TestDocumentPath=//margritte/qaapi/dev/cws/filtercfg/docTypes

and files.csv has their filenames, including:
Writer/AoE2a.rtf
Writer/Text_DOS.txt
Writer/Word2000.doc
Writer/Word2000_template.dot
any many more.

Those documents are not in Git, and were never in trunk.

It would be good if we could get those documents back and continue testing
them.

Does anyone know what happened to them?

Regards
Damjan