kept at: https://corpora.tika.apache.org/base/packaged/pdfs/pdfs_202011/
I think copies of the archived NYS Regents exams: https://www.nysl.nysed.gov/regentsexams.htm then click on the link in the one liner: "Browse all available Regents Exams" https://nysl.ptfs.com/knowvation/app/consolidatedSearch/#search/v=list,c=1,q=qs%3D%5B*%5D%2Cfacet-fields%3D%5Bbrowse1_ss%3A%22All%20Government%20Collections%22%3E%3Ebrowse2_ss%3A%22New%20York%20State%20Government%20Documents%22%3E%3Ebrowse3_ss%3A%22Education%20Department%22%3E%3Ebrowse4_ss%3A%22Office%20of%20Elementary%2C%20Middle%2C%20Secondary%20and%20Continuing%20Education%22%3E%3Ebrowse5_ss%3A%22Office%20of%20Standards%2C%20Assessment%20and%20Reporting%22%3E%3Ebrowse6_ss%3A%22Regents%20high%20school%20examinations%22%5D%2CqueryType%3D%5B16%5D,sm=s,b=t,bs=ALPH%3AASC,sb=1%3Atitle%3AASC,l=library1_lib and why not, more recent versions of the Regents exams (nysedregents.org) should be included, should be included, as well. Legally, they are public domain. As part of my own research I am interested in corpora of multi-encoded texts containing not only "natural language", but also formulas, graphs, descriptive pictures, structural formula of a chemical compound, ... The nysl ptfs site obfuscates links behind a javascript wall. I think those links or the links' content should be more descriptive. Who else would like to work on that? lbrtchx
