These PDFs aren't there to build an indexed database on a topic, but to have widely different bulk PDFs from many sources / creators.

Tilman

Am 29.06.2022 um 02:20 schrieb Albretch Mueller:
  kept at: https://corpora.tika.apache.org/base/packaged/pdfs/pdfs_202011/

  I think copies of the archived NYS Regents exams:

  https://www.nysl.nysed.gov/regentsexams.htm

  then click on the link in the one liner: "Browse all available Regents Exams"

  
https://nysl.ptfs.com/knowvation/app/consolidatedSearch/#search/v=list,c=1,q=qs%3D%5B*%5D%2Cfacet-fields%3D%5Bbrowse1_ss%3A%22All%20Government%20Collections%22%3E%3Ebrowse2_ss%3A%22New%20York%20State%20Government%20Documents%22%3E%3Ebrowse3_ss%3A%22Education%20Department%22%3E%3Ebrowse4_ss%3A%22Office%20of%20Elementary%2C%20Middle%2C%20Secondary%20and%20Continuing%20Education%22%3E%3Ebrowse5_ss%3A%22Office%20of%20Standards%2C%20Assessment%20and%20Reporting%22%3E%3Ebrowse6_ss%3A%22Regents%20high%20school%20examinations%22%5D%2CqueryType%3D%5B16%5D,sm=s,b=t,bs=ALPH%3AASC,sb=1%3Atitle%3AASC,l=library1_lib

  and why not, more recent versions of the Regents exams
(nysedregents.org) should be included, should be included, as well.
Legally, they are public domain.

  As part of my own research I am interested in corpora of
multi-encoded texts containing not only "natural language", but also
formulas, graphs, descriptive pictures, structural formula of a
chemical compound, ... The nysl ptfs site obfuscates links behind a
javascript wall. I think those links or the links' content should be
more descriptive. Who else would like to work on that?

  lbrtchx


Reply via email to