Remaining rdflib-jsonld work: - Connection Negotiation So that e.g. @context: https://schema.org/ correctly resolves the Link: header
- Only access external resources if RDFLIB_CONFIG or rdflibconfig['allow_access_external_resources'] instead of by default, which is what 6.0 is currently doing: "URLInputSource can be abused to retrieve arbitrary documents if used naïvely" https://github.com/RDFLib/rdflib/issues/1369 - Should RDFlib cache @contexts with requests-level caching with requests-cache or CacheControl or something else? - Should RDFlib cache at least contexts in the JSON LD Recommended Context [so that @context: https://schema.org/ works out of the box]? https://github.com/w3c/json-ld-rc/blob/main/context.jsonld On Sat, Jul 31, 2021 at 8:03 AM Wes Turner <wes.tur...@gmail.com> wrote: > > > On Sat, Jul 31, 2021 at 6:45 AM Wes Turner <wes.tur...@gmail.com> wrote: > >> >> >> On Wed, Jul 28, 2021 at 4:49 AM Miel Vander Sande < >> miel.vandersa...@meemoo.be> wrote: >> >>> Hi Nick, >>> >>> TBH, it's pretty much a function that converts a Dict or a JSON file in >>> a streaming fashion: >>> https://github.com/viaacode/construction-site/blob/main/construction_site/parse_functions.py. >>> I think it's a stand-alone thing; I don't plan anything extra on that >>> specifically, with maybe the exception of a cmd interface (hence the >>> proposed refactoring of csv2rdf) >>> >> >> Profiling / [comparative] benchmarks with e.g. Scalene [1][2] and/or >> perfplot [3] (%timeit) [4][5] could be worthwhile. >> >> [1] >> https://awesomeopensource.com/project/plasma-umass/scalene?categoryPage=26 >> [2] https://github.com/plasma-umass/scalene >> [3] https://github.com/nschloe/perfplot >> [4] >> https://docs.python.org/3/library/timeit.html#timeit-command-line-interface >> [5] >> https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit >> > > From https://twitter.com/westurner/status/1274571490688630785 plus > further references: > > Other methods for CSV + transforms => RDF? >> - #rdflib csv2rdf >> > > >> - COW > > - url: https://github.com/CLARIAH/COW > - desc: Integrated CSV to RDF converter, using CSVW and nanopublications > > - #CSVW https://github.com/cldf/csvw/blob/master/README.md#see-also >> > > Could https://github.com/cldf/csvw/ be modified to support (1) > json-ld-streaming; and (2) alternate csv parsers? > > - @kidehen sponger / rdfm_yq_parse_csv()? > > - > http://docs.openlinksw.com/virtuoso/rdfspongerprogrammerguide/#virtuosospongeroverviewcartarch > - > https://github.com/openlink/Virtuoso-RDFIzer-Mapper-Scripts/blob/master/rdf_mappers.sql > > - #tarql >> > - Web: https://tarql.github.io/ > - Src: https://github.com/tarql/tarql > - ProgrammingLanguage: Java > > - #csv2rdf GH topic: https://github.com/topics/csv2rdf >> > > #csv2rdf > https://github.com/topics/csv2rdf > > >> Adding columnar & dataset-level metadata *with URIs* is the value add >> here, IMHO #LR > > > "7 metadata header rows (column label, property URI path, DataType, unit, > accuracy, precision, significant figures)" > > https://wrdrd.github.io/docs/consulting/linkedreproducibility#csv-csvw-and-metadata-rows > > Example Table A with 7 metadata header rows: > https://wrdrd.github.io/docs/consulting/linkedreproducibility#id4 > > The csv2rdf tool would need to optionally read this additional metadata > from either additional header rows or an external 'header' file. > > >> ijson [6] looks like it has some interesting features; iterative, >> asyncio, push. How does the performance compare? >> >> [6] https://pypi.org/project/ijson/ >> >> I do plan to develop more components that assist scalable ETL, >>> data-to-rdf like tasks. This includes a plugin for Apache Airflow >>> ("provider"), which would be good as a RDFLib family repository. >>> >> >> - The datasette and dogsheep projects have a bunch of *-to-sqlite utils >> and an interface that a number of projects on PyPI have implemented: >> https://datasette.io/tools >> https://github.com/dogsheep >> >> https://pypi.org/search/?q=to-sqlite >> - https://datasette.io/tools/csvs-to-sqlite >> - >> https://github.com/simonw/csvs-to-sqlite/blob/a8a37a016790dc93270c74e32d0a5051bc5a0f4d/tests/test_csvs_to_sqlite.py#L417-L446 >> - parse datetimes in CSVs >> - xsd:datetime (and schema.org/Date and schema.org/dateCreated >> and schema.org/dateModified) specifies that time will be specified in >> ISO8601 formats >> >> What are the solutions for generating RDFS schema from CSVs and SQL >> tables? >> >> - https://pypi.org/project/tablib/ >> https://github.com/jazzband/tablib/blob/master/tests/test_tablib.py >> - doesn't do anything with datatypes FWICS >> >> - https://sqlite-utils.datasette.io/en/stable/cli.html#showing-the-schema >> https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/db.py >> >> def suggest_column_types: >> >> https://github.com/simonw/sqlite-utils/blob/c7e8d72be9fe8fe0811f685a18eebc637662d41b/sqlite_utils/utils.py#L29-L58 >> >> - https://en.wikipedia.org/wiki/Knowledge_extraction >> >> https://en.wikipedia.org/wiki/Knowledge_extraction#Relational_databases_to_RDF >> >> - https://pypi.org/project/rdb2rdf/ >> https://github.com/nisavid/pyrdb2rdf/blob/master/rdb2rdf/stores.py >> > PyRDB2RDF provides RDFLib with an interface to relational databases >> as RDF stores. The underlying data is accessed via SQLAlchemy. It is mapped >> to RDF according to the specifications of RDB2RDF. The corresponding RDF >> graph is represented as an RDFLib graph. >> > >> > Translating from relational data to RDF via direct mapping is >> currently supported. Translating in the other direction and mapping with >> R2RML are planned but not yet implemented. >> >> - https://pypi.org/project/rdfizer/ >> >> - https://github.com/RDFLib/pyTARQL >> - Does this handle datetimes? >> >> - Generate JSONschema from JSON and SHACL from JSON-Schema: >> >> https://stackoverflow.com/questions/7341537/tool-to-generate-json-schema-from-json-data/30294535#30294535 >> - https://pypi.org/project/genson/ has been recently updated >> - Src: https://github.com/wolverdude/genson/ >> - https://github.com/mulesoft-labs/json-ld-schema >> https://github.com/mulesoft-labs/json-ld-schema#how-does-it-work >> > JSON-LD Schema defines a simple 'semantics' JSON-Schema vocabulary >> (effectively a JSON-Schema meta-schema) that reuses the official JSON >> Schema for JSON-LD to provide definitions for @context and @type >> properties. These annotations can be used to provide JSON-LD context for a >> JSON-Schema document. Provided this JSON-LD context, constraints over named >> 'properties' in a JSON Schema document can be understood as constraints >> over CURIES of JSON-LD documents following the context rules defined in the >> JSON-LD specification. >> >> ## CSVW: CSV on the Web >> >> - Homepage: https://w3c.github.io/csvw/ >> - Standard: https://www.w3.org/TR/tabular-data-model/ >> - Standard: https://www.w3.org/TR/tabular-metadata/ >> - Standard: https://www.w3.org/TR/csv2json/ >> - Standard: https://www.w3.org/TR/csv2rdf/ >> - Namespace: https://www.w3.org/ns/csvw# >> - xmlns: `@prefix csvw: <https://www.w3.org/ns/csvw#> .` >> - @context: https://www.w3.org/ns/csvw.jsonld >> >> CSVW (*CSV on the Web*) is a set of relatively new standards >> for representing :ref:`CSV` rows and columns >> as :ref:`RDF` (and :ref:`JSON` / :ref:`JSON-LD`) >> along with *metadata*. >> >> * URIs for datatypes (XSD) >> > > FWIU, there is not yet a vocabulary for physical units like meters**2 in > the JSON-LD Recommended Context: > https://w3c.github.io/json-ld-rc/context.jsonld > https://github.com/w3c/json-ld-rc > https://github.com/w3c/json-ld-rc/blob/main/context.jsonld > > QUDT is one such vocabulary: > ```turtle > qudt-quantity:Time > rdf:type qudt:SpaceAndTimeQuantityKind ; > rdfs:label "Time"^^xsd:string ; > qudt:description "Time is a basic component of the measuring system > used to sequence events, to compare the durations of events and the > intervals between them, and to quantify the motions of > objects."^^xsd:string ; > qudt:symbol "T"^^xsd:string ; > skos:exactMatch <http://dbpedia.org/resource/Time> . > > # ... > unit:SecondTime > rdf:type qudt:SIBaseUnit , qudt:TimeUnit ; > rdfs:label "Second"^^xsd:string ; > qudt:abbreviation "s"^^xsd:string ; > qudt:code "1615"^^xsd:string ; > qudt:conversionMultiplier > "1"^^xsd:double ; > qudt:conversionOffset > "0.0"^^xsd:double ; > qudt:symbol "s"^^xsd:string ; > skos:exactMatch <http://dbpedia.org/resource/Second> . > # ... > ``` > http://www.qudt.org/qudt/owl/1.0.0/unit/Instances.html#SecondTime > > ... We must be able to say that the numbers in a column have a physical > unit with URI; to specify columnar metadata so that downstream tools don't > need to try to sniff and cast between datatypes and lossily drop units from > strings in column names: > > - (_datatype_ _physical_unit_): > - (float64, "unit:SecondTime",) > - (float64, unit["SecondTime"],) > > > >> * URIs for columns (RDF) >> * Document Metadata >> * CSV -> JSON (-> JSON-LD -> RDF) >> * CSV -> RDF >> >> Could there be a file naming convention for specifying the extra CSVW >> header to apply_to or transform zero or more CSV files with? >> >> filename.csv >> filename.csv.csvw >> filename.csv.csvwheader.jsonld.json >> filename.csv.csvw.jsonld.json >> > ```python > > uri = 'filename.csv' > if Path(uri + 'csvw.jsonld.json').exists(): > read_csvw(uri, *args, **kwargs) > else: > read_csv(uri, *args, **kwargs) > > ``` > >> >> https://www.w3.org/TR/tabular-data-primer/ >> >> >>> Best, >>> >>> Miel >>> >>> Op wo 28 jul. 2021 om 06:09 schreef Nicholas Car < >>> nicholas....@surroundaustralia.com>: >>> >>>> Hi Meil, >>>> >>>> Yes, all offers of contribution are of interest! The CSV 2 RDF stuff is >>>> very old and many tools related to it, such as pyTARQL ( >>>> https://github.com/RDFLib/pyTARQL), are missing. Are you planning on >>>> presenting JSON2RDF as a new plugin to RDFlib? that may be an option, >>>> however remember that another option is also just to present your tool's >>>> repository within RDFlib's family of repositories (i.e. within >>>> https://github.com/RDFLib) and the choice will depend on how stable >>>> the tool is and how you see it's future development going. >>>> >>>> But perhaps you have other things in mind? Whatever the case, we'd love >>>> to hear your plans. >>>> >>>> Cheers, >>>> >>>> Nick >>>> >>>> On Tue, Jul 27, 2021 at 5:56 PM Miel Vander Sande < >>>> miel.vandersa...@meemoo.be> wrote: >>>> >>>>> Hi all, >>>>> >>>>> little late to the party, but what a great effort this is! Congrats >>>>> with the release and thank you; this library is super essential to my work >>>>> and it makes RDF usable in ways other libraries can't. >>>>> >>>>> Sidenote: I have a streaming direct json-to-rdf mapping implementation >>>>> (port of https://github.com/AtomGraph/JSON2RDF) that I'd like to >>>>> contribute, possibly in combination with a refactoring of >>>>> https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.tools.html#rdflib.tools.csv2rdf.CSV2RDF. >>>>> Would that be of interest? >>>>> >>>> >> Does JSON2RDF [need to] implement the w3c json-ld-streaming spec [7]? >> [7] https://w3c.github.io/json-ld-streaming/ >> >> - https://w3c.github.io/json-ld-streaming/#streaming-document-form >> - https://w3c.github.io/json-ld-streaming/#streaming-rdf-form >> >> >>> >>>>> Best, >>>>> >>>>> Miel >>>>> >>>>> Op di 20 jul. 2021 om 21:58 schreef Natanael Arndt <arn...@gmail.com>: >>>>> >>>>>> I've retweetet the tweet by jarven. But I don't use reddit or hacker >>>>>> news, I think also semantic web mailing list would be a good idea. >>>>>> >>>>>> If you'd like to post something in the channels, please do so. >>>>>> >>>>>> Natanael >>>>>> >>>>>> Am 20. Juli 2021 20:14:09 MESZ schrieb Wes Turner < >>>>>> wes.tur...@gmail.com>: >>>>>> >Congrats and thanks! >>>>>> > >>>>>> >From the release notes on the Release: >>>>>> >https://github.com/RDFLib/rdflib/releases/tag/6.0.0 >>>>>> > >>>>>> >``` >>>>>> >6.0.0 is a major stable release that drops support for Python 2 and >>>>>> >Python >>>>>> >3 < 3.7. Type hinting is now present in much >>>>>> >of the toolkit as a result. >>>>>> > >>>>>> >It includes the formerly independent JSON-LD parser/serializer, >>>>>> >improvements to Namespaces that allow for IDE namespace >>>>>> >prompting, simplified use of g.serialize() (turtle default, no need >>>>>> to >>>>>> >decode()) and many other updates to >>>>>> >documentation, store backends and so on. >>>>>> > >>>>>> >Performance of the in-memory store has also improved since Python 3.6 >>>>>> >dictionary improvements. >>>>>> > >>>>>> >There are numerous supplementary improvements to the toolkit too, >>>>>> such >>>>>> >as: >>>>>> > >>>>>> >- inclusion of Docker files for easier CI/CD >>>>>> >- black config files for standardised code formatting >>>>>> >- improved testing with mock SPARQL stores, rather than a reliance on >>>>>> >DBPedia etc >>>>>> >``` >>>>>> > >>>>>> >Have there been ANN posts to e.g. Hacker news and e.g. >>>>>> /r/semanticweb? >>>>>> > >>>>>> >On Tue, Jul 20, 2021, 10:23 Florent Georges <fgeor...@gmail.com> >>>>>> wrote: >>>>>> > >>>>>> >> Congratulations, and thank you all for the hard work! >>>>>> >> >>>>>> >> -- >>>>>> >> Florent Georges >>>>>> >> H2O Consulting >>>>>> >> http://h2o.consulting/ >>>>>> >> >>>>>> >> On Tue, Jul 20, 2021, 16:00 Nicholas Car < >>>>>> >> nicholas....@surroundaustralia.com> wrote: >>>>>> >> >>>>>> >>> Hi all, >>>>>> >>> >>>>>> >>> Yes, 6.0.0 is out: >>>>>> >>> >>>>>> >>> - https://pypi.org/project/rdflib/6.0.0/ >>>>>> >>> - https://github.com/RDFLib/rdflib/releases/tag/6.0.0 >>>>>> >>> >>>>>> >>> Please publicise this release: it has a lot of stuff since 5.0.0 >>>>>> in >>>>>> >April >>>>>> >>> last year. >>>>>> >>> >>>>>> >>> Thank you very much to all of you who contributed, in particular >>>>>> my >>>>>> >>> co-maintainers, Ashley & Natanael and Edmond, Iwan, Tom, Remi, >>>>>> >Harold and >>>>>> >>> all the PR and Issue creators. Thanks also to the institutions >>>>>> that >>>>>> >>> provided time for their staff to contribute. >>>>>> >>> >>>>>> >>> If you see issues, please let the co-maintainers know straight >>>>>> away: >>>>>> >we >>>>>> >>> keen to get a 6.0.1 release out shortly (like weeks to a month) to >>>>>> >speed up >>>>>> >>> the RDFlib release cycle. >>>>>> >>> >>>>>> >>> Cheers, >>>>>> >>> >>>>>> >>> Nick >>>>>> >>> >>>>>> >>> -- >>>>>> >>> kind regards >>>>>> >>> Dr Nicholas Car >>>>>> >>> Data Systems Architect >>>>>> >>> >>>>>> >>> SURROUND Australia Pty Ltd and >>>>>> >>> SURROUND NZ Limited >>>>>> >>> >>>>>> >>> Address Level 9, Nishi Building, >>>>>> >>> 2 Phillip Law Street >>>>>> >>> New Acton Canberra 2601 >>>>>> >>> Mobile +61 477 560 177 >>>>>> >>> Email nicholas....@surroundaustralia.com >>>>>> >>> Website https://www.surroundaustralia.com >>>>>> >>> >>>>>> >>> Enhancing Intelligence Within Organisations >>>>>> >>> delivering evidence that connects decisions to outcomes >>>>>> >>> >>>>>> >>> Dr Nicholas Car >>>>>> >>> Adjunct Senior Lecturer >>>>>> >>> >>>>>> >>> Research School of Computer Science >>>>>> >>> >>>>>> >>> The Australian National University, >>>>>> >>> Canberra ACT Australia >>>>>> >>> +61 477 560 177 >>>>>> >>> nicholas....@anu.edu.au >>>>>> >>> https://cs.anu.edu.au/people/nicholas-car >>>>>> >>> https://orcid.org/0000-0002-8742-7730 >>>>>> ><https://www.surroundaustralia.com> >>>>>> >>> >>>>>> >>> >>>>>> >>> -- >>>>>> >>> http://github.com/RDFLib >>>>>> >>> --- >>>>>> >>> You received this message because you are subscribed to the Google >>>>>> >Groups >>>>>> >>> "rdflib-dev" group. >>>>>> >>> To unsubscribe from this group and stop receiving emails from it, >>>>>> >send an >>>>>> >>> email to rdflib-dev+unsubscr...@googlegroups.com. >>>>>> >>> To view this discussion on the web visit >>>>>> >>> >>>>>> > >>>>>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh19yjpwB8EoHVqs5QzKug_rSq1X%2BfFHfnFtOJBdZ1RwYg%40mail.gmail.com >>>>>> >>> >>>>>> >< >>>>>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh19yjpwB8EoHVqs5QzKug_rSq1X%2BfFHfnFtOJBdZ1RwYg%40mail.gmail.com?utm_medium=email&utm_source=footer >>>>>> > >>>>>> >>> . >>>>>> >>> >>>>>> >> -- >>>>>> >> http://github.com/RDFLib >>>>>> >> --- >>>>>> >> You received this message because you are subscribed to the Google >>>>>> >Groups >>>>>> >> "rdflib-dev" group. >>>>>> >> To unsubscribe from this group and stop receiving emails from it, >>>>>> >send an >>>>>> >> email to rdflib-dev+unsubscr...@googlegroups.com. >>>>>> >> To view this discussion on the web visit >>>>>> >> >>>>>> > >>>>>> https://groups.google.com/d/msgid/rdflib-dev/CADyR_r1Q_hvfnufYVD0YLYhP%3DwEXnjsi5ucpjzWK_owyYfsfnQ%40mail.gmail.com >>>>>> >> >>>>>> >< >>>>>> https://groups.google.com/d/msgid/rdflib-dev/CADyR_r1Q_hvfnufYVD0YLYhP%3DwEXnjsi5ucpjzWK_owyYfsfnQ%40mail.gmail.com?utm_medium=email&utm_source=footer >>>>>> > >>>>>> >> . >>>>>> >> >>>>>> >>>>>> -- >>>>>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet. >>>>>> >>>>>> -- >>>>>> http://github.com/RDFLib >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "rdflib-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to rdflib-dev+unsubscr...@googlegroups.com. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/rdflib-dev/905F1E60-396C-4320-88D1-5A0BCB15B785%40gmail.com >>>>>> . >>>>>> >>>>> -- >>>>> http://github.com/RDFLib >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "rdflib-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to rdflib-dev+unsubscr...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWs6jG-f5HWqew0iqdpqObab3ft-L%3DNyvS7p%2By%2BGAV4RoQ%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWs6jG-f5HWqew0iqdpqObab3ft-L%3DNyvS7p%2By%2BGAV4RoQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>>> http://github.com/RDFLib >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "rdflib-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to rdflib-dev+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh06aApBfyx11L0_uik_BMFA563qXUFLYYOaonumyP4A5g%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh06aApBfyx11L0_uik_BMFA563qXUFLYYOaonumyP4A5g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >>> http://github.com/RDFLib >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "rdflib-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to rdflib-dev+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWtCCa-b7Q%3DycdsxaoTRd%2BHALfWK9rAeNS0JTZcO4pUX9w%40mail.gmail.com >>> <https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWtCCa-b7Q%3DycdsxaoTRd%2BHALfWK9rAeNS0JTZcO4pUX9w%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- http://github.com/RDFLib --- You received this message because you are subscribed to the Google Groups "rdflib-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CACfEFw9FEM42_E8QATaP%3DDPiQsDD4nbPzi-XzNDfurkZ0P9Efg%40mail.gmail.com.