On Sat, Jul 31, 2021 at 6:45 AM Wes Turner <wes.tur...@gmail.com> wrote:
> > > On Wed, Jul 28, 2021 at 4:49 AM Miel Vander Sande < > miel.vandersa...@meemoo.be> wrote: > >> Hi Nick, >> >> TBH, it's pretty much a function that converts a Dict or a JSON file in a >> streaming fashion: >> https://github.com/viaacode/construction-site/blob/main/construction_site/parse_functions.py. >> I think it's a stand-alone thing; I don't plan anything extra on that >> specifically, with maybe the exception of a cmd interface (hence the >> proposed refactoring of csv2rdf) >> > > Profiling / [comparative] benchmarks with e.g. Scalene [1][2] and/or > perfplot [3] (%timeit) [4][5] could be worthwhile. > > [1] > https://awesomeopensource.com/project/plasma-umass/scalene?categoryPage=26 > [2] https://github.com/plasma-umass/scalene > [3] https://github.com/nschloe/perfplot > [4] > https://docs.python.org/3/library/timeit.html#timeit-command-line-interface > [5] > https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit > >From https://twitter.com/westurner/status/1274571490688630785 plus further references: Other methods for CSV + transforms => RDF? > - #rdflib csv2rdf > > - COW - url: https://github.com/CLARIAH/COW - desc: Integrated CSV to RDF converter, using CSVW and nanopublications - #CSVW https://github.com/cldf/csvw/blob/master/README.md#see-also > Could https://github.com/cldf/csvw/ be modified to support (1) json-ld-streaming; and (2) alternate csv parsers? - @kidehen sponger / rdfm_yq_parse_csv()? - http://docs.openlinksw.com/virtuoso/rdfspongerprogrammerguide/#virtuosospongeroverviewcartarch - https://github.com/openlink/Virtuoso-RDFIzer-Mapper-Scripts/blob/master/rdf_mappers.sql - #tarql > - Web: https://tarql.github.io/ - Src: https://github.com/tarql/tarql - ProgrammingLanguage: Java - #csv2rdf GH topic: https://github.com/topics/csv2rdf > #csv2rdf https://github.com/topics/csv2rdf > Adding columnar & dataset-level metadata *with URIs* is the value add > here, IMHO #LR "7 metadata header rows (column label, property URI path, DataType, unit, accuracy, precision, significant figures)" https://wrdrd.github.io/docs/consulting/linkedreproducibility#csv-csvw-and-metadata-rows Example Table A with 7 metadata header rows: https://wrdrd.github.io/docs/consulting/linkedreproducibility#id4 The csv2rdf tool would need to optionally read this additional metadata from either additional header rows or an external 'header' file. > ijson [6] looks like it has some interesting features; iterative, asyncio, > push. How does the performance compare? > > [6] https://pypi.org/project/ijson/ > > I do plan to develop more components that assist scalable ETL, data-to-rdf >> like tasks. This includes a plugin for Apache Airflow ("provider"), which >> would be good as a RDFLib family repository. >> > > - The datasette and dogsheep projects have a bunch of *-to-sqlite utils > and an interface that a number of projects on PyPI have implemented: > https://datasette.io/tools > https://github.com/dogsheep > > https://pypi.org/search/?q=to-sqlite > - https://datasette.io/tools/csvs-to-sqlite > - > https://github.com/simonw/csvs-to-sqlite/blob/a8a37a016790dc93270c74e32d0a5051bc5a0f4d/tests/test_csvs_to_sqlite.py#L417-L446 > - parse datetimes in CSVs > - xsd:datetime (and schema.org/Date and schema.org/dateCreated > and schema.org/dateModified) specifies that time will be specified in > ISO8601 formats > > What are the solutions for generating RDFS schema from CSVs and SQL tables? > > - https://pypi.org/project/tablib/ > https://github.com/jazzband/tablib/blob/master/tests/test_tablib.py > - doesn't do anything with datatypes FWICS > > - https://sqlite-utils.datasette.io/en/stable/cli.html#showing-the-schema > https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/db.py > > def suggest_column_types: > > https://github.com/simonw/sqlite-utils/blob/c7e8d72be9fe8fe0811f685a18eebc637662d41b/sqlite_utils/utils.py#L29-L58 > > - https://en.wikipedia.org/wiki/Knowledge_extraction > > https://en.wikipedia.org/wiki/Knowledge_extraction#Relational_databases_to_RDF > > - https://pypi.org/project/rdb2rdf/ > https://github.com/nisavid/pyrdb2rdf/blob/master/rdb2rdf/stores.py > > PyRDB2RDF provides RDFLib with an interface to relational databases as > RDF stores. The underlying data is accessed via SQLAlchemy. It is mapped to > RDF according to the specifications of RDB2RDF. The corresponding RDF graph > is represented as an RDFLib graph. > > > > Translating from relational data to RDF via direct mapping is > currently supported. Translating in the other direction and mapping with > R2RML are planned but not yet implemented. > > - https://pypi.org/project/rdfizer/ > > - https://github.com/RDFLib/pyTARQL > - Does this handle datetimes? > > - Generate JSONschema from JSON and SHACL from JSON-Schema: > > https://stackoverflow.com/questions/7341537/tool-to-generate-json-schema-from-json-data/30294535#30294535 > - https://pypi.org/project/genson/ has been recently updated > - Src: https://github.com/wolverdude/genson/ > - https://github.com/mulesoft-labs/json-ld-schema > https://github.com/mulesoft-labs/json-ld-schema#how-does-it-work > > JSON-LD Schema defines a simple 'semantics' JSON-Schema vocabulary > (effectively a JSON-Schema meta-schema) that reuses the official JSON > Schema for JSON-LD to provide definitions for @context and @type > properties. These annotations can be used to provide JSON-LD context for a > JSON-Schema document. Provided this JSON-LD context, constraints over named > 'properties' in a JSON Schema document can be understood as constraints > over CURIES of JSON-LD documents following the context rules defined in the > JSON-LD specification. > > ## CSVW: CSV on the Web > > - Homepage: https://w3c.github.io/csvw/ > - Standard: https://www.w3.org/TR/tabular-data-model/ > - Standard: https://www.w3.org/TR/tabular-metadata/ > - Standard: https://www.w3.org/TR/csv2json/ > - Standard: https://www.w3.org/TR/csv2rdf/ > - Namespace: https://www.w3.org/ns/csvw# > - xmlns: `@prefix csvw: <https://www.w3.org/ns/csvw#> .` > - @context: https://www.w3.org/ns/csvw.jsonld > > CSVW (*CSV on the Web*) is a set of relatively new standards > for representing :ref:`CSV` rows and columns > as :ref:`RDF` (and :ref:`JSON` / :ref:`JSON-LD`) > along with *metadata*. > > * URIs for datatypes (XSD) > FWIU, there is not yet a vocabulary for physical units like meters**2 in the JSON-LD Recommended Context: https://w3c.github.io/json-ld-rc/context.jsonld https://github.com/w3c/json-ld-rc https://github.com/w3c/json-ld-rc/blob/main/context.jsonld QUDT is one such vocabulary: ```turtle qudt-quantity:Time rdf:type qudt:SpaceAndTimeQuantityKind ; rdfs:label "Time"^^xsd:string ; qudt:description "Time is a basic component of the measuring system used to sequence events, to compare the durations of events and the intervals between them, and to quantify the motions of objects."^^xsd:string ; qudt:symbol "T"^^xsd:string ; skos:exactMatch <http://dbpedia.org/resource/Time> . # ... unit:SecondTime rdf:type qudt:SIBaseUnit , qudt:TimeUnit ; rdfs:label "Second"^^xsd:string ; qudt:abbreviation "s"^^xsd:string ; qudt:code "1615"^^xsd:string ; qudt:conversionMultiplier "1"^^xsd:double ; qudt:conversionOffset "0.0"^^xsd:double ; qudt:symbol "s"^^xsd:string ; skos:exactMatch <http://dbpedia.org/resource/Second> . # ... ``` http://www.qudt.org/qudt/owl/1.0.0/unit/Instances.html#SecondTime ... We must be able to say that the numbers in a column have a physical unit with URI; to specify columnar metadata so that downstream tools don't need to try to sniff and cast between datatypes and lossily drop units from strings in column names: - (_datatype_ _physical_unit_): - (float64, "unit:SecondTime",) - (float64, unit["SecondTime"],) > * URIs for columns (RDF) > * Document Metadata > * CSV -> JSON (-> JSON-LD -> RDF) > * CSV -> RDF > > Could there be a file naming convention for specifying the extra CSVW > header to apply_to or transform zero or more CSV files with? > > filename.csv > filename.csv.csvw > filename.csv.csvwheader.jsonld.json > filename.csv.csvw.jsonld.json > ```python uri = 'filename.csv' if Path(uri + 'csvw.jsonld.json').exists(): read_csvw(uri, *args, **kwargs) else: read_csv(uri, *args, **kwargs) ``` > > https://www.w3.org/TR/tabular-data-primer/ > > >> Best, >> >> Miel >> >> Op wo 28 jul. 2021 om 06:09 schreef Nicholas Car < >> nicholas....@surroundaustralia.com>: >> >>> Hi Meil, >>> >>> Yes, all offers of contribution are of interest! The CSV 2 RDF stuff is >>> very old and many tools related to it, such as pyTARQL ( >>> https://github.com/RDFLib/pyTARQL), are missing. Are you planning on >>> presenting JSON2RDF as a new plugin to RDFlib? that may be an option, >>> however remember that another option is also just to present your tool's >>> repository within RDFlib's family of repositories (i.e. within >>> https://github.com/RDFLib) and the choice will depend on how stable the >>> tool is and how you see it's future development going. >>> >>> But perhaps you have other things in mind? Whatever the case, we'd love >>> to hear your plans. >>> >>> Cheers, >>> >>> Nick >>> >>> On Tue, Jul 27, 2021 at 5:56 PM Miel Vander Sande < >>> miel.vandersa...@meemoo.be> wrote: >>> >>>> Hi all, >>>> >>>> little late to the party, but what a great effort this is! Congrats >>>> with the release and thank you; this library is super essential to my work >>>> and it makes RDF usable in ways other libraries can't. >>>> >>>> Sidenote: I have a streaming direct json-to-rdf mapping implementation >>>> (port of https://github.com/AtomGraph/JSON2RDF) that I'd like to >>>> contribute, possibly in combination with a refactoring of >>>> https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.tools.html#rdflib.tools.csv2rdf.CSV2RDF. >>>> Would that be of interest? >>>> >>> > Does JSON2RDF [need to] implement the w3c json-ld-streaming spec [7]? > [7] https://w3c.github.io/json-ld-streaming/ > > - https://w3c.github.io/json-ld-streaming/#streaming-document-form > - https://w3c.github.io/json-ld-streaming/#streaming-rdf-form > > >> >>>> Best, >>>> >>>> Miel >>>> >>>> Op di 20 jul. 2021 om 21:58 schreef Natanael Arndt <arn...@gmail.com>: >>>> >>>>> I've retweetet the tweet by jarven. But I don't use reddit or hacker >>>>> news, I think also semantic web mailing list would be a good idea. >>>>> >>>>> If you'd like to post something in the channels, please do so. >>>>> >>>>> Natanael >>>>> >>>>> Am 20. Juli 2021 20:14:09 MESZ schrieb Wes Turner < >>>>> wes.tur...@gmail.com>: >>>>> >Congrats and thanks! >>>>> > >>>>> >From the release notes on the Release: >>>>> >https://github.com/RDFLib/rdflib/releases/tag/6.0.0 >>>>> > >>>>> >``` >>>>> >6.0.0 is a major stable release that drops support for Python 2 and >>>>> >Python >>>>> >3 < 3.7. Type hinting is now present in much >>>>> >of the toolkit as a result. >>>>> > >>>>> >It includes the formerly independent JSON-LD parser/serializer, >>>>> >improvements to Namespaces that allow for IDE namespace >>>>> >prompting, simplified use of g.serialize() (turtle default, no need to >>>>> >decode()) and many other updates to >>>>> >documentation, store backends and so on. >>>>> > >>>>> >Performance of the in-memory store has also improved since Python 3.6 >>>>> >dictionary improvements. >>>>> > >>>>> >There are numerous supplementary improvements to the toolkit too, such >>>>> >as: >>>>> > >>>>> >- inclusion of Docker files for easier CI/CD >>>>> >- black config files for standardised code formatting >>>>> >- improved testing with mock SPARQL stores, rather than a reliance on >>>>> >DBPedia etc >>>>> >``` >>>>> > >>>>> >Have there been ANN posts to e.g. Hacker news and e.g. /r/semanticweb? >>>>> > >>>>> >On Tue, Jul 20, 2021, 10:23 Florent Georges <fgeor...@gmail.com> >>>>> wrote: >>>>> > >>>>> >> Congratulations, and thank you all for the hard work! >>>>> >> >>>>> >> -- >>>>> >> Florent Georges >>>>> >> H2O Consulting >>>>> >> http://h2o.consulting/ >>>>> >> >>>>> >> On Tue, Jul 20, 2021, 16:00 Nicholas Car < >>>>> >> nicholas....@surroundaustralia.com> wrote: >>>>> >> >>>>> >>> Hi all, >>>>> >>> >>>>> >>> Yes, 6.0.0 is out: >>>>> >>> >>>>> >>> - https://pypi.org/project/rdflib/6.0.0/ >>>>> >>> - https://github.com/RDFLib/rdflib/releases/tag/6.0.0 >>>>> >>> >>>>> >>> Please publicise this release: it has a lot of stuff since 5.0.0 in >>>>> >April >>>>> >>> last year. >>>>> >>> >>>>> >>> Thank you very much to all of you who contributed, in particular my >>>>> >>> co-maintainers, Ashley & Natanael and Edmond, Iwan, Tom, Remi, >>>>> >Harold and >>>>> >>> all the PR and Issue creators. Thanks also to the institutions that >>>>> >>> provided time for their staff to contribute. >>>>> >>> >>>>> >>> If you see issues, please let the co-maintainers know straight >>>>> away: >>>>> >we >>>>> >>> keen to get a 6.0.1 release out shortly (like weeks to a month) to >>>>> >speed up >>>>> >>> the RDFlib release cycle. >>>>> >>> >>>>> >>> Cheers, >>>>> >>> >>>>> >>> Nick >>>>> >>> >>>>> >>> -- >>>>> >>> kind regards >>>>> >>> Dr Nicholas Car >>>>> >>> Data Systems Architect >>>>> >>> >>>>> >>> SURROUND Australia Pty Ltd and >>>>> >>> SURROUND NZ Limited >>>>> >>> >>>>> >>> Address Level 9, Nishi Building, >>>>> >>> 2 Phillip Law Street >>>>> >>> New Acton Canberra 2601 >>>>> >>> Mobile +61 477 560 177 >>>>> >>> Email nicholas....@surroundaustralia.com >>>>> >>> Website https://www.surroundaustralia.com >>>>> >>> >>>>> >>> Enhancing Intelligence Within Organisations >>>>> >>> delivering evidence that connects decisions to outcomes >>>>> >>> >>>>> >>> Dr Nicholas Car >>>>> >>> Adjunct Senior Lecturer >>>>> >>> >>>>> >>> Research School of Computer Science >>>>> >>> >>>>> >>> The Australian National University, >>>>> >>> Canberra ACT Australia >>>>> >>> +61 477 560 177 >>>>> >>> nicholas....@anu.edu.au >>>>> >>> https://cs.anu.edu.au/people/nicholas-car >>>>> >>> https://orcid.org/0000-0002-8742-7730 >>>>> ><https://www.surroundaustralia.com> >>>>> >>> >>>>> >>> >>>>> >>> -- >>>>> >>> http://github.com/RDFLib >>>>> >>> --- >>>>> >>> You received this message because you are subscribed to the Google >>>>> >Groups >>>>> >>> "rdflib-dev" group. >>>>> >>> To unsubscribe from this group and stop receiving emails from it, >>>>> >send an >>>>> >>> email to rdflib-dev+unsubscr...@googlegroups.com. >>>>> >>> To view this discussion on the web visit >>>>> >>> >>>>> > >>>>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh19yjpwB8EoHVqs5QzKug_rSq1X%2BfFHfnFtOJBdZ1RwYg%40mail.gmail.com >>>>> >>> >>>>> >< >>>>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh19yjpwB8EoHVqs5QzKug_rSq1X%2BfFHfnFtOJBdZ1RwYg%40mail.gmail.com?utm_medium=email&utm_source=footer >>>>> > >>>>> >>> . >>>>> >>> >>>>> >> -- >>>>> >> http://github.com/RDFLib >>>>> >> --- >>>>> >> You received this message because you are subscribed to the Google >>>>> >Groups >>>>> >> "rdflib-dev" group. >>>>> >> To unsubscribe from this group and stop receiving emails from it, >>>>> >send an >>>>> >> email to rdflib-dev+unsubscr...@googlegroups.com. >>>>> >> To view this discussion on the web visit >>>>> >> >>>>> > >>>>> https://groups.google.com/d/msgid/rdflib-dev/CADyR_r1Q_hvfnufYVD0YLYhP%3DwEXnjsi5ucpjzWK_owyYfsfnQ%40mail.gmail.com >>>>> >> >>>>> >< >>>>> https://groups.google.com/d/msgid/rdflib-dev/CADyR_r1Q_hvfnufYVD0YLYhP%3DwEXnjsi5ucpjzWK_owyYfsfnQ%40mail.gmail.com?utm_medium=email&utm_source=footer >>>>> > >>>>> >> . >>>>> >> >>>>> >>>>> -- >>>>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet. >>>>> >>>>> -- >>>>> http://github.com/RDFLib >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "rdflib-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to rdflib-dev+unsubscr...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/rdflib-dev/905F1E60-396C-4320-88D1-5A0BCB15B785%40gmail.com >>>>> . >>>>> >>>> -- >>>> http://github.com/RDFLib >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "rdflib-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to rdflib-dev+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWs6jG-f5HWqew0iqdpqObab3ft-L%3DNyvS7p%2By%2BGAV4RoQ%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWs6jG-f5HWqew0iqdpqObab3ft-L%3DNyvS7p%2By%2BGAV4RoQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >>> http://github.com/RDFLib >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "rdflib-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to rdflib-dev+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh06aApBfyx11L0_uik_BMFA563qXUFLYYOaonumyP4A5g%40mail.gmail.com >>> <https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh06aApBfyx11L0_uik_BMFA563qXUFLYYOaonumyP4A5g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> http://github.com/RDFLib >> --- >> You received this message because you are subscribed to the Google Groups >> "rdflib-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to rdflib-dev+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWtCCa-b7Q%3DycdsxaoTRd%2BHALfWK9rAeNS0JTZcO4pUX9w%40mail.gmail.com >> <https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWtCCa-b7Q%3DycdsxaoTRd%2BHALfWK9rAeNS0JTZcO4pUX9w%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- http://github.com/RDFLib --- You received this message because you are subscribed to the Google Groups "rdflib-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CACfEFw9Yr0f_qjirC1Y_gEOumsnhyYROgEnf1Ge9w9e5C9U-xw%40mail.gmail.com.