On Sat, Jul 31, 2021 at 6:45 AM Wes Turner <wes.tur...@gmail.com> wrote:

>
>
> On Wed, Jul 28, 2021 at 4:49 AM Miel Vander Sande <
> miel.vandersa...@meemoo.be> wrote:
>
>> Hi Nick,
>>
>> TBH, it's pretty much a function that converts a Dict or a JSON file in a
>> streaming fashion:
>> https://github.com/viaacode/construction-site/blob/main/construction_site/parse_functions.py.
>> I think it's a stand-alone thing; I don't plan anything extra on that
>> specifically, with maybe the exception of a cmd interface (hence the
>> proposed refactoring of csv2rdf)
>>
>
> Profiling / [comparative] benchmarks with e.g. Scalene [1][2] and/or
> perfplot [3] (%timeit) [4][5] could be worthwhile.
>
> [1]
> https://awesomeopensource.com/project/plasma-umass/scalene?categoryPage=26
> [2] https://github.com/plasma-umass/scalene
> [3] https://github.com/nschloe/perfplot
> [4]
> https://docs.python.org/3/library/timeit.html#timeit-command-line-interface
> [5]
> https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit
>

>From https://twitter.com/westurner/status/1274571490688630785 plus further
references:

Other methods for CSV + transforms => RDF?
> - #rdflib csv2rdf
>


> - COW

- url: https://github.com/CLARIAH/COW
- desc: Integrated CSV to RDF converter, using CSVW and nanopublications

- #CSVW  https://github.com/cldf/csvw/blob/master/README.md#see-also
>

Could https://github.com/cldf/csvw/ be modified to support (1)
json-ld-streaming; and (2) alternate csv parsers?

- @kidehen sponger / rdfm_yq_parse_csv()?

-
http://docs.openlinksw.com/virtuoso/rdfspongerprogrammerguide/#virtuosospongeroverviewcartarch
-
https://github.com/openlink/Virtuoso-RDFIzer-Mapper-Scripts/blob/master/rdf_mappers.sql

- #tarql
>
- Web: https://tarql.github.io/
- Src: https://github.com/tarql/tarql
- ProgrammingLanguage: Java

- #csv2rdf GH topic: https://github.com/topics/csv2rdf
>

#csv2rdf
https://github.com/topics/csv2rdf


> Adding columnar & dataset-level metadata *with URIs* is the value add
> here, IMHO #LR


"7 metadata header rows (column label, property URI path, DataType, unit,
accuracy, precision, significant figures)"
https://wrdrd.github.io/docs/consulting/linkedreproducibility#csv-csvw-and-metadata-rows

Example Table A with 7 metadata header rows:
https://wrdrd.github.io/docs/consulting/linkedreproducibility#id4

The csv2rdf tool would need to optionally read this additional metadata
from either additional header rows or an external 'header' file.


> ijson [6] looks like it has some interesting features; iterative, asyncio,
> push. How does the performance compare?
>
> [6] https://pypi.org/project/ijson/
>
> I do plan to develop more components that assist scalable ETL, data-to-rdf
>> like tasks. This includes a plugin for Apache Airflow ("provider"), which
>> would be good as a RDFLib family repository.
>>
>
> - The datasette and dogsheep projects have a bunch of *-to-sqlite utils
> and an interface that a number of projects on PyPI have implemented:
>   https://datasette.io/tools
>   https://github.com/dogsheep
>
>   https://pypi.org/search/?q=to-sqlite
>   - https://datasette.io/tools/csvs-to-sqlite
>     -
> https://github.com/simonw/csvs-to-sqlite/blob/a8a37a016790dc93270c74e32d0a5051bc5a0f4d/tests/test_csvs_to_sqlite.py#L417-L446
>       - parse datetimes in CSVs
>         - xsd:datetime (and schema.org/Date and schema.org/dateCreated
> and schema.org/dateModified) specifies that time will be specified in
> ISO8601 formats
>
> What are the solutions for generating RDFS schema from CSVs and SQL tables?
>
> - https://pypi.org/project/tablib/
>   https://github.com/jazzband/tablib/blob/master/tests/test_tablib.py
>   - doesn't do anything with datatypes FWICS
>
> - https://sqlite-utils.datasette.io/en/stable/cli.html#showing-the-schema
>   https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/db.py
>
>   def suggest_column_types:
>
> https://github.com/simonw/sqlite-utils/blob/c7e8d72be9fe8fe0811f685a18eebc637662d41b/sqlite_utils/utils.py#L29-L58
>
> - https://en.wikipedia.org/wiki/Knowledge_extraction
>
> https://en.wikipedia.org/wiki/Knowledge_extraction#Relational_databases_to_RDF
>
> - https://pypi.org/project/rdb2rdf/
>   https://github.com/nisavid/pyrdb2rdf/blob/master/rdb2rdf/stores.py
>   > PyRDB2RDF provides RDFLib with an interface to relational databases as
> RDF stores. The underlying data is accessed via SQLAlchemy. It is mapped to
> RDF according to the specifications of RDB2RDF. The corresponding RDF graph
> is represented as an RDFLib graph.
>   >
>   > Translating from relational data to RDF via direct mapping is
> currently supported. Translating in the other direction and mapping with
> R2RML are planned but not yet implemented.
>
> - https://pypi.org/project/rdfizer/
>
> - https://github.com/RDFLib/pyTARQL
>   - Does this handle datetimes?
>
> - Generate JSONschema from JSON and SHACL from JSON-Schema:
>
> https://stackoverflow.com/questions/7341537/tool-to-generate-json-schema-from-json-data/30294535#30294535
>   - https://pypi.org/project/genson/ has been recently updated
>     - Src: https://github.com/wolverdude/genson/
>   - https://github.com/mulesoft-labs/json-ld-schema
>     https://github.com/mulesoft-labs/json-ld-schema#how-does-it-work
>     > JSON-LD Schema defines a simple 'semantics' JSON-Schema vocabulary
> (effectively a JSON-Schema meta-schema) that reuses the official JSON
> Schema for JSON-LD to provide definitions for @context and @type
> properties. These annotations can be used to provide JSON-LD context for a
> JSON-Schema document. Provided this JSON-LD context, constraints over named
> 'properties' in a JSON Schema document can be understood as constraints
> over CURIES of JSON-LD documents following the context rules defined in the
> JSON-LD specification.
>
> ## CSVW: CSV on the Web
>
> - Homepage: https://w3c.github.io/csvw/
> - Standard: https://www.w3.org/TR/tabular-data-model/
> - Standard: https://www.w3.org/TR/tabular-metadata/
> - Standard: https://www.w3.org/TR/csv2json/
> - Standard: https://www.w3.org/TR/csv2rdf/
> - Namespace: https://www.w3.org/ns/csvw#
> - xmlns: `@prefix csvw: <https://www.w3.org/ns/csvw#> .`
> - @context: https://www.w3.org/ns/csvw.jsonld
>
> CSVW (*CSV on the Web*) is a set of relatively new standards
> for representing :ref:`CSV` rows and columns
> as :ref:`RDF` (and :ref:`JSON` / :ref:`JSON-LD`)
> along with *metadata*.
>
> * URIs for datatypes (XSD)
>

FWIU, there is not yet a vocabulary for physical units like meters**2 in
the JSON-LD Recommended Context:
https://w3c.github.io/json-ld-rc/context.jsonld
https://github.com/w3c/json-ld-rc
https://github.com/w3c/json-ld-rc/blob/main/context.jsonld

QUDT is one such vocabulary:
```turtle
qudt-quantity:Time
    rdf:type qudt:SpaceAndTimeQuantityKind ;
    rdfs:label "Time"^^xsd:string ;
    qudt:description "Time is a basic component of the measuring system
used to sequence events, to compare the durations of events and the
intervals between them, and to quantify the motions of
objects."^^xsd:string ;
    qudt:symbol "T"^^xsd:string ;
    skos:exactMatch <http://dbpedia.org/resource/Time> .

# ...
unit:SecondTime
      rdf:type qudt:SIBaseUnit , qudt:TimeUnit ;
      rdfs:label "Second"^^xsd:string ;
      qudt:abbreviation "s"^^xsd:string ;
      qudt:code "1615"^^xsd:string ;
      qudt:conversionMultiplier
              "1"^^xsd:double ;
      qudt:conversionOffset
              "0.0"^^xsd:double ;
      qudt:symbol "s"^^xsd:string ;
      skos:exactMatch <http://dbpedia.org/resource/Second> .
# ...
```
http://www.qudt.org/qudt/owl/1.0.0/unit/Instances.html#SecondTime

... We must be able to say that the numbers in a column have a physical
unit with URI; to specify columnar metadata so that downstream tools don't
need to try to sniff and cast between datatypes and lossily drop units from
strings in column names:

- (_datatype_ _physical_unit_):
- (float64, "unit:SecondTime",)
- (float64, unit["SecondTime"],)



> * URIs for columns (RDF)
> * Document Metadata
> * CSV -> JSON (-> JSON-LD -> RDF)
> * CSV -> RDF
>
> Could there be a file naming convention for specifying the extra CSVW
> header to apply_to or transform zero or more CSV files with?
>
> filename.csv
> filename.csv.csvw
> filename.csv.csvwheader.jsonld.json
> filename.csv.csvw.jsonld.json
>
```python

uri = 'filename.csv'
if Path(uri + 'csvw.jsonld.json').exists():
    read_csvw(uri, *args, **kwargs)
else:
    read_csv(uri, *args, **kwargs)

```

>
> https://www.w3.org/TR/tabular-data-primer/
>
>
>> Best,
>>
>> Miel
>>
>> Op wo 28 jul. 2021 om 06:09 schreef Nicholas Car <
>> nicholas....@surroundaustralia.com>:
>>
>>> Hi Meil,
>>>
>>> Yes, all offers of contribution are of interest! The CSV 2 RDF stuff is
>>> very old and many tools related to it, such as pyTARQL (
>>> https://github.com/RDFLib/pyTARQL), are missing. Are you planning on
>>> presenting JSON2RDF as a new plugin to RDFlib? that may be an option,
>>> however remember that another option is also just to present your tool's
>>> repository within RDFlib's family of repositories (i.e. within
>>> https://github.com/RDFLib) and the choice will depend on how stable the
>>> tool is and how you see it's future development going.
>>>
>>> But perhaps you have other things in mind? Whatever the case, we'd love
>>> to hear your plans.
>>>
>>> Cheers,
>>>
>>> Nick
>>>
>>> On Tue, Jul 27, 2021 at 5:56 PM Miel Vander Sande <
>>> miel.vandersa...@meemoo.be> wrote:
>>>
>>>> Hi all,
>>>>
>>>> little late to the party, but what a great effort this is! Congrats
>>>> with the release and thank you; this library is super essential to my work
>>>> and it makes RDF usable in ways other libraries can't.
>>>>
>>>> Sidenote: I have a streaming direct json-to-rdf mapping implementation
>>>> (port of https://github.com/AtomGraph/JSON2RDF) that I'd like to
>>>> contribute, possibly in combination with a refactoring of
>>>> https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.tools.html#rdflib.tools.csv2rdf.CSV2RDF.
>>>> Would that be of interest?
>>>>
>>>
> Does JSON2RDF [need to] implement the w3c json-ld-streaming spec [7]?
> [7] https://w3c.github.io/json-ld-streaming/
>
> - https://w3c.github.io/json-ld-streaming/#streaming-document-form
> - https://w3c.github.io/json-ld-streaming/#streaming-rdf-form
>
>
>>
>>>> Best,
>>>>
>>>> Miel
>>>>
>>>> Op di 20 jul. 2021 om 21:58 schreef Natanael Arndt <arn...@gmail.com>:
>>>>
>>>>> I've retweetet the tweet by jarven. But I don't use reddit or hacker
>>>>> news, I think also semantic web mailing list would be a good idea.
>>>>>
>>>>> If you'd like to post something in the channels, please do so.
>>>>>
>>>>> Natanael
>>>>>
>>>>> Am 20. Juli 2021 20:14:09 MESZ schrieb Wes Turner <
>>>>> wes.tur...@gmail.com>:
>>>>> >Congrats and thanks!
>>>>> >
>>>>> >From the release notes on the Release:
>>>>> >https://github.com/RDFLib/rdflib/releases/tag/6.0.0
>>>>> >
>>>>> >```
>>>>> >6.0.0 is a major stable release that drops support for Python 2 and
>>>>> >Python
>>>>> >3 < 3.7. Type hinting is now present in much
>>>>> >of the toolkit as a result.
>>>>> >
>>>>> >It includes the formerly independent JSON-LD parser/serializer,
>>>>> >improvements to Namespaces that allow for IDE namespace
>>>>> >prompting, simplified use of g.serialize() (turtle default, no need to
>>>>> >decode()) and many other updates to
>>>>> >documentation, store backends and so on.
>>>>> >
>>>>> >Performance of the in-memory store has also improved since Python 3.6
>>>>> >dictionary improvements.
>>>>> >
>>>>> >There are numerous supplementary improvements to the toolkit too, such
>>>>> >as:
>>>>> >
>>>>> >- inclusion of Docker files for easier CI/CD
>>>>> >- black config files for standardised code formatting
>>>>> >- improved testing with mock SPARQL stores, rather than a reliance on
>>>>> >DBPedia etc
>>>>> >```
>>>>> >
>>>>> >Have there been ANN posts to e.g. Hacker news and e.g. /r/semanticweb?
>>>>> >
>>>>> >On Tue, Jul 20, 2021, 10:23 Florent Georges <fgeor...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> >> Congratulations, and thank you all for the hard work!
>>>>> >>
>>>>> >> --
>>>>> >> Florent Georges
>>>>> >> H2O Consulting
>>>>> >> http://h2o.consulting/
>>>>> >>
>>>>> >> On Tue, Jul 20, 2021, 16:00 Nicholas Car <
>>>>> >> nicholas....@surroundaustralia.com> wrote:
>>>>> >>
>>>>> >>> Hi all,
>>>>> >>>
>>>>> >>> Yes, 6.0.0 is out:
>>>>> >>>
>>>>> >>>    - https://pypi.org/project/rdflib/6.0.0/
>>>>> >>>    - https://github.com/RDFLib/rdflib/releases/tag/6.0.0
>>>>> >>>
>>>>> >>> Please publicise this release: it has a lot of stuff since 5.0.0 in
>>>>> >April
>>>>> >>> last year.
>>>>> >>>
>>>>> >>> Thank you very much to all of you who contributed, in particular my
>>>>> >>> co-maintainers, Ashley & Natanael and Edmond, Iwan, Tom, Remi,
>>>>> >Harold and
>>>>> >>> all the PR and Issue creators. Thanks also to the institutions that
>>>>> >>> provided time for their staff to contribute.
>>>>> >>>
>>>>> >>> If you see issues, please let the co-maintainers know straight
>>>>> away:
>>>>> >we
>>>>> >>> keen to get a 6.0.1 release out shortly (like weeks to a month) to
>>>>> >speed up
>>>>> >>> the RDFlib release cycle.
>>>>> >>>
>>>>> >>> Cheers,
>>>>> >>>
>>>>> >>> Nick
>>>>> >>>
>>>>> >>> --
>>>>> >>> kind regards
>>>>> >>> Dr Nicholas Car
>>>>> >>> Data Systems Architect
>>>>> >>>
>>>>> >>> SURROUND Australia Pty Ltd and
>>>>> >>> SURROUND NZ Limited
>>>>> >>>
>>>>> >>> Address Level 9, Nishi Building,
>>>>> >>> 2 Phillip Law Street
>>>>> >>> New Acton Canberra 2601
>>>>> >>> Mobile +61 477 560 177
>>>>> >>> Email nicholas....@surroundaustralia.com
>>>>> >>> Website https://www.surroundaustralia.com
>>>>> >>>
>>>>> >>> Enhancing Intelligence Within Organisations
>>>>> >>> delivering evidence that connects decisions to outcomes
>>>>> >>>
>>>>> >>> Dr Nicholas Car
>>>>> >>> Adjunct Senior Lecturer
>>>>> >>>
>>>>> >>> Research School of Computer Science
>>>>> >>>
>>>>> >>> The Australian National University,
>>>>> >>> Canberra ACT Australia
>>>>> >>> +61 477 560 177
>>>>> >>> nicholas....@anu.edu.au
>>>>> >>> https://cs.anu.edu.au/people/nicholas-car
>>>>> >>> https://orcid.org/0000-0002-8742-7730
>>>>> ><https://www.surroundaustralia.com>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> http://github.com/RDFLib
>>>>> >>> ---
>>>>> >>> You received this message because you are subscribed to the Google
>>>>> >Groups
>>>>> >>> "rdflib-dev" group.
>>>>> >>> To unsubscribe from this group and stop receiving emails from it,
>>>>> >send an
>>>>> >>> email to rdflib-dev+unsubscr...@googlegroups.com.
>>>>> >>> To view this discussion on the web visit
>>>>> >>>
>>>>> >
>>>>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh19yjpwB8EoHVqs5QzKug_rSq1X%2BfFHfnFtOJBdZ1RwYg%40mail.gmail.com
>>>>> >>>
>>>>> ><
>>>>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh19yjpwB8EoHVqs5QzKug_rSq1X%2BfFHfnFtOJBdZ1RwYg%40mail.gmail.com?utm_medium=email&utm_source=footer
>>>>> >
>>>>> >>> .
>>>>> >>>
>>>>> >> --
>>>>> >> http://github.com/RDFLib
>>>>> >> ---
>>>>> >> You received this message because you are subscribed to the Google
>>>>> >Groups
>>>>> >> "rdflib-dev" group.
>>>>> >> To unsubscribe from this group and stop receiving emails from it,
>>>>> >send an
>>>>> >> email to rdflib-dev+unsubscr...@googlegroups.com.
>>>>> >> To view this discussion on the web visit
>>>>> >>
>>>>> >
>>>>> https://groups.google.com/d/msgid/rdflib-dev/CADyR_r1Q_hvfnufYVD0YLYhP%3DwEXnjsi5ucpjzWK_owyYfsfnQ%40mail.gmail.com
>>>>> >>
>>>>> ><
>>>>> https://groups.google.com/d/msgid/rdflib-dev/CADyR_r1Q_hvfnufYVD0YLYhP%3DwEXnjsi5ucpjzWK_owyYfsfnQ%40mail.gmail.com?utm_medium=email&utm_source=footer
>>>>> >
>>>>> >> .
>>>>> >>
>>>>>
>>>>> --
>>>>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
>>>>>
>>>>> --
>>>>> http://github.com/RDFLib
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "rdflib-dev" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to rdflib-dev+unsubscr...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/rdflib-dev/905F1E60-396C-4320-88D1-5A0BCB15B785%40gmail.com
>>>>> .
>>>>>
>>>> --
>>>> http://github.com/RDFLib
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "rdflib-dev" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to rdflib-dev+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWs6jG-f5HWqew0iqdpqObab3ft-L%3DNyvS7p%2By%2BGAV4RoQ%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWs6jG-f5HWqew0iqdpqObab3ft-L%3DNyvS7p%2By%2BGAV4RoQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> http://github.com/RDFLib
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "rdflib-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to rdflib-dev+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh06aApBfyx11L0_uik_BMFA563qXUFLYYOaonumyP4A5g%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh06aApBfyx11L0_uik_BMFA563qXUFLYYOaonumyP4A5g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> http://github.com/RDFLib
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "rdflib-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to rdflib-dev+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWtCCa-b7Q%3DycdsxaoTRd%2BHALfWK9rAeNS0JTZcO4pUX9w%40mail.gmail.com
>> <https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWtCCa-b7Q%3DycdsxaoTRd%2BHALfWK9rAeNS0JTZcO4pUX9w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
http://github.com/RDFLib
--- 
You received this message because you are subscribed to the Google Groups 
"rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to rdflib-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/rdflib-dev/CACfEFw9Yr0f_qjirC1Y_gEOumsnhyYROgEnf1Ge9w9e5C9U-xw%40mail.gmail.com.

Reply via email to