True, but producing an HDT file will likely require a lot more effort than
an n-triples file. breanda, what's the thing generating the RDF/XML file in
the first place?

On Sat, Jan 8, 2022 at 11:58 AM Wes Turner <wes.tur...@gmail.com> wrote:

> RDFHDT is fast for *reads*; probably faaster than n-triples
> https://github.com/RDFLib/rdflib-hdt
>
> On Fri, Jan 7, 2022 at 8:55 PM Nicholas Car <
> nicholas....@surroundaustralia.com> wrote:
>
>> I guess it depends on how you are producing the RDF/XML file in the first
>> place. If you do have control over that, and loading times are really an
>> issue, produce an n-triples file as this will load the fastest!
>>
>> On Sat, Jan 8, 2022 at 5:06 AM Wes Turner <wes.tur...@gmail.com> wrote:
>>
>>> Out of curiosity, does performance differ with defusedxml in there?
>>>
>>> (RDF)XML parser complexity really is unnecessary compared to e.g. N3,
>>> JSONLD, or RDFHDT.
>>>
>>> Does performance differ after transforming to a non-XML format?
>>>
>>> defusedxml should probably be an install_requires dependency because of
>>> the XML parser vulnerabilities that it patches:
>>> https://pypi.org/project/defusedxml/
>>>
>>> On Fri, Jan 7, 2022, 9:58 AM Brendan McMahon <brendan.mcma...@tempus.com>
>>> wrote:
>>>
>>>> Hi Nick, thanks for the response! Yes, the idea you mention is what I
>>>> was considering trying next, but I thought I'd ask in here to see if there
>>>> were any other ideas about handling this with what the library has built
>>>> in. I will report back here with what I do if it works out! Also, the file
>>>> is about half a gig. It takes ~25 minutes to parse.
>>>>
>>>> Thanks,
>>>> Brendan
>>>>
>>>> On Friday, January 7, 2022 at 6:38:43 AM UTC-5
>>>> nichol...@surroundaustralia.com wrote:
>>>>
>>>>> Hi Brendan,
>>>>>
>>>>> This is an interesting issue! No I haven't encountered it, but then I
>>>>> never use large RDF/XMl graphs. How large is your graph by the way?
>>>>>
>>>>> If you really think the issue is the getting or testing of elements in
>>>>> the RDF DefinedNamespace, couldn't you just clone rdfxml.py and replace 
>>>>> all
>>>>> references to the RDF DefinedNamespace with references to a hard-coded set
>>>>> of URIRefs? You could try using that in place of the current rdfxml.py and
>>>>> see if there is a speedup. the file's only ~600 lines long, so a find 'n
>>>>> replace shouldn't be too impossible.
>>>>>
>>>>> I would love to know how you go with this, if you try it. If it
>>>>> overcomes the problem, we may consider doing such a replacement within
>>>>> internal RDFlib files to improve performance and then providing the
>>>>> DefinedNamespaces for external use only, i.e. when people define RDFlib
>>>>> grapes with g.add() and use FOAF.givenName to represent URIs.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Nick
>>>>>
>>>>> On Thu, Dec 23, 2021 at 3:32 AM Brendan McMahon <brendan...@tempus.com>
>>>>> wrote:
>>>>>
>>>>>> Dear rdflib contributors and maintainers,
>>>>>>
>>>>>> I have recently been trying to update rdflib to version 6 from 4.2.2.
>>>>>> Upon doing so, a process I normally run, which uses rdflib to load a 
>>>>>> large
>>>>>> xml RDF file into a graph, has a significantly larger memory profile and
>>>>>> latency (for my large file, parsing is taking about 1.5x as much time).
>>>>>>
>>>>>> I've traced the issue back to the graph.parse method. More
>>>>>> specifically, by profiling the graph.parse with versions 6.1.1 and 
>>>>>> 4.2.2, I
>>>>>> can see that calls to access members of the RDF class (mostly occurring 
>>>>>> in
>>>>>> the node_element_start method here
>>>>>> <https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/parsers/rdfxml.py#L299>
>>>>>>  as
>>>>>> well as the property_element_start method) seem to be taking up a
>>>>>> significantly longer time, as they the class is now a DefinedNamespace 
>>>>>> with
>>>>>> overridden __getitem__ and __contains
>>>>>> <https://github.com/RDFLib/rdflib/blob/2011a6dd85518642e0800b2ee010a5565e16e5cc/rdflib/namespace/__init__.py#L190>
>>>>>> methods with added string checks.
>>>>>>
>>>>>> Has anyone else experienced this issue? I have been trying to find
>>>>>> ways to work around/with the library to lower the latency, but haven't 
>>>>>> been
>>>>>> able to find anything yet.
>>>>>>
>>>>>> Thanks,
>>>>>> Brendan
>>>>>>
>>>>>> ------------------------------
>>>>>> This email and any attachments may contain confidential and/or
>>>>>> privileged information. If you are not the intended recipient of this
>>>>>> message or their agent, or if this message has been addressed to you in
>>>>>> error, please immediately alert the sender by reply email and then delete
>>>>>> this message and any attachments. If you are not the intended recipient,
>>>>>> you are hereby notified that any use, dissemination, copying, or storage 
>>>>>> of
>>>>>> this message or its attachments is strictly prohibited.
>>>>>>
>>>>>> --
>>>>>> http://github.com/RDFLib
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "rdflib-dev" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to rdflib-dev+...@googlegroups.com.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/rdflib-dev/1f616ab0-187b-4af0-aca3-13d436c6dd71n%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/rdflib-dev/1f616ab0-187b-4af0-aca3-13d436c6dd71n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>> ------------------------------
>>>> This email and any attachments may contain confidential and/or
>>>> privileged information. If you are not the intended recipient of this
>>>> message or their agent, or if this message has been addressed to you in
>>>> error, please immediately alert the sender by reply email and then delete
>>>> this message and any attachments. If you are not the intended recipient,
>>>> you are hereby notified that any use, dissemination, copying, or storage of
>>>> this message or its attachments is strictly prohibited.
>>>>
>>>> --
>>>> http://github.com/RDFLib
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "rdflib-dev" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to rdflib-dev+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/rdflib-dev/370e6b5c-06a1-4b4e-a6bc-b9ad492d7362n%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/rdflib-dev/370e6b5c-06a1-4b4e-a6bc-b9ad492d7362n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> http://github.com/RDFLib
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "rdflib-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to rdflib-dev+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/rdflib-dev/CACfEFw8JmMkF-wqPQJ9ideTYxJJU-NmBo%2BDu8AdZ35OXRZmBGw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/rdflib-dev/CACfEFw8JmMkF-wqPQJ9ideTYxJJU-NmBo%2BDu8AdZ35OXRZmBGw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> http://github.com/RDFLib
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "rdflib-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to rdflib-dev+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh1P6V2TGhDZ0bnMhgHkcBV1E_4d8b5YagG3Pmr4kv1frg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh1P6V2TGhDZ0bnMhgHkcBV1E_4d8b5YagG3Pmr4kv1frg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> http://github.com/RDFLib
> ---
> You received this message because you are subscribed to the Google Groups
> "rdflib-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to rdflib-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/rdflib-dev/CACfEFw_iVQ35sQVGYo3j3Qy9SnCFdu5e9%2Btc%2B4FhueyME15tsw%40mail.gmail.com
> <https://groups.google.com/d/msgid/rdflib-dev/CACfEFw_iVQ35sQVGYo3j3Qy9SnCFdu5e9%2Btc%2B4FhueyME15tsw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
http://github.com/RDFLib
--- 
You received this message because you are subscribed to the Google Groups 
"rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to rdflib-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh2fbnLwZ%2Bxzm0KFeDgEQi5xija%3D%2B6a5LAUySc3As60zFw%40mail.gmail.com.

Reply via email to