True, but producing an HDT file will likely require a lot more effort than an n-triples file. breanda, what's the thing generating the RDF/XML file in the first place?
On Sat, Jan 8, 2022 at 11:58 AM Wes Turner <wes.tur...@gmail.com> wrote: > RDFHDT is fast for *reads*; probably faaster than n-triples > https://github.com/RDFLib/rdflib-hdt > > On Fri, Jan 7, 2022 at 8:55 PM Nicholas Car < > nicholas....@surroundaustralia.com> wrote: > >> I guess it depends on how you are producing the RDF/XML file in the first >> place. If you do have control over that, and loading times are really an >> issue, produce an n-triples file as this will load the fastest! >> >> On Sat, Jan 8, 2022 at 5:06 AM Wes Turner <wes.tur...@gmail.com> wrote: >> >>> Out of curiosity, does performance differ with defusedxml in there? >>> >>> (RDF)XML parser complexity really is unnecessary compared to e.g. N3, >>> JSONLD, or RDFHDT. >>> >>> Does performance differ after transforming to a non-XML format? >>> >>> defusedxml should probably be an install_requires dependency because of >>> the XML parser vulnerabilities that it patches: >>> https://pypi.org/project/defusedxml/ >>> >>> On Fri, Jan 7, 2022, 9:58 AM Brendan McMahon <brendan.mcma...@tempus.com> >>> wrote: >>> >>>> Hi Nick, thanks for the response! Yes, the idea you mention is what I >>>> was considering trying next, but I thought I'd ask in here to see if there >>>> were any other ideas about handling this with what the library has built >>>> in. I will report back here with what I do if it works out! Also, the file >>>> is about half a gig. It takes ~25 minutes to parse. >>>> >>>> Thanks, >>>> Brendan >>>> >>>> On Friday, January 7, 2022 at 6:38:43 AM UTC-5 >>>> nichol...@surroundaustralia.com wrote: >>>> >>>>> Hi Brendan, >>>>> >>>>> This is an interesting issue! No I haven't encountered it, but then I >>>>> never use large RDF/XMl graphs. How large is your graph by the way? >>>>> >>>>> If you really think the issue is the getting or testing of elements in >>>>> the RDF DefinedNamespace, couldn't you just clone rdfxml.py and replace >>>>> all >>>>> references to the RDF DefinedNamespace with references to a hard-coded set >>>>> of URIRefs? You could try using that in place of the current rdfxml.py and >>>>> see if there is a speedup. the file's only ~600 lines long, so a find 'n >>>>> replace shouldn't be too impossible. >>>>> >>>>> I would love to know how you go with this, if you try it. If it >>>>> overcomes the problem, we may consider doing such a replacement within >>>>> internal RDFlib files to improve performance and then providing the >>>>> DefinedNamespaces for external use only, i.e. when people define RDFlib >>>>> grapes with g.add() and use FOAF.givenName to represent URIs. >>>>> >>>>> Cheers, >>>>> >>>>> Nick >>>>> >>>>> On Thu, Dec 23, 2021 at 3:32 AM Brendan McMahon <brendan...@tempus.com> >>>>> wrote: >>>>> >>>>>> Dear rdflib contributors and maintainers, >>>>>> >>>>>> I have recently been trying to update rdflib to version 6 from 4.2.2. >>>>>> Upon doing so, a process I normally run, which uses rdflib to load a >>>>>> large >>>>>> xml RDF file into a graph, has a significantly larger memory profile and >>>>>> latency (for my large file, parsing is taking about 1.5x as much time). >>>>>> >>>>>> I've traced the issue back to the graph.parse method. More >>>>>> specifically, by profiling the graph.parse with versions 6.1.1 and >>>>>> 4.2.2, I >>>>>> can see that calls to access members of the RDF class (mostly occurring >>>>>> in >>>>>> the node_element_start method here >>>>>> <https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/parsers/rdfxml.py#L299> >>>>>> as >>>>>> well as the property_element_start method) seem to be taking up a >>>>>> significantly longer time, as they the class is now a DefinedNamespace >>>>>> with >>>>>> overridden __getitem__ and __contains >>>>>> <https://github.com/RDFLib/rdflib/blob/2011a6dd85518642e0800b2ee010a5565e16e5cc/rdflib/namespace/__init__.py#L190> >>>>>> methods with added string checks. >>>>>> >>>>>> Has anyone else experienced this issue? I have been trying to find >>>>>> ways to work around/with the library to lower the latency, but haven't >>>>>> been >>>>>> able to find anything yet. >>>>>> >>>>>> Thanks, >>>>>> Brendan >>>>>> >>>>>> ------------------------------ >>>>>> This email and any attachments may contain confidential and/or >>>>>> privileged information. If you are not the intended recipient of this >>>>>> message or their agent, or if this message has been addressed to you in >>>>>> error, please immediately alert the sender by reply email and then delete >>>>>> this message and any attachments. If you are not the intended recipient, >>>>>> you are hereby notified that any use, dissemination, copying, or storage >>>>>> of >>>>>> this message or its attachments is strictly prohibited. >>>>>> >>>>>> -- >>>>>> http://github.com/RDFLib >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "rdflib-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to rdflib-dev+...@googlegroups.com. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/rdflib-dev/1f616ab0-187b-4af0-aca3-13d436c6dd71n%40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/rdflib-dev/1f616ab0-187b-4af0-aca3-13d436c6dd71n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> >>>> ------------------------------ >>>> This email and any attachments may contain confidential and/or >>>> privileged information. If you are not the intended recipient of this >>>> message or their agent, or if this message has been addressed to you in >>>> error, please immediately alert the sender by reply email and then delete >>>> this message and any attachments. If you are not the intended recipient, >>>> you are hereby notified that any use, dissemination, copying, or storage of >>>> this message or its attachments is strictly prohibited. >>>> >>>> -- >>>> http://github.com/RDFLib >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "rdflib-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to rdflib-dev+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/rdflib-dev/370e6b5c-06a1-4b4e-a6bc-b9ad492d7362n%40googlegroups.com >>>> <https://groups.google.com/d/msgid/rdflib-dev/370e6b5c-06a1-4b4e-a6bc-b9ad492d7362n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >>> http://github.com/RDFLib >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "rdflib-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to rdflib-dev+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/rdflib-dev/CACfEFw8JmMkF-wqPQJ9ideTYxJJU-NmBo%2BDu8AdZ35OXRZmBGw%40mail.gmail.com >>> <https://groups.google.com/d/msgid/rdflib-dev/CACfEFw8JmMkF-wqPQJ9ideTYxJJU-NmBo%2BDu8AdZ35OXRZmBGw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> http://github.com/RDFLib >> --- >> You received this message because you are subscribed to the Google Groups >> "rdflib-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to rdflib-dev+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh1P6V2TGhDZ0bnMhgHkcBV1E_4d8b5YagG3Pmr4kv1frg%40mail.gmail.com >> <https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh1P6V2TGhDZ0bnMhgHkcBV1E_4d8b5YagG3Pmr4kv1frg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > http://github.com/RDFLib > --- > You received this message because you are subscribed to the Google Groups > "rdflib-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to rdflib-dev+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/rdflib-dev/CACfEFw_iVQ35sQVGYo3j3Qy9SnCFdu5e9%2Btc%2B4FhueyME15tsw%40mail.gmail.com > <https://groups.google.com/d/msgid/rdflib-dev/CACfEFw_iVQ35sQVGYo3j3Qy9SnCFdu5e9%2Btc%2B4FhueyME15tsw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- http://github.com/RDFLib --- You received this message because you are subscribed to the Google Groups "rdflib-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh2fbnLwZ%2Bxzm0KFeDgEQi5xija%3D%2B6a5LAUySc3As60zFw%40mail.gmail.com.