Hi Brendan, This is an interesting issue! No I haven't encountered it, but then I never use large RDF/XMl graphs. How large is your graph by the way?
If you really think the issue is the getting or testing of elements in the RDF DefinedNamespace, couldn't you just clone rdfxml.py and replace all references to the RDF DefinedNamespace with references to a hard-coded set of URIRefs? You could try using that in place of the current rdfxml.py and see if there is a speedup. the file's only ~600 lines long, so a find 'n replace shouldn't be too impossible. I would love to know how you go with this, if you try it. If it overcomes the problem, we may consider doing such a replacement within internal RDFlib files to improve performance and then providing the DefinedNamespaces for external use only, i.e. when people define RDFlib grapes with g.add() and use FOAF.givenName to represent URIs. Cheers, Nick On Thu, Dec 23, 2021 at 3:32 AM Brendan McMahon <brendan.mcma...@tempus.com> wrote: > Dear rdflib contributors and maintainers, > > I have recently been trying to update rdflib to version 6 from 4.2.2. Upon > doing so, a process I normally run, which uses rdflib to load a large xml > RDF file into a graph, has a significantly larger memory profile and > latency (for my large file, parsing is taking about 1.5x as much time). > > I've traced the issue back to the graph.parse method. More specifically, > by profiling the graph.parse with versions 6.1.1 and 4.2.2, I can see that > calls to access members of the RDF class (mostly occurring in the > node_element_start > method here > <https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/parsers/rdfxml.py#L299> > as > well as the property_element_start method) seem to be taking up a > significantly longer time, as they the class is now a DefinedNamespace with > overridden __getitem__ and __contains > <https://github.com/RDFLib/rdflib/blob/2011a6dd85518642e0800b2ee010a5565e16e5cc/rdflib/namespace/__init__.py#L190> > methods with added string checks. > > Has anyone else experienced this issue? I have been trying to find ways to > work around/with the library to lower the latency, but haven't been able to > find anything yet. > > Thanks, > Brendan > > ------------------------------ > This email and any attachments may contain confidential and/or privileged > information. If you are not the intended recipient of this message or their > agent, or if this message has been addressed to you in error, please > immediately alert the sender by reply email and then delete this message > and any attachments. If you are not the intended recipient, you are hereby > notified that any use, dissemination, copying, or storage of this message > or its attachments is strictly prohibited. > > -- > http://github.com/RDFLib > --- > You received this message because you are subscribed to the Google Groups > "rdflib-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to rdflib-dev+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/rdflib-dev/1f616ab0-187b-4af0-aca3-13d436c6dd71n%40googlegroups.com > <https://groups.google.com/d/msgid/rdflib-dev/1f616ab0-187b-4af0-aca3-13d436c6dd71n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- http://github.com/RDFLib --- You received this message because you are subscribed to the Google Groups "rdflib-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh1HCozpam6%2B%2BGr1C3829T0e%3D%2BnkFrBm%3DES2ZCjC1SgKgg%40mail.gmail.com.