Tim,

You may want to try assigning different sml:baseURI values in the sml:ConvertXMLToRDF for each iteration.

Gokhan


On 3/16/2012 2:10 PM, Tim Smith wrote:
More investigation into the Convert XML to RDF module crashing.

This only seems to occur when running multiple threads. As long as I only use one thread it will process the files correctly, albeit slowly.

I guess this is one of the watchouts for using multiple threads!

Tim


On Fri, Mar 16, 2012 at 4:31 PM, Tim Smith <[email protected] <mailto:[email protected]>> wrote:

    Hi Gohkan,

    Per your suggestion, I added the Apply Construct module to the end
    of the body.  It's running faster overall, but I'm still seeing
    the slow down behavior.

    In addition, I've been trying different directories and the
    Convert XML to RDF module is crashing (stack trace below) on one
    of the XML instance files.  Unfortunately I cannot tell which one
    even when running in Debug mode.  The file name is in a variable
    bound from the Iterate module but I do not know how to make that
    display on the console as the script executes.  Since there are so
    many files, I don't really want to put in a break point and
    manually step through until it crashes.

    Is there a way to display the bound variables as the script executes?

    Thanks,

    Tim

    java.lang.reflect.InvocationTargetException
        at
    
org.topbraidcomposer.sparqlmotion.actions.AbstractExecuteSPARQLMotionAction$1.run(AbstractExecuteSPARQLMotionAction.java:148)
        at
    org.topbraidcomposer.core.util.ThreadUtil$1$1.run(ThreadUtil.java:64)
        at java.lang.Thread.run(Unknown Source)
    Caused by: org.topbraid.spin.sparqlmotion.modules.SMException:
    Failed to convert XML file using Semantic XML
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.ConvertXMLToRDFModule.createGraph(ConvertXMLToRDFModule.java:53)
        at
    
org.topbraid.spin.sparqlmotion.modules.AbstractSMModule.getRDFOutput(AbstractSMModule.java:849)
        at
    
org.topbraid.spin.sparqlmotion.engine.impl.ExecutionEngineImpl.executeModule(ExecutionEngineImpl.java:175)
        at
    
org.topbraid.spin.sparqlmotion.engine.impl.ExecutionEngineImpl.execute(ExecutionEngineImpl.java:120)
        at
    
org.topbraid.spin.sparqlmotion.modules.AbstractSMModule.executeSubScript(AbstractSMModule.java:292)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.IterateOverSelectModule.access$0(IterateOverSelectModule.java:1)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.IterateOverSelectModule$1.run(IterateOverSelectModule.java:175)
        ... 1 more
    Caused by: java.util.ConcurrentModificationException
        at
    com.hp.hpl.jena.mem.HashCommon$BasicKeyIterator.hasNext(HashCommon.java:338)
        at
    com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:87)
        at
    
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
        at
    com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:43)
        at
    
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
        at
    com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:86)
        at
    
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
        at
    com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:43)
        at
    
com.hp.hpl.jena.graph.compose.CompositionBase$2.hasNext(CompositionBase.java:99)
        at
    com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:86)
        at
    
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
        at
    com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:86)
        at
    
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
        at
    com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:43)
        at
    
com.hp.hpl.jena.graph.compose.CompositionBase$2.hasNext(CompositionBase.java:99)
        at
    com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:86)
        at
    
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:64)
        at
    com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:86)
        at
    
com.hp.hpl.jena.graph.query.SimpleQueryHandler.subjectsFor(SimpleQueryHandler.java:61)
        at
    
com.hp.hpl.jena.graph.query.SimpleQueryHandler.subjectsFor(SimpleQueryHandler.java:44)
        at
    com.hp.hpl.jena.rdf.model.impl.ModelCom.listSubjectsFor(ModelCom.java:1019)
        at
    
com.hp.hpl.jena.rdf.model.impl.ModelCom.listResourcesWithProperty(ModelCom.java:1033)
        at
    
com.hp.hpl.jena.rdf.model.impl.ModelCom.listSubjectsWithProperty(ModelCom.java:433)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.getExistingURISubject(XML2RDF.java:593)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.getExistingURISubject(XML2RDF.java:588)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.getAnnotatedElementClass(XML2RDF.java:351)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.getElementType(XML2RDF.java:504)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:141)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:236)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:236)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:236)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:236)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:236)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createElement(XML2RDF.java:126)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.sxml.XML2RDF.createDocument(XML2RDF.java:119)
        at
    org.topbraid.sxml.mapping.XML2RDFLoader.load(XML2RDFLoader.java:77)
        at
    
org.topbraid.sparqlmotion.lib.convertXMLToRDF.ConvertXMLToRDFModule.load(ConvertXMLToRDFModule.java:24)
        at
    
org.topbraid.spin.sparqlmotion.lib.internal.ConvertXMLToRDFModule.createGraph(ConvertXMLToRDFModule.java:50)
        ... 7 more



    On Fri, Mar 16, 2012 at 3:44 PM, Tim Smith <[email protected]
    <mailto:[email protected]>> wrote:

        One small correction - I'm using an Iterate Over Select
        module, not Bind by Select to process each file.

        Thanks,

        Tim



        On Fri, Mar 16, 2012 at 3:35 PM, Tim Smith
        <[email protected] <mailto:[email protected]>> wrote:

            Hi,

            I'm attempting to process ~250 XML files into RDF.  I
            created a schema for the files using XMLSpy and imported
            the schema into TBC using the XSD importer.  This created
            two .ttl files.

            I created an SM script that iterates over the files using
            tops:files via a bind by select module.  Prior to the Bind
            by Select, I import the schema ontologies and my target
            ontology.  In the body, I import each XML file, convert it
            to RDF and then run a series of CONSTRUCT queries to map
            each file into the target ontology.  The combination of
            all triples generated is then saved to disk.

            The script works fine if I only run through a small number
            of files.  However, if I try to hit all 250 at once, it
            just runs slower and slower and slower...  The slow part
            seems to be the CONSTRUCT queries.  They run fast
            initially but slow significantly after 10-20 files.  For
            every file that I have manually tested by running the
            CONSTRUCT query in the SPARQL view, the query has always
            run very fast so I do not know why performance is so poor
            running as an SM script.

            Any suggestions?  Are there things I can do to speed this
            along?  Is there data that I can collect to better inform you?

            My current work around is to process each directory
            individually but even that hits the problem because some
            directories have 10's of files (not to mention the obvious
            hassle of changing the script - file names, base URIs,
            etc... for each directory)

            I'm using 3.6B on win7/64 with 5G allocated to the JVM.

            Thanks,

            Tim




--
You received this message because you are subscribed to the Google
Group "TopBraid Suite Users", the topics of which include Enterprise Vocabulary Network (EVN), TopBraid Composer,
TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
To post to this group, send email to
[email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/topbraid-users?hl=en

--
You received this message because you are subscribed to the Google
Group "TopBraid Suite Users", the topics of which include Enterprise Vocabulary 
Network (EVN), TopBraid Composer,
TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
To post to this group, send email to
[email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/topbraid-users?hl=en

Reply via email to