hey Simone, I was now wondering if isn't better to import my xml doc in a database and working with mysql.
I guess it is faster to scan a mysql database with java rather than a xml doc, what do you think ? I'm using Digester combined with Apache Lucene to perform queries (all together they are 65MBs in a xml file) against a collection (65MBs in XML again). thanks On 28 March 2011 17:20, Simone Tripodi <[email protected]> wrote: > Hi Patrick, > take a look at this example[1]: all you have to do is obtaining a > ContentHandler instance as shown, then invoking SAX events while > parsing the original document. > It's more efficient and consumes less memory > Simo > > [1] http://www.stylusstudio.com/xmldev/200502/post20440.html > > http://people.apache.org/~simonetripodi/ > http://www.99soft.org/ > > > > On Mon, Mar 28, 2011 at 4:56 PM, Patrick Diviacco > <[email protected]> wrote: > > hi! > > > > What should I use instead of StringBuffer ? > > > > Any example or tutorial ? > > > > thanks > > Patrick > > > > On 28 March 2011 16:53, Simone Tripodi <[email protected]> wrote: > > > >> Hi Patrick, > >> nice to know you quickly fixed the issue before anybody could have > >> provided his help! :) > >> > >> As a side note, I would suggest you taking in consideration a > >> different solution for the XML generation rather the StringBuffer, > >> since you're parsing large dataset, streaming data while parsing > >> would improve the performances and reduce the consumed memory. > >> > >> Just my 2 cents, have a nice day, > >> Simo > >> > >> http://people.apache.org/~simonetripodi/ > >> http://www.99soft.org/ > >> > >> > >> > >> On Mon, Mar 28, 2011 at 2:28 PM, Patrick Diviacco > >> <[email protected]> wrote: > >> > I've solved. the issue was a row in train.xml file. To solve the issue > >> I've > >> > printed the source file rows while processing. However it has been > >> possible > >> > only because the parsing takes 4 minutes. > >> > > >> > I'm wondering how to debug such issues with a much bigger text file. > >> > > >> > thanks > >> > > >> > On 28 March 2011 14:14, Patrick Diviacco <[email protected]> > >> wrote: > >> > > >> >> And these are the files: > >> >> > >> >> http://dl.dropbox.com/u/72686/test.xml > >> >> > >> >> http://dl.dropbox.com/u/72686/train.xml > >> >> > >> >> thanks > >> >> > >> >> > >> >> On 28 March 2011 14:13, Patrick Diviacco <[email protected] > >> >wrote: > >> >> > >> >>> Hi, > >> >>> > >> >>> I've a 74MB xml document and I've split it into 2 docs:52MB and 22MB > >> >>> respectively. > >> >>> > >> >>> I'm parsing the file using common Digester library, and everything > >> works > >> >>> perfectly for the small file, but I get a NullPointerExceptio with > the > >> big > >> >>> one. > >> >>> > >> >>> I don't think the issue is the code because it works for the small > >> file... > >> >>> I guess the problem is with the file itself. > >> >>> > >> >>> I've parsed the files with the same parser, so I don't think the > files > >> >>> have issues either. > >> >>> > >> >>> In conclusion I dunno where the issue is. This is the code: > >> >>> http://pastie.org/1726063 > >> >>> > >> >>> This is the exception > >> >>> SEVERE: End event threw exception > >> >>> java.lang.reflect.InvocationTargetException > >> >>> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > >> >>> at > >> >>> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> >>> at java.lang.reflect.Method.invoke(Method.java:597) > >> >>> at > >> >>> > >> > org.apache.commons.beanutils.MethodUtils.invokeMethod(MethodUtils.java:216) > >> >>> at > org.apache.commons.digester.SetNextRule.end(SetNextRule.java:220) > >> >>> at org.apache.commons.digester.Rule.end(Rule.java:257) > >> >>> at > org.apache.commons.digester.Digester.endElement(Digester.java:1345) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:601) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2938) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) > >> >>> at org.apache.commons.digester.Digester.parse(Digester.java:1871) > >> >>> at CentroidGenerator.main(CentroidGenerator.java:137) > >> >>> Caused by: java.lang.NullPointerException > >> >>> at CentroidGenerator.nextItem(CentroidGenerator.java:62) > >> >>> ... 19 more > >> >>> Exception in thread "main" java.lang.NullPointerException > >> >>> at > >> >>> > >> > org.apache.commons.digester.Digester.createSAXException(Digester.java:3363) > >> >>> at > >> >>> > >> > org.apache.commons.digester.Digester.createSAXException(Digester.java:3389) > >> >>> at > org.apache.commons.digester.Digester.endElement(Digester.java:1348) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:601) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2938) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) > >> >>> at > >> >>> > >> > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) > >> >>> at org.apache.commons.digester.Digester.parse(Digester.java:1871) > >> >>> at CentroidGenerator.main(CentroidGenerator.java:137) > >> >>> Caused by: java.lang.NullPointerException > >> >>> at CentroidGenerator.nextItem(CentroidGenerator.java:62) > >> >>> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > >> >>> at > >> >>> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> >>> at java.lang.reflect.Method.invoke(Method.java:597) > >> >>> at > >> >>> > >> > org.apache.commons.beanutils.MethodUtils.invokeMethod(MethodUtils.java:216) > >> >>> at org.apache.commons.digester.SetNextRule.end(SetNextRule.java:220) > >> >>> at org.apache.commons.digester.Rule.end(Rule.java:257) > >> >>> at > org.apache.commons.digester.Digester.endElement(Digester.java:1345) > >> >>> ... 12 more > >> >>> > >> >>> thanks > >> >>> > >> >> > >> >> > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
