hey Simone,
I was now wondering if isn't better to import my xml doc in a database and
working with mysql.
I guess it is faster to scan a mysql database with java rather than a xml
doc, what do you think ?
I'm using Digester combined with Apache Lucene to perform queries (all
together they are
Hi Patrick,
I'd say: it depends! I don't know the domain you're working on, I'd
say once you import the XML into Lucene index you don't need the XML
anymore.
Do you need data have to be persisted to be reused in a second time?
So use a DB.
Do you need analyze documents just to populate the Lucene
And these are the files:
http://dl.dropbox.com/u/72686/test.xml
http://dl.dropbox.com/u/72686/train.xml
thanks
On 28 March 2011 14:13, Patrick Diviacco patrick.divia...@gmail.com wrote:
Hi,
I've a 74MB xml document and I've split it into 2 docs:52MB and 22MB
respectively.
I'm parsing
I've solved. the issue was a row in train.xml file. To solve the issue I've
printed the source file rows while processing. However it has been possible
only because the parsing takes 4 minutes.
I'm wondering how to debug such issues with a much bigger text file.
thanks
On 28 March 2011 14:14,
Hi Patrick,
nice to know you quickly fixed the issue before anybody could have
provided his help! :)
As a side note, I would suggest you taking in consideration a
different solution for the XML generation rather the StringBuffer,
since you're parsing large dataset, streaming data while parsing
hi!
What should I use instead of StringBuffer ?
Any example or tutorial ?
thanks
Patrick
On 28 March 2011 16:53, Simone Tripodi simonetrip...@apache.org wrote:
Hi Patrick,
nice to know you quickly fixed the issue before anybody could have
provided his help! :)
As a side note, I would
Hi Patrick,
take a look at this example[1]: all you have to do is obtaining a
ContentHandler instance as shown, then invoking SAX events while
parsing the original document.
It's more efficient and consumes less memory
Simo
[1] http://www.stylusstudio.com/xmldev/200502/post20440.html