Hi all, I'm trying to decide between the following two options and haven't been able to get my answer doing some google searches.
I am writing an application which requires importing MANY, LARGE (not exactly sure how many or how big yet...) xml files into jackrabbit. My concerns are two fold: 1. I need to use Lucene to index these xml files and I want to be able to run xpath queries on these data (the hierarchy of my content is important) 2. I want the process of importing (which will happen frequently) to be as fast as possible. I am not sure which of my two solutions work? solutions: 1. Import all the xml files using jackrabbit's xml import api - This keeps the structure of the xml content but it's presumably slow. I'm not sure what's the overhead. I wonder if anyone has done any profiling for jackrabbit 1.4. Are there tweaks that can make this process faster? 2. Import all the xml files' content as plain strings - I believe this will prevent lucene/jackrabbit to be aware of the hierarchy of the data, but I'm NOT sure. Would the imports be faster in this case? Would they be a lot faster? Would searching the content be as accurate as the first scenario? Any help is very much appreciated. Rokham S. -- View this message in context: http://www.nabble.com/jackrabbit%27s-xml-import-overhead-tp18388305p18388305.html Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
