Re: [basex-talk] Coding help
Hi Greg, So, to be clear and succinct, the goal is to create a single XML file containing all XML files that have a predefined text string match in them, yes? If so, I'm wondering if creating any database is necessary. A single pass through all the files, searching for the text string, and appending matched files as you go seems sufficient. R On Mon, Aug 5, 2019, 08:42 Greg Kawchuk wrote: > Hi everyone, > I'm wondering if someone could provide what I think is a brief script for > a scientific project to do the following. > The goal is to identify XML documents from a very large collection that > would be too big to load into a database all at once. > > Here is how I see the functions provided by the code. > 1. In the script, the user could enter the path of the target folder (with > millions of XML documents). > 2. In the script, the user would enter the number of documents to load > into a database at a given time (i =. 1,000) depending on memory > limitations. > 3. The code would then create a temporary database from the first (i) xml > files in the target folder. > 4. The code would then search the 1000 xml documents in the database for a > pre-defined text string. > 5. If hits exist for the text string, the code would write those documents > to a unique XML file. > 6. Clear the database. > 7. Read in the next 1000 files (or remaining files in the folder). > 8. Return to #4. > > There would be no need to append XML files in step 5. The resulting XML > files could be concatenated afterwards. > Thank you in advance. If you have any questions, please feel free to email > me here. > Greg > > *** > Greg Kawchuk BSC, DC, MSc, PhD. > Professor, Faculty of Rehabilitation Medicine > University of Alberta > greg.kawc...@ualberta.ca > 780-492-6891 >
Re: [basex-talk] Coding help
Am 05.08.2019 um 08:41 schrieb Greg Kawchuk: Hi everyone, I'm wondering if someone could provide what I think is a brief script for a scientific project to do the following. The goal is to identify XML documents from a very large collection that would be too big to load into a database all at once. Here is how I see the functions provided by the code. 1. In the script, the user could enter the path of the target folder (with millions of XML documents). 2. In the script, the user would enter the number of documents to load into a database at a given time (i =. 1,000) depending on memory limitations. 3. The code would then create a temporary database from the first (i) xml files in the target folder. 4. The code would then search the 1000 xml documents in the database for a pre-defined text string. What kind of search is that exactly? Does it depend on any database related features at all or can't you just use BaseX as a standalone XQuery processor? 5. If hits exist for the text string, the code would write those documents to a unique XML file. What kind of structure would that unique file have, simply {collection('foo')[1 to 1000][condition]} 6. Clear the database. 7. Read in the next 1000 files (or remaining files in the folder). 8. Return to #4. There would be no need to append XML files in step 5. The resulting XML files could be concatenated afterwards. Thank you in advance. If you have any questions, please feel free to email me here.
[basex-talk] Coding help
Hi everyone, I'm wondering if someone could provide what I think is a brief script for a scientific project to do the following. The goal is to identify XML documents from a very large collection that would be too big to load into a database all at once. Here is how I see the functions provided by the code. 1. In the script, the user could enter the path of the target folder (with millions of XML documents). 2. In the script, the user would enter the number of documents to load into a database at a given time (i =. 1,000) depending on memory limitations. 3. The code would then create a temporary database from the first (i) xml files in the target folder. 4. The code would then search the 1000 xml documents in the database for a pre-defined text string. 5. If hits exist for the text string, the code would write those documents to a unique XML file. 6. Clear the database. 7. Read in the next 1000 files (or remaining files in the folder). 8. Return to #4. There would be no need to append XML files in step 5. The resulting XML files could be concatenated afterwards. Thank you in advance. If you have any questions, please feel free to email me here. Greg *** Greg Kawchuk BSC, DC, MSc, PhD. Professor, Faculty of Rehabilitation Medicine University of Alberta greg.kawc...@ualberta.ca 780-492-6891