Re: [basex-talk] large number of xml files
Hi Sateesh, is saw that you sent dirk an XQuery file, is it the same that takes that much memory? In case yes we will see if we can help with that :) Kind Regards Michael Am 20.08.2012 um 14:34 schrieb sateesh sate...@intense.in: Hi Michael, I created the collection of 2k xml's as per your previous mail and tried executing the query,even though after creating the collection also the memory consumption is high(700MB of heap memory) and also it is taking 3 mins of time for processing. Thanks Regards Sateesh.A -Original Message- From: Michael Seiferle [mailto:m...@basex.org] Sent: Monday, August 20, 2012 2:33 PM To: sateesh Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] large number of xml files Sateesh, sorry I totally overlooked your last email. I'll reply inline: Am 18.08.2012 um 08:58 schrieb sateesh sate...@intense.in: Hi Micheal, I have tried to implemet your suggested changes , but I got struck as the 10k xml's which I have to query on comes from different folders,and also one more question is how do I create collections using the program before running the query. XQuery at the moment has no possibility to create a collection on the fly, as such you would have to use our Java API [1] or Commandline API [2]. For creating a collection from different folders you would do as follows: create db myDB path/to/files; . creates the database coll with all documents found in the input directory. ADD TO target/ xmldir . adds all files from the xmldir directory to the database in the target path. I hope this helps :-) Kind Regards Michael Thanks Regards Sateesh.A [1] https://github.com/BaseXdb/basex-examples/blob/master/src/main/java/org/base x/examples/query/CreateCollection.java [2] http://docs.basex.org/wiki/Commands ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] large number of xml files
HI Michael, For dirk that was a separate issue(grouping of records) in that also iam facing the memory issue,and in our case of querying on 10k xml's after creating collections also is taking huge memory as mentioned in my previous mail. Waiting for your suggestions ,It would really help me in closing the issue as I am at a crucial stage of the project. Thanks Regards Sateesh.A -Original Message- From: Michael Seiferle [mailto:m...@basex.org] Sent: Tuesday, August 21, 2012 1:52 PM To: sateesh Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] large number of xml files Hi Sateesh, is saw that you sent dirk an XQuery file, is it the same that takes that much memory? In case yes we will see if we can help with that :) Kind Regards Michael Am 20.08.2012 um 14:34 schrieb sateesh sate...@intense.in: Hi Michael, I created the collection of 2k xml's as per your previous mail and tried executing the query,even though after creating the collection also the memory consumption is high(700MB of heap memory) and also it is taking 3 mins of time for processing. Thanks Regards Sateesh.A -Original Message- From: Michael Seiferle [mailto:m...@basex.org] Sent: Monday, August 20, 2012 2:33 PM To: sateesh Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] large number of xml files Sateesh, sorry I totally overlooked your last email. I'll reply inline: Am 18.08.2012 um 08:58 schrieb sateesh sate...@intense.in: Hi Micheal, I have tried to implemet your suggested changes , but I got struck as the 10k xml's which I have to query on comes from different folders,and also one more question is how do I create collections using the program before running the query. XQuery at the moment has no possibility to create a collection on the fly, as such you would have to use our Java API [1] or Commandline API [2]. For creating a collection from different folders you would do as follows: create db myDB path/to/files; . creates the database coll with all documents found in the input directory. ADD TO target/ xmldir . adds all files from the xmldir directory to the database in the target path. I hope this helps :-) Kind Regards Michael Thanks Regards Sateesh.A [1] https://github.com/BaseXdb/basex-examples/blob/master/src/main/java/org/base x/examples/query/CreateCollection.java [2] http://docs.basex.org/wiki/Commands ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Basex editing performance test
Hi Yoann, my initial assumption would be that the culprit for the performance drop is the used file system (NTFS? ext3?). If 80,000 databases are created, your db directory will contain 80,000 directories, which is quite a lot for usual file systems. Some alternatives (e.g. XFS, maybe ReiserFS, or ReFS in Windows 8) may give you better results here. Another, more general, approach is to cluster your databases and find a good tradeoff between the number and the size of your databases. As your results have already shown, there's hardly any difference if 1 or 1,000 databases are created - but it will hardly be possible to get satisfying results with 1M databases. Hope this helps, Christian ___ Continuing on my recent question about using multiple databases, we've been running some performance test on basex. I don't have the detail of the computer that hosted these test but they were all done on the same computer. Test was to edit a node's value from a php script in a database a million time, the database being choosen randomly in X amount of databases created. (this simulates multiple users accessing our app) 2 DB Total time : 1517.95 seconds for 100 iterations with 2 DB Min time : 1.17 ms with database 1 Max time : 704.72 ms with database 0 Mean time : 1.52 ms 10 000 DB Total time : 1515.75 seconds for 100 iterations with 10 000 DB Min time : 1.21 ms with database 8879 Max time : 645.16 ms with database 3822 Mean time : 1.52 ms 20 000 DB Total time : 1680.29 seconds for 100 iterations with 20 000 DB Min time : 1.18 ms with database 3749 Max time : 285.49 ms with database 6518 Mean time : 1.68 ms 40 000 DB Total time : 1813.53 seconds for 100 iterations with 40 000 DB Min time : 1.04 ms with database 786 Max time : 212.2 ms with database 6949 Mean time : 1.81 ms 80 000 db - test 1 Total time : 24728.94 seconds for 100 iterations with 80 000 DB Min time : 1.16 ms with database 25693 Max time : 2433.44 ms with database 22021 Mean time : 24.73 ms 80 000 db - test 2 Total time : 18661.74 seconds for 100 iterations with 80 000 DB Min time : 1.68 ms with database 5979 Max time : 1936.4 ms with database 30239 Mean time : 18.66 ms We can just see that there is an important difference from 40k to 80k databases. We haven't checked other mean method to see if this was due to a few edit actions. Does any one tried to have many databases and at some point reached a certain limit? In oder to do server sizing what is key for these actions? processor?ram? Thanks for your help! -- Yoann Maingon mydatalinx 0664324966 ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk