[basex-talk] BaseX Scala Client 0.5 released, this time with documentation
Hi all, I finally found / forcefully took the time and documented the Scala Client library. The new release mostly brings documentation and uses BaseX 7.3. We have been using it for 5 months now and it works nicely. One interesting feature if you're dealing with queries that return large quantities of documents (and use Scala...) is the StreamingClientSession which doesn't cache incoming results but directly passes them on to be consumed by the client code. Check it out here: https://github.com/delving/basex-scala-client Cheers, Manuel ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Performance of Add command
Hi again, inserting 3M records now seems to take a lot less time - I'm running an insertion for the past 40 minutes now and it's close to finishing (2.8M records so far). I have the impression that it gets slower with the amount of size still, but much less so - but I couldn't put a finger on any particular method call with YourKit (it started with a whopping 15K documents / second, and now is at 300 documents / second) I'll leave the computer running and see tomorrow how much time it took in total (and give a detail on what calls took how long), but in any case this is a huge improvement over how it used to be, thanks a lot! Manuel On Mon, Jul 9, 2012 at 2:04 PM, Manuel Bernhardt bernhardt.man...@gmail.com wrote: Hi Christian, thanks for the fix! I'll test it right away on a big import. We don't have that many namespaces in those documents but the general idea is to keep them, so we won't be using the STRIPNS feature for the time being (though we might in the future, depending on the use-case) Thanks, Manuel On Sat, Jul 7, 2012 at 4:45 PM, Christian Grün christian.gr...@gmail.com wrote: …the problem should now be fixed. I'd be glad if you could once more test the import you've been discussing in your report with the latest code base/snapshot. Thanks in advance, Christian ___ On Sat, Jun 30, 2012 at 7:01 PM, Manuel Bernhardt bernhardt.man...@gmail.com wrote: Hi, I'm doing some testing before migration one of our customers to a new version of our platform that uses BaseX in order to store documents. They have approx. 4M documents, and I'm running an import operation on a 1 M document collection on my laptop. The way I'm inserting documents is by firing off one Add command per document, based on a stream of the document, at a different (unique) path for each document, and flushing every at 10K Adds. Since most CPU usage (for one of the cores, the other ones being untouched) is taken by the BaseX server, I fired up YourKit out of curiosity to see where the CPU time was spent. My machine is a 2*4 core MacBook Pro with 8GB of RAM and SSD, so I think hardware-wise it should do pretty fine. YourKit shows that what seems to use up most time is the Namespaces.update method: Thread-12 [RUNNABLE] CPU time: 2h 7m 9s org.basex.data.Namespaces.update(NSNode, int, int, boolean, Set) org.basex.data.Namespaces.update(int, int, boolean, Set) org.basex.data.Data.insert(int, int, Data) org.basex.core.cmd.Add.run() org.basex.core.Command.run(Context, OutputStream) org.basex.core.Command.exec(Context, OutputStream) org.basex.core.Command.execute(Context, OutputStream) org.basex.core.Command.execute(Context) org.basex.server.ClientListener.execute(Command) org.basex.server.ClientListener.add() org.basex.server.ClientListener.run() I'm not really sure what that method does - it's a recursive function and seems to be triggered by Data.insert: // NSNodes have to be checked for pre value shifts after insert nspaces.update(ipre, dsize, true, newNodes); The whole set of records should have no more than 5 different namespaces in total. Thus I'm wondering if there would perhaps be some potential for optimization here? Note that I'm completely ignorant as to what the method does and what its exact purpose is. Thanks, Manuel PS: the import is now finished: Storing 1001712 records into BaseX took 9285008 ms ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Performance of Add command
(Context, OutputStream) org.basex.core.Command.exec(Context, OutputStream) org.basex.core.Command.execute(Context, OutputStream) org.basex.core.Command.execute(Context) org.basex.server.ClientListener.execute(Command) org.basex.server.ClientListener.add() org.basex.server.ClientListener.run() Thread-30 --- Frozen for at least 51s org.basex.index.resource.Docs.insert(int, Data) org.basex.index.resource.Resources.insert(int, Data) org.basex.data.Data.insert(int, int, Data) org.basex.core.cmd.Add.run() org.basex.core.Command.run(Context, OutputStream) org.basex.core.Command.exec(Context, OutputStream) org.basex.core.Command.execute(Context, OutputStream) org.basex.core.Command.execute(Context) org.basex.server.ClientListener.execute(Command) org.basex.server.ClientListener.add() org.basex.server.ClientListener.run() I'm not exactly sure which of the above are relevant, but I thought I'd share them anyway. I'll try to get some better measurements tomorrow. Manuel On Mon, Jul 9, 2012 at 11:25 PM, Manuel Bernhardt bernhardt.man...@gmail.com wrote: Hi again, inserting 3M records now seems to take a lot less time - I'm running an insertion for the past 40 minutes now and it's close to finishing (2.8M records so far). I have the impression that it gets slower with the amount of size still, but much less so - but I couldn't put a finger on any particular method call with YourKit (it started with a whopping 15K documents / second, and now is at 300 documents / second) I'll leave the computer running and see tomorrow how much time it took in total (and give a detail on what calls took how long), but in any case this is a huge improvement over how it used to be, thanks a lot! Manuel On Mon, Jul 9, 2012 at 2:04 PM, Manuel Bernhardt bernhardt.man...@gmail.com wrote: Hi Christian, thanks for the fix! I'll test it right away on a big import. We don't have that many namespaces in those documents but the general idea is to keep them, so we won't be using the STRIPNS feature for the time being (though we might in the future, depending on the use-case) Thanks, Manuel On Sat, Jul 7, 2012 at 4:45 PM, Christian Grün christian.gr...@gmail.com wrote: …the problem should now be fixed. I'd be glad if you could once more test the import you've been discussing in your report with the latest code base/snapshot. Thanks in advance, Christian ___ On Sat, Jun 30, 2012 at 7:01 PM, Manuel Bernhardt bernhardt.man...@gmail.com wrote: Hi, I'm doing some testing before migration one of our customers to a new version of our platform that uses BaseX in order to store documents. They have approx. 4M documents, and I'm running an import operation on a 1 M document collection on my laptop. The way I'm inserting documents is by firing off one Add command per document, based on a stream of the document, at a different (unique) path for each document, and flushing every at 10K Adds. Since most CPU usage (for one of the cores, the other ones being untouched) is taken by the BaseX server, I fired up YourKit out of curiosity to see where the CPU time was spent. My machine is a 2*4 core MacBook Pro with 8GB of RAM and SSD, so I think hardware-wise it should do pretty fine. YourKit shows that what seems to use up most time is the Namespaces.update method: Thread-12 [RUNNABLE] CPU time: 2h 7m 9s org.basex.data.Namespaces.update(NSNode, int, int, boolean, Set) org.basex.data.Namespaces.update(int, int, boolean, Set) org.basex.data.Data.insert(int, int, Data) org.basex.core.cmd.Add.run() org.basex.core.Command.run(Context, OutputStream) org.basex.core.Command.exec(Context, OutputStream) org.basex.core.Command.execute(Context, OutputStream) org.basex.core.Command.execute(Context) org.basex.server.ClientListener.execute(Command) org.basex.server.ClientListener.add() org.basex.server.ClientListener.run() I'm not really sure what that method does - it's a recursive function and seems to be triggered by Data.insert: // NSNodes have to be checked for pre value shifts after insert nspaces.update(ipre, dsize, true, newNodes); The whole set of records should have no more than 5 different namespaces in total. Thus I'm wondering if there would perhaps be some potential for optimization here? Note that I'm completely ignorant as to what the method does and what its exact purpose is. Thanks, Manuel PS: the import is now finished: Storing 1001712 records into BaseX took 9285008 ms ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Performance of Add command
Hi, On Mon, Jul 2, 2012 at 10:42 AM, Christian Grün christian.gr...@gmail.com wrote: Another note: if your initial database is empty, and if your documents to be added are stored on disk, the operation will be much faster if you specify this directory along with the create command. I had considered looking at this, but in our situation the source is a stream that gets converted on the fly and then sent to the server (which is on a different server than the one doing the inserts). Btw, is there a reason why inserting from a file is faster than from a stream? I'd expect both to use the same insertion mechanism. Thanks, Manuel great, thanks! If there's anything I can do to help, let me know. Right now I think I'm going to abort the import because it probably will take somewhat longer. Manuel On Mon, Jul 2, 2012 at 3:11 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Manuel, sorry for the delayed feedback, and thanks for pointing to the Namespaces.update() method, which in fact updates the hierarchical namespaces structures in a database (well, you guessed that already…). As we first need to do some more research on potential optimizations, I have created a new GitHub issue to keep track of this bottleneck [1]. Thanks, Christian [1] https://github.com/BaseXdb/basex/issues/523 ___ On Sat, Jun 30, 2012 at 7:01 PM, Manuel Bernhardt bernhardt.man...@gmail.com wrote: Hi, I'm doing some testing before migration one of our customers to a new version of our platform that uses BaseX in order to store documents. They have approx. 4M documents, and I'm running an import operation on a 1 M document collection on my laptop. The way I'm inserting documents is by firing off one Add command per document, based on a stream of the document, at a different (unique) path for each document, and flushing every at 10K Adds. Since most CPU usage (for one of the cores, the other ones being untouched) is taken by the BaseX server, I fired up YourKit out of curiosity to see where the CPU time was spent. My machine is a 2*4 core MacBook Pro with 8GB of RAM and SSD, so I think hardware-wise it should do pretty fine. YourKit shows that what seems to use up most time is the Namespaces.update method: Thread-12 [RUNNABLE] CPU time: 2h 7m 9s org.basex.data.Namespaces.update(NSNode, int, int, boolean, Set) org.basex.data.Namespaces.update(int, int, boolean, Set) org.basex.data.Data.insert(int, int, Data) org.basex.core.cmd.Add.run() org.basex.core.Command.run(Context, OutputStream) org.basex.core.Command.exec(Context, OutputStream) org.basex.core.Command.execute(Context, OutputStream) org.basex.core.Command.execute(Context) org.basex.server.ClientListener.execute(Command) org.basex.server.ClientListener.add() org.basex.server.ClientListener.run() I'm not really sure what that method does - it's a recursive function and seems to be triggered by Data.insert: // NSNodes have to be checked for pre value shifts after insert nspaces.update(ipre, dsize, true, newNodes); The whole set of records should have no more than 5 different namespaces in total. Thus I'm wondering if there would perhaps be some potential for optimization here? Note that I'm completely ignorant as to what the method does and what its exact purpose is. Thanks, Manuel PS: the import is now finished: Storing 1001712 records into BaseX took 9285008 ms ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Performance of Add command
Hi, a little update on this: I started the import of 3M documents last evening using this method, and after 9h it's not yet finished (at 2,29M documents atm.). So this operation looks a lot like it is in o(n^2) (the insertion of 1M record took somewhat above 2h) Manuel On Sat, Jun 30, 2012 at 7:01 PM, Manuel Bernhardt bernhardt.man...@gmail.com wrote: Hi, I'm doing some testing before migration one of our customers to a new version of our platform that uses BaseX in order to store documents. They have approx. 4M documents, and I'm running an import operation on a 1 M document collection on my laptop. The way I'm inserting documents is by firing off one Add command per document, based on a stream of the document, at a different (unique) path for each document, and flushing every at 10K Adds. Since most CPU usage (for one of the cores, the other ones being untouched) is taken by the BaseX server, I fired up YourKit out of curiosity to see where the CPU time was spent. My machine is a 2*4 core MacBook Pro with 8GB of RAM and SSD, so I think hardware-wise it should do pretty fine. YourKit shows that what seems to use up most time is the Namespaces.update method: Thread-12 [RUNNABLE] CPU time: 2h 7m 9s org.basex.data.Namespaces.update(NSNode, int, int, boolean, Set) org.basex.data.Namespaces.update(int, int, boolean, Set) org.basex.data.Data.insert(int, int, Data) org.basex.core.cmd.Add.run() org.basex.core.Command.run(Context, OutputStream) org.basex.core.Command.exec(Context, OutputStream) org.basex.core.Command.execute(Context, OutputStream) org.basex.core.Command.execute(Context) org.basex.server.ClientListener.execute(Command) org.basex.server.ClientListener.add() org.basex.server.ClientListener.run() I'm not really sure what that method does - it's a recursive function and seems to be triggered by Data.insert: // NSNodes have to be checked for pre value shifts after insert nspaces.update(ipre, dsize, true, newNodes); The whole set of records should have no more than 5 different namespaces in total. Thus I'm wondering if there would perhaps be some potential for optimization here? Note that I'm completely ignorant as to what the method does and what its exact purpose is. Thanks, Manuel PS: the import is now finished: Storing 1001712 records into BaseX took 9285008 ms ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Performance of Add command
Hi, great, thanks! If there's anything I can do to help, let me know. Right now I think I'm going to abort the import because it probably will take somewhat longer. Manuel On Mon, Jul 2, 2012 at 3:11 AM, Christian Grün christian.gr...@gmail.com wrote: Hi Manuel, sorry for the delayed feedback, and thanks for pointing to the Namespaces.update() method, which in fact updates the hierarchical namespaces structures in a database (well, you guessed that already…). As we first need to do some more research on potential optimizations, I have created a new GitHub issue to keep track of this bottleneck [1]. Thanks, Christian [1] https://github.com/BaseXdb/basex/issues/523 ___ On Sat, Jun 30, 2012 at 7:01 PM, Manuel Bernhardt bernhardt.man...@gmail.com wrote: Hi, I'm doing some testing before migration one of our customers to a new version of our platform that uses BaseX in order to store documents. They have approx. 4M documents, and I'm running an import operation on a 1 M document collection on my laptop. The way I'm inserting documents is by firing off one Add command per document, based on a stream of the document, at a different (unique) path for each document, and flushing every at 10K Adds. Since most CPU usage (for one of the cores, the other ones being untouched) is taken by the BaseX server, I fired up YourKit out of curiosity to see where the CPU time was spent. My machine is a 2*4 core MacBook Pro with 8GB of RAM and SSD, so I think hardware-wise it should do pretty fine. YourKit shows that what seems to use up most time is the Namespaces.update method: Thread-12 [RUNNABLE] CPU time: 2h 7m 9s org.basex.data.Namespaces.update(NSNode, int, int, boolean, Set) org.basex.data.Namespaces.update(int, int, boolean, Set) org.basex.data.Data.insert(int, int, Data) org.basex.core.cmd.Add.run() org.basex.core.Command.run(Context, OutputStream) org.basex.core.Command.exec(Context, OutputStream) org.basex.core.Command.execute(Context, OutputStream) org.basex.core.Command.execute(Context) org.basex.server.ClientListener.execute(Command) org.basex.server.ClientListener.add() org.basex.server.ClientListener.run() I'm not really sure what that method does - it's a recursive function and seems to be triggered by Data.insert: // NSNodes have to be checked for pre value shifts after insert nspaces.update(ipre, dsize, true, newNodes); The whole set of records should have no more than 5 different namespaces in total. Thus I'm wondering if there would perhaps be some potential for optimization here? Note that I'm completely ignorant as to what the method does and what its exact purpose is. Thanks, Manuel PS: the import is now finished: Storing 1001712 records into BaseX took 9285008 ms ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Disable or control query caching
Hi Christian, I just witnessed this again now. There was one processing resulting in a streaming query (though I think there would not have been a big difference if it would have been a cached one) over 9 records, and we uploaded a few small collections after starting that one. Additionally I issued a list statement from another client. What happened next is that: - the process with the long query went on - the uploads were blocked (in queue) - my call on the console was blocked (in queue) - once the long query was done, all other operations proceeded So it looks as though there is some kind of read lock on the server level...? Am I perhaps doing something wrong when starting the long query - e.g. should it be started within some kind of transaction or special context? Thanks, Manuel On Tue, May 22, 2012 at 12:18 AM, Christian Grün christian.gr...@gmail.com wrote: [] There is one thing I noticed however, and that I had noticed earlier on as well when a big collection was being processed: any attempt to talk with the server seems not to be working, i.e. even when I try to connect via the command-line basexadmin and run a command such as list or open db foo, I do not get a reply. [] I'm not quite sure what's the problem. Some questions I get in mind: -- does the problem occur with a single client? -- does no reply mean that your client request is being blocked, or that the returned result is empty? -- can you access your database via the standalone interfaces? Just in case.. Feel free to send a small Java example that demonstrates the issue. Christian ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Best way to insert large amounts of records
Hi Christian, I'd say that your approach is close to an optimal solution, as the ADD command is pretty cheap, compared to e.g. REPLACE. If you believe that you could still run into some bottlenecks, you could have a look at, or provide us, with the output of Java's profiler (e.g. -Xrunhprof:cpu=samples), Ok, I will look into this if we get bitten by performance issues (the longer collections do usually take a fair amount of time to be inserted, at least concurrently). - is there a performance penalty in doing this kind of parsing concurrently? Concurrent operations will be managed by the central transaction manager. At the time of writing this, all write operations are performed one after another, but in near future, concurrent write operations to different databases will also be run in parallel. Excellent news. I noticed things were slowing down when we had multiple collections inserted at the same time, so this should probably help. - are there any JVM parameters that would help speed this up? I In general, Java will be faster when run with -server, but this option may have been chosen anyway by your Java runtime. Regarding the maximum amount of memory, there shouldn't be any noteworthy differences when adding documents. Hope this helps, Christian Thanks! Manuel ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Disable or control query caching
Hi Christian, as you have already seen, all results are first cached by the client if they are requested via the iterative query protocol. In earlier versions of BaseX, results were returned in a purely iterative manner -- which was more convenient and flexible from a user's point of view, but led to numerous deadlocks if reading and writing queries were mixed. If you only need parts of the requested results, I would recommend to limit the number of results via XQuery, e.g. as follows: ( for $i in /record[@version = 0] order by $i/system/index return $i) [position() = 1 to 1000] I had considered this, but haven't used that approach - yet - mainly because I wanted to try the streaming approach first. So far our system only used MongoDB and we are used to working with cursors as query results, so I'm trying to keep that somehow aligned if possible. Next, it is important to note that the order by clause can get very expensive, as all results have to be cached anyway before they can be returned. Our top-k functions will probably give you better results if it's possible in your use case to limit the number of results [1]. Ok, thanks. If this becomes a problem, I'll consider using this. Is the query time of 0.06ms otherwise the actual time the query takes to run? If yes then I'm not too worried about query performance :) In general, the bottleneck in our system is not so much the querying but rather the processing of the records - I started rewriting this one concurrently using Akka, but am now stuck with a classloader deadlock (no pun intended). It will likely take quite some effort for the processing to be faster than the query iteration. A popular alternative to client-side caching (well, you mentioned that already) is to overwrite the code of the query client, and directly process the returned results. Note, however, that you need to loop through all results, even if you only need parts of the results. I implemented this and it looks like it works nicely (to be confirmed soon - I started a run on a 600k records collection). Thanks for your time! Manuel Hope this helps, Christian [1] http://docs.basex.org/wiki/Higher-Order_Functions_Module#hof:top-k-by ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Disable or control query caching
Hello again, I implemented this and it looks like it works nicely (to be confirmed soon - I started a run on a 600k records collection). This runs nicely, in that the machine doesn't run out of memory anymore. There is one thing I noticed however, and that I had noticed earlier on as well when a big collection was being processed: any attempt to talk with the server seems not to be working, i.e. even when I try to connect via the command-line basexadmin and run a command such as list or open db foo, I do not get a reply. I can see the commands in the log though: 17:28:06.532[127.0.0.1:33112] LOGIN admin OK 17:28:08.158[127.0.0.1:33112] LIST 17:28:21.288[127.0.0.1:33114] LOGIN admin OK 17:28:25.602[127.0.0.1:33114] LIST 17:28:52.676[127.0.0.1:33116] LOGIN admin OK Could it be that the long session is blocking the output stream coming from the server? Thanks, Manuel On Mon, May 21, 2012 at 4:40 PM, Manuel Bernhardt bernhardt.man...@gmail.com wrote: Hi Christian, as you have already seen, all results are first cached by the client if they are requested via the iterative query protocol. In earlier versions of BaseX, results were returned in a purely iterative manner -- which was more convenient and flexible from a user's point of view, but led to numerous deadlocks if reading and writing queries were mixed. If you only need parts of the requested results, I would recommend to limit the number of results via XQuery, e.g. as follows: ( for $i in /record[@version = 0] order by $i/system/index return $i) [position() = 1 to 1000] I had considered this, but haven't used that approach - yet - mainly because I wanted to try the streaming approach first. So far our system only used MongoDB and we are used to working with cursors as query results, so I'm trying to keep that somehow aligned if possible. Next, it is important to note that the order by clause can get very expensive, as all results have to be cached anyway before they can be returned. Our top-k functions will probably give you better results if it's possible in your use case to limit the number of results [1]. Ok, thanks. If this becomes a problem, I'll consider using this. Is the query time of 0.06ms otherwise the actual time the query takes to run? If yes then I'm not too worried about query performance :) In general, the bottleneck in our system is not so much the querying but rather the processing of the records - I started rewriting this one concurrently using Akka, but am now stuck with a classloader deadlock (no pun intended). It will likely take quite some effort for the processing to be faster than the query iteration. A popular alternative to client-side caching (well, you mentioned that already) is to overwrite the code of the query client, and directly process the returned results. Note, however, that you need to loop through all results, even if you only need parts of the results. I implemented this and it looks like it works nicely (to be confirmed soon - I started a run on a 600k records collection). Thanks for your time! Manuel Hope this helps, Christian [1] http://docs.basex.org/wiki/Higher-Order_Functions_Module#hof:top-k-by ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
[basex-talk] Best way to insert large amounts of records
Hi, we're using BaseX to store multiple collections of documents (we call them records). These record are produced programmatically, by parsing an incoming stream on a server application and turning it into a document of the kind record id=123 version=1 ... /record So far I took the following approach: - each collection of records is its own database in BaseX, for easier management - on insertion - set the session's autoflush to false - iterate over record - add them via add(id, document) - each 1 records, flush - finally, flush once more - create the attributes index So for example now we have: name Resources Size Input Path col1 14141 19815190 col2 14750 16697081 col384450 253593687 col4 10124772107593252 col5 126058 186315175 col6 13767 14640701 col7815991 730536864 col8 31189 39598405 col9 24733 91277637 col10 171906 202392553 ... and there'll be quite a bit more coming in. This kind of bulk insertion can also happen concurrently (I've set-up an actor pool at five for the moment). My questions are: - is this the most performant approach, or would it make sense to e.g. build one stream on the fly and somehow turn it into an inputstream to be sent via add? - is there a performance cost in adding with an ID? We don't really need them since we retrieve records via a query - and those resources aren't really files on the file-system - is there a performance penalty in doing this kind of parsing concurrently? - are there any JVM parameters that would help speed this up? I haven't quite found how to pass in JVM parameters when starting basexserver via the command line. Looks like BaseX gave itself an Xmx of 1866006528 (but that machine has 8GB so it could in theory get more. Thanks! Manuel ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
[basex-talk] Scala client library for BaseX
Hi, I'd like to announce the first release of a scala client library for BaseX, which simplifies the idiomatic usage of BaseX within Scala applications. It's likely going to evolve quite a bit over the next weeks since we're in the process of learning how to best use BaseX. The source is available here: https://github.com/delving/basex-scala-client And I'll try to find some time to write some documentation this week-end. Comments, feedback etc. are of course very welcome! Cheers, Manuel ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
[basex-talk] Preferred way to run BaseX as service on Debian
Hi, is there perhaps an init.d script somewhere already in order to launch basexserver as a service on Debian? So far it looks as though there isn't one in the Debian package, so I'm thinking of adding a line to rc.local to run it on startup. Also, from what I gathered, basex is now only available in sid, is that correct? I installed it on squeeze by downloading the deb, there's just one dependency on java-wrappers that I needed to install by hand. Thanks, Manuel ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Preferred way to run BaseX as service on Debian
Hi, Thanks for the fast answers! Yes, that init.d script looks like what I was looking for. If I understand correctly now, the data is going to be stored in the BaseXData directory of the user who launched the service? Another thing that came to mind: is there perhaps a way by which basexserver could be bound only to the loopback address (or to some configured IP address)? I remember this being possible in e.g. mysql, and it is a quite nice way of securing the server (since someone could arguably try to brute-force access if knowing the port on the machine, or using the default port). Of course an alternative is to set-up a rule in iptables. Also, I haven't quite found yet where the username and password are being configured - do I need to create a configuration file for this? Some additional ideas for the Debian package: - have it set-up a basex user, which is the default user running basex-server - set the data directory for that server to be e.g. in /var/lib/basex, to be more in line with Debian's default behavior Thanks, Manuel On Thu, May 10, 2012 at 4:06 PM, Christian Grün christian.gr...@gmail.com wrote: Hi Manuel, thanks for your input; at times, there were some online references to init.d scripts for BaseX; maybe they could be of interest here? http://blog.neolocus.com/2012/02/basex-xml-server-as-a-linux-service/ http://cubeb.blogspot.com/2011/07/basex_23.html Christian ___ On Thu, May 10, 2012 at 3:47 PM, Alexander Holupirek a...@holupirek.de wrote: Hi Manuel, On 10.05.2012, at 15:22, Manuel Bernhardt wrote: is there perhaps an init.d script somewhere already in order to launch basexserver as a service on Debian? no, not yet, but good idea. I filed an issue for that [1] So far it looks as though there isn't one in the Debian package, so I'm thinking of adding a line to rc.local to run it on startup. +1 Also, from what I gathered, basex is now only available in sid, is that correct? I installed it on squeeze by downloading the deb, there's just one dependency on java-wrappers that I needed to install by hand. the current version is available in sid and, since yesterday, in testing. right, java-wrappers are the only dependency. libtagsoup-java might be of interest if you want to process non-wellformed HTML. providing the latest version as squeeze-backport is a good idea as well (filed another issue [2] Thanks, Alex [1] https://github.com/BaseXdb/basex/issues/499 [2] https://github.com/BaseXdb/basex/issues/500 ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk