Re: [basex-talk] Fairlock and query monitoring
Hi Max, Are read-only queries similarly slow if FAIRLOCK is set to false? If yes, it might help to optimize the incremental database in regular time intervals. If you do that anyway, we could set up a little telko and discuss further possibilities. Cheers Christian On Mon, Oct 22, 2018 at 1:46 PM Maximilian Gärber wrote: > > Hi, > for some time now, I've switched a production system (running Basex > 8.6) to FAIRLOCK = true. With PARALLEL = 16. > > While this helped speed up write operations, there are situations > where queries go from sub-second to minute(s) if a few more users are > reading/writing at the same time. > > The DBs in question are incrementally indexed. > > Since single queries are fast, I don't know what the best way of > handling these situations would be. > > As a last resort I was thinking about adding a cache solution > (write-behind) that allows for saving and delays (batches) writes. > > But before introducing another component to the system, I'd like to > hear what other ideas might exist. > > > Br, > Max
Re: [basex-talk] Optimized query for importing content based on metadata
Thanks Christian! I appreciate the touch-up and the insight. Jason On 10/19/18, 2:26 PM, "Christian Grün" wrote: Hi Jason, My version would have looked pretty similar: let $xmlroot := '/Users/jason.davis/Sandbox/dita-docs/content/' for $path in file:list($xmlroot, true()) where matches($path, '\.(xml|ditamap)$') let $doc := doc($xmlroot || $path) where $doc//brand[contains(., 'xyz')] return db:add('d4st^dita-docs^meta-test', $doc, $path) If you have already openen the document, you can directly pass it on to db:add. Cheers, Christian On Fri, Oct 19, 2018 at 11:01 PM Jason Davis wrote: > > Hi, > > I’ve cobbled together a query that I want to use to import xml from the filesystem into the database based on specific metadata requirements: > > let $xmlroot := "/Users/jason.davis/Sandbox/dita-docs/content/" >for $file in file:list($xmlroot, true()) >where matches($file, 'xml') or matches($file, 'ditamap') >let $doc := file:resolve-path($file, $xmlroot) >return if (doc($doc)//brand[contains(.,'xyz')]) >then db:add("d4st^dita-docs^meta-test", $doc) >else () > > It works, so I’m pleased! I’m just wondering if there is a more efficient way to achieve what I want to do. I know that using a specific XPath in the doc function is one thing I can do better. Any suggestions are appreciated! > > Thanks, > Jason
[basex-talk] Fairlock and query monitoring
Hi, for some time now, I've switched a production system (running Basex 8.6) to FAIRLOCK = true. With PARALLEL = 16. While this helped speed up write operations, there are situations where queries go from sub-second to minute(s) if a few more users are reading/writing at the same time. The DBs in question are incrementally indexed. Since single queries are fast, I don't know what the best way of handling these situations would be. As a last resort I was thinking about adding a cache solution (write-behind) that allows for saving and delays (batches) writes. But before introducing another component to the system, I'd like to hear what other ideas might exist. Br, Max
Re: [basex-talk] Sir, when basex 9.1, please ;-)
Hi Marco, I’m glad to you are gathering some experience with the WebSocket facility. > I get the following error [2] as return to my HTTP POST and, ca va sans > dire, nothing on the websocket. The function bound to the "/dataprovider" is a simple RESTXQ function. As such, it isn’t attached to a WebSocket id. One of the reasons is that a client who’s using RESTXQ may not necessarily have a WebSocket connection, or there can also be multiple WebSockets per client. If you want to send your result to all WebSockets – including the client that called the dataprovider – you can simply use ws:emit(). If your use case is complex enough to have a WebSocket connection and simultaneous RESTXQ requests in a single browser tab, you could store the WebSocket id(s) of your client as HTTP Session attribute, and access these ids from the RESTXQ code. > BTW, even if not stated in the path annotation, an extra /ws needs to be > prefixed to the url used for JS' WebSocket constructor. Personally I'd > prefer to keep things explicit and put it in the annotation too. I have just revised our documentation, and I hope it’s fairly complete now. In the Annotations Section, you will find a hint to the "ws/" path. The reason why the path is omitted in XQuery is that the web server takes care of the path resolution. If the default path is changed in the web.xml file, it would need to be changed in all XQuery applications as well. The same applies to RESTXQ: If a prefix is used in the configuration, there won’t be a need to change your path annotations. Hope this helps, Christian [1] http://docs.basex.org/wiki/WebSockets > > Thanks for your support and thanks Maximilian for the lift. > > [1] > module namespace dp = 'urn:nubisware:datarouter'; > > import module namespace ws = 'http://basex.org/modules/ws'; > > declare >%rest:path('/dataprovider') >%rest:POST("{$data}") >%output:method('json') > function dp:route($data as node()) { >ws:send(json:serialize($data), ws:ids()[. != ws:id()]) > }; > > declare >%ws:connect('/dataprovider') > function dp:connect() as empty-sequence() { >() > }; > > declare >%ws:close('/dataprovider') > function dp:close() as empty-sequence() { >() > }; > > [2] > Stopped at /home/lettere/tmp/basex/webapp/dataprovider/dataprovider.xqm, > 11/53: > [basex:ws] WebSocket connection required. > > On 18/10/18 18:33, Christian Grün wrote: > > Sir, doing our best ;) > > > > We believe that BaseX 9.1 pretty soon (until end of October). > > > > For everyone who is interested in giving us some feedback on the new > > WebSocket feature… Thank you in advance! 90% of the documentation is > > finalized: > > > >http://docs.basex.org/wiki/WebSockets > > > > Best, > > Christian > >
Re: [basex-talk] Sir, when basex 9.1, please ;-)
Hi, I'm just hijacking Maximilian's email here to post the following test I wanted to do for experimenting with BaseX and Websockets. I wrote the code at [1] with the intent to open a RESTXQ entrypoint that receives a JSON via POST and the broadcastst it to all connecte WS clients. I might be using ws:broadcast or the naive version as shown but still I get the following error [2] as return to my HTTP POST and, ca va sans dire, nothing on the websocket. What am I doing wrong? BTW, even if not stated in the path annotation, an extra /ws needs to be prefixed to the url used for JS' WebSocket constructor. Personally I'd prefer to keep things explicit and put it in the annotation too. Thanks for your support and thanks Maximilian for the lift. [1] module namespace dp = 'urn:nubisware:datarouter'; import module namespace ws = 'http://basex.org/modules/ws'; declare %rest:path('/dataprovider') %rest:POST("{$data}") %output:method('json') function dp:route($data as node()) { ws:send(json:serialize($data), ws:ids()[. != ws:id()]) }; declare %ws:connect('/dataprovider') function dp:connect() as empty-sequence() { () }; declare %ws:close('/dataprovider') function dp:close() as empty-sequence() { () }; [2] Stopped at /home/lettere/tmp/basex/webapp/dataprovider/dataprovider.xqm, 11/53: [basex:ws] WebSocket connection required. On 18/10/18 18:33, Christian Grün wrote: Sir, doing our best ;) We believe that BaseX 9.1 pretty soon (until end of October). For everyone who is interested in giving us some feedback on the new WebSocket feature… Thank you in advance! 90% of the documentation is finalized: http://docs.basex.org/wiki/WebSockets Best, Christian
Re: [basex-talk] Websocket Vers. 9.1 on Tomcat
Hi Dieter, The current WebSocket implementation is based on Jetty’s WebSocket API. This was clearly missing in the documentation (thanks for the pointer), so I have just updated our Wiki pages [1]. It appears that Jetty’s implementation of the official JSR-356 WebSocket API is pretty stable nowadays. As it’s quite similar to their custom API (which served as inspiration for the generic API), we might switch to the new version in future versions of BaseX. All the best, Christian [1] http://docs.basex.org/wiki/WebSockets On Mon, Oct 22, 2018 at 6:34 AM Dieter Zanzinger wrote: > > Hi, BaseX-Team, > > I tested the new websocket-feature. > The .zip- installation on Win10 (your embedded jetty) worked (your chat-app). > But I had problems with the .war -installation on tomcat. With the > standard-installation, the paths in the chat.js had to be preceded with > /BaseX - ok so far. I get the login-page and main-page visible - ok. But when > the main-page is loading, the network call to > http://localhost:8080/BaseX/ws/chat fails with http-code 404. > I think, there is a problem with paths as well? > > It would be great, if you could add documentation for Tomcat as well. > > Thanks for this great tool! > > Dieter Zanzinger
Re: [basex-talk] Bug/question: large collection
Yes, a command script: open mydb add file1.xml add file2.xml etc Best regards, Marko On Mon, Oct 22, 2018 at 4:21 PM Christian Grün wrote: > > Yes it did. I had a file with about 1 million add filename statements > and when the max nodes limit was exceeded, each statement gave an error > message. > > Was this a “command script” [1], and did you call the ADD command or > the db:add function? > > Thanks in advance, > Christian > > [1] http://docs.basex.org/wiki/Commands#Command_Scripts >
Re: [basex-talk] Missing two things but great tool!
Hi Jennifer, Welcome to the list, and thanks for the kudos. > 1. I really like the map visualisation possibility. I use lot of XML with > coordinates and I was impressed that the tool could figure out them > automatically. But it would be much more cooler to see it with a real map in > background. We thought about adding a feature to choose background images in the scatterplot. If you have two-dimensional latitude/longitude data, single data set could then be assigned visually to geographic locations. I am not sure how that could look like for our map visualization? Could you give us more details? You may have discovered our set of map layouts in the GUI preferences, but probably they don’t match your specific requirements. > 2. I tried to export the XML to CSV but this seems not working. As XML resources may be arbitrarily nested, there is no canonical way of exporting them to a tabular representation. The CSV module [1] gives you all flexibility to create tabular exports exactly as you want them to be, but you’ll need to write some XQuery code for that. Hope this helps, Christian [1] http://docs.basex.org/wiki/CSV_Module
Re: [basex-talk] Bug/question: large collection
Hi Christian, many thanks! > However, I would have expected BaseX to raise an error message. Could you give us more detail how you imported the documents? Yes it did. I had a file with about 1 million add filename statements and when the max nodes limit was exceeded, each statement gave an error message. Best regards, Marko On Mon, Oct 22, 2018 at 3:40 PM Christian Grün wrote: > Hi Marko, > > Databases are restricted to 2^31 nodes. If the limit is exceeded, > you’ll need to distribute your documents across multiple database > instances (see [1] for more details). > > However, I would have expected BaseX to raise an error message. Could > you give us more detail how you imported the documents? > > By default, 8 parallel queries are allowed. The number can be changed > by assigning a different value to the PARALLEL option [2]. In most > cases, you’ll get best results if you ensure that your queries are > rewritten for index access (provided that your queries allows such > rewritings), as multiple concurrent databases access may have negative > effects, in particular if sequential scans are required. Obviously, > things looks slightly better for SSDs. > > Hope this helps, > Christian > > [1] http://docs.basex.org/wiki/Databases > [2] http://docs.basex.org/wiki/Options#PARALLEL > > > > On Mon, Oct 22, 2018 at 4:40 AM Marko Niinimaki > wrote: > > > > Hi, > > it looks like "nodes" exceeds some integer range if I add 2 million > documents (below). > > > > Another, unrelated question: our server has 24 cores. What would be the > best way to utilize that kind of parallel power in queries? > > > > > info db > > Database Properties > > NAME: tmp > > SIZE: 47 GB > > NODES: -2147476286 > > DOCUMENTS: 705708 > > BINARIES: 0 > > TIMESTAMP: 2018-08-29T02:27:58.000Z > > UPTODATE: false > > > > > > Improper use? Potential bug? Your feedback is welcome: > > Contact: basex-talk@mailman.uni-konstanz.de > > Version: BaseX 9.0.2 > > Java: Oracle Corporation, 1.8.0_66 > > OS: Linux, amd64 > > Stack Trace: > > java.lang.ArrayIndexOutOfBoundsException > > >
[basex-talk] Missing two things but great tool!
Hi, thank you for this handy and comprehensive tool! I have tested other „non-open source“ but they are not working so smart like this. When I started working with XML files I had problems to find the relevant information. But this tool helped me a lot to figure out the general structure. There are just two things that I‘m missing and wanted to ask if this is planned for future: 1. I really like the map visualisation possibility. I use lot of XML with coordinates and I was impressed that the tool could figure out them automatically. But it would be much more cooler to see it with a real map in background. 2. I tried to export the XML to CSV but this seems not working. The exported file is not usable. In past I just used Excel and it’s source task pane to bring it into a table format (filtering and sorting) so an export to Excel woud be cool! I understand that it’s an open source application but maybe you have possibilities to do something for it. Best Regards Jennifer
Re: [basex-talk] Bug/question: large collection
Hi Marko, Databases are restricted to 2^31 nodes. If the limit is exceeded, you’ll need to distribute your documents across multiple database instances (see [1] for more details). However, I would have expected BaseX to raise an error message. Could you give us more detail how you imported the documents? By default, 8 parallel queries are allowed. The number can be changed by assigning a different value to the PARALLEL option [2]. In most cases, you’ll get best results if you ensure that your queries are rewritten for index access (provided that your queries allows such rewritings), as multiple concurrent databases access may have negative effects, in particular if sequential scans are required. Obviously, things looks slightly better for SSDs. Hope this helps, Christian [1] http://docs.basex.org/wiki/Databases [2] http://docs.basex.org/wiki/Options#PARALLEL On Mon, Oct 22, 2018 at 4:40 AM Marko Niinimaki wrote: > > Hi, > it looks like "nodes" exceeds some integer range if I add 2 million documents > (below). > > Another, unrelated question: our server has 24 cores. What would be the best > way to utilize that kind of parallel power in queries? > > > info db > Database Properties > NAME: tmp > SIZE: 47 GB > NODES: -2147476286 > DOCUMENTS: 705708 > BINARIES: 0 > TIMESTAMP: 2018-08-29T02:27:58.000Z > UPTODATE: false > > > Improper use? Potential bug? Your feedback is welcome: > Contact: basex-talk@mailman.uni-konstanz.de > Version: BaseX 9.0.2 > Java: Oracle Corporation, 1.8.0_66 > OS: Linux, amd64 > Stack Trace: > java.lang.ArrayIndexOutOfBoundsException >