Re: [basex-talk] Fairlock and query monitoring

2018-10-22 Thread Christian Grün
Hi Max,

Are read-only queries similarly slow if FAIRLOCK is set to false? If
yes, it might help to optimize the incremental database in regular
time intervals. If you do that anyway, we could set up a little telko
and discuss further possibilities.

Cheers
Christian


On Mon, Oct 22, 2018 at 1:46 PM Maximilian Gärber  wrote:
>
> Hi,
> for some time now, I've switched a production system (running Basex
> 8.6) to FAIRLOCK = true. With PARALLEL = 16.
>
> While this helped speed up write operations, there are situations
> where queries go from sub-second to minute(s) if a few more users are
> reading/writing at the same time.
>
> The DBs in question are incrementally indexed.
>
> Since single queries are fast, I don't know what the best way of
> handling these situations would be.
>
> As a last resort I was thinking about adding a cache solution
> (write-behind) that allows for saving and delays (batches) writes.
>
> But before introducing another component to the system, I'd like to
> hear what other ideas might exist.
>
>
> Br,
> Max


Re: [basex-talk] Optimized query for importing content based on metadata

2018-10-22 Thread Jason Davis
Thanks Christian! I appreciate the touch-up and the insight.

Jason

On 10/19/18, 2:26 PM, "Christian Grün"  wrote:

Hi Jason,

My version would have looked pretty similar:

  let $xmlroot := '/Users/jason.davis/Sandbox/dita-docs/content/'
  for $path in file:list($xmlroot, true())
  where matches($path, '\.(xml|ditamap)$')
  let $doc := doc($xmlroot || $path)
  where $doc//brand[contains(., 'xyz')]
  return db:add('d4st^dita-docs^meta-test', $doc, $path)

If you have already openen the document, you can directly pass it on to 
db:add.

Cheers,
Christian


On Fri, Oct 19, 2018 at 11:01 PM Jason Davis
 wrote:
>
> Hi,
>
> I’ve cobbled together a query that I want to use to import xml from the 
filesystem into the database based on specific metadata requirements:
>
> let $xmlroot := "/Users/jason.davis/Sandbox/dita-docs/content/"
>for $file in file:list($xmlroot, true())
>where matches($file, 'xml') or matches($file, 'ditamap')
>let $doc := file:resolve-path($file, $xmlroot)
>return if (doc($doc)//brand[contains(.,'xyz')])
>then db:add("d4st^dita-docs^meta-test",  $doc)
>else ()
>
> It works, so I’m pleased! I’m just wondering if there is a more efficient 
way to achieve what I want to do. I know that using a specific XPath in the doc 
function is one thing I can do better. Any suggestions are appreciated!
>
> Thanks,
> Jason





[basex-talk] Fairlock and query monitoring

2018-10-22 Thread Maximilian Gärber
Hi,
for some time now, I've switched a production system (running Basex
8.6) to FAIRLOCK = true. With PARALLEL = 16.

While this helped speed up write operations, there are situations
where queries go from sub-second to minute(s) if a few more users are
reading/writing at the same time.

The DBs in question are incrementally indexed.

Since single queries are fast, I don't know what the best way of
handling these situations would be.

As a last resort I was thinking about adding a cache solution
(write-behind) that allows for saving and delays (batches) writes.

But before introducing another component to the system, I'd like to
hear what other ideas might exist.


Br,
Max


Re: [basex-talk] Sir, when basex 9.1, please ;-)

2018-10-22 Thread Christian Grün
Hi Marco,

I’m glad to you are gathering some experience with the WebSocket facility.

> I get the following error [2] as return to my HTTP POST and, ca va sans
> dire, nothing on the websocket.

The function bound to the "/dataprovider" is a simple RESTXQ function.
As such, it isn’t attached to a WebSocket id. One of the reasons is
that a client who’s using RESTXQ may not necessarily have a WebSocket
connection, or there can also be multiple WebSockets per client.

If you want to send your result to all WebSockets – including the
client that called the dataprovider – you can simply use ws:emit(). If
your use case is complex enough to have a WebSocket connection and
simultaneous RESTXQ requests in a single browser tab, you could store
the WebSocket id(s) of your client as HTTP Session attribute, and
access these ids from the RESTXQ code.

> BTW, even if not stated in the path annotation, an extra /ws needs to be
> prefixed to the url used for JS' WebSocket constructor. Personally I'd
> prefer to keep things explicit and put it in the annotation too.

I have just revised our documentation, and I hope it’s fairly complete
now. In the Annotations Section, you will find a hint to the "ws/"
path.

The reason why the path is omitted in XQuery is that the web server
takes care of the path resolution. If the default path is changed in
the web.xml file, it would need to be changed in all XQuery
applications as well. The same applies to RESTXQ: If a prefix is used
in the configuration, there won’t be a need to change your path
annotations.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/WebSockets


>
> Thanks for your support and thanks Maximilian for the lift.
>
> [1]
> module namespace dp = 'urn:nubisware:datarouter';
>
> import module namespace ws = 'http://basex.org/modules/ws';
>
> declare
>%rest:path('/dataprovider')
>%rest:POST("{$data}")
>%output:method('json')
> function dp:route($data as node()) {
>ws:send(json:serialize($data), ws:ids()[. != ws:id()])
> };
>
> declare
>%ws:connect('/dataprovider')
> function dp:connect() as empty-sequence() {
>()
> };
>
> declare
>%ws:close('/dataprovider')
> function dp:close() as empty-sequence() {
>()
> };
>
> [2]
> Stopped at /home/lettere/tmp/basex/webapp/dataprovider/dataprovider.xqm,
> 11/53:
> [basex:ws] WebSocket connection required.
>
> On 18/10/18 18:33, Christian Grün wrote:
> > Sir, doing our best ;)
> >
> > We believe that BaseX 9.1 pretty soon (until end of October).
> >
> > For everyone who is interested in giving us some feedback on the new
> > WebSocket feature… Thank you in advance! 90% of the documentation is
> > finalized:
> >
> >http://docs.basex.org/wiki/WebSockets
> >
> > Best,
> > Christian
>
>


Re: [basex-talk] Sir, when basex 9.1, please ;-)

2018-10-22 Thread Marco Lettere

Hi,
I'm just hijacking Maximilian's email here to post the following test I 
wanted to do for experimenting with BaseX and Websockets.
I wrote the code at [1] with the intent to open a RESTXQ entrypoint that 
receives a JSON via POST and the broadcastst it to all connecte WS clients.
I might be using ws:broadcast or the naive version as shown but still I 
get the following error [2] as return to my HTTP POST and, ca va sans 
dire, nothing on the websocket.

What am I doing wrong?

BTW, even if not stated in the path annotation, an extra /ws needs to be 
prefixed to the url used for JS' WebSocket constructor. Personally I'd 
prefer to keep things explicit and put it in the annotation too.


Thanks for your support and thanks Maximilian for the lift.

[1]
module namespace dp = 'urn:nubisware:datarouter';

import module namespace ws = 'http://basex.org/modules/ws';

declare
  %rest:path('/dataprovider')
  %rest:POST("{$data}")
  %output:method('json')
function dp:route($data as node()) {
  ws:send(json:serialize($data), ws:ids()[. != ws:id()])
};

declare
  %ws:connect('/dataprovider')
function dp:connect() as empty-sequence() {
  ()
};

declare
  %ws:close('/dataprovider')
function dp:close() as empty-sequence() {
  ()
};

[2]
Stopped at /home/lettere/tmp/basex/webapp/dataprovider/dataprovider.xqm, 
11/53:

[basex:ws] WebSocket connection required.

On 18/10/18 18:33, Christian Grün wrote:

Sir, doing our best ;)

We believe that BaseX 9.1 pretty soon (until end of October).

For everyone who is interested in giving us some feedback on the new
WebSocket feature… Thank you in advance! 90% of the documentation is
finalized:

   http://docs.basex.org/wiki/WebSockets

Best,
Christian





Re: [basex-talk] Websocket Vers. 9.1 on Tomcat

2018-10-22 Thread Christian Grün
Hi Dieter,

The current WebSocket implementation is based on Jetty’s WebSocket
API. This was clearly missing in the documentation (thanks for the
pointer), so I have just updated our Wiki pages [1].

It appears that Jetty’s implementation of the official JSR-356
WebSocket API is pretty stable nowadays. As it’s quite similar to
their custom API (which served as inspiration for the generic API), we
might switch to the new version in future versions of BaseX.

All the best,
Christian

[1] http://docs.basex.org/wiki/WebSockets




On Mon, Oct 22, 2018 at 6:34 AM Dieter Zanzinger
 wrote:
>
> Hi, BaseX-Team,
>
> I tested the new websocket-feature.
> The .zip- installation on Win10 (your embedded jetty) worked (your chat-app).
> But I had problems with the .war -installation on tomcat. With the 
> standard-installation, the paths in the chat.js had to be preceded with 
> /BaseX - ok so far. I get the login-page and main-page visible - ok. But when 
> the main-page is loading, the network call to 
> http://localhost:8080/BaseX/ws/chat fails with http-code 404.
> I think, there is a problem with paths as well?
>
> It would be great, if you could add documentation for Tomcat as well.
>
> Thanks for this great tool!
>
> Dieter Zanzinger


Re: [basex-talk] Bug/question: large collection

2018-10-22 Thread Marko Niinimaki
Yes, a command script:

open mydb
add file1.xml
add file2.xml

etc

Best regards,
Marko


On Mon, Oct 22, 2018 at 4:21 PM Christian Grün 
wrote:

> > Yes it did. I had a file with about 1 million add filename statements
> and when the max nodes limit was exceeded, each statement gave an error
> message.
>
> Was this a “command script” [1], and did you call the ADD command or
> the db:add function?
>
> Thanks in advance,
> Christian
>
> [1] http://docs.basex.org/wiki/Commands#Command_Scripts
>


Re: [basex-talk] Missing two things but great tool!

2018-10-22 Thread Christian Grün
Hi Jennifer,

Welcome to the list, and thanks for the kudos.

> 1. I really like the map visualisation possibility. I use lot of XML with 
> coordinates and I was impressed that the tool could figure out them 
> automatically. But it would be much more cooler to see it with a real map in 
> background.

We thought about adding a feature to choose background images in the
scatterplot. If you have two-dimensional latitude/longitude data,
single data set could then be assigned visually to geographic
locations.

I am not sure how that could look like for our map visualization?
Could you give us more details? You may have discovered our set of map
layouts in the GUI preferences, but probably they don’t match your
specific requirements.

> 2. I tried to export the XML to CSV but this seems not working.

As XML resources may be arbitrarily nested, there is no canonical way
of exporting them to a tabular representation. The CSV module [1]
gives you all flexibility to create tabular exports exactly as you
want them to be, but you’ll need to write some XQuery code for that.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/CSV_Module


Re: [basex-talk] Bug/question: large collection

2018-10-22 Thread Marko Niinimaki
Hi Christian,
many thanks!
> However, I would have expected BaseX to raise an error message. Could you
give us more detail how you imported the documents?
Yes it did. I had a file with about 1 million add filename statements and
when the max nodes limit was exceeded, each statement gave an error message.

Best regards,
Marko


On Mon, Oct 22, 2018 at 3:40 PM Christian Grün 
wrote:

> Hi Marko,
>
> Databases are restricted to 2^31 nodes. If the limit is exceeded,
> you’ll need to distribute your documents across multiple database
> instances (see [1] for more details).
>
> However, I would have expected BaseX to raise an error message. Could
> you give us more detail how you imported the documents?
>
> By default, 8 parallel queries are allowed. The number can be changed
> by assigning a different value to the PARALLEL option [2]. In most
> cases, you’ll get best results if you ensure that your queries are
> rewritten for index access (provided that your queries allows such
> rewritings), as multiple concurrent databases access may have negative
> effects, in particular if sequential scans are required. Obviously,
> things looks slightly better for SSDs.
>
> Hope this helps,
> Christian
>
> [1] http://docs.basex.org/wiki/Databases
> [2] http://docs.basex.org/wiki/Options#PARALLEL
>
>
>
> On Mon, Oct 22, 2018 at 4:40 AM Marko Niinimaki 
> wrote:
> >
> > Hi,
> > it looks like "nodes" exceeds some integer range if I add 2 million
> documents (below).
> >
> > Another, unrelated question: our server has 24 cores. What would be the
> best way to utilize that kind of parallel power in queries?
> >
> > > info db
> > Database Properties
> >  NAME: tmp
> >  SIZE: 47 GB
> >  NODES: -2147476286
> >  DOCUMENTS: 705708
> >  BINARIES: 0
> >  TIMESTAMP: 2018-08-29T02:27:58.000Z
> >  UPTODATE: false
> >
> >
> > Improper use? Potential bug? Your feedback is welcome:
> > Contact: basex-talk@mailman.uni-konstanz.de
> > Version: BaseX 9.0.2
> > Java: Oracle Corporation, 1.8.0_66
> > OS: Linux, amd64
> > Stack Trace:
> > java.lang.ArrayIndexOutOfBoundsException
> >
>


[basex-talk] Missing two things but great tool!

2018-10-22 Thread Jennifer Kracht
Hi,

thank you for this handy and comprehensive tool! I have tested other „non-open 
source“ but they are not working so smart like this. When I started working 
with XML files I had problems to find the relevant information. But this tool 
helped me a lot to figure out the general structure.
There are just two things that I‘m missing and wanted to ask if this is planned 
for future:
1. I really like the map visualisation possibility. I use lot of XML with 
coordinates and I was impressed that the tool could figure out them 
automatically. But it would be much more cooler to see it with a real map in 
background.
2. I tried to export the XML to CSV but this seems not working. The exported 
file is not usable. In past I just used Excel and it’s source task pane to 
bring it into a table format (filtering and sorting) so an export to Excel woud 
be cool!

I understand that it’s an open source application but maybe you have 
possibilities to do something for it.

Best Regards 
Jennifer

Re: [basex-talk] Bug/question: large collection

2018-10-22 Thread Christian Grün
Hi Marko,

Databases are restricted to 2^31 nodes. If the limit is exceeded,
you’ll need to distribute your documents across multiple database
instances (see [1] for more details).

However, I would have expected BaseX to raise an error message. Could
you give us more detail how you imported the documents?

By default, 8 parallel queries are allowed. The number can be changed
by assigning a different value to the PARALLEL option [2]. In most
cases, you’ll get best results if you ensure that your queries are
rewritten for index access (provided that your queries allows such
rewritings), as multiple concurrent databases access may have negative
effects, in particular if sequential scans are required. Obviously,
things looks slightly better for SSDs.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Databases
[2] http://docs.basex.org/wiki/Options#PARALLEL



On Mon, Oct 22, 2018 at 4:40 AM Marko Niinimaki  wrote:
>
> Hi,
> it looks like "nodes" exceeds some integer range if I add 2 million documents 
> (below).
>
> Another, unrelated question: our server has 24 cores. What would be the best 
> way to utilize that kind of parallel power in queries?
>
> > info db
> Database Properties
>  NAME: tmp
>  SIZE: 47 GB
>  NODES: -2147476286
>  DOCUMENTS: 705708
>  BINARIES: 0
>  TIMESTAMP: 2018-08-29T02:27:58.000Z
>  UPTODATE: false
>
>
> Improper use? Potential bug? Your feedback is welcome:
> Contact: basex-talk@mailman.uni-konstanz.de
> Version: BaseX 9.0.2
> Java: Oracle Corporation, 1.8.0_66
> OS: Linux, amd64
> Stack Trace:
> java.lang.ArrayIndexOutOfBoundsException
>