Re: [basex-talk] Querying Basex from a web interface

2016-05-22 Thread Christian Grün
> I am going to test the caching difference again soon, on a larger subcorpus. 
> I'm curious to find out the results! I will incorporate results in a thesis. 
> If you're interested, I can definitely share the results with you when I have 
> finished writing them down!

Sounds interesting, thanks!



> -Oorspronkelijk bericht-
> Van: Christian Grün [mailto:christian.gr...@gmail.com]
> Verzonden: zondag 22 mei 2016 13:29
> Aan: Bram Vanroy | KU Leuven 
> CC: BaseX 
> Onderwerp: Re: [basex-talk] Querying Basex from a web interface
>
> Hi Bram,
>
> Thanks for your reply. It’s long indeed, so sorry in advance if I didn’t 
> capture all relevant info…
>
>> The approach explained above also implies that we had to create a lot of 
>> BaseX databases. A lot. Around 10 million of them.
>
> Impressive :)
>
>
>> •   Would a query that returns single results really be faster 
>> than one that returns 10 results?
>> Yes. In a search space of 500 million tokens, you can imagine that a rare 
>> pattern may take a lot of time to query – even in the GrInded version.
>
> I see. So I assume there won’t be many chances to speed up this scenario by 
> working on index structures, as most time is spent for sequentially browsing 
> all the databases, right?
>
>> •   Do you sort your search results? If yes, returning 100 
>> results instead of 10 should not be much slower.
>> As I am not entirely sure what you mean by that, I don’t think we do. By 
>> sorting, do you mean the XQuery order by function?
>
> Exactly. I also assume it shouldn’t play a role in your scenario.
>
>
>> wouldn’t that mean that BaseX’ cache is cleared more often? I could imagine 
>> that the garbage collector passes by after a query, or at least a session, 
>> is closed? Have you any idea how this is possible?
>
> Phew, a difficult one… I would need to spend some real time with your 
> framework to give a solid answer.
>
>> My two questions are: is count() actually faster than getting all results?
>
> Yes, it will always be faster; but “faster” can mean 1% or 1000%… It will be 
> much faster if the database statistics can be utilized to answer your query 
> (which is probably not the case in your scenario), or if the step of 
> retrieving the data, and/or returning it via the network consumes too much 
> time. If you only count nodes, there is no need to retrieve all database 
> contents from disk (node properties, textual data) that will be returned in 
> the XML representation.
>
>> Or does count() get all the hits any way, and should I count and get all 
>> results in one step?
>
> As you already indicated that the last result may occur much later than the 
> first result in your database(s), I assume you won’t win that much. But for 
> testing, you can wrap your query with count() to see what would be the 
> minimum time to find all hits.
>
>> Secondly, it seems that when the last step is initialised, the other 
>> processes hang – leaving the user without any feedback. The processes 
>> literally seem to stop running. My question then is: does this happen 
>> because BaseX does not handle different sessions asynchronously, and new 
>> queries block others?
>
> By default, 8 queries can be run in parallel [1]. If your other queries are 
> delayed a lot, it may be that the random disk access pattern causes by 
> parallel queries outweigh the advantage of allowing parallel requests. But, 
> in the first place, I would also assume that it’s worth checking your PHP 
> environment first.
>
>> Finally, I simply want to ask what the best flow is for opening and closing 
>> BaseX sessions, and when one should open a new session.
>
> With the light-weight PHP client, it’s usually best to open a new session for 
> each request, and close it directly after your command or query has been 
> evaluated. As usual, you should ensure that every session will be closed, 
> even if an error occurs.
>
> Hope this helps,
> Christian
>
> [1] http://docs.basex.org/wiki/Options#PARALLEL
>


Re: [basex-talk] Querying Basex from a web interface

2016-05-22 Thread Bram Vanroy | KU Leuven
Hi there Christian

Thank you for the extensive answer! 

In the meanwhile, I have solved the issue I had that caused simultaneous 
queries not to fire asynchronously. The problem was a locked PHP $_SESSION 
variable. Not related to Basex. My bad!

I am going to test the caching difference again soon, on a larger subcorpus. 
I'm curious to find out the results! I will incorporate results in a thesis. If 
you're interested, I can definitely share the results with you when I have 
finished writing them down!

Finally, I'd like to thank you for providing these answers. Never expected such 
good feedback and response. It really means a lot! Thank you!


Kind regards

Bram

-Oorspronkelijk bericht-
Van: Christian Grün [mailto:christian.gr...@gmail.com] 
Verzonden: zondag 22 mei 2016 13:29
Aan: Bram Vanroy | KU Leuven 
CC: BaseX 
Onderwerp: Re: [basex-talk] Querying Basex from a web interface

Hi Bram,

Thanks for your reply. It’s long indeed, so sorry in advance if I didn’t 
capture all relevant info…

> The approach explained above also implies that we had to create a lot of 
> BaseX databases. A lot. Around 10 million of them.

Impressive :)


> •   Would a query that returns single results really be faster 
> than one that returns 10 results?
> Yes. In a search space of 500 million tokens, you can imagine that a rare 
> pattern may take a lot of time to query – even in the GrInded version.

I see. So I assume there won’t be many chances to speed up this scenario by 
working on index structures, as most time is spent for sequentially browsing 
all the databases, right?

> •   Do you sort your search results? If yes, returning 100 
> results instead of 10 should not be much slower.
> As I am not entirely sure what you mean by that, I don’t think we do. By 
> sorting, do you mean the XQuery order by function?

Exactly. I also assume it shouldn’t play a role in your scenario.


> wouldn’t that mean that BaseX’ cache is cleared more often? I could imagine 
> that the garbage collector passes by after a query, or at least a session, is 
> closed? Have you any idea how this is possible?

Phew, a difficult one… I would need to spend some real time with your framework 
to give a solid answer.

> My two questions are: is count() actually faster than getting all results?

Yes, it will always be faster; but “faster” can mean 1% or 1000%… It will be 
much faster if the database statistics can be utilized to answer your query 
(which is probably not the case in your scenario), or if the step of retrieving 
the data, and/or returning it via the network consumes too much time. If you 
only count nodes, there is no need to retrieve all database contents from disk 
(node properties, textual data) that will be returned in the XML representation.

> Or does count() get all the hits any way, and should I count and get all 
> results in one step?

As you already indicated that the last result may occur much later than the 
first result in your database(s), I assume you won’t win that much. But for 
testing, you can wrap your query with count() to see what would be the minimum 
time to find all hits.

> Secondly, it seems that when the last step is initialised, the other 
> processes hang – leaving the user without any feedback. The processes 
> literally seem to stop running. My question then is: does this happen because 
> BaseX does not handle different sessions asynchronously, and new queries 
> block others?

By default, 8 queries can be run in parallel [1]. If your other queries are 
delayed a lot, it may be that the random disk access pattern causes by parallel 
queries outweigh the advantage of allowing parallel requests. But, in the first 
place, I would also assume that it’s worth checking your PHP environment first.

> Finally, I simply want to ask what the best flow is for opening and closing 
> BaseX sessions, and when one should open a new session.

With the light-weight PHP client, it’s usually best to open a new session for 
each request, and close it directly after your command or query has been 
evaluated. As usual, you should ensure that every session will be closed, even 
if an error occurs.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Options#PARALLEL



Re: [basex-talk] dba lags on replace resource

2016-05-22 Thread Christian Grün
Hi Marco,

Something has clearly gone wrong while you were replacing resources in
your database. I assume it’s a general issue, and not due to DBA, as
the latter one only is only an interface.

Do you think there’s some chance to make the results reproducible for
us? It can also be a step-by-step explanation, base on the DBA.

The size attribute contains the database size, measured in bytes; see
[1] for a slightly updated documentation.

Thanks,
Christian

[1] http://docs.basex.org/wiki/Database_Module#db:list-details


On Wed, May 11, 2016 at 4:25 PM, Marco Lettere  wrote:
> Hey Dirk,
> db:list-details() executed from the dba query panel returns:
>
>  size="39870" path="">projects
>  path="">runnables
>  path="">users
>
> whereas
> db:list-details('projects') returns:
>
>  modified-date="2016-05-11T13:23:32.473Z"
> size="111">testproject.xml
>
> The second is clearly the right one since that db contains only one resource
> repeatedly updated.
> I don't understand exactly what the size attribute is indicating since for
> the empty dbs it's even larger 
>
> If I have some time this evening I'll try with the latest possible snapshot.
> Thanks,
> M.


Re: [basex-talk] Faceted search

2016-05-22 Thread Christian Grün
Hi Tim, hi Gregory,

As outlined by Michael, we are currently investigating if there will
be enough (human and financial) resources to implement the EXPath
Facet Module [1]. To everyone: We are very interested in feedback on
this proposal!

As outlined by Tim, one crucial aspect will be the question if we can
rewrite the functions of that module such that they will be sped up by
index structures. From what I’ve seen so far, it won’t be that obvious
how to proceed, but I’m currently thinking about possible solutions.

Cheers,
Christian

[1] http://expath.org/spec/facet



On Thu, May 19, 2016 at 8:46 AM, Finney, Tim
 wrote:
> Hi Michael,
>
> The EXpath facet proposal does appear to cover most use-cases I can think of. 
> I believe that indexes are required to make faceted searches work quickly 
> with large document sets.
>
> Best,
>
> Tim Finney
>
> -Original Message-
> From: Michael Seiferle [mailto:m...@basex.org]
> Sent: Thursday, 19 May 2016 1:24 PM
> To: Finney, Tim
> Cc: basex-talk@mailman.uni-konstanz.de
> Subject: Re: [basex-talk] Faceted search
>
> Hi Tim,
>
> thank you for joining the discussion, sponsoring is by no means a requirement 
>  (sorry if I made it sound any other way! :-)).
>
> What would you think, does the EXpath Facet proposal cover some or most of 
> your use-cases?
>
> Best from Lake Konstanz
> Michael
>
>
>
> Von meinem iPhone gesendet
>
>> Am 19.05.2016 um 01:57 schrieb Finney, Tim :
>>
>> Hi All,
>>
>> I would love to see faceted search capability. Sorry, but I can't offer 
>> sponsorship.
>>
>> Two things I would love to see in BaseX are faceted search capability (fast, 
>> index-based) and role-based security.
>>
>> Best,
>>
>> Tim Finney
>>
>> --
>>
>> Hi Greg,
>>
>> currently there is no out-of-the-box functionality, yet there is an EXPath 
>> Specification that handles facets:
>> http://expath.org/spec/facet
>>
>> We have not yet implemented it in BaseX, but we might eventually if there is 
>> enough interest and maybe even a sponsor :-)
>>
>> Would that module cover your needs?
>>
>>
>> Best
>> Michael


Re: [basex-talk] Querying Basex from a web interface

2016-05-22 Thread Christian Grün
Hi Bram,

Thanks for your reply. It’s long indeed, so sorry in advance if I
didn’t capture all relevant info…

> The approach explained above also implies that we had to create a lot of 
> BaseX databases. A lot. Around 10 million of them.

Impressive :)


> •   Would a query that returns single results really be faster 
> than one that returns 10 results?
> Yes. In a search space of 500 million tokens, you can imagine that a rare 
> pattern may take a lot of time to query – even in the GrInded version.

I see. So I assume there won’t be many chances to speed up this
scenario by working on index structures, as most time is spent for
sequentially browsing all the databases, right?

> •   Do you sort your search results? If yes, returning 100 
> results instead of 10 should not be much slower.
> As I am not entirely sure what you mean by that, I don’t think we do. By 
> sorting, do you mean the XQuery order by function?

Exactly. I also assume it shouldn’t play a role in your scenario.


> wouldn’t that mean that BaseX’ cache is cleared more often? I could imagine 
> that the garbage collector passes by after a query, or at least a session, is 
> closed? Have you any idea how this is possible?

Phew, a difficult one… I would need to spend some real time with your
framework to give a solid answer.

> My two questions are: is count() actually faster than getting all results?

Yes, it will always be faster; but “faster” can mean 1% or 1000%… It
will be much faster if the database statistics can be utilized to
answer your query (which is probably not the case in your scenario),
or if the step of retrieving the data, and/or returning it via the
network consumes too much time. If you only count nodes, there is no
need to retrieve all database contents from disk (node properties,
textual data) that will be returned in the XML representation.

> Or does count() get all the hits any way, and should I count and get all 
> results in one step?

As you already indicated that the last result may occur much later
than the first result in your database(s), I assume you won’t win that
much. But for testing, you can wrap your query with count() to see
what would be the minimum time to find all hits.

> Secondly, it seems that when the last step is initialised, the other 
> processes hang – leaving the user without any feedback. The processes 
> literally seem to stop running. My question then is: does this happen because 
> BaseX does not handle different sessions asynchronously, and new queries 
> block others?

By default, 8 queries can be run in parallel [1]. If your other
queries are delayed a lot, it may be that the random disk access
pattern causes by parallel queries outweigh the advantage of allowing
parallel requests. But, in the first place, I would also assume that
it’s worth checking your PHP environment first.

> Finally, I simply want to ask what the best flow is for opening and closing 
> BaseX sessions, and when one should open a new session.

With the light-weight PHP client, it’s usually best to open a new
session for each request, and close it directly after your command or
query has been evaluated. As usual, you should ensure that every
session will be closed, even if an error occurs.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Options#PARALLEL


Re: [basex-talk] connection question

2016-05-22 Thread Christian Grün
Hi Genneva,

Usually, we work without connection pools, because client connections
in BaseX are very light-weight. To find out what problems you’ve been
encountering, we’ll probably need some more details. Could you provide
us with a litte, self-contained example (or SSCCE, as we like to call
it)?

Thanks,
Christian



On Sat, May 21, 2016 at 1:59 AM, Wang, Genneva  wrote:
> Hi basex gurus,
>
> One other question. We internally have connection pools of 20 connecting to
> basex. While trying to do performance stress test using Jmeter, we found out
> that if we submit more than the configured connection pool, say submit 30
> connections, we noticed that the connection will not be recovered and the
> connection pool size will be in a un-recovereable stage (to pool size 0
> eventually).
>
> I was just wondering does basex does any queueing on the receiving the
> incoming transactions, or is this something that the application needs to
> handle? Any tips would be greatly appreciated! Thank you very much.
>
> Regards,
> —Genneva


Re: [basex-talk] Creating db in restxq interface

2016-05-22 Thread Christian Grün
Hi Henning,

The XQuery Update specification does not allow users to mix updating
expressions and return data at the same time. The slides of Arve and
Sabine (see [1]) will give you some hints how updates are usually
performed in RESTXQ contexts. The slides are from 2013, and some
convenience functions have been added since then, but the basic
principle is the same as before. In a nutshell:

* Use db:output() to both return data and do updates [2];
* use web:redirect() to redirect to another success page [3];
* alternatively, activate the MIXUPDATES option in web.xml to disable
the XQuery Update restriction [4].

Hope this helps,
Christian

[1] http://files.basex.org/publications/xmlprague/2013.html
[2] http://docs.basex.org/wiki/Database_Module#db:output
[3] http://docs.basex.org/wiki/Web_Module#web:redirect
[4] http://docs.basex.org/wiki/XQuery_Update#Returning_Results


> declare
>   %rest:path("/start")
>   %updating
>   %output:method("xhtml")
>   %output:omit-xml-declaration("no")
>   %output:doctype-public("-//W3C//DTD XHTML 1.0 Transitional//EN")
>
> %output:doctype-system("http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;)
>   function page:hello()
> as element(Q{http://www.w3.org/1999/xhtml}html)
> {
>   http://www.w3.org/1999/xhtml;>
> Good day Sir!
> 
> The current time is: { current-time() }
> 
> 
>   Home
>   Link 1
>   Link 2
>   Link 3
> 
>   {
> for $result in db:open('factbook')//continent/@name
> return { data($result) }
>   }
> 
> {
> db:create("test")
> }
> 
> 
>   
>
>
> === File End  


Re: [basex-talk] Basex: basexhttp Failing to Start with "Server is running or permission was denied"

2016-05-22 Thread Christian Grün
Hi Sovello,

It could also be the port that’s used for stopping BaseX (8985). It
can be adjusted via -s [1].

Cheers
Christian

[1] http://docs.basex.org/wiki/Command-Line_Options#HTTP_Server



On Wed, May 18, 2016 at 5:18 PM, Dirk Kirsten  wrote:
> Hello Sovello,
>
> looks like there is already another service (most likely another instance of
> BaseX server) running on port 1984. Please note that by default there are
> two ports opened for the basexhttp server: One for the BaseX server (default
> 1984) and another one for the Jetty application container (default 8984).
>
> You can define another port using the -h argument
>
> Hope that helps, Dirk
>
>
> On 05/18/2016 05:10 PM, Sovello Hildebrand Mgani wrote:
>
> Hi!
> I have downloaded BaseX 8.4.4 and running it on Ubuntu 14.04[.4]
> I have added basex/bin to $PATH so I can be able to run basex, basexhttp
> ecc.
> I can successfully run basex and basexserver. RUnning the HTTP server fails
> with this message
>
> $ basexhttp
> [main] INFO org.eclipse.jetty.server.Server - jetty-8.1.18.v20150929
> [main] INFO org.eclipse.jetty.webapp.StandardDescriptorProcessor - NO JSP
> Support for /, did not find org.apache.jasper.servlet.JspServlet
> [main] INFO / - Aliases are enabled! Security constraints may be bypassed!!!
> Server was started (port: 1984).
> [main] INFO org.eclipse.jetty.server.AbstractConnector - Started
> SelectChannelConnector@0.0.0.0:8984
> HTTP Server was started (port: 8984).
> Server is running or permission was denied.
> Server was stopped (port: 1984).
>
> and the output when I run
> $ basexhttp -d
> DEBUG: true
> BaseX 8.4.4 [HTTP Server]
> [main] INFO org.eclipse.jetty.server.Server - jetty-8.1.18.v20150929
> [main] INFO org.eclipse.jetty.webapp.StandardDescriptorProcessor - NO JSP
> Support for /, did not find org.apache.jasper.servlet.JspServlet
> [main] INFO / - Aliases are enabled! Security constraints may be bypassed!!!
> WEBPATH: /opt/oim/basex/webapp
> DEBUG: true
> Server was started (port: 1984).
> [main] INFO org.eclipse.jetty.server.AbstractConnector - Started
> SelectChannelConnector@0.0.0.0:8984
> HTTP Server was started (port: 8984).
> java.net.BindException: Address already in use
> at java.net.PlainSocketImpl.socketBind(Native Method)
> at
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
> at java.net.ServerSocket.bind(ServerSocket.java:375)
> at java.net.ServerSocket.bind(ServerSocket.java:329)
> at org.basex.BaseXHTTP$StopServer.(BaseXHTTP.java:430)
> at org.basex.BaseXHTTP.(BaseXHTTP.java:147)
> at org.basex.BaseXHTTP.main(BaseXHTTP.java:52)
> Server is running or permission was denied.
> Server was stopped (port: 1984).
>
> Which shows address is already in use.
> I only have apache2 and have run apache2 and tomcat earlier on the same
> laptop without issues, and I don't have any service that is listening on
> that port 8984.
>
> (I have tried running BaseX from version 8.0 and get the same errors). Am
> not sure what is not OK.
>
> Can you give me pointers to resolve this?
>
> Cheers
>
> --
> :: Sovello Hildebrand Mgani ::
>
>
>
> --If you teach man to fish, you'll feed him a lifetime--
>
>
> --
> Dirk Kirsten, BaseX GmbH, http://basexgmbh.de
> |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
> |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
> |   Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
> `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22