Re: [basex-talk] creating epub and odf with basex

2020-09-08 Thread Liam R. E. Quin
On Tue, 2020-09-08 at 17:07 +0200, Jos van den Oever wrote:
> Thank you for making the improvements. This is much cleaner imho than
> bash + 
> zip + xsltproc. :-)
A minor addition - i've sometimes started with a base zip file with the
uncompressed "mimetype" entry in it, and just added the rest to that
file fromXQuery or XSLT, without problems.


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] superfluous shape="area' when using html:parse()

2020-08-30 Thread Liam R. E. Quin
On Sun, 2020-08-30 at 14:01 +0200, Jos van den Oever wrote:
> Hi all,
> 
> When loading a document with html:parse(), an extra attribute is
> added to 
> every  element.
> 
>   becomes 

This usually comes from the HTML 4 or XHTML 1.x DTDs. It is actually
not incorrect behaviour, although i never liked it either.

It's similar to inferring a tbody element in a table if none was
supplied.

Liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] improving query performance

2020-08-23 Thread Liam R. E. Quin
On Sun, 2020-08-23 at 14:05 -0700, Bill Osmond wrote:
> Indeed I have, with no positive results unfortunately. I'm now
> testing to
> see if having multiple return statements (as in Liam's queries)
> helps,
> although the results so far are basically the same.

I tried to make clear what was going on, using explit returns, The
syntax of a FLWOR expression allows either form:
  for $sock in /drawer/socks, $shoe in /tray/shoes
is the same as
  for $sock in /drawer/socks
for $shoe in /tray/shoes
and constructs every possible (sock, shoe) pair.

However,
  for $sock in /drawer/socks
  return
for $shoe in /tray/shoes
return
   some_expression($sock, $shoe)
maybe makes clearer that some_exoression() will be called
count(/drawer/socks) * count(/tray/shoes) times.

the way to speed this up is likely to construct many fewer tuples,
using grouping or windowing to process the inner part of your query.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] improving query performance

2020-08-21 Thread Liam R. E. Quin
On Fri, 2020-08-21 at 17:28 -0700, Bill Osmond wrote:
> I'm beginning to think that perhaps my performance hopes were a bit
> too
> inflated, given the size and complexity of our database. After a
> fresh
> optimization, and with -Xms2g -Xmx10g, the following query takes
> 1492ms:

[...]

First note - there are in fact no loops in your query. Although "for"
is used to introduce a loop in many procedural languages, it does nto
do so in XQuery (nor does for-each in XSLT).

In fact, it's closer to what SQL people know as a join.

It's making a stream of n-tuples, and then evaluating the inner
expression for each tuple, so that

for $a in (  'a', 'b', 'c')
  for $b in (1 to 5)
return $a || '-' || $b

produces 15 lines of output,
a-1, a-2, 1-3, a-4, a-6, b-1, and so on.

You can see the BaseX query plan for your query already moves your
where clauses as i did by hand, because BaseX is awesome.

To make the query fast, you either need to reduce the number of tuples,
and henve the number of times the expressions are evaluated, or you
need to reduce the cost of creating the tuples.

Moving the where clauses was my attempt to reduce the number of tuples.
Adding an index might reduce the cost of making the tuples, so i'd
certainly try that.

If the input document is sorted, you might be able to construct
something recursively (e.g. with fold-left) or use grouping or
windowing to process $parties in groups, which may help considerably.

Without seeing the data, that's only a guess.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] improving query performance

2020-08-21 Thread Liam R. E. Quin
On Fri, 2020-08-21 at 12:51 -0700, Bill Osmond wrote:
> 
> declare namespace ernm="http://ddex.net/xml/ern/411;;
> 
> for $r in /ernm:NewReleaseMessage
> for $track_release in $r/ReleaseList/TrackRelease
> for $party in $r/PartyList/Party
> for $sound_recording in $r/ResourceList/SoundRecording
> for $release in $r/ReleaseList/Release
> where
>   $track_release/ReleaseLabelReference = $party/PartyReference
>   and $track_release/ReleaseResourceReference =
> $sound_recording/ResourceReference
>   and $track_release/ReleaseResourceReference =
> $release/ResourceGroup/ResourceGroup/ResourceGroupContentItem/Release
> ResourceReference

BaseX is probably smart enough to rewrite this, but check -

for $r in /ernm:NewReleaseMessage
 for $track_release in $r/ReleaseList/TrackRelease
 where $track_release/ReleaseLabelReference = $party/PartyReference

 return
   for $party in $r/PartyList/Party
for $sound_recording in $r/ResourceList/SoundRecording
where $track_release/ReleaseResourceReference =
  $sound_recording/ResourceReference
return
  for $release in $r/ReleaseList/Release
  where
$track_release/ReleaseResourceReference =
$release/ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseRe
sourceReference 
return
   ...

> Am I wrong, and would an additional value index help here? Or is my
> query
> just bad?

You're computing every possible combiation of 5 items and then
filtering out the ones you want.

Filtering out earlier would probably help. Also, moving the tests least
lilely to match to the outside would reduce the number of tests sooner.

A value index might well help, but as Bridger wrote, check in the gUI
to see the query plan. BaseX might already be doing the sort of rewrite
i suggested.

Liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] xslt:transform function not working with XML Catalog

2020-07-10 Thread Liam R. E. Quin
On Fri, 2020-07-10 at 08:23 +0200, Imsieke, Gerrit, le-tex wrote:
>  I’d like to warmly 
> recommend paying him so that he can explore and fix the issue.

:-) Thank you for the recommendation!


The trick is to find the resolver output from setting verbose, as then
you will see the strings that are being sent to the resolver.

If you have your strace log, you can see which
CatalogManager.properties files were read, set verbose=999 in one, and
then look for output, but it's possible it's getting eaten by saxon as
the buffered output of XSLT. I should look into being able to capture
those messages separately.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] xslt:transform function not working with XML Catalog

2020-07-09 Thread Liam R. E. Quin
On Thu, 2020-07-09 at 04:32 +, Lizzi, Vincent wrote:
> Hi Liam,
> 
> Thanks for the reply and suggestions. Based on your suggestion I
> tried pragmas and strace, and had another go at
> CatalogManager.properties, but they've not had any effect. 

use, strace -f java >& hugelogfile.txt
and after, grep -i catalogmanager.properties hugelogfile.txt
and you should see where it's looking. If it doesn't look for that
file, check to see if it opened the jar file containing the resolver.

If you're running BaseX from  Oxygen, Oxygen needs to have it in its
classpath too i think.

Also, of course, see if the catalog file is actually being opened!

I actually wrote some of the code in BaseX that makes XML catalogs work
with transform(), or provided a rough draft that Christian improved :),
and debugging it was... interesting.

I'd also try an absolute path for the catalog file - if you are using
the BaseX server, relative paths will be relative to the directory
(folder) where the server itself is running. (and of course the server
needs the resolver in its classpath).

Messages from  the catalog manager seem to go (oddly) to standard
output interleaved with any XML output.

The command-line i used for testing this (well, one of the tests) was,

R=$HOME/lib/xmlcatalog/xml-commons-resolver-1.2/resolver.jar
MAIN=$HOME/packages/basex/basex

java -Dxml.catalog.files=saxlog.xml -D'
http://saxon.sf.net/feature/uriResolverClass=org.apache.xml.resolver.tools.CatalogResolver'
-cp
$R/resolver.jar:/home/lee/packages/basex/basex/BaseX.jar:$MAIN/lib/cust
om/*:$MAIN/lib/*: org.basex.BaseX  try.xq 

(Saxon was in $MAIN)

> 
-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] xslt:transform function not working with XML Catalog

2020-07-08 Thread Liam R. E. Quin
On Wed, 2020-07-08 at 22:46 +, Lizzi, Vincent wrote:
> I've encountered a problem using xslt:transform in to transform some
> old XML that contains a DTD DOCTYPE system literal pointing to a non-
> working URI and also uses ENTITYREF attributes to refer to image
> files. I have the XML Catalog configured correctly using CATFILE. 


If this is on Linux, using strace can help check which catalog file is
being used; you can also turn on debugging in a
CatalogManager.properties file containing the line
  verbosity = 999
(thee file needs to be in your Java classpath).

There's also a BaseX pragma, (# db:catfile path/to/catalog.xml #) {
   transform(...)
}

You need to turn off the BaseX internal parser.

Make sure that the resolver library and of course saxon are in your
class path.

You may need to add,
declare option db:catfile "path/relative/to/cwd/catalog.xml";
to your query.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Autocomplete with RESTXQ

2020-06-25 Thread Liam R. E. Quin
On Thu, 2020-06-25 at 18:31 +0200, Imsieke, Gerrit, le-tex wrote:
> Hi List,
> 
> Can anyone recommend a lightweight vanilla Javascript autocomplete 
> library that can easily be used together with BaseX RESTXQ? Maybe
> even a 
> readily cloneable/modifiable example?

Awesomeplete works for me, by Lea Verou.

There's lots of examples on her Web page for it,
https://leaverou.github.io/awesomplete/

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Websites that use BaseX

2020-06-23 Thread Liam R. E. Quin
On Tue, 2020-06-23 at 16:34 -0400, Joe Wicentowski wrote:
> Hi all,
> 
> I'd welcome any and all contributions to my list of XQuery-powered
> resources:
> 
>   https://github.com/joewiz/xquery-power

Not a major ecommerce site :) but...

Most Web pages on From Old Books _dot_ Org - 
https://www.fromoldbooks.org/ - incorporate some content generated by
XQuery, and many of the pages are entirely generated at runtime by
XQuery, using BaseX.

There may still be some pages using qizx/open.

Please don't add this to your site, but, It's possible to add, ;show-
query=text/plain to the pages under /Search/ to see the XQuery itself.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Writing

2020-06-11 Thread Liam R. E. Quin
On Fri, 2020-06-12 at 03:46 +0200, Giuseppe G. A. Celano wrote:
> Hi,
> 
> I would like to print a comment containing only a dash (i.e.,  >) , but this is not allowed.

This is not permitted in XML; it's a syntax error. A - in a comment
must be followed by a character that is not a hyphen. You can do,

or

but not


There are no tricks or workarounds, it's illegal.

Liam

https://www.w3.org/TR/REC-xml/#sec-comments

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] [FULLTEXT] How to search for underscore seperated words ?

2020-06-05 Thread Liam R. E. Quin
On Fri, 2020-06-05 at 12:38 +, ETANCHAUD Fabrice wrote:
> Hi all BaseX users,
> 
> When I search for 'YET_ANOTHER_SILLY_KEYWORD', ft:search gives me all
> text nodes containing any of the YET ANOTHER SILLY KEYWORD words.

Does it work to put the phrase in double quotes?

  ft:search(...'"the-phrase_here"'

?

Or use, //p[. contains text "the_word_with_underscores"] ?

(i've use contains text followed by ft:mark() to do match highlighting)

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] repeatedly full-text marking the same text node

2020-05-11 Thread Liam R. E. Quin
On Mon, 2020-05-11 at 22:29 +0200, Christian Grün wrote:

> Providing access to the starts and ends may be difficult due to all
> the logical operators that can be used 

A way to go from ($input, $phrases) to a $input autmented with
db:milestone elements each containing starts="0 7 23" ends="2 6 18"
attributes (where the numbers are positional in the sequene of phrases)
might be good. Or the mileston element could iclude the phrase,

I saw his 
   naked hooves
   unshod
bare
   feet

as two problems are (1) overlapping results, and (2) query expansion
using a thesaurus and/or stemming.

Liam

> (ftor, ftand, ftnot, not in). A
> simple example:
> 
>   let $xml := <_>a b c d update {}
>   return ft:mark($xml[text() contains text 'b c' ftand 'c d'])
> 
> We could possibly make the full data structures available that need
> to
> be internally generated. I fear people wouldn’t really work with it
> as
> they are fairly complex (a look into the specification may give you
> an
> impression of that [1]).
> 
> But thanks for your thoughts, I’ll let them grow.
> 
> [1] https://www.w3.org/TR/xpath-full-text-10/#FTOperatorsSemanticsSec
-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] repeatedly full-text marking the same text node

2020-05-10 Thread Liam R. E. Quin
On Sun, 2020-05-10 at 10:12 -0400, Graydon wrote:
> 
> I now think this just isn't a full-text use case;

In the past i used a text retrival package i wrote to solve the problem
of inserting links automatically, choosing the longest & avoiding
overlaps.

I use some multi-threaded procedural code i wrote years ago in Perl to
do it on e.g.
https://words.fromoldbooks.org/Chalmers-Biography/w/walsingham-sir-francis.html

Recently i was thinking about rewriting thism perhaps in XSLT and/or
XQuery to try and keep the most "relevant" link rather than the
longest, with a different UI. The Perl script takes maybe two minutes
to run on approx. 200 MBytes of HTML (10,000 files). But i'd need a
good definition of relevant.

I regret that my efforts to get more full text researchers interested
in joining the XQuery full text work failed - but then i think one of
them may have been Sergey Brin, and he had other interests :) - as
markup-informed ranking of results ought to be really interesting. On
the other hand maybe Full Text would have become even more complex :)

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] repeatedly full-text marking the same text node

2020-05-10 Thread Liam R. E. Quin
On Fri, 2020-05-08 at 14:52 -0400, Graydon Saunders wrote:
> 
> 
> The idea would be to iterate through the list, marking up the node
> with any
> matches.

Can you instead use standoff markup? E.g. store positions of start and
end as word counts, and then merge them later?

> 
-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Grouping words in phrasal matches with full-text indexes

2020-04-25 Thread Liam R. E. Quin
On Sat, 2020-04-25 at 13:46 -0400, Graydon Saunders wrote:
> 
> I think I have figured out a way to connect the adjacent marked words
> in
> the phrasal term into a single mark element. I cannot convince myself
> that
> this is the right way; is there a better approach than tumbling
> windows?

I just search for the multi-word phrase and surround that. Enclosed is
a sample from a prototype for a keyword in context search index for
fromoldbooks.org (not yet live). Lookognow i see it's not very neat but
maybe it'll give some ideas.




let $results :=
(
let $matches := $doc//p[not(ancestor::longdesc) and (.//text() contains text { 
$term })]
for $match at $pos in $matches
let $sock := {ft:mark( $match[text() contains text { $term }] , 
'sock')},
(: "sock" is now a singleton element likely containing a p or title element,
 : with every phrase matching the query surrounded with a sock element.
 :)
$longbefore := concat(
"   









 ",
local:ws( string-join(($sock//sock)[1]/preceding-sibling::node(), '' ))),
$before := replace($longbefore, " $", ""),
$after:= local:ws( string-join( 
($sock//sock)[last()]/following-sibling::node(), '')),
$uri := document-uri( $match/ancestor::document-node() ),
$image := $match/ancestor::image
where not( empty(($sock//sock)))
return

  

  {
substring($before, string-length($before) - 100, string-length($before))
  }
  
  { string-join($sock//sock, ' ') }{ 
local:trunc($after, 60) }
  


)
return
  
{
for $r in $results
group by $g := $r/@data-group
return
  
{
  if ($g ne "") then
let $source := $doc//source[@id = $g]
return
  (
{
 $source/title/node()
},
if ($source/author and ($source/author ne "Anonymous"))
then ", by " || $source/author
else "",
if ($source/date)
then " (" || $source/date || ")"
else ""
)
else
  $r[1]//a[contains-token(@class, 'info')]
}
{$r}
  
}
  


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Write output using proc:execute

2020-04-09 Thread Liam R. E. Quin
On Thu, 2020-04-09 at 16:00 -0400, Tim Thompson wrote:
> 
> proc:execute("echo", ("hello!", "> hello.txt"))

You could run, bash -c 'echo hello > hello.txt'
instead, maybe?


This is assuming you are using Linux or the Linux subsystem on Windows,
or cygwin, or OS X... so bash is available.

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Memoize

2020-04-09 Thread Liam R. E. Quin
On Thu, 2020-04-09 at 10:08 +0200, Mickael Desfrenes wrote:
> 
> My goal was to get faster results when a query is run multiple times.
> Yes, that's probably premature optimization, but since I do require
> these things in other application stacks I thought I'd ask.

I have a Perl-based framework that kept a cache of results). But, the
time taken to open a cache file and read it is often longer than it
takes BaseX to run the query. The main value is that there are a couple
of queries that are much slower.

I've also used memcached via php in a front end, and that's faster.

One of that hardest things about cache management, though, is
invalidating pages when the data changes. For 
https://www.fromoldbooks.org/Search/ i just blow away the whole cache
and then pre-fetch the 100 or so most common queries, one per second.

But the front page on fromoldbooks.org is not cached and is just about
as fast as a search. The sweet spot for Web back ends is still that you
need a Web page to finish loading in under two seconds to stop Google
from hating you :)

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Problem with attribute parsing?

2020-04-05 Thread Liam R. E. Quin
On Sun, 2020-04-05 at 22:21 +, Peter Villadsen wrote:
> Hello
> 
> I was doing some experiments and I ended up with this:
> 
> let $c := 

This is an error because the braces surround expresions:
  
would work.

Similarly you can write
   { concat('dirt', 'noise') }
of course.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Memoize

2020-03-30 Thread Liam R. E. Quin
On Mon, 2020-03-30 at 15:39 +0200, Christian Grün wrote:
>  I remember that some users have successfully utilized Memcached in
> the past to cache BaseX query results.


I did this for a while on fromoldbooks.org and it worked fine (you have
to know when to invalidate the cache of course!). But for 
www.fromolbooks.org/Search/ i have a cacheing framework i wrote years
ago (which unfortunately grew into a monster, as these things do) that
also lets me switch between different XQuery engines.  It's been ages,
i think more than 10 years, since i've had to switch, but it's
sometimes useful for debugging problems.

The first version was in production in under a day.

Most of the queries i use in BaseX for the Web site run faster than the
time to start the cahe framework these days, though, so i should redo
it. You have to get the basic HTML result sent to the Google bot in
under a second if you want to be on the first page of results.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] file:move() fails when moving to another file system

2020-02-12 Thread Liam R. E. Quin
On Wed, 2020-02-12 at 08:44 +, Zimmel, Daniel wrote:
> > A last try: What do you get if you run it with basex (the
> > standalone, not the client)?

Looks to me like query/func/file/FileCopy.java uses Files.move() when
it thinks i can, but this will fail on Unix-like systems if you try to
move a file across file system boundaries (because then the kernel
would need to copy the data).

One approach might be to catch failure of Files.move() and try again
with Files.copy().

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] An invalid XML character (Unicode: 0x1a) was found in the element content of the document

2020-01-16 Thread Liam R. E. Quin
On Thu, 2020-01-16 at 03:43 -0500, Geoff Alexander wrote:
> 
> We're getting an "An invalid XML character (Unicode: 0x1a) was found
> in the element content of the document" error

Character 0x1A is indeed not allowed in an XML document.

See e.g. https://www.w3.org/TR/REC-xml/#charsets

One reason this can happen is if a document is in some character set,
such as a DOS codepage or early Apple charset, that (mis-)uses some of
the control characters, such as this one, to be printable characters.
Anotheris conversion errors, and another is attempts to include binary
data. In other words it's usually a file encoding problem.

If the document doesn't actually contain a byte of that value, though,
it's another problem...


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Huge No of XML files.

2019-12-18 Thread Liam R. E. Quin
On Wed, 2019-12-18 at 11:10 +0530, Sreenivasulu Yadavalli wrote:
> > 
> What exactly do you mean by moving collections around?.
> 
> A: moving the collections in the same system. 

So, you use the Linux "mv" command to do this? Or what?

What exactly do you mean by collections? I for one would find it easier
if you would stop talking in riddles, as my telepathy skills are weak.

> And every day we have to
> update the existing collection with call data. So finding the
> collection is
> taking more time

How do you look for the collection? Isn't it a separate BaseX database?

> 
> Are you taking a database with 100 million documents and renaming
> 50,000 of them?
> 
> What operations exactly are slow?
> 
> A: finding the existing collection.

find / -name collection.db ?

This is a little frustrating in that you are asking for people's help
but not explaining the problem. Are you saying that fn:collection() is
slow in BaseX? What arguments are you passing it exactly? What is the
size, in gigabytes, of the database, on disk? How many documents are in
it?

Can you give step-by-step EXACT AND PRECISSE instructions so someone
else could reproduce the problem you have having? Complete and exact
instructions, with sample files if needed, so they can reproduce the
problem on their own computer?

A database with 80,000 files is easy to "find" here, and opens quickly,
in a small fraction of a second. It doesn't take hours.

Is something else running on your computer that makes it slow??

Note: please remember to copy the list in your replies, as the BaseX
people are far more knowledgeable about BaseX than i am :) My goal as
an analyst is to get you to explain the problem you are having clearly
enough that you can get an answer :)

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Huge No of XML files.

2019-12-17 Thread Liam R. E. Quin
On Tue, 2019-12-17 at 11:48 +0530, Sreenivasulu Yadavalli wrote:
> 
> Every day we are moving collections around 55k to 60k no of xml files
> large
> account.


Here, i just created a BaseX database with 80,000 XML files. It took
under one minute on the Linux desktop system i use.

>  Its taking more than 18 hours.
This make no sense. How much memory do you have on the computer?

What exactly do you mean by moving collections around?

Are you taking a database with 100 million documents and renaming
50,000 of them?

What operations exactly are slow?

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Weird: mixed content trimmed unexpectedly

2019-12-09 Thread Liam R. E. Quin
On Mon, 2019-12-09 at 20:27 +0100, Arjan Loeffen wrote:
> 
> In general: when the wiki states here: "Many XML documents include
> whitespaces that have been added to improve readability. ", this
> should not
> apply to mixed content fragments as described. Only to start and end
> of
> "text content of elements", not on text nodes.
> I therefore also think that the second approach is not exactly in
> line with
> the *intention *of the XML standard.

It isn't, but some of the earliest XML parsers had the option to drop
white-space-only text nodes (e.g. MSXML i think) because of XML used in
data contexts. The intent was that a DTD could be used to determine
which spaces to ignore, but then DTDs became optional.

A parser without a DTD does not know which elements _could_ contain
text, and hence doesn't know what to drop. In addition, markup like,

  

   Nigel


   0.4

  

is common, unfortunately. In SGML this worked but the whitespace rules
were complex enough that were a constant source of trouble.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] http:request parsing question

2019-10-22 Thread Liam R. E. Quin
On Tue, 2019-10-22 at 19:32 -0400, Bridger Dyson-Smith wrote:
> 
> http://export.arxiv.org/oai2?verb=Identify'/>)//@status/data()
> 
> returns '200', but trying
> 
> http:send-request()/h:response/@status
> 
> fails.

I'm guessing that the first request sends you an auth token, and that
you're supposed to send this as a parameter in the second token?

How exactly does the second call fail?


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] how to specify a database in a.xq FLOWR query?

2019-10-07 Thread Liam R. E. Quin
On Mon, 2019-10-07 at 03:38 -0700, thufir wrote:
>  From SO (and the fine manual), the solution is to use:  basex -i 
> w3school_data titles.xq

Seeing the filename propts me to note - w3schools (when i've looked at
it) isn't a good place to learn from. They were never affiliated with
W3C, and their tutorials often seem written by people without a deep
understanding.

Mind you, i don't know many free online resources that are better
alternatives...

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] // versus /*/

2019-10-07 Thread Liam R. E. Quin
On Sun, 2019-10-06 at 21:28 -0700, thufir wrote:
> Do these have the same meaning?  Might there be a subtle distinction,
> or 
> might they be read differently but functionally identical?

Are we doing your homework? :-) :-)

 //* is the same as /descendant-or-self::*
 //book means, search the whole database to find "book" elements.


 /*/book meeans make a list of all children of the top-level node, and
find book elements that are children of items in that list.

So, given
  

//book will find one node, and /*/book won't find any.

> They're equally efficient, at least as used above?
They are doing different things. To measure efficiency, use a much
larger database than the XQuery use cases example :)

You may find Priscilla Walmsley's XQuery book helpful in learning XPath
version 3.

Best,

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] Passing through entities unchanged when serializing

2019-09-09 Thread Liam R. E. Quin
On Tue, 2019-09-10 at 02:59 +0200, Andreas Mixich wrote:
> I wonder why the serialization behaves that way. It does not make
> sense to
> me. If a user has the need to escape XML, it should be thorough,
> shouldn't it?

XML entities are expanded by he XML parser, so by the time XQuery (or
XSLT) sees the document they are gone.

Consider an entity like
blackgreySteven">



It'd be really complex to have that visible to XPath and to have to
write, e.g.
/students/entity(*)/person

If it's an external parsed entity it's visible in that the base-uri
property changes, but that's all.

Character entities like  (ŗ) are just special cases of
general entities, and XML does not distinguish them. I wish it did, but
we never got back to that work after publishing XML 1.0.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] Passing through entities unchanged when serializing

2019-09-09 Thread Liam R. E. Quin
On Mon, 2019-09-09 at 15:04 +0200, Andreas Mixich wrote:
> when serializing a string, that contains literal XML with entities,
> how do I pass through those entities unchanged?

One way is to use a character map, as Bridger Dyson-Smith described.

Sometimes another way can be to have a version of the DTD in which the
replacement text of the entity marks the presence of the entity, e.g.

but this will affect full-text searching of course.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Webslave for old illustrations  http://www.fromoldbooks.org/



Re: [basex-talk] Nooby/#CitizenScientist REQUEST for HELP: Python implementation of this XQuery

2019-04-10 Thread Liam R. E. Quin
On Wed, 2019-04-10 at 14:52 -0600, Jim Salmons wrote:
> [...]

I _think_ what you are asking is, how so i interpolate values into a
string in Python.

If that is correct, then the first Google result for
interpolate values into a string in Python
is https://www.programiz.com/python-programming/string-interpolation

The main thing to remember is that $ and { are special in XQuery, so it
can be easier to use substitution with a regular expression than direct
interpolation.

https://stackoverflow.com/questions/3877623/in-python-can-you-have-variables-within-triple-quotes-if-so-how
may also help.

If this is meaningless technobabble or i have misunderstood, please
feel free to ask again but more directly.

Liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] Higher order functions in XSLT

2019-03-28 Thread Liam R. E. Quin
On Thu, 2019-03-28 at 11:48 +0100, nikos dimitrakas wrote:
> 
[...]
> XPST0003: Inline functions require support for higher-order-
> functions, which needs Saxon-PE or higher. I am using Saxon EE
> 9.9.1.2 (also tried PE 9.9.1.2)

This suggests BaseX is picking up the wrong version of Saxon, or maybe
not finding the license file?

it should work.

On Linux, i sometimes use
strace -f $MAIN/bin/basex myquery.xq |& grep saxon
and see where the open system call succeeds.
(you can get more sophisticated with strace but i don’t know which
operating system you use).

Liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] Way to escape XML?

2019-03-24 Thread Liam R. E. Quin
On Sun, 2019-03-24 at 04:22 +0100, Andreas Mixich wrote:
> let $xml as element() := Hello World
> return serialize($xml, map{"method":"entity-escaped-string"}
> 
> would result in
> 
> xmlHello World/xml

One way,

declare function local:escapexml($input as item()*) as xs:string?
{
  {fn:serialize($input)}/text()
};

declare option output:method   "xml";

local:escapexml(
  
Simon
24 years
blue
  
)


Note that if you don’t have the XML output method, strings are output
without escaping, so you can’t see that it has worked.

Liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] supporting XML Catalog files in xslt:transform() (patch)

2019-03-13 Thread Liam R. E. Quin
On Wed, 2019-03-13 at 20:15 +0100, Imsieke, Gerrit, le-tex wrote:
> 
> On 13.03.2019 19:55, Liam R. E. Quin wrote:
> > Yes, they are a bit of a nightmare. Actually i’ve thought about
> > having
> > the ability to write a URI Resolver in XQuery,
> >  db:resolve-identifier($system, $public, $purpose, $types) as
> > xs:anyURI?
> > 
> > but maybe it is too scary!
> 
> I’ve already written a catalog resolver in XSLT…
> https://github.com/transpect/xslt-util/blob/master/xslt-based-catalog-resolver/xsl/resolve-uri-by-catalog.xsl

i bow down before your awesomeness :) but, the next step is to be able
to use a user-written resolver in XSLT itself, e.g. for loading DTDs
(other than the stylesheet contianig the resolver code...)

For now though, thanks, Christian, for making the changes!

Liam



-- 
Liam Quin - web slave for https://www.fromoldbooks.org/
with fabulous vintage art and fascinating texts to read.
Click here to have the slave beaten.



Re: [basex-talk] supporting XML Catalog files in xslt:transform() (patch)

2019-03-13 Thread Liam R. E. Quin
On Wed, 2019-03-13 at 11:57 +0100, Christian Grün wrote:
> > Note that i have a public identifier, so using prefer-public lets
> > that
> > be resolved.
> 
> xmllint and BaseX seem to behave differently on my system. With
> xmllint and xsltproc, your examples run fine.

That's good at least...

> Your example (with the public identifier) returns the expected result
> if I replace "nonsense.dtd" with "http://nonsense.dtd;. Do you
> experience a similar behavior?

Io, but i am running the example locally with no http involve, and
using the standalone BaseX jar. But i did have to change it to map
file:


If you add
verbosity = 999
to CatalogManager.properties you will see a log of what it is trying to
resolve which may help (it also logs all the mapping rules it finds).

> In a nutshell: You convinced me well enough that we should simplify
> things and handle catalogs globally. 

:-)

Yes, they are a bit of a nightmare. Actually i’ve thought about having
the ability to write a URI Resolver in XQuery, 
db:resolve-identifier($system, $public, $purpose, $types) as
xs:anyURI?

but maybe it is too scary!

> Understanding catalogs is quite a
> challenge in itself, and we shouldn’t necessarily make it even more
> challenging. I have simplified the code again, so it looks pretty
> similar to your original solution ;)
i’m sorry - i should have included more background and rationale as to
why i did it the way i did, i think.

> • If a global catalog file list is defined, it will also be assigned
> to the XSL transformer. In fact, that’s the default behavior anyway
> if
> functions like fn:doc are used in BaseX.
Perfect.

> • No warnings will be output to standard error, unless
> xml.catalog.ignoreMissing is overwritten.
Perfect.

> 
> The documentation has been updated, and new snapshots are available.

it is all working for me. Many many thanks!

Liam



-- 
Liam Quin - web slave for https://www.fromoldbooks.org/
with fabulous vintage art and fascinating texts to read.
Click here to have the slave rewarded with cold gruel.



Re: [basex-talk] supporting XML Catalog files in xslt:transform() (patch)

2019-03-12 Thread Liam R. E. Quin
On Tue, 2019-03-12 at 13:46 +0100, Christian Grün wrote:
> Hi Liam,
> 
> Thanks for the enclosed example. I am still trying to figure out how
> to run it, so I tried to simplify everything.
> 
> As you can easily guess, my knowledge on XML catalogs is rather
> limited: For example, when trying to run the example with fetch:xml,
> I
> noticed that the URI resolution works if I change "nonsense.dtd" to
> "http://nonsense.dtd; (both in all.xml and in saxalog.xml). I wonder
> why it works without the URI scheme on your system? I have attached
> my
> simple example for fetch.xq.

Note that i have a public identifier, so using prefer-public lets that
be resolved.

Adding a CatalogManagers.properties file that says verbose= helps
to debug this stuff.

I used the Apache resolver class because it's commonly used also with
Saxon.

> Maybe we manage to construct a fully-stripped down, minimized
> instance
> of the example that works with XSLT 1.0?

Enclosed.

> The more I think about URI resolution, the more I see why it could
> make sense to handle catalog resolution globally.

Especially since the “CATFILE” option is really a semicolon-separated
list of files.

>  In the given case,
> I’ll still need to understand why it makes a difference if we assign
> the value of the CATFILE option or the xslt:transform option to the
> transformer?

Seems the latter is ignored, except that if it's not given, the URI
resolver is not enabled. But i tihnk the catalog should be enabled
everywhere if it's set.

It's hard to come up with a use case for having catalog being used for
doc() and for files loaded into the database and not for files opened
in subsidiary modules, and using different catalog files in different
parts of the same query sounds like a nightmare for users to debug.

Note, by the way, that instantiating a resolver is relatively expensive
- the code looks for a whole bunch of files - and also that resolving a
file will look by default for CatalogManager.properties in a bunch of
places, so that this would slow down (for example) importing 10,000 XML
files; a static resolver is probably much faster, which is why i'd left
the two options - ignore missing, and static. I don't want to be
responsible for slowing down BaseX! :-) :-)

Liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/
<>


Re: [basex-talk] supporting XML Catalog files in xslt:transform() (patch)

2019-03-08 Thread Liam R. E. Quin
On Thu, 2019-03-07 at 13:19 +0100, Christian Grün wrote:
> Hi Liam,
> 
> > works if i uncomment the option declaration, but not otherwise.
> 
> Interesting; seems I have overlooked something. And I must admit I
> haven’t tried to run it by myself so far. Could you possibly send me
> a
> little self-contained example (xsl, catalog file, file referenced by
> the xsl file) that demonstrates the missing URI resultion and that I
> could embed as unit test?

Yes, enclosed, with a README that says what output is expected, and the
two problems -
[A] the wrong catalog file being used.
[B] a spurious message

Now, in fact, if you changed the catalog option to a boolean, and
documented that the db:option should be used, i think you’d be fine.

Liam

> > i'd suggest either checking if the systemProperty for it is set,
> > …
> > The system property is xml.catalog.ignoreMissing
> 
> Yes, sounds reasonable. It’s actually what I already tried before I
> decided to drop the static property assignment.
> 
> Thanks,
> Christian
<>


Re: [basex-talk] supporting XML Catalog files in xslt:transform() (patch)

2019-03-06 Thread Liam R. E. Quin
On Tue, 2019-03-05 at 13:44 +0100, Christian Grün wrote:
> Liam,
> 
> Thanks a lot for your patch, very appreciated! Pull requests are even
> handier for us, but any type of commit is welcome.

Thanks! Awesome!

I'll do a pull request next time.

I've tried the snapshot and got it to work; however, the following
query

(: declare option db:catfile 
"/home/lee/public_html/texts/Dictionaries/Pigott-PoliticalDictionary/xml/saxalog.xml";
 :)
xslt:transform(
doc("try.xsl"),
doc("try.xsl"),
map {
'dummyparam' : 'socks'
},
map {
'catalog' : 
"/home/lee/public_html/texts/Dictionaries/Pigott-PoliticalDictionary/xml/unused.xml"
}
)

works if i uncomment the option declaration, but not otherwise. The
file given as a fourth argument to xslt:transform doesn't seem to be
opened (according to strace for example).  But if the argument is
omitted, catalog resolution is not performed.


Also watch that having xml.catalog.ignoreMissing unset means the
resolver will issue warnings if no properties file is found; rather
than get bug reports about a weird message,
  Cannot find CatalogManager.properties
i'd suggest either checking if the systemProperty for it is set, or
reinstating
  invoke(method(CMP, "setIgnoreMissingProperties", boolean.class), CM,
true);
(modulo refactoring)

The system property is xml.catalog.ignoreMissing

I can look at tracking these two things down and making a pull request
for them, unless i'm doing something obvious wrong.

I noticed that Java 9 provides a better built-in support for XML
> catalog resolution [4]. With BaseX 10, we will probably switch to a
> newer version of Java. If we are going to upgrade: Would anyone
> reading this recommend us to keep up support for the separate Apache
> XML resolver library, or could we drop it completely and rely on
> Java’s built-in catalog support?

I think it's too soon; i don't think Saxon is using it for example.

Thanks!

Liam

> 

Liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] following-sibling axis -- real data example

2019-03-01 Thread Liam R. E. Quin
On Fri, 2019-03-01 at 13:50 -0800, Mark Bordelon wrote:
> I have tried using the -w option’s true and false values, but my
> results are always as above.
> 
> Any ideas?

Try removing all whitespace between tags that's not part of the actual
document and see if you get different results; if so, i'd suspect -w
isn't working, perhaps?

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



[basex-talk] supporting XML Catalog files in xslt:transform() (patch)

2019-02-26 Thread Liam R. E. Quin
The enclosed patch -- would you prefer a github pull request? -- makes
xslt:transform() aware of XML catalog files, as other XML parsing
already is. The same CATFILE preference is used (via the query
context).

I refactored CatalogWrapper slightly so it could be reused. I also took
away the line that sets verbosity to 0, as without it you can control
verbosity via a system property, or using the CatalogManager.properties
file.

I tested this with xml-commons-resolver-1.2/resolver.jar and with the
built-in resolver and both seem to work.

I have not added tests. In addition, it'd be worth adding something to
the documentation, especially about the xml.catalog.verbosity property
(just verbosity in the .properties file).

Possible breaking change: i also removed the line that sets
prefer=public. I spent ages trying to get catalogs working before i
dicovered this, as i was using a system identifier!  The code could
check to see if the corresponding system property is set (users can't
override the API with system properties, frustratingly), but since
catalogs already say prefer=public or prefer=system in them, and it'd
have needed to have been the same to work, i don't think this change
breaks anythign in practice. It may make some catalogs start to work
that had not been working, so maybe it's worth a line in the release
notes.

Liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/
diff --git a/basex-core/src/main/java/org/basex/build/xml/CatalogWrapper.java b/basex-core/src/main/java/org/basex/build/xml/CatalogWrapper.java
index c4c9f41fa..1303fca92 100644
--- a/basex-core/src/main/java/org/basex/build/xml/CatalogWrapper.java
+++ b/basex-core/src/main/java/org/basex/build/xml/CatalogWrapper.java
@@ -38,6 +38,41 @@ public final class CatalogWrapper {
 return CM != null;
   }
 
+  /**
+   * Returns the resolver, which could be of any class that implements the
+   * CatalogManager interface.
+   */
+  public static Object getCM() {
+return get(CRP, CM);
+  }
+
+  public static void setDefaults(final String path) {
+if(CM == null) return;
+
+// IgnoreMissingProperties - default is to print a warning if properties
+// are unset; this is not usually what we want, but can be overridden
+// for debugging. Not all resolvers produce errors even if this is false,
+// so better to set it to true.
+invoke(method(CMP, "setIgnoreMissingProperties", boolean.class), CM, true);
+
+// CatalogFiles - semicolon-separated list of files
+invoke(method(CMP, "setCatalogFiles", String.class), CM, path);
+
+// StaticCatalog:
+// If this manager uses static catalogs, the same static catalog will
+// always be returned.   Otherwise a new catalog will be returned.
+// We would probably get better performance by using true here. You get a
+// new catalogmanager instance if you change PATH.
+invoke(method(CMP, "setUseStaticCatalog", boolean.class), CM, false);
+
+// You can also set Verbosity, but that's best left for the properties file,
+// CatalogManager.propertie,  or system property, to help debugging.
+// The higher the number, the more messages.
+// NOTE messages go to output, not err stream!
+// invoke(method(CMP, "setVerbosity", int.class), CM, 0);
+
+  }
+
   /**
* Decorates the {@link XMLReader} with the catalog resolver if it is found in the classpath.
* Does nothing otherwise.
@@ -46,11 +81,9 @@ public final class CatalogWrapper {
*/
   static void set(final XMLReader reader, final String path) {
 if(CM == null) return;
-invoke(method(CMP, "setIgnoreMissingProperties", boolean.class), CM, true);
-invoke(method(CMP, "setCatalogFiles", String.class), CM, path);
-invoke(method(CMP, "setPreferPublic", boolean.class), CM, true);
-invoke(method(CMP, "setUseStaticCatalog", boolean.class), CM, false);
-invoke(method(CMP, "setVerbosity", int.class), CM, 0);
+
+setDefaults(path);
+
 reader.setEntityResolver((EntityResolver) get(CRP, CM));
   }
 }
diff --git a/basex-core/src/main/java/org/basex/query/func/xslt/XsltTransform.java b/basex-core/src/main/java/org/basex/query/func/xslt/XsltTransform.java
index a0750b74c..3c34bced7 100644
--- a/basex-core/src/main/java/org/basex/query/func/xslt/XsltTransform.java
+++ b/basex-core/src/main/java/org/basex/query/func/xslt/XsltTransform.java
@@ -9,6 +9,7 @@ import java.util.*;
 import javax.xml.transform.*;
 import javax.xml.transform.stream.*;
 
+import org.basex.core.*;
 import org.basex.io.*;
 import org.basex.io.out.*;
 import org.basex.io.serial.*;
@@ -17,6 +18,7 @@ import org.basex.query.value.item.*;
 import org.basex.query.value.node.*;
 import org.basex.util.*;
 import org.basex.util.options.*;
+import org.basex.build.xml.*; // for CatalogWrapper
 
 /**
  * Function 

Re: [basex-talk] RESTXQ and regexp

2019-02-23 Thread Liam R. E. Quin
On Fri, 2019-02-22 at 12:05 +0100, Marco Lettere wrote:

> (: Matches anything followed by /input :)
> declare %rest:path("app/{$path=.+}/input")
>function page:inputs($path) { ... };
> 
> (: Matches all other all paths starting with "app/" :)
> declare %rest:path("app/{$path=.+}")
>function page:others($path) { ... };

Can you use content negotiation/quality to say the /input one is
preferred when both match? E.g. %rest:produces("*/*;qs=0.8") on the
page:others function and */*;qs=1.0 on the input ont?

Liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] proc:execute

2019-02-11 Thread Liam R. E. Quin
On Tue, 2019-02-12 at 01:42 +0100, Giuseppe G. A. Celano wrote:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position
> 5390: ordinal not in range(128)

A guess - sounds like an encoding error - 0xE2 is â in Unicode, and 128
suggests US ASCII was expected - check the encoding declaration on the
XML, or maybe it's a locale difference?

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] many distinct namespaces

2019-01-24 Thread Liam R. E. Quin
On Wed, 2019-01-16 at 15:31 +0100, Christian Grün wrote:
> Hi Liam,
> 
> > did this ever happen?
> 
> What exactly? ;)

sorry! support for more than 256 namespaces in one db.


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] many distinct namespaces

2019-01-15 Thread Liam R. E. Quin
On Fri, 2018-11-02 at 17:25 +0100, Christian Grün wrote:


did this ever happen?
> 

> Some more details: The current storage layout per node has been fixed
> to 16 bytes. One byte (8 bits) is reserved for the namespace
> reference.

Here are a couple of hacky appraches in the spirit of brainstorming ;)

* reserve 7 bits for the namespace, and 1 bit for "uses extended
namespace". In the top-bit-set objects only, add an extra byte.
* Use the first 2 bytes of the element name :)

Liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] Support for XML with different file extension in GUI

2018-11-14 Thread Liam R. E. Quin
On Wed, 2018-11-14 at 19:20 +0100, Christian Grün wrote:
> Good news: It has already been implemented, and it will be available
> soon
> after we’ve done some more testing!

you people are beyond awesome! :D

> 


-- 
Liam Quin - web slave for https://www.fromoldbooks.org/
with fabulous vintage art and fascinating texts to read.
Click here to have the slave beaten.



Re: [basex-talk] Support for XML with different file extension in GUI

2018-11-14 Thread Liam R. E. Quin
On Tue, 2018-11-06 at 18:32 +0100, Christian Grün wrote:
> Hi Kevin,
> 
> I’ll probably add an editable list of XML suffixes to the GUI
> preference dialog. 

I think this would be very helpful - recently i was making an index of
some epub files - they are zip'd directories of (mostly) XML, but i
have re name them to end in .zip and/or expand them and rename
individual files right now. Or write a query to do it :)


-- 
Liam Quin, https://www.holoweb.net/liam/cv/
Web slave for vintage clipart http://www.fromoldbooks.org/
Available for XML/Document/Information Architecture/
XSL/XQuery/Web/Text Processing/A11Y work & consulting.



Re: [basex-talk] First non-null value?

2018-09-14 Thread Liam R. E. Quin
On Thu, 2018-09-13 at 16:18 -0400, Graydon Saunders wrote:
> let $possible1 as xs:string* := (: go looking for a value via one
> route :)
> let $possible2  (: all the other routes in preference order :)
> 
> let $foundIt as xs:string :=
> ($possible1,$possible2,$possible3,$possible4,$possible5,'FAILED')[1]
> 
> This works nicely in terms of "I got the value by the least-bad
> route".
> What I'm blanking on is "how do I tell which was the first
> possibility to
> match?" without resorting to sprawl of if-then-else statements.  I
> have the
> idea that there must be a compact way but I have no idea what it is
> if
> there is.

Don't use variables - just construct a sequence,
let $possibles as xs:string := (
   stuff to make possible1,
   stuff to make possible 2,
   . . .
   'fallback value'
   )[1]
> 

Liam


-- 
Liam Quin, https://www.holoweb.net/liam/cv/
Web slave for vintage clipart http://www.fromoldbooks.org/
Available for XML/Document/Information Architecture/
XSL/XQuery/Web/Text Processing/A11Y work/training/consulting.



Re: [basex-talk] Huge CSV

2018-08-12 Thread Liam R. E. Quin
On Sun, 2018-08-12 at 23:58 +0200, Giuseppe Celano wrote:
> more documents accessed sequentially is better than one
> big file.

Are you building indexes in the database? Do yourqueries make use of
them?

You may find using the full text extensions useful.

Liam


-- 
Liam Quin, https://www.holoweb.net/liam/cv/
Web slave for vintage clipart http://www.fromoldbooks.org/
Available for XML/Document/Information Architecture/
XSL/XQuery/Web/Text Processing/A11Y work & consulting.



Re: [basex-talk] Huge CSV

2018-08-10 Thread Liam R. E. Quin
On Fri, 2018-08-10 at 13:43 +0200, Giuseppe Celano wrote:
> I uploaded the file, as it is, in the database,

i'd probably look for an XSLT transformation to turn it into XSLT - of
there are python and perl scripts or other programs that can do it -
and then load the result intoa database.

It's not all that large a file, so maybe it'd help if you described the
exact problems you were having -- what did you try, what did you expect
to happen, what actually happen, what steps did you take to
investigate...

Liam


-- 
Liam Quin, https://www.holoweb.net/liam/cv/
Web slave for vintage clipart http://www.fromoldbooks.org/
Available for XML/Document/Information Architecture/
XSL/XQuery/Web/Text Processing/A11Y work & consulting.



Re: [basex-talk] Different interpretation of regex in eXist, Saxon and BaseX

2018-08-07 Thread Liam R. E. Quin
On Tue, 2018-08-07 at 21:31 -0400, Bridger Dyson-Smith wrote:
>  isn't the '?' a reluctant quantifier - given two
> choices it
> will always match the shorter choice?

b? matches zero or one "b".

b* matches zero or more "b" using the longest match possible

b+ matches one or more "b" using the longest match possible

b*? matches zero or more "b" using the shortest match possible.

b+? matches one or more "b" using the shortest match possible.

See https://www.w3.org/TR/xpath-functions-31/#regex-syntax for examples
and more text.

? inside a character class matches a ? so that [#?] matches either "#"
or "?".


> >   ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?


This can indeed match the empty string: adding speaces for clarity:

^  -- start of string
(([^:/?#]+):)? -- optional because of ?
(//([^/?#]*))? -- optional because of ?
([^?#]*)  can match the empty string because of *
(\?([^#]*))?  optional because of ?
(#(.*))?  optional because of ?

[no $ to match the end of the string included]

It's actually hard to construct a string that isn't a valid URI
according to the specs, and harder still to determine this from reading
the specs.

In XQuery i'd just do soemthing like xs:anyURI($string) and let the
XQuery engine work it out.- use try/catch if necessary. It's rare that
it makes sense to be more restrictive than, say, fn:doc() or than Web
browsers.

Liam



-- 
Liam Quin, https://www.holoweb.net/liam/cv/
Web slave for vintage clipart http://www.fromoldbooks.org/
Available for XML/Document/Information Architecture/
XSL/XQuery/Web/Text Processing/A11Y work & consulting.



Re: [basex-talk] Failure with comment in editor when commenting out a regex

2018-08-01 Thread Liam R. E. Quin
On Wed, 2018-08-01 at 22:31 +0200, Andreas Mixich wrote:
> host :)
> 
> Editor/GUI warns, that: "no expression allowed in library module" and
> places error marker at the two slashes // after the :)( in the regex.

Well, it's right - since it's in a comment it's not in a regex, it's
just :) in a comment.

Liam


-- 
Liam Quin - web slave for https://www.fromoldbooks.org/
with fabulous vintage art and fascinating texts to read.



Re: [basex-talk] baseX vs ExistDB

2018-04-20 Thread Liam R. E. Quin
On Thu, 2018-04-19 at 16:26 +0100, Feargal Hogan wrote:
> > 
> From the comparison chart that Ben referenced earlier I noticed that
> baseX doesn’t seem to actually load xml files into an xml database,
> is that right?
No. Yes. Maybe.

baseX does load the documents into a database. It stores them in an
internal data structure, not as textual XML.

> It creates a queryable indexed representation of the files?
> Is that right?
Yes.

> And what happens when a file is edited/updated?
A file outside the database? Nothing.

Depending on database options, though, if you update a document in the
db, the index is updated.

> Does baseX need to be 'told' that it has been updated, in order to
> add the new data to its indeices?
> Or does it know there has been an update and automatically reindex?

This isn't a meaningful quesiton.

If you load a CSV file into a database such as Oracle, what happens if
the CSV file changes on disk outside Oracle? And why do you care? You
would normally edit the data at that point in the database using a SQL
application.

BaseX doesn't need to consult the external XML files once the database
is built (although yes, you _can_ keep files on disk and refer to them
if you want, but then you're somewhat fighting the system and will have
to go through some hoops to have super-fast queries).

As Christian and Dirk said, go give BaseX a try as many of your
questions will be answered in some number of nanoseconds :) In
particular, you can create a database from the GUI -- 12,000 files may
take a few seconds to index, depending on how large they are -- and run
queries directly.

One note on BaseX wth documents - it has an option to delete whitespace
nodes on import, which, inappropriately for documents, is enabled by
default. You'll find it in the Options tab when you make a database
from the GUI, for example.

Liam

-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, SVG WG, XQuery WG
Improving Web Advertising: https://www.w3.org/community/web-adv/
Personal: awesome vintage art: http://www.fromoldbooks.org/


Re: [basex-talk] baseX vs ExistDB

2018-04-18 Thread Liam R. E. Quin
On Wed, 2018-04-18 at 14:39 +0100, Feargal Hogan wrote:
> Hi
> 
> Is anyone aware of any comparisons between baseX and Exist?
> I have some familiarity with Exist and I’d like o understand what are
> the benefits of each.

I don't know of any recent ones that are in-depth, and both products
have changed - eXist  especially i think matured, but you'll be aware
of that end :) - so look carefully at the date on any you find.

What really matters is suitability to task, though, and that will
depend on what you're trying to do. And part of suitability to task is
the support network - are other people doing similar thigns using
eXist-db or BaseX?

Liam


-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, SVG WG, XQuery WG
Improving Web Advertising: https://www.w3.org/community/web-adv/
Personal: awesome vintage art: http://www.fromoldbooks.org/


Re: [basex-talk] SSL support for BaseX REST API

2018-03-14 Thread Liam R. E. Quin
On Wed, 2018-03-14 at 14:18 -0500, Giavanna J Richards wrote:
> I'm trying to determine how to enable SSL communications with the
> BaseX server

I don't know if this helps, but I run BaseX listening only to
"localhost" so that SSL isn't an issue (as a connection to localhost
doesn't normally go over a network), and connect (on the same system)
from PHP or Perl (!) or you can proxy via apache.

If BaseX is running on a different computer, you could also proxy on
the system running BaseX e.g. with apache and .htaccess or the server
conf & mod_rewrite. That way you'd use SSL to get to apache and then an
in-memory connection from there to BaseX.

Liam

-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, SVG WG, XQuery WG
Improving Web Advertising: https://www.w3.org/community/web-adv/
Personal: awesome vintage art: http://www.fromoldbooks.org/


Re: [basex-talk] Dynamic xpath

2018-02-02 Thread Liam R. E. Quin
On Fri, 2018-02-02 at 16:06 +0100, France Baril wrote:
> I'm trying to do something similar to this because I'll have to deal
> with xpaths provided by end users as parameters to a rest query:

I hope you have taken security issues into account, e.g. the ability to
access (and even write to) potentially any local file.

Having said that, eval() might do what you need,
http://docs.basex.org/wiki/XQuery_Module

Liam

-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, SVG WG, XQuery WG

Web slave for http://www.fromoldbooks.org/


Re: [basex-talk] java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder

2017-09-18 Thread Liam R. E. Quin
On Mon, 2017-09-18 at 23:31 +, Kendall Shaw wrote:
> [...] maybe using the oracle JDK would work.

Thank you for such a quick answer - it wasn't what i did, but it helped
me find the problem.

It seems an OS upgrade had upgraded the Java runtime, and that class is
now neither provided nor called... but (and i should have mentioned
this) my BaseX server was still running... i was worried about
restarting the server in case it wouldn't work, until i resolved the
problem, but in fact, restarting the server resolved it.


> Kendall
> 
> On 9/18/17, 4:06 PM, "basex-talk-boun...@mailman.uni-konstanz.de
> on behalf of Liam R. E. Quin" <basex-talk-boun...@mailman.uni-konstan
> z.de on behalf of l...@w3.org> wrote:
> 
> It seems with the latest Java 1.8 -
> java-1.8.0-openjdk-headless-1.8.0.144-0.b01.el7_4.x86_64
> on Centos 7, I can no longer drop a database, any ideas?
> 
> This is with both 8.5.3 and 8.6.6, and also with
> the latest snapshot, BaseX867-20170824.195627.zip
> 
> [[
> $ bin/basexclient -p 1994
> Username: admin
> Password: 
> BaseX 8.6.6 [Client]
> Try 'help' to get more information.
> > open rdf
> Database 'rdf' was opened in 1.61 ms.
> > close
> Database 'rdf' was closed.
> > drop db rdf
> Improper use? Potential bug? Your feedback is welcome:
> Contact: basex-talk@mailman.uni-konstanz.de
> Version: BaseX 8.5.3
> Java: Oracle Corporation, 1.8.0_131
> OS: Linux, amd64
> Stack Trace: 
> java.lang.NoClassDefFoundError: Could not initialize class
> java.nio.file.FileSystems$DefaultFileSystemHolder
>   at
> java.nio.file.FileSystems.getDefault(FileSystems.java:176)
>   at java.nio.file.Paths.get(Paths.java:84)
>   at org.basex.io.IOFile.toPath(IOFile.java:335)
>   at org.basex.io.IOFile.delete(IOFile.java:243)
>   at org.basex.io.IOFile.delete(IOFile.java:240)
>   at org.basex.core.cmd.DropDB.drop(DropDB.java:77)
>   at org.basex.core.cmd.DropDB.run(DropDB.java:46)
>   at org.basex.core.Command.run(Command.java:253)
>   at org.basex.core.Command.execute(Command.java:99)
>   at
> org.basex.server.ClientListener.run(ClientListener.java:136)
> 
> > open rdf
> Database 'rdf' was opened in 1.56 ms.
> > xquery count(//image)
> 7370
> Query executed in 68.2 ms.
> ]]
> 
> 
> 
> -- 
> Liam Quin, W3C, https://urldefense.proofpoint.com/v2/url?u=ht
> tp-
> 3A__www.w3.org_People_Quin_=DwICaQ=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i
> 9ijVXllEdOozc=JgwnBEpN1c-DDmq-
> Up2QMq9rrGyfWK0KtSpT7dxRglA=nQjzwq1-
> tGGjgX2B1ykqHFTvIIAsUM7apaVuSdPVWkk=0s_TznyqfelNVXekmL3_FHwI3_ASwud
> gsnLjh_k5nDM= 
> Staff contact for Verifiable Claims WG, XQuery WG
> 
> Web slave for https://urldefense.proofpoint.com/v2/url?u=http
> -3A__www.fromoldbooks.org_=DwICaQ=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9
> ijVXllEdOozc=JgwnBEpN1c-DDmq-Up2QMq9rrGyfWK0KtSpT7dxRglA=nQjzwq1-
> tGGjgX2B1ykqHFTvIIAsUM7apaVuSdPVWkk=9CUbgtTPIfEhIOd76gJHf8kbpM6WP9l
> LIRPFG7YGLkc= 
> 
> 
> 
> 
-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, SVG WG, XQuery WG

Web slave for http://www.fromoldbooks.org/


[basex-talk] java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder

2017-09-18 Thread Liam R. E. Quin
It seems with the latest Java 1.8 -
java-1.8.0-openjdk-headless-1.8.0.144-0.b01.el7_4.x86_64
on Centos 7, I can no longer drop a database, any ideas?

This is with both 8.5.3 and 8.6.6, and also with
the latest snapshot, BaseX867-20170824.195627.zip

[[
$ bin/basexclient -p 1994
Username: admin
Password: 
BaseX 8.6.6 [Client]
Try 'help' to get more information.
> open rdf
Database 'rdf' was opened in 1.61 ms.
> close
Database 'rdf' was closed.
> drop db rdf
Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.5.3
Java: Oracle Corporation, 1.8.0_131
OS: Linux, amd64
Stack Trace: 
java.lang.NoClassDefFoundError: Could not initialize class 
java.nio.file.FileSystems$DefaultFileSystemHolder
at java.nio.file.FileSystems.getDefault(FileSystems.java:176)
at java.nio.file.Paths.get(Paths.java:84)
at org.basex.io.IOFile.toPath(IOFile.java:335)
at org.basex.io.IOFile.delete(IOFile.java:243)
at org.basex.io.IOFile.delete(IOFile.java:240)
at org.basex.core.cmd.DropDB.drop(DropDB.java:77)
at org.basex.core.cmd.DropDB.run(DropDB.java:46)
at org.basex.core.Command.run(Command.java:253)
at org.basex.core.Command.execute(Command.java:99)
at org.basex.server.ClientListener.run(ClientListener.java:136)

> open rdf
Database 'rdf' was opened in 1.56 ms.
> xquery count(//image)
7370
Query executed in 68.2 ms.
]]



-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, XQuery WG

Web slave for http://www.fromoldbooks.org/


Re: [basex-talk] Old dog trying to learn some new tricks.

2017-08-28 Thread Liam R. E. Quin
On Mon, 2017-08-28 at 16:02 -0500, Dave Day wrote:
> 
[...]
>  What I was hoping to do was to connect to a running BaseX, and
> send 
> the schema definitions that would be used to validate the XML.  In 
> reading doc, I see it is possible to create namespaces and use them,
> but 
> the format has to be either a URL or a URN.  Is it possible to create
> a 
> URN for a namespace on the fly, and then tell the code later on
> which 
> namespace to use?

I think really you are wanting to validate against a particular schema.
This is entirely unrelated to which namespace URI is used to identify
elements in the document. You can have multiple XML Schema Documents
(xsd) for validating XML documents whose elements are "in" the same
namespace.

A namespace in XML is nothing more or less than a URI used as a name.
They don't actually "do" anything :)

http://docs.basex.org/wiki/Validation_Module#XML_Schema_Validation
has an example that might help.

Liam


-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, XQuery WG

Web slave for http://www.fromoldbooks.org/


Re: [basex-talk] json-to-xml and xpath

2017-08-26 Thread Liam R. E. Quin
On Sat, 2017-08-26 at 20:35 +, Hans-Juergen Rennau wrote:
> Of course, Liam, but of course the BaseX format has its escaping
> rules, so that it works *always*. The names become less beautiful
> here and there, but in practise a couple of warped names among
> hundreds of nice and meaningful ones are better than meaningless
> names throughout.

It's subjective  I think. We did have some feedback on the design of
the XML/XDM representation for JSON.
[...]

>I hope the W3C will come up with a meaningful format
> definition, comparable to the BaseX one.
I don't see that happening - the XQuery and XSLT work is essentially
finished except for bug fixes at this point.

It might happen in the expath community group, if people show up to do
it - remember, W3C specs are essentially done by volunteers.

Liam


-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, XQuery WG

Web slave for http://www.fromoldbooks.org/


Re: [basex-talk] json-to-xml and xpath

2017-08-26 Thread Liam R. E. Quin
On Sat, 2017-08-26 at 13:53 +, Hans-Juergen Rennau wrote:
>  I find the W3C-defined format obtained from fn:json-to-xml
> unnatural and unpractical;

It is, but it works in more cases. JSON keys can have values that
aren't possible XML element names, e.g.
{
"1"  : "one",
"2"  : "two",
"*"  : "lots"
}

-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, XQuery WG

Web slave for http://www.fromoldbooks.org/


Re: [basex-talk] How could I reuse sub queries?

2017-08-07 Thread Liam R. E. Quin
On Tue, 2017-08-08 at 09:24 +0800, donaldjohn wrote:
>   is there a way that I can cache the sub query result and
> reuse it somewhere else? I think it will run faster in that way.

You may find it runs at the same speed - BaseX may have noticed hte
common query.

But you can try using,

let $statuslist :=  for $result in
doc("50PatentDividedCreatingClause.xml")/results/result
return
db:open($result/dbName)/business:PatentDocumentAndRelated/@status
return (
 . . . do stuff with $statuslist

 . . . do more stuff with $statuslist

)

> 

Liam

-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, XQuery WG

Web slave for http://www.fromoldbooks.org/


Re: [basex-talk] GUI

2017-08-05 Thread Liam R. E. Quin
On Sat, 2017-08-05 at 16:52 +0100, Thomas Daly wrote:
>  
> 
> a)  It is not possible to log into a BaseX database running on a
> remote
> server from a GUI running on a different machine?

You can probably also do it the other way round, setting DISPLAY and
running BaseX on the server system with its display pointing to your
system, e.g, using an ssh tunnel so you don't need to expose your
system to the raw Internet. See e.g. https://askubuntu.com/questions/20
3173/run-application-on-local-machine-and-show-gui-on-remote-display
for using ssh -X or -Y to the server. You have to arrange for your X
server to listen for remote connections I think.


> b)  Do I need any particular Linux distribution / GUI version to
> run the BaseX Linux GUI?

No. I use Mageia Linux under Gnome 3, but as Christian said, you just
need Java.

Liam

-- 
Liam Quin, W3C, http://www.w3.org/People/Quin/
Staff contact for Verifiable Claims WG, XQuery WG

Web slave for http://www.fromoldbooks.org/


Re: [basex-talk] Somewhat unusual question

2017-02-25 Thread Liam R. E. Quin
On Sat, 2017-02-25 at 10:02 +, Kendall Shaw wrote:
> 
[...]
> The original post was asking for examples of ways that XQuery is a
> good solution for an unknown problem.

Unknown to us at least... yes.

>  Generally, if I found myself think that technology x is the solution
> to every problem, I would proceed as if I am probably wrong about
> that.

:-)
> 
> It’s interesting to me to know what sorts of applications seem like
> they would be a good match for XQuery’s data model but turned out not
> to be in some case.

I agree that's interesting - I'm afraid I have more experience with
what didn't work in SQL and did work with a forest store, probably
because I don't often encounter people whose projects didn't fit with
XML-related technologies. How do we find them?

I do know of people who moved to JSONiq in order to persist JSON
documents alongside XML ones, but XQuery 3.1 might well reduce that
pressure. Or at least that was a large part of our goal in the XML
Query Working Group in developing it, to improve the XML+JSON and JSON-
only support. But JSONiq still has the XQuery support there.

An example of where I probably would not use an XQuery-based system as
my first choice might be where I'm dealing with data coming in from
100,000 sensors at a high rate and I have to store, say, the past
hour's worth of data for simple queries. The tree-based collection
model works best with data that has hierarchy and is, in SQL terms,
semi-structured (i.e. has irregularity). XQuery tends to be a win over
SQL when a lot of the queries involve hierarchy or sequence, but if
those aspects aren't present it's less obviously a win.

Of course, if the sensors are sending EXI-compressed XML parse events,
and the db supports that, an XML db can be a win even there; I heard
from someone who experimented with devices that could send either JSON
or EXI, and he was blown away at the performance improvement he got by
switching to EXI. So you have to measure and experiment, too.

best,

Liam


-- 
Liam R. E. Quin <l...@w3.org>
The World Wide Web Consortium (W3C)


Re: [basex-talk] Somewhat unusual question

2017-02-24 Thread Liam R. E. Quin
On Fri, 2017-02-24 at 18:07 +, Kendall Shaw wrote:
> For example, a program that regulates flow of water in a garden
> sprinkler is probably not a good match for xquery and an xml
> database.
Funnily enough, sensors these days often report results using EXI, and
an embedded XQuery engine might be a fine way to go. People used to use
mxquery for such applications.
 
A lot of tooling choice comes down to warm fuzzies and familiarity.

If you're actually querying XML, then an XML database and XQuery do
make an obvious choice. I find it can help to say it's a "fast forest
store" with XML as an interchange and loading format, to try & prevent
the misunderstanding that the software reads the entire database as
pointy-bracket XML with each query and is glacially slow. I wish we'd
used a forest metaphor for naming XQuery...

Liam



Re: [basex-talk] xquery result limit

2016-11-11 Thread Liam R. E. Quin
On Fri, 2016-11-11 at 19:01 +0200, George Sofianos wrote:
> So If I get it right, when I use [position() = 1 to 100], only the
> first 100 results are calculated? or all 900.000 rows are calculated,
> and I get the first 100 results? (imagine it is a complex query)

Note that an order by clause would force everything to be created &
sorted in any case.

Liam



Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-11 Thread Liam R. E. Quin
Hans-Jürgen, wrote:
! Already the first
> two characters 
>     (?render the expression invalid:(1) An unescaped ? is an
> occurrence indicator, making the preceding entity optional(2) An
> unescaped ( is used for grouping, it does not repesent anything
> => there is no entity preceding the ? which the ? could make optional
> => error

Actually (?:  ) is a non-capturing group, defined in XPath 3.0 and
XQuery 3.0, based on the same syntax in other languages.

This extension, like a number of others, is useful because the
expression syntax defined by XSD doesn't make use of capturing groups
(there's no \1 or $1 or whatever), and so it doesn't need non-capturing 
groups, but in XPath and XQuery they are used.

See e.g. https://www.w3.org/TR/xpath-functions-30/#regex-syntax

Liam


-- 
Liam R. E. Quin <l...@w3.org>
The World Wide Web Consortium (W3C)


Re: [basex-talk] Improving performance in a 40GB database

2016-07-06 Thread Liam R. E. Quin
On Wed, 2016-07-06 at 16:02 +0100, James Sears wrote:
> 
> let $void := prof:void(for $book in //book

Are you able to make that //book more specific? or can book elements
occur at any level? BaseX is possibly fetching every element node in
the database to see if it's an element of type book or not.

or can you look at the generated query plan in more detail?

Liam
      
-- 
Liam R. E. Quin <l...@w3.org>
The World Wide Web Consortium (W3C)


Re: [basex-talk] Export database to SQL?

2016-03-02 Thread Liam R. E. Quin
On Wed, 2016-03-02 at 16:24 +, Bhander, Gurbakhash S. wrote:
> Hi,
> 
> Is there any way to export data from the database to other databases?

There are several ways. But note that there's not a 1:1 mapping between
relational data and XML, so you first have to choose how you will
represent your information in the relational database.

After that, if it's simple unstructured values (e.g. no mixed content)
you might write out CSV as text using XQuery, or use the BaseX SQL
Module [1] to connect directly to the other database if it supports
JDBC.

Liam

[1] http://docs.basex.org/wiki/SQL_Module


-- 
Liam R. E. Quin <l...@w3.org>
The World Wide Web Consortium (W3C)



Re: [basex-talk] XProc 2 ...

2016-02-07 Thread Liam R. E. Quin
On Fri, 2016-02-05 at 09:46 +0100, Marco Lettere wrote:
> Hi all,
> sorry for the small OT but this sounds really interesting:
> 
> http://lists.xml.org/archives/xml-dev/201602/msg1.html

I hope not too off-topic; maybe I should have posted about it here too.

Thanks!

Liam

-- 
Liam R. E. Quin <l...@w3.org>
The World Wide Web Consortium (W3C)



Re: [basex-talk] Returning text matches

2015-12-10 Thread Liam R. E. Quin
On Thu, 2015-12-10 at 17:24 +0100, Christian Grün wrote:
> Hi Ron,
> 
> You can use ft:mark and ft:extract to highlights matches in a
> full-text result [1].

And what happens if a full text match crosses an element boundary, e.g.
a search for "blue socks" matching,
He wore dark blue socks that day.
could not return,
He wore dark blue socks that day.

(Yes, I should test it, sorry! but the docs should probably mention it.
it was a big part of the XPath/XQuery Full Text design early on)

Liam

-- 
Liam R. E. Quin <l...@w3.org>
The World Wide Web Consortium (W3C)



Re: [basex-talk] Optimizing Element Access By Attribute Value Matching

2015-04-13 Thread Liam R. E. Quin
On Mon, 2015-04-13 at 12:38 -0500, Eliot Kimber wrote:

 For large repositories an
 XQuery like
 //*[contains(@class, ' topic/topic ')] is going to be quite slow

I took this use case to the XQuery  XSLT Working Groups a year or two 
ago (Jirka added the DITA case - I was thinking of (X)HTML) and the 
result was contains-token() which might be easier for the database to 
optimize.

Judging by comments submitted against my awful tests for it :) I think 
BaseX may well support it already.

Liam



Re: [basex-talk] xpointer

2015-04-08 Thread Liam R. E. Quin
On Wed, 2015-04-08 at 10:37 +0200, Jérôme Chauveau wrote:

 Unfortunatly, the id notation fails...

 xi:include href=plop.xml xpointer=element(myId)/

Are you validating against a DTD? If not, the ID-ness property won't 
be set...
It might work if the document uses xml:id without DTD validation.

Liam




Re: [basex-talk] Apply variable argument list to anonymous function

2014-08-14 Thread Liam R E Quin
On Thu, 14 Aug 2014 13:42:24 +0200
Marc van Grootel marc.van.groo...@gmail.com wrote:

 I will try to come up with a simple but representative example this evening.

Note, if you are proposing a change to the XQuery languuage itself, the XQuery 
WG and the XSLT WG are aware of limitations in our syntax and the fact you 
can't call a function with a signature like fn:concat() using a function item; 
a concrete proposal would for sure be discussed by the Working Groups when they 
next meet jointly in September, although it is almost certainly too late for 
XQuery 3.1. The way to get it considered is to file a bugillza issue/bug; see 
www.w3.org/XML/Query for pointers to doing that.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
W3C staff participant for XQuery
Pictures from old books: http://fromoldbooks.org/


Re: [basex-talk] Confusing error using GROUP BY

2014-03-22 Thread Liam R E Quin
On Sat, 2014-03-22 at 20:44 -0500, Charles Duffy wrote:

I'd guess:

 let $speaker := $line/../SPEAKER/text()[1]

A list of all text nodes that are the fist child of SPEAKER elements?

Try ($line/../SPEAKER/text())[1] if you want the first text node in the
list.

Maybe you have a SPEAKER element with comments, processing instructions
or other elements inside, or maybe the container of $line has more than
one SPEAKER element sometimes.

Yes,
SPEECH
SPEAKERCORNELIUS/SPEAKER
SPEAKERVOLTIMAND/SPEAKER
LINEIn that and all things will we show our duty./LINE
/SPEECH



-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] BaseX for noobs: C++ API ?

2014-01-06 Thread Liam R E Quin
On Mon, 2014-01-06 at 21:15 +0100, jean-marc Mercier wrote:
 Hello,
 
 I still can't connect to BaseX. I checked that I sent to the server exactly
 the same bytes than the C# connector that worked on my configuration.
 
 Is there any way to make the BaseX server echoing any input request on a
 particular socket to check exactly what my TCP connector is sending ?
I do have an old C program I wrote years ago that saves a copy of data
sent on a socket into a file; I wrote it for Unix but maybe it'd work
for you. There are some packages for Linux that do soemthing similar
too.

Make sure that you check the return value of every system call you use -
writing to a socket does not necessarily succeed. Make sure also that
your code connects to the right IP address and port - e.g. try
connecting to something like a Web server where you can see the logs,
and sending GET / HTTP/1.0\r\nAccept: */*\r\n\r\n at it. Forgetting to
convert addresses and port numbers to network order is sometimes an
issue.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] insert cause db error, often.

2014-01-02 Thread Liam R E Quin
On Fri, 2014-01-03 at 14:47 +0800, easy wrote:
  I lost faith almost.   do insert doc continuously ,query often cause :

[...]

How are you starting BaseX exactly?

How large is your database and how much memory are you giving the Java
Virtual machine? Try giving it a lot more memory, to see if it's an
out-of-memory error or a bug...

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] thesaurus

2013-12-25 Thread Liam R E Quin
On Tue, 2013-12-24 at 21:52 +0200, Xavier-Laurent SALVADOR wrote:
 Hi,
 waiting for Christmas, i was playing with Basex.
 I had no problem for using a short Thesaurus i built a few days ago.
 But when i tried to use the extended one (26Mo), i get this error message.

I see Christian has already replied, but note too that you might need to
give BaseX (or the JVM, rather) more memory.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] Serializing unescaped xml to string

2013-12-04 Thread Liam R E Quin
On Wed, 2013-12-04 at 19:59 -0800, Joe Templeman wrote:
 Hi all,
 
 Is there a way to output unescaped XML to a string for debugging purposes?
 In unittests we would like to output the XML returned when a test fails,
 here is my example:
[...]
 ||   Got:  || fn:serialize($result))
   return $u
 };
 
 But when a test fails, I get escaped XML as the output. Is there any way to
 get  this as actual XML?

( Input: , $case/input/text(),
Expected:  , $case/output,
  Got:, $result)
||   Got:  || 
 as a sequence...

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] CDATA in xquery

2013-12-02 Thread Liam R E Quin
On Mon, 2013-12-02 at 15:29 -0500, Erol Akarsu wrote:
 I would like to generate CDATA of one xmk construct like this.
[...]
 let $allfs := record
   name{$name}/name
   features
 ![CDATA[
 h3Features:/h3
 br/
 ul
   li{$f1}/li
   li{$f2}/li
   li{$f3}/li
 /ul
 ]]
   /features
   /record

Watch out that if $f1 (say) contains the string ]] you'd be liable to a
cdata injection attack.

Having said that, no, I'd probably write an e() function,
declare function my:e($name as xs:string, $content as xs:string)
as xs:string
{
  return concat(, $name, , $content, /, $name, )
}

and use my:e(ul,
  concat(my:e(li, $f1),
 my:e(li, $f2),
 my:e(li, $f3))
and so on. Which gives you slightly more checking.

The implementation will probably generate lt; and gt; rather than
CDATA; XML says they're equivalent.

In XQuery 3 there's a per-element serialization option for CDATA
sections,
http://www.w3.org/TR/xslt-xquery-serialization-30/#XML_CDATA-SECTION-ELEMENTS
so if BaseX implements that you have a way to get closer to what you
want, perhaps, e.g. for generating RSS. However, you'd still need to
construct a string containing h1 etc.

Liam


-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] Referential Queries

2013-05-14 Thread Liam R E Quin
[I think this thread is getting further away from BaseX, and might
belong on query-talk instead, but on the other hand the use of XQuery as
a back-end for Web Apps is definitely on the increase]

On Tue, 2013-05-14 at 11:14 -0600, James Wright wrote:
 Hello Again,
 If this is the wrong forum for these type of questions let me know. By
 the way Liam I picked up your book last night, I like the flavor as it
 differs from my other reads such as those from Kay. Although I have
 been using XML for years and understand the core concepts it should be
 a great refresher. 

Thanks, I wrote the boring chapters :-)

 The Organizational Overall Problem:
  There aren't many people in my industry that use XQuery and xml in
 the way it was intended (IMHO). In fact most developers in my
 organization are rather uneducated in it and as you know there is some
 un-rational backlash as many correlate XML to the DOM and XPath/XSLT
 1.0 and as a competitor to JSON which is ludicrous. 

You're right, it's crazy and unfortunate.

XML was originally designed as an interoperable way to put SGML
technical documentation on the Web in Netscape plugins!

 The DOM has its issues of scale-ability which our products are
 currently running into. This isn't really xml's or the DOM's problem
 but simply poor implementation. As you know though, all that matters
 is perception.

If it helps, XQuery, Xpath 2 and later, XSLT 2 and later, are not
DOM-based, but have an abstract data model, and are designed with
performance very much in mind.
 [...]


  we are stuck with a .NET XSLT 1.0 processor. 

There's at least two .net-based XSLT 2 processors, and another in
development. But I think that's maybe off-topic for this list ;)

 We have two primary use cases:
 1) as a local db to replace the context DOM for our 'documents' which
 in our case relates to Utilitiy GIS Designs of circuit, subdivisions,
 fiber, etc... I am thinking BaseX coupled with RestXQ could replace
 our DOM for local installs and allow ourselves to decouple from the
 Geodatabase and provide a browser based UI.

Yes, that will likely make sense.

 2) as a service for hosting uploading and allowing users/delivery and
 support to view, query and modify complex sets of interrelated XML
 configuration files. Some of our applications have hundreds. Again all
 these documents follow a similar semantics however their is no defined
 schema for any of them.

You might want to look at W3C SML as a way of orchestrating validation
for configuration management.

 I think we can accomplish both of the above tasks using a single
 codebase and restXQ

it's likely although obviously you'll want separate database instances.
Note also that there are size/performance issues with BaseX today if you
have a lot of data - a lot is subjective but if it's multiple
terabytes you'll probably need multiple database instances. The good
news is that it's relatively easy to move to different XQuery engines if
needed, and also that BaseX keeps improving so you might well not need
to move :-) I do know of people with petabyte XQuery databases.

 I have written an XQuery expression which using our 'common' xml
 semantics can ascertain entities/properties/relationships and distill
 this in the form of metadata which then using RestXQ is distilled into
 a metadata driven api for manipulated data centric xml documents.

This pattern is rather like creating a persistent view in SQL.

 [...]

 The XML/BaseX Question:We need to be able to query effectively across
 relationships. Are there any facilities in XML/XQuery/3rd party that
 do this? I was hoping Id and IDref could accomplish this but as you
 stated that  is not the case

ID and IDREF support is almost certainly irrelevant here.

given $doc1 with student sn=3016nameSimon/name/student and
$doc2 with courseenrolled3016/enrolled
you can easily do
  for $student in $doc1//student
  return $doc2//course[enrolled = $student/@sn]
to get a list of courses with students from $doc1.

 Basically I won't know before hand whether a child node is inline or a
 reference but would like to be able to query both as those they were
 the same..
 If I had two nodes (I will call them parents) both with the same
 child. One has the actual inline child node and the other has just a
 reference to it. Lets say the child's attribute name is 'Tom'. I would
 like to be able to query like this and return both parents nodes:
 //parents[child/@name = 'Tom']

Write a function to do it.
declare function local:get-children($input as element(*)) as element(*)*
{
  for $child in $input/node()
  return
if (local:isreferences($child))
then local:getreference($child)
else $child
}

and maybe
declare function local:get-reference($input as element(*))
  as element(*)*
{
  return /documents[@id eq $input/@doc]//*[@id eq $input/ref]
}

 Anything? If not, is this a weird use case? I wouldn't think so.

I haven't encountered it, but RDF people often want something similar.

After this 

Re: [basex-talk] Question regarding BaseX support for id and idref

2013-05-13 Thread Liam R E Quin
On Mon, 2013-05-13 at 19:15 -0600, James Wright wrote:
 [...] I know the xml standard defines id and idref however I have not
 been able to find any documentation on these in BaseX or XML in
 general. For example does BaseX handle idiosyncrasies of id and idref
 or must I handle these in my queries? For example:
 let $context := item id=10

Note that an XML ID must be an identifier, so must start with a name
start character (a letter)...

 let $context := item id=10
  child idref=2 /itemitem id=20  child idref=2 /  
 child idref=1 //itemitem id=30 /
 child id=1 name=firstChild /child id=2 name=secondChild /
 for $itemWithFirstChild in $context/item[child/@name = 'firstChild']return 
 $itemWithFirstChild/@id

I think this got garbled somewhere - e.g. there aren't enough end tags.
Make sure you turn off HTML formatting in your email program.

 I would like this to return 10 20Now this example uses a dynamic
 context node however in the application these would exist as nodes in
 the database...
 If it does fully support this how do I enable it or get it to work?

One way to get started with BaseX would be to use the basexgui program
in the bin directory to create a database (there's also a command-line
program to do it; I use the Perl and PHP APIs too on
www.fromoldbooks.org).

People usually do joins by value in XML and XQuery; ID/IDREF only work
with older DTD technology, or with xml:id; I don't know if BaseX has
support for them. But if it does the values must be IDs :)

So you're on the right lines.

  Also what about XLink?

It's not used very much. As a standard it solved the wrong problems,
unfortunately. Easy for me to say in hindsight.

The XQuery Use Cases on www.w3.org/TR may be helpful; there's also a
chapter on XQuery using BaseX for examples in a book I co-authored last
year, Beginning XML 5th edition.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


[basex-talk] full text with search term in a variable gives no matches

2013-05-12 Thread Liam R E Quin
The following query gives me no results:

for $city as xs:string in (Paris, Cambridge, London, Oxford)
return (
  $city, 
  count(/dictionary/letter/entry[.//p//text() contains text {$city}]),
  #xa;
)

However, BaseX rewrites it to

for $city as xs:string in (Paris, Cambridge, London, Oxford, ...)
return ($city, fn:count(db:fulltext(with-sources, { $city 
})/ancestor::*:p/ancestor::*:entry), 
)

If I remove the braces around $city in the argument to db:fulltext() I
get the right answers (or at least plausible answers).

So this looks like a bug in the rewriting engine.

Liam


-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] whitespace around comments

2013-04-13 Thread Liam R E Quin
On Fri, 2013-04-05 at 11:31 +0200, Dirk Kirsten wrote:

 So if you could point out some details as why this is not conforming
 behaviour, this would be interesting.

It's a requirement in the XML Spec that the XML parser pass all
whitespace back to the application. Some whitespace may be marked as not
significant - that is only possible if there's a DTD and the space is in
a context where only elements would be valid, not #PCDATA. There's no
formal specification, although constructing an XDM instance from an
infoset, and constructing an infoset from XML, does not entail
discarding these spaces:
Chopping internal whitespace nodes in mixed content contexts is not
sanctioned by any version of any XML specification, with any setting of
xml:space. I think the onus would be on you to justify the non-standard
behaviour.

On the other hand I can see its uses too. But I don't want it, and
always turn it off with BaseX :-)

Best,

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] whitespace around comments

2013-04-12 Thread Liam R E Quin
On Fri, 2013-04-05 at 11:15 +0200, Michael Piotrowski wrote:
 On 2013-04-05, Michael Seiferle m...@basex.org wrote:
 chopping certainly *does* change the
 semantics--that's precisely why I've argued before that it shouldn't be
 on by default.

Agreed, but Christian has already said it will be off by default in the
next release.

I have seen a commercial SGML formatter that had a similar behaviour
used for aircraft manuals, where there was actually a possibility of
lives lost and unlimited civil damage liability as a result of numbers
run together, but I failed to get the people in charge to understand why
it made a difference.

  (and
 BaseX doesn't honor xml:space either).
The latest snapshot does.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] Config file and data dir (with BaseXServer in Java)

2013-03-25 Thread Liam R E Quin
On Mon, 2013-03-25 at 10:18 +0100, Christian Grün wrote:
 [...] or place a .basex file in the
 directory you are starting BaseX from (i.e., in the “current working
 directory”).
Use extreme caution if you do this.  Ideally the two files - ~/.basex
and .basex - would be merged, both would be used.

The caution is that if you upgrade to a newer BaseX version, presumably
in a new directory, your password and port will get reset to defaults,
opening up your site for remote access! The default from a security
perspective should obviously be that BaseX listens only on localhost
and no other network interface, or doesn't listen to any port at all,
without a configuration, but I don't think it is shipped that way.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] How to update db online without closing application ?

2013-03-15 Thread Liam R E Quin
On Fri, 2013-03-15 at 11:12 +0530, Pratik Kharwade wrote:
[...]
 While updating, the our application doesnt response till the updates are
 done.
 The user can't interact with the application.

Some initial thoughts:

The update should be done by a separate process or thread than the user
interface code.

Update one document at a time: load a complete document to a temporary
local file before telling BaseX to import it.

An alternative is to work with two databases, updating the non-live one
and then switching over.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] re-sort database

2013-03-13 Thread Liam R E Quin
On Wed, 2013-03-13 at 22:29 +0100, Christian Grün wrote:
 Hi Cerstin,
 [...]

  You could try to export your data and create a new
 database without updatable index structures; this could also speed up
 your updates. Maybe it even allows you to update all nodes in a single
 run.

  I already set VM=-Xmx1024m and I use BaseX 7.6.1 Beta from February 14 on a 
  MacBook Air with a 2 GHz processor and 8 GB RAM.

I'd try using VM=-Xmx6000m if you have 8G of RAM.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] Debugger for Xuery

2013-03-08 Thread Liam R E Quin
On Wed, 2013-03-06 at 14:27 -0800, bhupesh patel wrote:
 Hello All,
 
 I am looking for any free/paid version of debugger for XQuery while
 using BaseX server. Can anyone point me to any resources you guys know
 of?

There's some XQuery support in Oxygen, I'm told, although I haven't
tried it; it uses Saxon rather than BaseX, though. I think StylusStudio
may have something too.

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] RestXQ parameter matching

2013-02-27 Thread Liam R E Quin
On Wed, 2013-02-27 at 10:58 -0500, Wendell Piez wrote:

 Because it matches on substrings not path segments, Apache Cocoon can
 do this and indeed it's very powerful.

This is an area where I once tried to push for some standardization, and
I still think defining a portable mechanism to map between URI-space and
queries (or XProc pipelines) would be a step forward.

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] Address already in use

2013-02-26 Thread Liam R E Quin
On Tue, 2013-02-26 at 18:13 +0100, Ludovic Kuty wrote:
 Did you try a netstat -nltp as root under Linux to see if something
 is listening on the port ?

or (as root) lsof -i TCP:8984
which will list the process, if any.

Also make sure that you're contacting the right host :) BaseX has a
habit of moving the configuration files around between releases, so
sometimes the client isn't using the same conf file as the server.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] Not enough space

2013-02-22 Thread Liam R E Quin
On Fri, 2013-02-22 at 11:19 +0100, Christian Grün wrote:
 ..done (I went for Not enough pixels, as the text won’t be
 completely readable if the visible area is indeed too small).

Yes, that's a big improvement, thank you! :-)

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] Not enough space

2013-02-19 Thread Liam R E Quin
On Tue, 2013-02-19 at 19:55 +0100, Christian Grün wrote:
 Hi Shannon,
 
  Hi, whenever I choose to show the tree visualization, after a very long
  delay, Not enough space is displayed in that space--it's a small database
 
 the message simply indicates that there are too less pixels in your
 tree view to visualize all nodes of your document.

Suggest, Not enough room in window to display entire tree

or,

(Tree view does not fit in window)

or... scroll bars or panning :-)

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] problem keeping basexserver as an alive service

2013-02-17 Thread Liam R E Quin
On Mon, 2013-02-18 at 00:52 +, Richard Alexander Castro Mamani
wrote:
 Hello,
 
 I am having problems keeping the basexserver service alive,

[...]

 richard@0113:~$ basexserver
 BaseX 7.0.2 [Server]
 Server was started.
 
 Could you tell me what I am doing wrong?.

What you are doing wrong is not telling us the error message or
circumstance that causes it to stop running.

At the very least, run it in the background,
richard@0113:~$ basexserver  bx.log 21 
and then give us the content of the log file.

If you run the program the way you are doing it, and you get
disconnected from the server (e.g. you log out) then yes, the server
will stop.

Liam


-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] Performance with fulltext searches

2013-01-23 Thread Liam R E Quin

 count(/log/logentry[
   ./paths/path/text() contains text trunk
   and not(
 (./paths/path/text() contains text tags) or
 (./paths/path/text() contains text branches)
   )
 ])

what about

let $trunk := /log/logentry[
 paths/path/text() contains text trunk],
 $tags := /log/logentry[
 paths/path/text() contains text tags],
 $branches := /log/logentry[
 paths/path/text() contains text branches]

return $trunk except ($tags|$branches)



-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] Query slow via PHP

2013-01-10 Thread Liam R E Quin
On Thu, 2013-01-10 at 16:13 +, Mayer, Jonathan wrote:

 Running:
 
 XQUERY db:open('project', 'Links.xml')
 
 From the command line displays the file in 4.5 seconds

Is this on a 1MHz 286? :D

How many megabytes is the file?

The usual trick is to put the document in the database so you don't need
to read it off disk each time.

http://www.fromoldbooks.org/ connects to a local basex server for the
keyword tag cloud and it's fast enough that I have not bothered to use
memcached to speed it up. The XML file is a little under 6 megabytes,
though, so not very large. Here's the code I use. The element and
attribute indexes are properly titillated, er, turned on, but I have not
measured whether that makes a difference.


include(Search/BaseXClient.php);
try {
  $session = new Session(localhost, 1994, socks, black);
  $session-execute(open rdf);
  $session-execute(set SERIALIZER
 method=xhtml,omit-xml-declaration=yes,indent=no);
  echo $session-execute('xquery
  div class=tagcloud{
  let $kw := /cache/kwlist/kw,
  $max := math:sqrt(max(
for $e in $kw return xs:integer($e/@count)
  ))
  return
  for $i in /cache/kwlist/kw
  let $freq :=
  xs:integer(10 * math:sqrt($i/@count) div $max)
  where xs:integer($i/@count) ge 10
  order by $i/@what
  return (
a
  class=tag freq{$freq}
  title=items: {$i/@count}
  href=/Search/?kw={translate($i/@what,  ,
+)};fp=1
  
{replace(xs:string($i/@what), ^letter([a-z])$, letter
$1)}
/a,
#xa;
  )
}/div
  ');
  // close session
  $session-close();

} catch (Exception $e) {
  // print exception
  print $e-getMessage();
}


-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] how to pass raw bytes intact?

2012-12-31 Thread Liam R E Quin
On Tue, 2013-01-01 at 10:52 +0800, jida...@jidanni.org wrote:

 I'm just trying to find a way to remove the wbr/ injected here,
 $ echo 'A你好/A'|perl -pwle 's![^[:ascii:]]!$wbr/!'|qprint -e
 A=E4wbr/=BD=A0=E5=A5=BD/A

I don't have a qprint command on my system, so I'm not sure what's going
on for you here. Your perl substitution is putting wbr/ after the
first non-ascii character on the line, and 你 is for sure not an ascii
character, so you get wbr/ after it.

Are you trying to do MIME octet-level encoding of UTF-8 here?

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


[basex-talk] basex SERVERHOST

2012-12-21 Thread Liam R E Quin
How do I use SERVERHOST in ~/.basex (or elsewhere) to get basex to
listen on the given PORT only on localhost (127.0.0.1) and not on
externally-visible IP addresses?

Setting SERVERHOST and HOST to localhost does not seem to accomplish
this.

Thanks!

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] basex SERVERHOST

2012-12-21 Thread Liam R E Quin
On Fri, 2012-12-21 at 21:54 -0500, Liam R E Quin wrote:
 How do I use SERVERHOST in ~/.basex (or elsewhere) to get basex to
 listen on the given PORT only on localhost (127.0.0.1) and not on
 externally-visible IP addresses?
 
 Setting SERVERHOST and HOST to localhost does not seem to accomplish
 this.

To answer my own question, it's the .basex file in the BaseX
distribution directory that is read, not the one in the user's home
directory.

There's also an attmept to read .basex in the directory from which the
basex server (or client) was run, but that's not in general the home
directory.

 
 Thanks!
 

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] Perl BaseX API questions

2012-12-13 Thread Liam R E Quin
On Thu, 2012-12-13 at 18:30 +0100, Christian Grün wrote:
 Hi Liam,
 
  (1) The sample perl files and the perl module implementing the API
  mention a readme file that explains the methods; does anyone know where
  one might find such a readme file? It might answer my other question...
 
 The quoted readme.txt file is currently found in the parent directory,
 which lists all other client APIs [1].

Thank you, this led me to http://docs.basex.org/wiki/Server_Protocol
which at least gave a list of methods and what they are expected to do.

  (2) if there's an error in my query and I catch the exception, the
  message I get is about how many milliseconds the query took, not the
  actual error. I can enclose my Perl script, but I'm wondering if others
  have seen this?
 
 In Perl, the error message can be retrieved by enclosing the query
 with eval {...} and printing $@ if an error occurs. This is eg shown
 in Example.pl. If this does not help, you are invited to send us an
 example that demonstrates what goes wrong.

Enclosed. When I run this the error I get is always:
Database 'rdf' was opened in 0.6 ms.

If I use the commandline BsaeX program on the same query I get this
instead (for this same deliberately malformed query):
$ ~/packages/basex/BaseX7.3/bin/basexclient 
Username: admin
Password: 
BaseX 7.3 [Client]
Try help to get more information.

 open rdf
Database 'rdf' was opened in 0.65 ms.
 run /home/liam/tmp/error.xq
Stopped at line 1, column 9 in /home/liam/tmp/error.xq:
[XPST0003] Unexpected end of query: 'with syntax err...'.
 

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml
Co-author, 5th edition of Beginning XML, Wrox, 2012


error-message.pl
Description: Perl program
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


Re: [basex-talk] Perl BaseX API questions

2012-12-13 Thread Liam R E Quin
On Thu, 2012-12-13 at 23:11 +0100, Christian Grün wrote:
 Liam,
 
 the bug has been fixed [1]; feel free to check out the latest version [2].

Thank you! Confirmed as fixed.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml
Co-author: 5th edition of Beginning XML, Wrox, 2012

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


  1   2   >