Re: [basex-talk] Calling a Java function with varargs parameter from XQuery

2023-07-02 Thread Lizzi, Vincent
Hi Christian,

Thanks for the advice. I've tried a few variations based on your example. There 
seems to be a problem passing the Java array value to the Java function that 
has the varargs parameter.

I tried constructing the list in this manner, which produces the expected list.

declare namespace Language = "java:com.github.pemistahl.lingua.api.Language";
declare namespace List = 'java:java.util.List';
List:toArray(List:of(
  Language:ENGLISH(),
  Language:GERMAN()
)) ! (., string())

When I try passing the list to the Java constructor:

declare namespace Builder = 
"java:com.github.pemistahl.lingua.api.LanguageDetectorBuilder";
declare namespace Language = "java:com.github.pemistahl.lingua.api.Language";
declare namespace List = 'java:java.util.List';
Builder:fromLanguages·com.github.pemistahl.lingua.api.Language...(
List:toArray(List:of(
  Language:ENGLISH(),
  Language:GERMAN()
)))

I get this error:

[XPTY0004] 
com.github.pemistahl.lingua.api.LanguageDetectorBuilder:fromLanguages(com.github.pemistahl.lingua.api.Language[])
 expected, () found.

When I try constructing the list first and then passing the value using 
ArrayList:

declare namespace Builder = 
"java:com.github.pemistahl.lingua.api.LanguageDetectorBuilder";
declare namespace Language = "java:com.github.pemistahl.lingua.api.Language";
declare namespace list = 'java.util.ArrayList';
let $list := list:new() where list:add($list, Language:ENGLISH()) where 
list:add($list, Language:GERMAN())
let $array := list:toArray($list)
return Builder:fromLanguages·com.github.pemistahl.lingua.api.Language...($array)

I get this error:

[XPTY0004] 
com.github.pemistahl.lingua.api.LanguageDetectorBuilder:fromLanguages(com.github.pemistahl.lingua.api.Language[])
 expected, (VarRef) found.

I've tried other variations but each time get a slightly different error 
message, for such as:

[XPTY0004] 
com.github.pemistahl.lingua.api.LanguageDetectorBuilder:fromLanguages(com.github.pemistahl.lingua.api.Language[])
 expected, (GFLWOR) found.

Do you have any suggestions?

Thank you,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com


Information Classification: General
-Original Message-
From: Christian Grün 
Sent: Saturday, July 1, 2023 2:29 AM
To: Lizzi, Vincent 
Cc: BaseX 
Subject: Re: [basex-talk] Calling a Java function with varargs parameter from 
XQuery

Hi Vincent,

In general, varargs parameters can be invoked similarly as functions with 
arrays, but it can be tricky to prepare the arguments in a way that they will 
be passed on as arrays. An explicit array conversion may help. This Java code…

ArrayList list = new ArrayList<>(); list.add("a"); list.add("b"); 
System.out.println(String.join("/", list.toArray(String[]::new)));

…can be written as:

declare namespace list = 'java.util.ArrayList'; let $list := list:new() where 
list:add($list, 'a') where list:add($list, 'b') let $array := 
list:toArray($list)
(: with types: "join·CharSequence·CharSequence..." :) return 
Q{java.lang.String}join('/', $array)

When the code gets more complex, it’s still more convenient to write an 
additional Java wrapper class.

Hope this helps,
Christian


Re: [basex-talk] Calling a Java function with varargs parameter from XQuery

2023-06-30 Thread Lizzi, Vincent
Hello again,

I've also tried a variation with the parameter's type declared in the way 
described in the Java Bindings wiki page. Here is the code:

declare namespace Builder = 
"java:com.github.pemistahl.lingua.api.LanguageDetectorBuilder";
declare namespace Language = "java:com.github.pemistahl.lingua.api.Language";

Builder:fromLanguages*com.github.pemistahl.lingua.api.Language...(
  (
  Language:ENGLISH(),
  Language:DUTCH(),
  Language:GERMAN(),
  Language:SPANISH(),
  Language:FRENCH()
  )
)

But this still produces an error message:

[XPTY0004] 
com.github.pemistahl.lingua.api.LanguageDetectorBuilder:fromLanguages(com.github.pemistahl.lingua.api.Language[])
 expected, (List) found.

How should the parameter's list of values be written?

Thanks,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>



Information Classification: General
From: Lizzi, Vincent
Sent: Thursday, June 29, 2023 7:55 PM
To: BaseX 
Subject: Calling a Java function with varargs parameter from XQuery

Hello BaseX people,

The Java language bindings in BaseX look like a very good way to use Java 
libraries from XQuery. I'm currently trying to use Lingua 
(https://github.com/pemistahl/lingua) with BaseX, and mostly have it working, 
but I've not been able to figure out how to call a Java function that takes a 
vararg (variable arguments) parameter.

Here is a bit of XQuery code that shows the problem. I'm using BaseX version 
10.6 and the jar for Lingua version 1.2.2 has been added to the classpath. This 
query should return a LanguageDetectorBuilder object.


declare namespace Builder = 
"java:com.github.pemistahl.lingua.api.LanguageDetectorBuilder";
declare namespace Detector = 
"java:com.github.pemistahl.lingua.api.LanguageDetector";
declare namespace Language = "java:com.github.pemistahl.lingua.api.Language";

let $builder :=
  Builder:fromLanguages(
[
  Language:ENGLISH(),
  Language:DUTCH(),
  Language:GERMAN(),
  Language:SPANISH(),
  Language:FRENCH()
]
  )
return $builder

The above code produces this error message in the BaseX GUI:

[XPTY0004] 
com.github.pemistahl.lingua.api.LanguageDetectorBuilder:fromLanguages(com.github.pemistahl.lingua.api.Language[])
 expected, (CArray) found.

If I try making the parameter a sequence instead of an array, replacing () 
parenthesis for the [] brackets, then this is the error message:

[XPTY0004] 
com.github.pemistahl.lingua.api.LanguageDetectorBuilder:fromLanguages(com.github.pemistahl.lingua.api.Language[])
 expected, (List) found.

I've read through the Java Bindings page in the wiki and tried to rewrite the 
parameter in different ways, but so far have only gotten different error 
messages.

The signature for the fromLanguages function can be seen at 
https://github.com/pemistahl/lingua/blob/main/src/main/kotlin/com/github/pemistahl/lingua/api/LanguageDetectorBuilder.kt#L165


Is there a way to make this work?

Thanks,
Vincent

__
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
530 Walnut St., Suite 850, Philadelphia, PA 19106
E-Mail: 
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>
Web: www.tandfonline.com<http://www.tandfonline.com>

Taylor & Francis is a trading name of Informa UK Limited,
registered in England under no. 1072954

"Everything should be made as simple as possible, but not simpler."



[basex-talk] Calling a Java function with varargs parameter from XQuery

2023-06-29 Thread Lizzi, Vincent
Hello BaseX people,

The Java language bindings in BaseX look like a very good way to use Java 
libraries from XQuery. I'm currently trying to use Lingua 
(https://github.com/pemistahl/lingua) with BaseX, and mostly have it working, 
but I've not been able to figure out how to call a Java function that takes a 
vararg (variable arguments) parameter.

Here is a bit of XQuery code that shows the problem. I'm using BaseX version 
10.6 and the jar for Lingua version 1.2.2 has been added to the classpath. This 
query should return a LanguageDetectorBuilder object.


declare namespace Builder = 
"java:com.github.pemistahl.lingua.api.LanguageDetectorBuilder";
declare namespace Detector = 
"java:com.github.pemistahl.lingua.api.LanguageDetector";
declare namespace Language = "java:com.github.pemistahl.lingua.api.Language";

let $builder :=
  Builder:fromLanguages(
[
  Language:ENGLISH(),
  Language:DUTCH(),
  Language:GERMAN(),
  Language:SPANISH(),
  Language:FRENCH()
]
  )
return $builder

The above code produces this error message in the BaseX GUI:

[XPTY0004] 
com.github.pemistahl.lingua.api.LanguageDetectorBuilder:fromLanguages(com.github.pemistahl.lingua.api.Language[])
 expected, (CArray) found.

If I try making the parameter a sequence instead of an array, replacing () 
parenthesis for the [] brackets, then this is the error message:

[XPTY0004] 
com.github.pemistahl.lingua.api.LanguageDetectorBuilder:fromLanguages(com.github.pemistahl.lingua.api.Language[])
 expected, (List) found.

I've read through the Java Bindings page in the wiki and tried to rewrite the 
parameter in different ways, but so far have only gotten different error 
messages.

The signature for the fromLanguages function can be seen at 
https://github.com/pemistahl/lingua/blob/main/src/main/kotlin/com/github/pemistahl/lingua/api/LanguageDetectorBuilder.kt#L165


Is there a way to make this work?

Thanks,
Vincent

__
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
530 Walnut St., Suite 850, Philadelphia, PA 19106
E-Mail: 
vincent.li...@taylorandfrancis.com
Web: www.tandfonline.com

Taylor & Francis is a trading name of Informa UK Limited,
registered in England under no. 1072954

"Everything should be made as simple as possible, but not simpler."



Information Classification: General


Re: [basex-talk] Pretty print

2022-11-18 Thread Lizzi, Vincent
Hi Liam,

XML's way handling of space characters is understandably an improvement over 
SGML, but it still causes problems sometimes and seems more complex than it 
perhaps could be. Although the ship has long since sailed, out of curiosity do 
you recall if there were any suggestions for a rule to ensure that spaces (and 
absence of spaces) would be consistently preserved without relying on a DTD or 
Schema?

A relatively safe way to "pretty print" indent XML is to only insert or remove 
spaces between an element's name and closing > and where spaces already exist 
in text nodes. Changing the spaces within an element opening tag can adjust 
formatting without inserting or removing text nodes. For example:

pretty print n2.

Can be indented without changing the node tree:

pretty
  print n2.

However I haven't seen any XML editor or processor implement this approach.

Best regards,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com



Information Classification: General
From: BaseX-Talk  On Behalf Of Liam 
R. E. Quin
Sent: Thursday, November 17, 2022 4:44 PM
To: BaseX 
Subject: Re: [basex-talk] Pretty print

On Thu, 2022-11-17 at 19:05 +0100, Christian Grün wrote:
> >
> > But is there no way to declare that when I import a file to the
> > database?
> >
>
> There's currently no way to supply this for specific elements

Both XML Schema and DTDs do have a way to say whether text is allowed
in a particular context, and the XML loader could use this information
to discard whitespace text nodes that aren't text.

On how it came to be -

SGML had some really bad whitespace rules, including what was called
"pernicious whitespace" - whitespace where the parser needed
backtracking to know if was text or not, but the parsers didn't
actually do backtracking so they flagged it as an error. This was a
very common source of problems for users.

We eliminated this for XML by requiring #PCDATA (i.e. text) always to
be in a repeatable or-group, so

and not

(to paraphrase Ambrose Beirce's Devil's Dictionary, which defined a boy
as a noise with dirt on it).

liam


--
Liam Quin, 
https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  
http://www.fromoldbooks.org


Re: [basex-talk] current-dateTime() function precision

2022-09-09 Thread Lizzi, Vincent
Hi Yitzhak

The adjust-dateTime-to-timezone function may do what you need. For example, 
this converts the current time to UTC.

adjust-dateTime-to-timezone(current-dateTime(), xs:dayTimeDuration('PT0H'))

For EDT, change the duration to "-PT4H". The duration that is needed for US 
Eastern time will vary depending on the time of year.

See https://maxtoroq.github.io/xpath-ref/fn/adjust-dateTime-to-timezone.html

Kind regards,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>




Information Classification: General
From: ykhab...@bellsouth.net 
Sent: Friday, September 9, 2022 1:46 PM
To: Lizzi, Vincent ; 'BaseX' 

Subject: RE: [basex-talk] current-dateTime() function precision

Hi Vincent and Martin,

Thanks for the clarification.


I switched to using the Profiling module.
let $before_datetime := convert:integer-to-dateTime(prof:current-ms()) (: 
current-dateTime() :)

And it is working well.
2022-09-09T17:31:34.466Z
2022-09-09T17:31:34.538Z

The last remaining task is to convert the output from the UTC time zone to my 
local EST:
2022-09-09T13:31:34.466-04:00

Any idea how to achieve that?

Regards,
Yitzhak Khabinsky
From: Lizzi, Vincent 
mailto:vincent.li...@taylorandfrancis.com>>
Sent: Friday, September 9, 2022 1:26 PM
To: ykhab...@bellsouth.net<mailto:ykhab...@bellsouth.net>; 'BaseX' 
mailto:basex-talk@mailman.uni-konstanz.de>>
Subject: RE: [basex-talk] current-dateTime() function precision

Hello Yitzhak,

The current-dateTime() function returns the same value throughout the execution 
of a single query.

The Profiling module has functions that provide the current system time, which 
is probably more to what you need.

https://docs.basex.org/wiki/Profiling_Module<https://docs.basex.org/wiki/Profiling_Module>

Kind regards,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>




Information Classification: General
From: BaseX-Talk 
mailto:basex-talk-boun...@mailman.uni-konstanz.de>>
 On Behalf Of ykhab...@bellsouth.net<mailto:ykhab...@bellsouth.net>
Sent: Friday, September 9, 2022 11:55 AM
To: 'BaseX' 
mailto:basex-talk@mailman.uni-konstanz.de>>
Subject: [basex-talk] current-dateTime() function precision

Hello,

I am using BaseX v.10.1

It seems that the current-dateTime() function precision is off.

I am validating an XML file against an XSD 1.1 file via Xerces 2.12.2
validator.
And trying to measure a timing of it.

Here is my code.
xquery version "4.0";

let $xml := '\\...\AForm-XSD-20211013\PD220224062681.XML'
let $xsd := '\\...\AForm-XSD-20211013\Miami-ws-AForm.xsd'

let $before_datetime := current-dateTime()
let $result := validate:xsd-report($xml, $xsd, map {
'http://apache.org/xml/features/validation/cta-full-xpath-checking<http://apache.org/xml/features/validation/cta-full-xpath-checking>':
 true()
})

return 
{data($result/status)}
{count($result/message)}
{$before_datetime}
{current-dateTime()}
{$xml}
{$xsd}
BaseX {data(db:system()//version)}, EE-Java,
{validate:xsd-processor()} 2.12.2
{validate:xsd-version()}

{$result/message}


It is emitting the following output:


invalid
3
2022-09-09T11:32:29.667-04:00
2022-09-09T11:32:29.667-04:00
\\...\AForm-XSD-20211013\PD220224062681.XML>
\\...\AForm-XSD-20211013\Miami-ws-AForm.xsd>
BaseX 10.1, EE-Java, Xerces 2.12.2
1.1


...



The question is why the before and after timing is the same:
2022-09-09T11:32:29.667-04:00
2022-09-09T11:32:29.667-04:00

They are identical up to a millisecond.
My expectations were that at least the millisecond time portion would be
different.

Regards,
Yitzhak Khabinsky


Re: [basex-talk] current-dateTime() function precision

2022-09-09 Thread Lizzi, Vincent
Hello Yitzhak,

The current-dateTime() function returns the same value throughout the execution 
of a single query.

The Profiling module has functions that provide the current system time, which 
is probably more to what you need.

https://docs.basex.org/wiki/Profiling_Module

Kind regards,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com




Information Classification: General
From: BaseX-Talk  On Behalf Of 
ykhab...@bellsouth.net
Sent: Friday, September 9, 2022 11:55 AM
To: 'BaseX' 
Subject: [basex-talk] current-dateTime() function precision

Hello,

I am using BaseX v.10.1

It seems that the current-dateTime() function precision is off.

I am validating an XML file against an XSD 1.1 file via Xerces 2.12.2
validator.
And trying to measure a timing of it.

Here is my code.
xquery version "4.0";

let $xml := '\\...\AForm-XSD-20211013\PD220224062681.XML'
let $xsd := '\\...\AForm-XSD-20211013\Miami-ws-AForm.xsd'

let $before_datetime := current-dateTime()
let $result := validate:xsd-report($xml, $xsd, map {
'http://apache.org/xml/features/validation/cta-full-xpath-checking':
 true()
})

return 
{data($result/status)}
{count($result/message)}
{$before_datetime}
{current-dateTime()}
{$xml}
{$xsd}
BaseX {data(db:system()//version)}, EE-Java,
{validate:xsd-processor()} 2.12.2
{validate:xsd-version()}

{$result/message}


It is emitting the following output:


invalid
3
2022-09-09T11:32:29.667-04:00
2022-09-09T11:32:29.667-04:00
\\...\AForm-XSD-20211013\PD220224062681.XML>
\\...\AForm-XSD-20211013\Miami-ws-AForm.xsd>
BaseX 10.1, EE-Java, Xerces 2.12.2
1.1


...



The question is why the before and after timing is the same:
2022-09-09T11:32:29.667-04:00
2022-09-09T11:32:29.667-04:00

They are identical up to a millisecond.
My expectations were that at least the millisecond time portion would be
different.

Regards,
Yitzhak Khabinsky


Re: [basex-talk] Java 8 and XML Catalog

2022-07-07 Thread Lizzi, Vincent
It may help to include the link to the repository: 
https://github.com/vincentml/xml-catalog-resolver

Kind regards,
Vincent
_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group 
vincent.li...@taylorandfrancis.com
 


Information Classification: General

-Original Message-
From: Lizzi, Vincent 
Sent: Thursday, July 7, 2022 9:12 AM
To: 'Christian Grün' 
Cc: basex-talk@mailman.uni-konstanz.de
Subject: RE: [basex-talk] Java 8 and XML Catalog

Hi Christian,

Thank you for your reply! That makes sense. I worked around the problem for now 
by implementing an XML Catalog Resolver in XQuery. I've made it available on 
GitHub in case other people might need it.

For example:

import module namespace resolver = "xml-catalog-resolver" at 
"https://raw.githubusercontent.com/vincentml/xml-catalog-resolver/main/xml-catalog-resolver.xqm;;
let $doc := "example.xml"
let $catfile := db:option("catfile")
return resolver:parse-xml($doc, $catfile)

Kind regards,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group 
vincent.li...@taylorandfrancis.com
 


Information Classification: General

-Original Message-
From: Christian Grün  
Sent: Thursday, July 7, 2022 5:27 AM
To: Lizzi, Vincent 
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Java 8 and XML Catalog

Hi Vincent,

XML catalogs and Java 8 are an ungrateful combination indeed. As you know, 
support for catalogs will improve a lot with BaseX 10, as JDK 11 comes with its 
own catalog resolver. With BaseX 10, we’ll also support Norm’s XML resolver, as 
you’ve already discovered (we’ll update our documentation soon).

Due to limited resources, we decided to focus on the new version exclusively. 
It’s interesting to hear, though, that your code is working with BaseX 9 and 
Java 17. Maybe it’s the inbuilt XML resolver that’s used by Java 17, no matter 
if an external resolver is added to the classpath? All I can do is guess …

Maybe JDK developers would be able to answer that question if we managed to 
create some self-contained code in Java?

Best,
Christian


Re: [basex-talk] Java 8 and XML Catalog

2022-07-07 Thread Lizzi, Vincent
Hi Christian,

Thank you for your reply! That makes sense. I worked around the problem for now 
by implementing an XML Catalog Resolver in XQuery. I've made it available on 
GitHub in case other people might need it.

For example:

import module namespace resolver = "xml-catalog-resolver" at 
"https://raw.githubusercontent.com/vincentml/xml-catalog-resolver/main/xml-catalog-resolver.xqm;;
let $doc := "example.xml"
let $catfile := db:option("catfile")
return resolver:parse-xml($doc, $catfile)

Kind regards,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group 
vincent.li...@taylorandfrancis.com
 


Information Classification: General

-Original Message-
From: Christian Grün  
Sent: Thursday, July 7, 2022 5:27 AM
To: Lizzi, Vincent 
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Java 8 and XML Catalog

Hi Vincent,

XML catalogs and Java 8 are an ungrateful combination indeed. As you know, 
support for catalogs will improve a lot with BaseX 10, as JDK 11 comes with its 
own catalog resolver. With BaseX 10, we’ll also support Norm’s XML resolver, as 
you’ve already discovered (we’ll update our documentation soon).

Due to limited resources, we decided to focus on the new version exclusively. 
It’s interesting to hear, though, that your code is working with BaseX 9 and 
Java 17. Maybe it’s the inbuilt XML resolver that’s used by Java 17, no matter 
if an external resolver is added to the classpath? All I can do is guess …

Maybe JDK developers would be able to answer that question if we managed to 
create some self-contained code in Java?

Best,
Christian


[basex-talk] Java 8 and XML Catalog

2022-07-05 Thread Lizzi, Vincent
Hi Christian,

I'm having trouble getting an XML Catalog to work in BaseX 9.7.2 running on 
Java 8. The XML Catalog is working properly in BaseX and Saxon using Java 17. 
However, if I keep everything the same and switch to Java 8 the XML Catalog 
doesn't work. The error messages when running on Java 8 indicate that the XML 
Catalog is not being used at all. The obvious solution of updating the 
environment to a more recent version of Java is not so easy in this case.

I have the XML Catalog location configured as an absolute file system URI in 
the system properties org.basex.catfile, xml.catalog.files, and 
javax.xml.catalog.files. Option db:intparse is set to false and option db:dtd 
is set to true. I have set the class path to include both the Apache XML 
Resolver (xml-resolver:xml-resolver:1.2) and the xmlresolver.org XML Resolver 
(org.xmlresolver:xmlresolver:4.4.0).

It is my understanding that BaseX will use Norm Tovey-Walsh's XML Resolver from 
xmlresolver.org if it is found on the class path, or fall back to Java's built 
in XML resolver. I see the logic for this check on which catalog resolver to 
use in org.basex.util.Resolver.

Do you have any suggestions on what to try, or is this pointing to a possible 
bug somewhere?

Thank you,
Vincent

__
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
530 Walnut St., Suite 850, Philadelphia, PA 19106
E-Mail: vincent.li...@taylorandfrancis.com
Web: www.tandfonline.com

Taylor & Francis is a trading name of Informa UK Limited,
registered in England under no. 1072954

"Everything should be made as simple as possible, but not simpler."



Information Classification: General


Re: [basex-talk] XML Catalog and xslt:transform()

2022-06-02 Thread Lizzi, Vincent
Hi Daniel and Gerrit,

If you are able to use Java version 11 or higher, it might be of use to try the 
XML Catalog support that comes built in with Java. This ticket comment has some 
details and an example for configuring Java and BaseX to use the same XML 
Catalog:

https://github.com/BaseXdb/basex/issues/1903#issuecomment-1108822028

I'm not sure if this is relevant for your situation, but I've read somewhere 
(although I can't put my hands on the source right now) that Saxon uses the XML 
Catalog for resolving URIs only in certain contexts. For example, a DTD DOCTYPE 
can be resolved using an XML Catalog, but the function fn:json-doc() does not 
use an XML Catalog.

Cheers,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com



Information Classification: General
From: BaseX-Talk  On Behalf Of 
Zimmel, Daniel
Sent: Thursday, June 2, 2022 10:57 AM
To: 'Imsieke, Gerrit, le-tex' ; 
basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] XML Catalog and xslt:transform()

I see, thanks Gerrit and Christian for the insight. This *does* sound wickedly 
unfunny.

OK if I actually do not need to be able to parse the DTD wouldn't the simple 
workaround be:

fetch:xml('file:///C:/temp/catalog/dokument.xml')
=> xslt:transform('transform.xsl')

At least this is what works here, resulting in a new document node and trashing 
the DTD declaration.

Daniel

-Ursprüngliche Nachricht-
Von: BaseX-Talk 
mailto:basex-talk-boun...@mailman.uni-konstanz.de>>
 Im Auftrag von Imsieke, Gerrit, le-tex
Gesendet: Donnerstag, 2. Juni 2022 16:40
An: 
basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] XML Catalog and xslt:transform()

As a workaround, you might be able to read the documents using doc() in XQuery 
(this might work with the help of the catalog, in contrast to
doc() from within XSLT/Saxon) and pass them to xslt:transform() in some way. 
"Some way" isn't easy, either, since xslt:transform() still relies on JAXP, and 
you can't pass arbitrary XDM items such as whole documents or maps as 
stylesheet parameters (or can you? $params as map(*)? doesn't rule this out, 
but I doubt that a parameter may have another map as value and arrive safely at 
the stylesheet). So you might need to wrap all inputs in a single top-level 
element, which of course prevents you from letting the XSLT stylesheet decide 
which resource to load dynamically, and you might need to change matching 
patterns.
But switching to XDM and implementing XPath 3.1's fn:transform() function that 
would allow to was too much of a stretch for Christian at the time we paid 
BaseX GmbH to implement xslt:transform-report(). I think this will need another 
significant investment, and Christian needs to find time to implement it.

Gerrit

On 02.06.2022 16:24, Imsieke, Gerrit, le-tex wrote:
> Hi Daniel,
>
> I think the catalog in xslt:transform() is only used for XSLT
> imports/includes and maybe for reading documents with doc(), and only
> for Saxon. The catalog is probably *not* used for mapping system
> identifiers in the documents accessed this way. We should document
> this better once we find out what is/isn't supported.
>
> The background is that we desperately needed to use catalogs for
> mapping import/include URIs, and we paid Liam to implement this. He
> succeeded with a little help from Christian, but it was not an easy
> feat because include/import URI resolution is different from doc() URI
> resolution in Saxon which in turn is different from system identifier
> resolution (that is probably done by the XML parser, not by Saxon).
>
> So I think we need to pay Liam and Christian again so that they work
> out how to pass the catalog to the XML parser that is invoked by
> Saxon. This definitely isn't a fun task.
>
> Gerrit
>
> On 02.06.2022 14:44, Zimmel, Daniel wrote:
>> Hi,
>>
>> after reading 
>> https://docs.basex.org/wiki/Catalog_Resolver
>>  and
>> digging in the list archives
>> (https://mailman.uni-konstanz.de/pipermail/basex-talk/2019-March/0141
>> 99.html
>> ) I still have trouble understanding catalog files.
>>
>> Is this supposed to work with xslt:transform() and BaseX GUI 9.7.2?
>> The default option (DTD = false) is ignored by xslt:transform()
>> because the function is definitely requesting the external DTD.
>> This prevents transforming XML with DTD declarations that are not
>> available (if I understand correctly, a problem that the DTD option
>> is trying to solve in general).
>>
>> When I try to solve this via catalog files (actually I do not need
>> the DTD), I do not have success.
>> Here are my mini examples:
>>
>> Saxon HE 10.3 resides in the lib folder
>>
>> .basex setting:
>> # Local Options
>> SERIALIZER 

Re: [basex-talk] First steps with client/server on Linux - bulk import; error message in dba

2022-05-12 Thread Lizzi, Vincent
Hi Christian and Florian,

I recall that the DBA web interface didn't like the YAJSW log files, although 
that didn't seem surprising because the YAJSW log files are in a different 
format than BaseX log files. It's been a while since I tried this so I don't 
have the details now. YAJSW can be configured to save its log files in a 
different location.

Kind regards,
Vincent


_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>



Information Classification: General
From: Christian Grün 
Sent: Thursday, May 12, 2022 8:03 AM
To: Florian Schmitt ; Lizzi, Vincent 

Cc: BaseX 
Subject: Re: [basex-talk] First steps with client/server on Linux - bulk 
import; error message in dba

Thanks for the observation, Florian.

> After modifying permissions for the $basexhome/data/.logs directory, i now 
> have a logfile showing the following entry:

I hope the latest snapshot resolves the Yajsw/.logs bug [1].

> [POST] /dba/log?name=wrapper-basex=1; Unexpected error: Improper use? 
> Potential bug? Your feedback is welcome: Contact: 
> basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de> 
> Version: BaseX 9.7.1 Java: Private Build, 11.0.15 OS: Linux, amd64 Stack 
> Trace: java.lang.NullPointerException at

I've added Vincent Lizzi to the conversation, who has gratefully added
the documentation on YAJSW: Maybe it would be sensible to move the
wrapper-basex.log file out of the .logs directory? I don't expect any
complications if it's stored in data/.

I'll additionally check if we can prevent admin:logs from crashing if
the log input does not conform to the usual pattern.

Best,
Christian

[1] 
https://github.com/BaseXdb/basex/issues/2105<https://github.com/BaseXdb/basex/issues/2105>

> [POST] /dba/log?name=wrapper-basex=1; Unexpected error: Improper use? 
> Potential bug? Your feedback is welcome: Contact: 
> basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de> 
> Version: BaseX 9.7.1 Java: Private Build, 11.0.15 OS: Linux, amd64 Stack 
> Trace: java.lang.NullPointerException at 
> org.basex.query.func.admin.AdminLogs$1.next(AdminLogs.java:84) at 
> org.basex.query.QueryContext.next(QueryContext.java:358) at 
> org.basex.query.func.fn.FnReverse.iter(FnReverse.java:61) at 
> org.basex.query.expr.gflwor.For$1.next(For.java:112) at 
> org.basex.query.expr.gflwor.Let$LetEval.next(Let.java:144) at 
> org.basex.query.expr.gflwor.Where$1.next(Where.java:41) at 
> org.basex.query.expr.gflwor.GFLWOR.value(GFLWOR.java:82) at 
> org.basex.query.expr.gflwor.Let$LetEval.next(Let.java:146) at 
> org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:68) at 
> org.basex.query.scope.MainModule$1.next(MainModule.java:67) at 
> org.basex.http.restxq.RestXqResponse.serialize(RestXqResponse.java:87) at 
> org.basex.http.web.WebResponse.create(WebResponse.java:58) at 
> org.basex.http.restxq.R...

> That NPE is thrown if i try to access the wrapper-basex log using the DBA web 
> interface. Accessing the BaseX logs works fine.


Re: [basex-talk] specifying the processor for xslt:transform()

2021-11-05 Thread Lizzi, Vincent
In case this is helpful, here are examples of code I've written to use an XML 
catalog with xslt:transform(). These examples were slightly modified to put 
into an email so there might be some typos.

Version 1:

In this example the XML document "file.xml" might be coming from a zip file or 
other location so temporarily writing the XML to disk was necessary.

The location of catalog.xml and DTD are relative to .basexhome. The location of 
the XSLT is relative to the XQuery file.

declare option db:catfile 'src/schemas/catalog.xml';

declare function local:parse-xml($xml as xs:string) as document-node() {
  let $file := file:create-temp-file('parse-xml-', '.xml')
  return (
file:write-text($file, $xml),
(# db:intparse false #) (# db:dtd true #) (# db:chop false #) { doc($file) 
},
file:delete($file)
  )
};

"file.xml" => file:read-text() => local:parse-xml() => 
xslt:transform-text(file:resolve-path(xslt/stylesheet.xsl'))


Version 2:

If the XSLT needs access to entities defined in the DTD using the function 
unparsed-entity-uri() then the above example does not work. In this case, the 
DOCTYPE is modified using a regular expression to insert a SYSTEM DTD location 
so that the unparsed XML can be provided to xslt:transform-text().

declare function local:preprocess-xml($xml as xs:string, $dtd-path as 
xs:string) as xs:string {
  replace($xml, 
'(PUBLIC\s["][\sa-zA-Z0-9-()\+,\./:=?;!*#@$_%]*["]\s["][a-zA-Z0-9_/:\.\\\-]*[/\\]?[a-zA-Z0-9_\.\-]+\.dtd["])|(SYSTEM\s["][a-zA-Z0-9_/:\.\\\-]*[/\\]?[a-zA-Z0-9_\.\-]+\.dtd["])',
 'SYSTEM "' || $dtd-path || ' "', 'i')
};

"file.xml" => file:read-text() => local:preprocess-xml("src/schemas/my.dtd") => 
xslt:transform-text(file:resolve-path('xslt/stylesheet.xsl'))

I'm using xslt:transform-text() because I want the transformed XML to have the 
serialization options and DOCTYPE that are specified in the XSLT, but if those 
things are not important to you then xslt:transform() would work equally well.

These examples just show what has worked for me, and there might be better 
alternatives.

Kind regards,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>



Information Classification: General
From: Lizzi, Vincent
Sent: Friday, November 5, 2021 4:54 PM
To: Christian Grün ; Imsieke, Gerrit, le-tex 

Cc: BaseX 
Subject: RE: [basex-talk] specifying the processor for xslt:transform()

Hello Christian, Gerrit, Liam, Graydon,

Is it possible to use a different XML Catalog Resolver with BaseX? I'm 
referring specifically to the new XML resolver that Norm Tovey-Wash presented 
today at Declarative Amsterdam. The presentation recording is at 
https://www.youtube.com/watch?v=LBuqQG8io8k_channel=DeclarativeAmsterdam and 
resolver is available at https://xmlresolver.org/ and 
https://github.com/xmlresolver/xmlresolver/.

I haven't yet had a chance to try Norm's new XML resolver or the BaseX 10 
snapshot.

However, I have also run into the limitation Gerrit mentioned about 
xslt:transform() not using an XML Catalog, and have used workarounds to 
preprocess the XML before calling xslt:transform().

Regarding useful options, the two things that I usually want to configure 
(apart from the contents of catalog.xml) are the location of the catalog.xml 
file(s) and logging verbosity. Being able to configure the catalog in a map 
parameter or startup parameter seem like useful additions to the existing 
methods (pragma, option, .basex, etc.).

Kind regards,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>




Information Classification: General
From: BaseX-Talk 
mailto:basex-talk-boun...@mailman.uni-konstanz.de>>
 On Behalf Of Christian Grün
Sent: Friday, November 5, 2021 8:28 AM
To: Imsieke, Gerrit, le-tex 
mailto:gerrit.imsi...@le-tex.de>>
Cc: BaseX 
mailto:basex-talk@mailman.uni-konstanz.de>>
Subject: Re: [basex-talk] specifying the processor for xslt:transform()

With BaseX 10, which will be based on JDK 11, we'll switch to the
built-in JDK Catalog Resolver [1], which tends to get good reviews,
and which allows for a much cleaner and more consistent integration.
Debugging should be easier as well, as errors will always be reported
back if the catalog resolution fails.

We think about replacing the CATFILE option...

1. Option:
CATFILE: path/to/catalog.xml

2. or XQuery:
fetch:xml('file.xml', map { 'catfile': 'path/to/catalog.xml })

...with a new CATALOG option that takes multiple keys and values:

1. Option:
CATALOG: files=path/to/catalog.xml,resolve=strict,prefer=public,defer=false

2. or XQuery:
fetch:xml('file.xml', ma

Re: [basex-talk] specifying the processor for xslt:transform()

2021-11-05 Thread Lizzi, Vincent
Hello Christian, Gerrit, Liam, Graydon,

Is it possible to use a different XML Catalog Resolver with BaseX? I'm 
referring specifically to the new XML resolver that Norm Tovey-Wash presented 
today at Declarative Amsterdam. The presentation recording is at 
https://www.youtube.com/watch?v=LBuqQG8io8k_channel=DeclarativeAmsterdam and 
resolver is available at https://xmlresolver.org/ and 
https://github.com/xmlresolver/xmlresolver/.

I haven't yet had a chance to try Norm's new XML resolver or the BaseX 10 
snapshot.

However, I have also run into the limitation Gerrit mentioned about 
xslt:transform() not using an XML Catalog, and have used workarounds to 
preprocess the XML before calling xslt:transform().

Regarding useful options, the two things that I usually want to configure 
(apart from the contents of catalog.xml) are the location of the catalog.xml 
file(s) and logging verbosity. Being able to configure the catalog in a map 
parameter or startup parameter seem like useful additions to the existing 
methods (pragma, option, .basex, etc.).

Kind regards,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com




Information Classification: General
From: BaseX-Talk  On Behalf Of 
Christian Grün
Sent: Friday, November 5, 2021 8:28 AM
To: Imsieke, Gerrit, le-tex 
Cc: BaseX 
Subject: Re: [basex-talk] specifying the processor for xslt:transform()

With BaseX 10, which will be based on JDK 11, we'll switch to the
built-in JDK Catalog Resolver [1], which tends to get good reviews,
and which allows for a much cleaner and more consistent integration.
Debugging should be easier as well, as errors will always be reported
back if the catalog resolution fails.

We think about replacing the CATFILE option...

1. Option:
CATFILE: path/to/catalog.xml

2. or XQuery:
fetch:xml('file.xml', map { 'catfile': 'path/to/catalog.xml })

...with a new CATALOG option that takes multiple keys and values:

1. Option:
CATALOG: files=path/to/catalog.xml,resolve=strict,prefer=public,defer=false

2. or XQuery:
fetch:xml('file.xml', map { 'catalog': map {
'files': 'path/to/catalog.xml',
'resolve': 'strict',
'prefer': 'public',
'defer': false()
}})

An alternative would be to completely drop the catalog options and
assign all catalog options via system properties at startup:

java -Djavax.xml.catalog.files=path/to/catalog.xml  BaseX

I'd love to get your feedback on these ideas, and your experiences
with an early BaseX 10 snapshot [2]!
Christian

[1] 
https://docs.oracle.com/en/java/javase/11/core/xml-catalog-api1.html#GUID-96D2C9AC-641A-4BDB-BB08-9FA04358A6F4
[2] 
https://files.basex.org/releases/latest-10/


On Fri, Nov 5, 2021 at 9:03 AM Imsieke, Gerrit, le-tex
mailto:gerrit.imsi...@le-tex.de>> wrote:
>
>
>
> On 05.11.2021 03:03, Liam R. E. Quin wrote:
> > On Thu, 2021-11-04 at 18:43 -0400, Graydon Saunders wrote:
>
> >> Related to this, setting the catalog for use by xslt:transform() is
> >> defeating me.
> >
> > The only ways i have found to debug these are
> > (1) with strace -f, to make sure the file is being read
> > (2) with a CatalogManager.properties file [[
> > verbosity=65535
> > # relative-catalogs=false
> > prefer = public
> > catalogs=mycataloguefile.xml
> > ]]
> >
> > Likely you need entries in the catalog file starting with file:///
> >
> > If you are uploading queries to a BaseX server, remember it's the
> > server that needs to have had XLASSPATH set when starting, and that
> > relativeURIs like "catalog.xml" might be sought for in the server's
> > directory.
> >
> > Liam
>
> Liam and Christian have thankfully added support for resolving
> include/import URIs and doc(...) URIs approx 2 years ago [1]. A thing that
> I recently found was lacking is resolution of system identifiers that
> occur in documents. That is, if there is a reference to a DTD in a
> document that is read during the transformation, the catalog resolution
> does not apply to the public or system identifiers.
>
> Is this the issue that you are encountering, Graydon?
>
> Your first argument to xslt:transform is db:open('acme_content')[1].
> Does this document have a DOCTYPE declaration? I'd have guessed that the
> DOCTYPE declaration was stripped away when the documents were loaded
> into the DB, that is, parsing with the DTD only happened during import.
> But maybe this is different if you use the internal parser.
>
> Gerrit
>
> [1] 
> https://github.com/BaseXdb/basex/issues/1719


[basex-talk] problems using Archive module with certain zip files

2020-08-19 Thread Lizzi, Vincent
Hello,

I'm using BaseX in several file conversion projects that involve unpacking a 
zip file using the Archive module, modifying the files that are found inside 
with the File module and XSLT module, and creating new zip file(s) using the 
Archive module again. BaseX is very useful for these kinds of transformations 
and makes it quick to develop a transformation complete with unit tests.

I'm running in to 2 kinds of problems with certain zip files and getting the 
following error messages from the Archive module.

1. When attempting to extract certain zip files

archive:extract-to() or archive:entries() produce this error message "Operation 
failed: only DEFLATED entries can have EXT descriptor."

The zip files that produce this error can be extracted using another program 
such as 7zip, which might be better at handling variations in the structure of 
some zip files.


2. When attempting to create a zip file larger than about 2 Gb.

After extracting a zip file larger than about 2 Gb using archive:extract-to 
(which works), when trying to create a new zip file

file:write-binary($newZip, archive:create-from($tempDir))

produces a stack trace that begins with:

java.lang.ArrayIndexOutOfBoundsException: Maximum array size reached.
   at org.basex.util.Array.checkCapacity(Array.java:322)
   at org.basex.util.Array.newCapacity(Array.java:313)
   at org.basex.util.Array.newCapacity(Array.java:301)
   at org.basex.io.out.ArrayOutput.write(ArrayOutput.java:32)
   at java.base/java.io.OutputStream.write(OutputStream.java:157)
   at 
java.base/java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253)
   at 
java.base/java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
   at 
java.base/java.util.zip.ZipOutputStream.write(ZipOutputStream.java:332)
   at 
java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:108)
   at org.basex.query.func.archive.ZIPOut.write(ZIPOut.java:44)
   at 
org.basex.query.func.archive.ArchiveCreate.add(ArchiveCreate.java:138)


I'm currently using BaseX version 9.4.1 and Java 64-bit openjdk version 
"11.0.7" 2020-04-14.

This might be a bug in BaseX handling of zip files. Is there any solution or 
workaround?

Thanks,
Vincent



__
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
530 Walnut St., Suite 850, Philadelphia, PA 19106
E-Mail: vincent.li...@taylorandfrancis.com
Phone: 215-606-4221
Web: www.tandfonline.com

Taylor & Francis is a trading name of Informa UK Limited,
registered in England under no. 1072954

"Everything should be made as simple as possible, but not simpler."



Information Classification: General


Re: [basex-talk] xslt:transform function not working with XML Catalog

2020-07-10 Thread Lizzi, Vincent
Gerrit and Liam,

Thanks for that background information and the recommendation for sponsoring 
development. I tried adding rewriteSystem to the XML Catalog but it did not 
help. The XML Catalog already uses systemSuffix. It is good to see that the 
option for sponsoring development is available and has already resulted in 
improvements to XML Catalog support in BaseX. As you pointed out that the 
initial work on catalog support in xslt:transform focused on supporting 
xsl:import and xsl:include so the scenario for using the catalog resolver when 
parsing XML was probably not in scope.

I've logged a ticket with a small self-contained example: 
https://github.com/BaseXdb/basex/issues/1903

For now, the workaround to replace the URI in the DOCTYPE using a regular 
expression on the XML before xslt:transform() looks like it will work.

Many thanks,
Vincent


_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com




Information Classification: General
From: BaseX-Talk  On Behalf Of Liam 
R. E. Quin
Sent: Friday, July 10, 2020 12:56 PM
To: Imsieke, Gerrit, le-tex ; 
basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] xslt:transform function not working with XML Catalog

On Fri, 2020-07-10 at 08:23 +0200, Imsieke, Gerrit, le-tex wrote:
> I'd like to warmly
> recommend paying him so that he can explore and fix the issue.

:-) Thank you for the recommendation!


The trick is to find the resolver output from setting verbose, as then
you will see the strings that are being sent to the resolver.

If you have your strace log, you can see which
CatalogManager.properties files were read, set verbose=999 in one, and
then look for output, but it's possible it's getting eaten by saxon as
the buffered output of XSLT. I should look into being able to capture
those messages separately.

Liam

--
Liam Quin, 
https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations: 
http://www.fromoldbooks.org


Re: [basex-talk] xslt:transform function not working with XML Catalog

2020-07-09 Thread Lizzi, Vincent
Hi Liam,

Thanks for the helpful suggestions. After trying everything you suggested and 
then also trying a few of Saxon's configuration options, unfortunately I'm 
still having the same problem. Trying a shell script that contains the 
following:

MAIN="$( cd -P "$(dirname "$FILE")/../basex" && pwd )"
CP=$MAIN/BaseX.jar:$MAIN/lib/custom/*:$MAIN/lib/*:$CLASSPATH
echo 1 Saxon
java -cp "$CP" net.sf.saxon.Transform -s:input1.xml -xsl:transform.xsl 
-catalog:schemas/catalog.xml
echo 2 BaseX transform
java -cp "$CP" org.basex.BaseX -q"(# db:catfile schemas/catalog.xml #) (# 
db:intparse false #) (# db:dtd true #) (# db:chop false #) { 
xslt:transform('input1.xml', 'transform.xsl') }"
echo 3 BaseX transform with Saxon features configured
java 
-Dhttp://saxon.sf.net/feature/entityResolverClass=org.apache.xml.resolver.tools.CatalogResolver
 
-Dhttp://saxon.sf.net/feature/uriResolverClass=org.apache.xml.resolver.tools.CatalogResolver
 -cp "$CP" org.basex.BaseX -q"(# db:catfile schemas/catalog.xml #) (# 
db:intparse false #) (# db:dtd true #) (# db:chop false #) { 
xslt:transform('input1.xml', 'transform.xsl') }"
echo 4 BaseX doc to show XML Catalog is configured correctly to parse XML
java -cp "$CP" org.basex.BaseX -q"(# db:catfile schemas/catalog.xml #) (# 
db:intparse false #) (# db:dtd true #) (# db:chop false #) { doc('input1.xml') 
}"

The classpath includes BaseX 9.3.3, Saxon HE 9.9, xml-resolver-1.2.jar, and 
CatalogManager.properties


  1.  The transformation works in Saxon and uses the catalog file to locate the 
DTD when parsing the XML input1.xml.
  2.  The BaseX xslt:transform should work the same as #1, but fails because 
the DTD cannot be read
  3.  Adding Saxon configuration for Entity Resolver Class and URI Resolve 
Class did not help
  4.  Simply parsing the XML using doc() in BaseX with the same configuration 
shows that the XML catalog is configured correctly within BaseX

Using strace -f, the log shows that BaseX xslt:transform is reading the 
catalog.xml file from disk, and then is trying (and failing) to read the DTD 
from the non-working URIL.

This might be a bug in xslt:transform, so the workaround of using a regular 
expression replace on the DOCTYPE system URI is probably the practical solution.

Many thanks,
Vincent


_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>




Information Classification: General
From: Liam R. E. Quin 
Sent: Thursday, July 9, 2020 12:55 PM
To: Lizzi, Vincent ; BaseX 

Subject: Re: [basex-talk] xslt:transform function not working with XML Catalog

On Thu, 2020-07-09 at 04:32 +, Lizzi, Vincent wrote:
> Hi Liam,
>
> Thanks for the reply and suggestions. Based on your suggestion I
> tried pragmas and strace, and had another go at
> CatalogManager.properties, but they've not had any effect.

use, strace -f java >& hugelogfile.txt
and after, grep -i catalogmanager.properties hugelogfile.txt
and you should see where it's looking. If it doesn't look for that
file, check to see if it opened the jar file containing the resolver.

If you're running BaseX from Oxygen, Oxygen needs to have it in its
classpath too i think.

Also, of course, see if the catalog file is actually being opened!

I actually wrote some of the code in BaseX that makes XML catalogs work
with transform(), or provided a rough draft that Christian improved :),
and debugging it was... interesting.

I'd also try an absolute path for the catalog file - if you are using
the BaseX server, relative paths will be relative to the directory
(folder) where the server itself is running. (and of course the server
needs the resolver in its classpath).

Messages from the catalog manager seem to go (oddly) to standard
output interleaved with any XML output.

The command-line i used for testing this (well, one of the tests) was,

R=$HOME/lib/xmlcatalog/xml-commons-resolver-1.2/resolver.jar
MAIN=$HOME/packages/basex/basex

java -Dxml.catalog.files=saxlog.xml -D'
http://saxon.sf.net/feature/uriResolverClass=org.apache.xml.resolver.tools.CatalogResolver<http://saxon.sf.net/feature/uriResolverClass=org.apache.xml.resolver.tools.CatalogResolver>'
-cp
$R/resolver.jar:/home/lee/packages/basex/basex/BaseX.jar:$MAIN/lib/cust
om/*:$MAIN/lib/*: org.basex.BaseX try.xq

(Saxon was in $MAIN)

>
--
Liam Quin, 
https://www.delightfulcomputing.com/<https://www.delightfulcomputing.com>
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations: 
http://www.fromoldbooks.org<http://www.fromoldbooks.org>


Re: [basex-talk] xslt:transform function not working with XML Catalog

2020-07-09 Thread Lizzi, Vincent
Hi Gerrit,

Thank you for the hint! Removing quotes from the pragma did not work in this 
case.

  (# db:catfile schemas/catalog.xml #)

The catalog file is also configured at the beginning of the query:

declare option db:catfile 'schemas/catalog.xml';

This detail about not needing quotes in a pragma is worth remembering though!.

Vincent



_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>




Information Classification: General
From: BaseX-Talk  On Behalf Of 
Imsieke, Gerrit, le-tex
Sent: Thursday, July 9, 2020 1:18 AM
To: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] xslt:transform function not working with XML Catalog

Hi Vincent,

I feel your pain. Maybe this comment helps:
https://github.com/BaseXdb/basex/issues/1793#issuecomment-579134499<https://github.com/BaseXdb/basex/issues/1793#issuecomment-579134499>
(omit the quotes in the pragma).

I documented it here, too:
https://docs.basex.org/wiki/Catalog_Resolver#Additional_Notes<https://docs.basex.org/wiki/Catalog_Resolver#Additional_Notes>
"The catalog location in the pragma can be given relative to the current
working directory (the directory that is returned by file:current-dir())
or as an absolute operating system path. The catalog location in the
pragma is not an XQuery expression; no concatenation or other operations
may occur in the pragma, and the location string must not be surrounded
by quotes."

Gerrit

On 09.07.2020 06:32, Lizzi, Vincent wrote:
> Hi Liam,
>
> Thanks for the reply and suggestions. Based on your suggestion I tried
> pragmas and strace, and had another go at CatalogManager.properties, but
> they've not had any effect. (I'm using Windows 10 but was able to run
> strace in Ubuntu via WSL). This query:
>
> try {
>
>   (# db:catfile 'schemas/catalog.xml' #)
>
>   (# db:intparse false #)
>
>   (# db:dtd true #)
>
>   (# db:chop false #)
>
>   { xslt:transform('file.xml', 'stylesheet.xsl')//inlinegraphic }
>
> } catch * { $err:description }
>
> Produces the same error again due to the DTD not being available at the
> system literal URI.
>
> I did try setting verbosity 99 in a CatalogManager.properties file on
> the classpath, but this did not produce any additional messages. I also
> tried setting the same properties directly when launching BaseX this did
> not work either. Specifically, I set the following system properties
> when launching BaseX, and then used proc:property() in a query to
> confirm that these system properties were in fact set.
>
> 'xml.catalog.verbosity': '99'
>
> 'xml.catalog.ignoreMissing': 'no'
>
> 'xml.catalog.catalog-class-name': 'org.apache.xml.resolver.Resolver'
>
> 'xml.catalog.files': 'schemas/catalog.xml'
>
> xml-resolver-1.2.jar and Saxon are definitely on the classpath.
>
> Thanks,
>
> VIncent
>
> _
>
> *Vincent M. Lizzi*
>
> Head of Information Standards | Taylor & Francis Group
>
> vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>
> <mailto:vincent.li...@taylorandfrancis.com>
>
>
> Information Classification: General
>
> *From:* Liam R. E. Quin mailto:l...@fromoldbooks.org>>
> *Sent:* Wednesday, July 8, 2020 10:28 PM
> *To:* Lizzi, Vincent 
> mailto:vincent.li...@taylorandfrancis.com>>;
>  BaseX
> mailto:basex-talk@mailman.uni-konstanz.de>>
> *Subject:* Re: [basex-talk] xslt:transform function not working with XML
> Catalog
>
> On Wed, 2020-07-08 at 22:46 +, Lizzi, Vincent wrote:
> > I've encountered a problem using xslt:transform in to transform some
> > old XML that contains a DTD DOCTYPE system literal pointing to a non-
> > working URI and also uses ENTITYREF attributes to refer to image
> > files. I have the XML Catalog configured correctly using CATFILE.
>
>
> If this is on Linux, using strace can help check which catalog file is
> being used; you can also turn on debugging in a
> CatalogManager.properties file containing the line
> verbosity = 999
> (thee file needs to be in your Java classpath).
>
> There's also a BaseX pragma, (# db:catfile path/to/catalog.xml #) {
> transform(...)
> }
>
> You need to turn off the BaseX internal parser.
>
> Make sure that the resolver library and of course saxon are in your
> class path.
>
> You may need to add,
> declare option db:catfile "path/relative/to/cwd/catalog.xml";
> to your query.
>
> Liam
>
> --
> Liam Quin, 
> https://www.delightfulcomputing.com/<https://www.delightfulcomputing.com>
> <https://www.delightfulcomputing.

Re: [basex-talk] xslt:transform function not working with XML Catalog

2020-07-08 Thread Lizzi, Vincent
Hi Liam,

Thanks for the reply and suggestions. Based on your suggestion I tried pragmas 
and strace, and had another go at CatalogManager.properties, but they've not 
had any effect. (I'm using Windows 10 but was able to run strace in Ubuntu via 
WSL). This query:

try {
  (# db:catfile 'schemas/catalog.xml' #)
  (# db:intparse false #)
  (# db:dtd true #)
  (# db:chop false #)
  { xslt:transform('file.xml', 'stylesheet.xsl')//inlinegraphic }
} catch * { $err:description }

Produces the same error again due to the DTD not being available at the system 
literal URI.

I did try setting verbosity 99 in a CatalogManager.properties file on the 
classpath, but this did not produce any additional messages. I also tried 
setting the same properties directly when launching BaseX this did not work 
either. Specifically, I set the following system properties when launching 
BaseX, and then used proc:property() in a query to confirm that these system 
properties were in fact set.

'xml.catalog.verbosity': '99'
'xml.catalog.ignoreMissing': 'no'
'xml.catalog.catalog-class-name': 'org.apache.xml.resolver.Resolver'
'xml.catalog.files': 'schemas/catalog.xml'

xml-resolver-1.2.jar and Saxon are definitely on the classpath.

Thanks,
VIncent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>



Information Classification: General
From: Liam R. E. Quin 
Sent: Wednesday, July 8, 2020 10:28 PM
To: Lizzi, Vincent ; BaseX 

Subject: Re: [basex-talk] xslt:transform function not working with XML Catalog

On Wed, 2020-07-08 at 22:46 +0000, Lizzi, Vincent wrote:
> I've encountered a problem using xslt:transform in to transform some
> old XML that contains a DTD DOCTYPE system literal pointing to a non-
> working URI and also uses ENTITYREF attributes to refer to image
> files. I have the XML Catalog configured correctly using CATFILE.


If this is on Linux, using strace can help check which catalog file is
being used; you can also turn on debugging in a
CatalogManager.properties file containing the line
verbosity = 999
(thee file needs to be in your Java classpath).

There's also a BaseX pragma, (# db:catfile path/to/catalog.xml #) {
transform(...)
}

You need to turn off the BaseX internal parser.

Make sure that the resolver library and of course saxon are in your
class path.

You may need to add,
declare option db:catfile "path/relative/to/cwd/catalog.xml";
to your query.

Liam

--
Liam Quin, 
https://www.delightfulcomputing.com/<https://www.delightfulcomputing.com>
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations: 
http://www.fromoldbooks.org<http://www.fromoldbooks.org>


[basex-talk] xslt:transform function not working with XML Catalog

2020-07-08 Thread Lizzi, Vincent
I've encountered a problem using xslt:transform in to transform some old XML 
that contains a DTD DOCTYPE system literal pointing to a non-working URI and 
also uses ENTITYREF attributes to refer to image files. I have the XML Catalog 
configured correctly using CATFILE. The XSLT is using unparsed-entity-uri() to 
convert the ENTITYREF into a file name for an href attribute. I've tested the 
XSLT and the XML Catalog in oXygen and in XSpec, so I know they work. I've 
tried a few alternatives to get this to work in BaseX and have not arrived at a 
working solution.

The BaseX setup uses BaseX 9.3.3, XML Resolver 1.2, and Saxon HE 9.9.1-7.

Passing the XML directly to xslt:transform does not use the XML Catalog for 
parsing the XM. I've tried making the first parameter of xslt:transform a path 
to a XML file, an xs:anyURI pointing to the XML file, or string containing the 
XML. Each time the XML fails to parse due to the non-working URI in the system 
literal. When I change the system literal to a URI that works the 
transformation works and the unparsed-entity-uri() function produces the 
expected file name for the href attribute.

I also tried parsing the XML using doc() or fetch:xml() first and then using 
the parsed XML as the first parameter of xslt:transform(). The doc() or 
fetch:xml does use the XML Catalog to parse the XML. However, 
unparsed-entity-uri() function produces an empty string because the DTD 
information is no longer available.

Hoping to be able to spot the problem I looked through the BaseX code. 
XsltTransform is using CatalogWrapper to set a URIResolver, but is not setting 
an EntityResolver. SAXWrapper is using CatalogWrapper to set an EntityResolver 
before parsing XML. It looks like this should work, but it's not.

The next alternative is to pre-process the XML using a regular expression to 
replace the non-working URI in the DOCTYPE system literal with a working URI 
prior to xslt:transform(). This works, but it seems like this is just working 
around a problem.

Has anyone else encountered this problem and found a better solution?

Thanks,
Vincent

__
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
530 Walnut St., Suite 850, Philadelphia, PA 19106
E-Mail: vincent.li...@taylorandfrancis.com
Phone: 215-606-4221
Web: www.tandfonline.com

Taylor & Francis is a trading name of Informa UK Limited,
registered in England under no. 1072954

"Everything should be made as simple as possible, but not simpler."



Information Classification: General


Re: [basex-talk] Problem with db:create and addraw

2019-09-17 Thread Lizzi, Vincent
Hi Christian,

Testing again with the latest 9.3-SNAPSHOT, when using db:create() the raw 
files in the .zip archive are being placed in the correct location. Thank you 
for the speedy solution!

Vincent




Information Classification: General
From: Christian Grün 
Sent: Tuesday, September 17, 2019 12:59 PM
To: Lizzi, Vincent 
Cc: BaseX 
Subject: Re: [basex-talk] Problem with db:create and addraw

Hi Vincent,

The fix for adding binary resources via XQuery is available. I'd
appreciate if you could give it a quick shot [1].

Thanks in advance,
Christian

[1] 
http://files.basex.org/releases/latest/<http://files.basex.org/releases/latest/>



On Mon, Sep 16, 2019 at 11:58 AM Christian Grün
mailto:christian.gr...@gmail.com>> wrote:
>
> Hi Vincent,
>
> > It is good to hear that problem this is already known and will be fixed 
> > soon. As a possible workaround or now, is there a way to run a command 
> > script from XQuery that would enable using CREATE DB from XQuery?
>
> Currently no. We didn't enable script execution from XQuery because it
> might introduce too many new and complex side effects. For now, you'll
> have to create your database in a first step and add resources via
> db:store and db:add in a subsequent step.
>
> BaseX 9.3 is expected to be released by end of September or mid-October.
>
> Cheers,
> Christian
>
>
> > From: Christian Grün 
> > mailto:christian.gr...@gmail.com>>
> > Sent: Sunday, September 15, 2019 8:56 PM
> > To: Lizzi, Vincent 
> > mailto:vincent.li...@taylorandfrancis.com>>
> > Cc: BaseX 
> > mailto:basex-talk@mailman.uni-konstanz.de>>
> > Subject: Re: [basex-talk] Problem with db:create and addraw
> >
> >
> >
> > Hi Vincent,
> >
> >
> >
> > Thanks for sharing your observation. Just recently, we have stumbled upon 
> > this bug by ourselves; it should be fixed with 9.3 [1].
> >
> >
> >
> > Cheers,
> >
> > Christian
> >
> >
> >
> >
> >
> > [1] 
> > https://github.com/BaseXdb/basex/issues/1717<https://github.com/BaseXdb/basex/issues/1717>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Lizzi, Vincent 
> > mailto:vincent.li...@taylorandfrancis.com>>
> >  schrieb am So., 15. Sep. 2019, 21:47:
> >
> > Greetings,
> >
> >
> >
> > I am seeing different behavior between XQuery db:create() and the command 
> > CREATE DB when trying to create a database from a zip file, which is 
> > causing a problem. I want to import a zip file that contains XML and binary 
> > files (images, ets.) into a new BaseX database using db:create() in a web 
> > app.
> >
> >
> >
> > The command "CREATE DB" works as expected. For example, after running this 
> > script all contents of the zip file, including XML and binary files, are 
> > stored in a new database named "test".
> >
> >
> >
> > SET ADDRAW true
> >
> > SET ADDARCHIVES true
> >
> > SET ARCHIVENAME false
> >
> > CREATE DB test C:\path\to\file.zip
> >
> >
> >
> > The XQuery function db:create() is not working correctly. The following 
> > query should be equivalent to the above command script, however binary 
> > files get placed in a new folder that is created in the folder where BaseX 
> > was launch, and the binary files cannot be accessed through BaseX. The 
> > binary files should be placed in a "raw" folder within the folder for the 
> > database within BaseX' s data directory the same as what CREATE DB produces.
> >
> >
> >
> > db:create('test', ' C:\path\to\file.zip ', (), map{
> >
> > 'addraw': 'true',
> >
> > 'addarchives': 'true',
> >
> > 'archivename': 'false'
> >
> > })
> >
> >
> >
> >
> >
> > I've tried this in BaseX versions 9.0.2 and 9.2.4, both produce the same 
> > results.
> >
> >
> >
> > Is there a way to resolve this problem?
> >
> >
> >
> > Thanks,
> >
> > Vincent
> >
> >
> >
> >
> >
> >
> >
> > Vincent M. Lizzi - Digital Production Manager
> >
> > Taylor & Francis Group
> >
> > 530 Walnut St., Suite 850, Philadelphia, PA 19106
> >
> > E-Mail: 
> > vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>
> >
> > Phone: 215-606-4221
> >
> > Web: http://www.tandfonline.com/<http://www.tandfonline.com/>
> >
> >
> >
> > Taylor & Francis is a trading name of Informa UK Limited,
> >
> > registered in England under no. 1072954
> >
> >
> >
> > "Everything should be made as simple as possible, but not simpler."
> >
> >
> >
> >
> >
> > Information Classification: General
> >
> >
> >
> > Information Classification: General


Re: [basex-talk] Problem with db:create and addraw

2019-09-15 Thread Lizzi, Vincent
Hi Christian,

It is good to hear that problem this is already known and will be fixed soon. 
As a possible workaround or now, is there a way to run a command script from 
XQuery that would enable using CREATE DB from XQuery?

Thanks!
Vincent


From: Christian Grün 
Sent: Sunday, September 15, 2019 8:56 PM
To: Lizzi, Vincent 
Cc: BaseX 
Subject: Re: [basex-talk] Problem with db:create and addraw

Hi Vincent,

Thanks for sharing your observation. Just recently, we have stumbled upon this 
bug by ourselves; it should be fixed with 9.3 [1].

Cheers,
Christian


[1] 
https://github.com/BaseXdb/basex/issues/1717<https://github.com/BaseXdb/basex/issues/1717>





Lizzi, Vincent 
mailto:vincent.li...@taylorandfrancis.com>> 
schrieb am So., 15. Sep. 2019, 21:47:
Greetings,

I am seeing different behavior between XQuery db:create() and the command 
CREATE DB when trying to create a database from a zip file, which is causing a 
problem. I want to import a zip file that contains XML and binary files 
(images, ets.) into a new BaseX database using db:create() in a web app.

The command "CREATE DB" works as expected. For example, after running this 
script all contents of the zip file, including XML and binary files, are stored 
in a new database named "test".

SET ADDRAW true
SET ADDARCHIVES true
SET ARCHIVENAME false
CREATE DB test C:\path\to\file.zip

The XQuery function db:create() is not working correctly. The following query 
should be equivalent to the above command script, however binary files get 
placed in a new folder that is created in the folder where BaseX was launch, 
and the binary files cannot be accessed through BaseX. The binary files should 
be placed in a "raw" folder within the folder for the database within BaseX' s 
data directory the same as what CREATE DB produces.

db:create('test', ' C:\path\to\file.zip ', (), map{
  'addraw': 'true',
  'addarchives': 'true',
  'archivename': 'false'
})


I've tried this in BaseX versions 9.0.2 and 9.2.4, both produce the same 
results.

Is there a way to resolve this problem?

Thanks,
Vincent



Vincent M. Lizzi - Digital Production Manager
Taylor & Francis Group
530 Walnut St., Suite 850, Philadelphia, PA 19106
E-Mail: 
vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>
Phone: 215-606-4221
Web: http://www.tandfonline.com/<http://www.tandfonline.com/>

Taylor & Francis is a trading name of Informa UK Limited,
registered in England under no. 1072954

"Everything should be made as simple as possible, but not simpler."



Information Classification: General


Information Classification: General


[basex-talk] Problem with db:create and addraw

2019-09-15 Thread Lizzi, Vincent
Greetings,

I am seeing different behavior between XQuery db:create() and the command 
CREATE DB when trying to create a database from a zip file, which is causing a 
problem. I want to import a zip file that contains XML and binary files 
(images, ets.) into a new BaseX database using db:create() in a web app.

The command "CREATE DB" works as expected. For example, after running this 
script all contents of the zip file, including XML and binary files, are stored 
in a new database named "test".

SET ADDRAW true
SET ADDARCHIVES true
SET ARCHIVENAME false
CREATE DB test C:\path\to\file.zip

The XQuery function db:create() is not working correctly. The following query 
should be equivalent to the above command script, however binary files get 
placed in a new folder that is created in the folder where BaseX was launch, 
and the binary files cannot be accessed through BaseX. The binary files should 
be placed in a "raw" folder within the folder for the database within BaseX' s 
data directory the same as what CREATE DB produces.

db:create('test', ' C:\path\to\file.zip ', (), map{
  'addraw': 'true',
  'addarchives': 'true',
  'archivename': 'false'
})


I've tried this in BaseX versions 9.0.2 and 9.2.4, both produce the same 
results.

Is there a way to resolve this problem?

Thanks,
Vincent



Vincent M. Lizzi - Digital Production Manager
Taylor & Francis Group
530 Walnut St., Suite 850, Philadelphia, PA 19106
E-Mail: 
vincent.li...@taylorandfrancis.com
Phone: 215-606-4221
Web: http://www.tandfonline.com/

Taylor & Francis is a trading name of Informa UK Limited,
registered in England under no. 1072954

"Everything should be made as simple as possible, but not simpler."



Information Classification: General


Re: [basex-talk] Can't get `cdata-section-elements` to work at all for XSLT output

2018-08-01 Thread Lizzi, Vincent
Hugh,


As Gerrit mentioned, the issue you have encountered is due to the 
xslt:transform function returning, essentially, a parsed XML document, so the 
serialization controls that are declared in the XSLT are not being used. If you 
want the serialized output of the XSLT you can use the xslt:transform-text 
function. For example:


let $in := hello

let $xslt := doc('rss.xslt')

let $out := xslt:transform-text($in, $xslt)

return file:write-text('out.xml', $out)


I hope this helps.


Vincent


http://docs.basex.org/wiki/XSLT_Module#xslt:transform-text

http://docs.basex.org/wiki/File_Module#file:write-text






From: BaseX-Talk  on behalf of Hugh 
Guiney 
Sent: Wednesday, August 1, 2018 5:47:55 PM
To: Imsieke, Gerrit, le-tex
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Can't get `cdata-section-elements` to work at all for 
XSLT output

Thanks for testing Gerrit, that's good to know. Sounds like a
regression then. Shall I go ahead and file this on Github or does it
need further confirmation?

Christian, your suggestion seems to work around the issue; the CDATA
sections do come in that way. Except, all the elements get sent back
entity-escaped for some reason. I have to manually reverse it back
into XML using `result.replace( />/gi, '>' ).replace( / wrote:
> Hi Hugh,
>
> The second version where you specify the serialization options in XQuery
> works for me (BaseX GUI 8.6.5 with Saxon PE 9.6.0.7:
>
> 
>  xmlns="http://backend.userland.com/rss2"
> xmlns:content="http://purl.org/rss/1.0/modules/content/"
>  version="2.0">
> 
> 
> 
> 
>
> The first version cannot generate CDATA sections since the XSLT processor is
> not serializing anything; it’s the XQuery processor that serializes the
> result.
>
> The error that you are seeing, XPST0081, would be generated if there were no
> namespace declaration for the prefix 'content', maybe caused by an
> indistinguishable look-alike non-ASCII character in 'content'. Doesn’t seem
> to be the case. Maybe this is a bug that is specific to BaseX 9?
>
> Gerrit
>
>
>
> On 01.08.2018 21:17, Hugh Guiney wrote:
>>
>> Hello,
>>
>> First off, loving BaseX so far! Using it as the backend for an API I’m
>> building. However, I’m running into an issue. I’m trying to transform
>> my database XML into an RSS 2.0 feed. It’s mostly working fine, but I
>> can’t output CDATA content at all, which I need to do for
>> `content:encoded` elements.
>>
>> Specs:
>>
>> - BaseX 9.0.2 (started via basexserver script)
>> - Saxon-HE 9.8.0.12J from Saxonica
>> - java version "1.8.0_112"
>> - basex 0.9.0 (NodeJS)
>> - macOS Sierra 10.12.6
>>
>> ### First Attempt
>>
>> I set `cdata-section-elements` in the XSLT.
>>
>> rss.xq:
>> ```
>> xquery version "3.0";
>> declare option output:omit-xml-declaration "no";
>>
>> let $in :=
>> 
>> hello
>> 
>> let $style := doc( 'rss.xslt' )
>> return xslt:transform( $in, $style )
>> ```
>>
>> rss.xslt:
>> ```
>> 
>> > version="3.0"
>> xmlns="http://backend.userland.com/rss2"
>> xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>> xmlns:content="http://purl.org/rss/1.0/modules/content/"
>>>
>>>
>> > omit-xml-declaration="no"
>> cdata-section-elements="content:encoded"
>> />
>> 
>> 
>> hi
>> howdy
>> > />
>> 
>> 
>> 
>> ```
>>
>> Result:
>> ```
>> 
>> > xmlns="http://backend.userland.com/rss2"
>> xmlns:content="http://purl.org/rss/1.0/modules/content/"
>> version="2.0">
>> hi
>> howdy
>> hello
>> 
>> ```
>>
>> No CDATA sections.
>>
>> ### Second Attempt
>>
>> I set `cdata-section-elements` in the XQuery.
>>
>> rss.xq:
>> ```
>> xquery version "3.0";
>> declare namespace content = 
>> "http://purl.org/rss/1.0/modules/content/";
>> declare option output:omit-xml-declaration "no";
>> declare option output:cdata-section-elements "content:encoded";
>>
>> let $in :=
>> 
>> hello
>> 
>> let $style := doc( 'rss.xslt' )
>> return xslt:transform( $in, $style )
>> ```
>>
>> rss.xslt:
>> [Unchanged]
>>
>> Result:
>> [XPST0081] No namespace declared for 'content:encoded'.
>>
>> Clearly I declared the namespace two lines up.
>>
>> This looks like a bug to me, but any help appreciated if I’ve missed a
>> step here.
>>
>> Thanks,
>> Hugh
>>
>


Re: [basex-talk] Wasn't there a function, that would walk a website?

2018-08-01 Thread Lizzi, Vincent
Hi Andreas, Christian,

Here attached is a module that I wrote a while ago to limit the rate of 
requests sent to a web server. This module has been useful in accessing APIs 
where the SLA does not allow more than a certain number requests per minute, 
and might be useful for this web crawling scenario. Although Cristian's crawler 
module already has a sleep built in to it.

Cheers,
Vincent


-Original Message-
From: BaseX-Talk  On Behalf Of 
Christian Grün
Sent: Wednesday, August 01, 2018 3:57 AM
To: Andreas Mixich 
Cc: BaseX 
Subject: Re: [basex-talk] Wasn't there a function, that would walk a website?

Hi Andreas,

Just for fun, I wrote a little crawler in XQuery (see the attached files).

Please note that it’s just a stub; and it should surely be used decently, 
otherwise the remote server might block further access.

Cheers,
Christian


On Wed, Aug 1, 2018 at 8:08 AM Andreas Mixich  wrote:
>
> Am 31.07.2018 um 08:51 schrieb Christian Grün:
> > I guess you were dreaming ;) But it should definitely be possible to 
> > realize this in XQuery without too many lines of code..
>
> Ok, then that's what I am going to do. Thanks for clarification.
>
> --
> Goody Bye, Minden jót, Mit freundlichen Grüßen, Andreas Mixich


throttle.xqm
Description: throttle.xqm


Re: [basex-talk] about special characters

2018-05-18 Thread Lizzi, Vincent
Hi Bit,

The problem may have to do with the character encoding. Try providing the 
“encoding” option, e.g.

csv:parse($file, map{ "encoding": "windows-1252" })

I’d also like to call your attention to this module which provides a way to 
read Excel files directly from XQuery without the intermediary step of saving 
to CSV.

https://github.com/eliudmeza/OOXML-Library-XQuery-BaseXdb

I hope this is of some help.

Vincent


From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of BitRider001
Sent: Thursday, May 17, 2018 10:56 PM
To: Eliot Kimber 
Cc: BaseX 
Subject: Re: [basex-talk] about special characters

Update:

I found a way to export the Excel sheet into XML then created a new database 
and pointed to the XML file. This returned the results with the correct special 
characters.

My guess is it may have something to do with the CSV Parser.

Thanks,
BIt



‐‐‐ Original Message ‐‐‐
On May 18, 2018 10:11 AM, BitRider001 
> wrote:

Hi Eliot,

I loaded it by first creating a new database and pointing to the CSV file as 
input. The default encoding as far as I can tell is UTF-8 as shown in the 
attached screenshot. The CSV file was exported from Excel in UTF-8 encoding.

Perplexed,
Bit



‐‐‐ Original Message ‐‐‐
On May 18, 2018 9:53 AM, Eliot Kimber 
> wrote:

That mangled string is the result of reading UTF-8 byte sequences as 
single-byte characters, e.g. ASCII or some Windows code page.

How are you loading it into BaseX? It seems unlikely that BaseX-provided code 
would make this kind of basic mistake in reading text but it’s possible it 
applied the incorrect encoding for some reason.

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com



From: 
>
 on behalf of BitRider001 >
Reply-To: BitRider001 >
Date: Thursday, May 17, 2018 at 8:34 PM
To: Bridger Dyson-Smith >
Cc: 
"basex-talk@mailman.uni-konstanz.de" 
>
Subject: Re: [basex-talk] about special characters

Bridger,

Indeed the file was exported from Excel in UTF-8 encoding. I've tried opening 
the CSV file using Notepad/Wordpad and in Linux with vi in a terminal and in 
both situations it displays the correct special character.

Its only when I load it into a BaseX db and query it does it show itself, as 
you said, as "mangled". Saving the results into a text file also contains the 
"mangled" string.

Strange.

Bit



‐‐‐ Original Message ‐‐‐
On May 18, 2018 9:21 AM, Bridger Dyson-Smith 
> wrote:

Bit -
that's odd; it looks like the characters are being decomposed (or whatever the 
term is) and mangled but I'm not sure, unfortunately. Was the CSV an export 
from Excel? If so, I suppose this could be a Windows character set problem 
(cp-1252 or iso-8859-1 or something?).

Bridger

On Thu, May 17, 2018 at 9:11 PM BitRider001 
> wrote:
Hi Bridger,

Yes that is right. I'm on the latest (9.0.1). Attaching a screenshot here for 
anyone to take a look.


Bit



‐‐‐ Original Message ‐‐‐
On May 18, 2018 8:41 AM, Bridger Dyson-Smith 
> wrote:

Hi Bit - are you using the latest version? There was a problem with 9.0 and 
some Unicode characters. Christian and co. have a fix in v9.0.1.

HTH,
Bridger

On Thu, May 17, 2018, 7:54 PM BitRider001 
> wrote:
Hi,

I just joined the mailing list due to a problem I'm having displaying and 
storing special characters.

I started with a CSV and created a database from it and the CSV is in UTF-8. 
However, when I query the special characters become garbled. I'm using the GUI 
in Windows 10.

It starts with this in the CSV:
Cañelas

Then ends up with this when I export the query result into a text file:
Ca�las


Help please.

Bit












Re: [basex-talk] SSL support for BaseX REST API

2018-03-21 Thread Lizzi, Vincent
Forwarding to the list...


From: Stefania Axo <st...@us.ibm.com>
Sent: Wednesday, March 21, 2018 10:05 PM
To: Lizzi, Vincent
Subject: Re: [basex-talk] SSL support for BaseX REST API

ok
i was able to set the property of not starting BaseXServer on  both Jetty and 
WebSphere liberty in WEB-INF\web.xml

 
org.basex.httplocal
true
  

on the documentation 
http://docs.basex.org/wiki/Options#HTTPLOCAL<http://docs.basex.org/wiki/Options#HTTPLOCAL>
it says
"If the option is set to false, the database server will be disabled. "
is it the other way around?
"If the option is set to true, the database server will be disabled. "


thanks
Stefania













Re: [basex-talk] Directory from which xml files are loaded

2017-10-08 Thread Lizzi, Vincent
Hi Helmut,

What I usually do is create an empty file named .basexhome in the root folder 
of my project. Then, launch BaseX by going to the command line, change 
directory to the project folder, and run basexgui (or basexhttp, etc.). If the 
.basex config file has not already been created, the config gets created 
automatically with file paths relative to my project’s root folder.

Vincent



From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of helmut
Sent: Sunday, October 08, 2017 4:32 PM
To: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Directory from which xml files are loaded

Dear Christian,

thanks for the quick answer.

Maybe just having an option like WORKPATH in the .basex config file. If
it is set this path ist used instead of the home directory.

Thanks again,
Helmut

Am 2017-10-08 16:49, schrieb Christian Grün:
> Dear Helmut,
>
> File paths in BaseX commands will always be resolved against the
> working directory from which BaseX was started. There’s currently no
> way to change this behavior, but if it turns out that more people have
> stumbled upon this, we could think about altering the default behavior
> with BaseX 9.0. Fedback is welcome.
>
> All the best,
> Christian
>
> Am 07.10.2017 09:23 schrieb "helmut" :
>
>> Hi everyone.
>>
>> I am using basex in embedded mode. I set the basex base directory
>> with System.setProperty("org.basex.path", path). In this path I have
>> a .basexhome file which defines REPRO, DBPATH ... and so on.
>>
>> For loading an xml file to the database I use e.g. new Add(path,
>> file).execute(context). "file" is given as an absolute path.
>> However, when I query the database I get the error [FODC0002]
>> Resource 'path/xyz.xml' does not exist, with a path which points now
>> to my the system property user.home or user.dir.
>>
>> How can I change this behaviour? What I am missing? I don't want to
>> change user.home or user.dir, because other things might depend on
>> the properties.
>>
>> I read through the configuration guide and many other pages, but
>> didn't find something.
>>
>> http://docs.basex.org/wiki/Configuration
>>  [1]
>>
>> Thanks very much,
>> Helmut
>
>
>
> Links:
> --
> [1] 
> http://docs.basex.org/wiki/Configuration


Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-06-30 Thread Lizzi, Vincent
Kristian,

Out of curiosity, how are you linking the normalized texts in the -ft- database 
to the source documents? Is keeping a reference from the indexed text back to 
the source document a requirement in your application?

Thanks,
Vincent

From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Kristian 
Kankainen
Sent: Friday, June 30, 2017 5:27 PM
To: Xavier-Laurent SALVADOR ; Christian Grün 

Cc: BaseX 
Subject: Re: [basex-talk] Full-text lemmatizing and xml:lang


Hello

Sorry for being slow in reception, being a full-time father of two kids is my 
only excuse.

Thank you for enlightening answers. At first creating a separate database felt 
wrong and stupid, but after a while it felt just right and helping to organize 
different language elements via aggregation instead of composition.

Here is what I came up with:

(:~
This function takes a list of database names and optionally a list of language 
codes.
It creates separate full-text indexed databases for lemmatized searching of 
each language contained in the original database.
If the list of language codes is empty, all existing values of xml:lang found 
in the database is used.
The full-text databases are named 'dbname-ft-langcode'
Another function normalizes the texts, removes duplicate entries and inserts 
xml:id attributes
:)
declare updating function keeleleek:create-ft-indices-for-each-lang(
  $db-names as xs:string*,
  $lang-codes as xs:string*
) {
  for $db-name in $db-names
let $langs := if( not( empty( $lang-codes )))
 then( $lang-codes )
 else( distinct-values(db:open($db-name)//@xml:lang) )
for $lang in $langs
  let $lang-group := db:open($db-name)//*[@xml:lang = $lang]
  let $ft-db-name := concat($db-name, '-ft-', $lang)

  (: create full-text db for each language :)
  return
db:create(
  $ft-db-name,
  {$lang-group},
  $ft-db-name,
  map { 'ftindex': true(), 'language': $lang }
  )
};

Cheers
Kristian K

28.06.2017 09:45 Xavier-Laurent SALVADOR kirjutas:
Hi,

After reading Christian answer ( :-) ); I thought it could be interesting to 
sort your docs according to @xml:lang and create a new DB next to your corpus :

--
distinct-values(
 file:children('input-dir')[matches(.,'xml$')] ! 
(doc(.)//@xml:lang)
 )
!
db:create(
 'db-' || .,
 
  {
   for $file in 
file:children('/Users/xavier/Desktop/')[matches(.,'xml$')]
   return
   {doc($file)//*[@xml:lang=.]//text()}
}
  ,
  "myfile",
  map { 'ftindex': true(), 'language': . }
  )
--



2017-06-27 20:49 GMT+02:00 Christian Grün 
>:
Hi Kristian,

It is currently not possible to work with different languages in a
single database. This is mostly because all normalized tokens will end
up in the same internal index, and it would be a lot of effort to
diversify this software behavior.

As Xavier pointed out (thanks!), the best way indeed is to create
different databases, one per language. The following example has been
inspired by Xavier’s proposal; it groups all files by their language
and adopts the language in the name of the database:

  for $path-group in file:children('input-dir')
  where ends-with($path-group, '.xml')
  group by $lang := ($path-group//@xml:lang)[1]
  return db:create(
'db-' || $lang,
$path-group,
(),
map { 'ftindex': true(), 'language': $lang }
  )

Hope this helps,
Christian




On Tue, Jun 27, 2017 at 5:19 PM, Xavier-Laurent SALVADOR
> 
wrote:
> Hi Kristian,
>
> This is useful for creating automatically databases according to xml:lang
> attribute
>
> let $dir := '/Users/me/myDesktop/'
> for $file in file:list($dir)[matches(.,'xml')]
>  return
>   let $flag := (data(doc($dir||$file)/div/@xml:lang))
>return
> db:create("DB", $dir||$file, (), map { 'ftindex':
> true(),'language':$flag })
>
> Or you can "ft:tokenize" your string mapping {'language':$flag} into your
> query
>
> Hope I understood the problem :) Else return 'sorry'
>
> 2017-06-27 16:57 GMT+02:00 Kristian Kankainen 
> >:
>>
>> Hello
>>
>> I have documents with text in several languages. When creating a database
>> in BaseX I can choose *one* language for stemming for the full-text search
>> index. Is there a way BaseX could lemmatize according to the elements
>> xml:lang attribute?
>>
>> Best regards
>> Kristian K
>>
>
>
>
> --
> Ce message peut contenir des informations réservées exclusivement à son
> destinataire. Toute diffusion  sans autorisation est interdite. Si vous n'en
> êtes pas le destinataire, merci de prendre contact avec l'expéditeur et de
> détruire ce 

Re: [basex-talk] Java Modules - Constructor

2017-05-11 Thread Lizzi, Vincent
I like the idea of having a “custom” lib folder, also for the .zip 
distribution. That would make updating to new versions of BaseX a little easier.



Sent from my Windows 10 phone



From: Christian Grün
Sent: Thursday, May 11, 2017 4:09 PM
To: Andy Bunce
Cc: 
basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Java Modules - Constructor



> In fact I have modified the bat to add a "lib2" folder where I can easier 
> manage extension jars.
>
> set CP=%MAIN%/BaseX.jar;%MAIN%/lib/*;%MAIN%/lib2/*

Good point. For the Windows executable, I thought about adding a
'custom' sub-directory in the lib folder. The jars in the lib
directory could be deleted with each update, but the custom folder
would be preserved.

> But is the original problem a bug?
> That is after installing the jar via REPO install t:new("aaa") fails although 
> it works when added to lib/

Be sure I’ll check this out later this week.



Re: [basex-talk] count(//elem) not optimized, even though `elem` is in the index

2017-02-22 Thread Lizzi, Vincent
Hi Christian,

Does this mean, for a given set of XML files that have several namespace 
declarations attached to the root element including a default namespace, if 
namespaces are removed when these XML files are loaded into BaseX, for example 
by using the “Strip namespaces” option in the GUI, BaseX may be able to use 
additional query optimizations?

If the answer is yes, I may have to try re-importing several databases with 
“Strip namespaces” turned on to see if there is any difference in query 
performance.

Thank you,
Vincent


From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Christian Grün
Sent: Wednesday, February 22, 2017 9:54 AM
To: Gioele Barabucci 
Cc: BaseX 
Subject: Re: [basex-talk] count(//elem) not optimized, even though `elem` is in 
the index

Hi Gioele,

> I wonder if the presence of the namespace somehow confuses the optimizer.

Exactly, that’s the reason. For some historical reason (but not such a
wise one, as most quoted “historical reasons” are), we decided to
index the node names without considering the namespace URI. As a
result, the index:element-names function will yield…

xml

…for the following document:





For the same reason, various optimizations that are based on the
database statistics will only get into effect if a document contains
no, or at most one global, namespace declaration. In various cases,
optimizations could still be made possible (e.g. if we know that the
element/attribute names with and without namespace URIs are distinct),
but that hasn’t been implemented so far.

Cheers,
Christian


> I was stressing the BaseX 8.6 planner/optimizer when I noticed that
> expressions like `count(//elem)` are not optimized at all, even though they
> are correctly indexed, as demonstrated by `index:element-names()`.
>
> The current database is a 300 MB TEI document. All the elements are in the
> `http://www.tei-c.org/ns/1.0` namespace.
>
> The following test case will report the correct number, but it will take a
> couple of seconds to run, instead of a few milliseconds.
>
> ```
> declare namespace 
> tei="http://www.tei-c.org/ns/1.0";
>
> let $n := index:element-names("monier")[. = 're']/@count
>
> let $c := count(//tei:re)
>
> return {$n}{$c}
> ```
>
> I wonder if the presence of the namespace somehow confuses the optimizer.
> The same problem can be observed running the same test case with
>
> ```
> declare default element namespace 
> "http://www.tei-c.org/ns/1.0";
> [...]
> let $c := count(//re)
> ```
>
> Regards,
>
> --
> Gioele Barabucci 
> >
>


Re: [basex-talk] retrieve a sequence of all values within an attribute index

2016-10-10 Thread Lizzi, Vincent
Hi Christian,

Thank you!!! I will have a look.

Vincent


From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: Sunday, October 09, 2016 4:49 PM
To: Alex Muir <alex.g.m...@gmail.com>
Cc: Lizzi, Vincent <vincent.li...@taylorandfrancis.com>; BaseX 
<basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] retrieve a sequence of all values within an attribute 
index

Hi Vincent, hi Alex,

I am glad to report that with BaseX 8.6 the distinct values of numeric
elements and attributes will also be stored in the index. You are
invited to check out the latest stable snapshot [1].

Cheers,
Christian

[1] 
http://files.basex.org/releases/latest/<http://files.basex.org/releases/latest/>



On Mon, Jul 11, 2016 at 10:56 PM, Alex Muir 
<alex.g.m...@gmail.com<mailto:alex.g.m...@gmail.com>> wrote:
>
> On Mon, Jul 11, 2016 at 7:17 PM, Lizzi, Vincent
> <vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>>
>  wrote:
>>
>> I have a similar situation in which I want to get all distinct values of a
>> specific attribute. I’ve tried using 2 different approaches: group and
>> distinct-values. On small or medium size databases group tends to be faster.
>> When trying to get distinct values of a specific attribute from large
>> databases however both approaches are timing out for me. I’m looking for a
>> way to optimize this query:
>>
>>
>>
>> distinct-values(for $db in db:list() return
>> distinct-values(db:open($db)//@sec-type))
>>
>>
>
>
>
> With the current logic available it look possible given an attribute index
> on sec-type to associate a prefix onto the attribute value prior to
> insertion into the database like sec-type="type:13F" with the prefix type:
> and then use index:texts("dbname","type:") to get a distinct list of types
> all be it with a prefix that would need adjusting logic in using that data
> or querying.
>
>
>
> Regards
> Alex
> tech.jahtoe.com
> bafila.jahtoe.com


Re: [basex-talk] using rest api to move a file

2016-09-22 Thread Lizzi, Vincent
Adil,

If you can use RESTXQ to create a function, you could use something like:

db:add($todb, db:open($fromdb, $path), $path)

inside a function available that is made available via RESTXQ.

Vincent

http://docs.basex.org/wiki/RESTXQ


From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Christian Grün
Sent: Thursday, September 22, 2016 11:19 AM
To: Adil Hasan 
Cc: BaseX 
Subject: Re: [basex-talk] using rest api to move a file

Hi Adil,

Thanks for mentioning the null pointer exception, this is clearly
something that should be fixed (if you are interested, you can follow
the issue that I’ve just created [1]).

rename test1.xml test-moved/test1.xml

With this command, you can move documents inside a single database.
There is currently no fixed solution to move resources across
database, so you’ll probably have to use two requests for this (GET
and PUT).

Hope this helps
Christian

[1] 
https://github.com/BaseXdb/basex/issues/1358



> 
>
> The command returns a 200, but nothing has moved. I am clearly doing something
> wrong. If I change the URL in the cURL request to "http://localhost:8984/rest;
> I get a java null pointer exception.
>
> Do you know they syntax I should use for the REST API to move a file?
>
> many thanks,
> adil


Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-08 Thread Lizzi, Vincent
As it so happens, I just received a 20.5 Mb Excel file which I am loading into 
BaseX as CSV. To prepare the file, I opened it in Excel and saved as CSV 
format. The CSV file is 70 Mb. Here is what I observe loading this CSV file to 
BaseX a few different ways.


1.   BaseX GUI – Using “Create Database” with input format CSV, the CSV was 
loaded and converted to XML in a few seconds.


2.   Command script – The CSV was loaded and converted to XML in about 10 
seconds.

SET PARSER csv
SET CSVPARSER encoding=windows-1252, header=true, separator=comma
SET CREATEFILTER *.csv
create database csvtest1 "path\to\file.csv"


3.   XQuery – The CSV was loaded and converted to XML in about 20 seconds.

db:create('csvtest2', csv:parse(file:read-text(' path\to\file.csv'), 
map{'encoding': 'windows-1252', 'header': true()}), 'file.csv' )


4.   XQuery (parsing only) – CSV file was parsed in about 4 seconds.

csv:parse(file:read-text(' path\to\file.csv'), map{'encoding': 'windows-1252', 
'header': true()})


5.   XQuery (parsing only) using map – The CSV file was parsed in about 6 
seconds.

csv:parse(file:read-text(' path\to\file.csv'), map{'encoding': 'windows-1252', 
'header': true(), 'format': 'map'})

These alternate methods are, from what I can see, pretty equivalent except for 
the last one which produces a map instead of XML. At what point, i.e. how much 
data in CSV format, would using map start to offer benefits beyond mere 
convenience?


I came across an example in the documentation that gave me an error message. 
The Command Line example at http://docs.basex.org/wiki/Parsers#CSV_Parser has


SET CSVPARSER encoding=utf-8, 
lines=true, header=false, separator=space

When trying this in BaseX 8.2.3 I get an error message:

Error: PARSER: csv Unknown option 'lines'.

The “lines” option is not listed in the CSV Module parser documentation at 
http://docs.basex.org/wiki/CSV_Module#Options.

I didn’t want to correct the example in the documentation without checking 
whether it is actually incorrect. Does this example need to be updated?

Vincent



From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Hans-Juergen 
Rennau
Sent: Thursday, September 08, 2016 10:02 AM
To: Marc van Grootel 
Cc: BaseX 
Subject: Re: [basex-talk] csv:parse in the age of XQuery 3.1

What concerns me, I definitely want the CSV as XML. But the performance 
problems have certainly nothing to do with XML versus CSV (I often deal with > 
300 MB XML, which is parsed very fast!) - it is the parsing operation itself 
which, if I'm not mistaken, is handled by XQuery code and which must be shifted 
into the Java implementation.

Kind regards,
Hans-Jürgen

Marc van Grootel 
> schrieb am 
15:55 Donnerstag, 8.September 2016:

I'm currently dealing with CSV a lot as well. I tend to use the
format=map approach but not nearly as large as 22 MB CSV yet. I'm
wondering if, or how much more efficient it is to deal with this type
of data as arrays and map data structures versus XML. For most
processing I can leave serializing to XML to the very end. And if too
large I would probably also chunk it before storing the end result.

Intuitively I would think that dealing with CSV as maps/arrays should
be much faster and less memory intensive.


--Marc




Re: [basex-talk] 8.5.3 and xquery:invoke ...

2016-09-02 Thread Lizzi, Vincent
I've run into what might be a variation of the same problem.

Inside an expath packaged module, a .xqm file uses relative file path 
references to .xsl files contained within the module. In my case, the relative 
file path references are being used with xslt:transform(). The path resolution 
might be exhibiting the same problem that you encountered with xquery:invoke(). 

With BaseX 8.2.3, using a relative file path inside a module worked. With BaseX 
8.5.3 the same module gives an error "Resource ... does not exist."

By adding file:base-dir() to resolve the path helps, and seems to work in both 
versions 8.2.3 and 8.5.3.

Marco, if you change:

xquery:invoke("src2/q2.xqm")

to this:

xquery:invoke(file:base-dir() || "src2/q2.xqm")

Does this solve the "Resource ... does not exist." problem for you?

Vincent



-Original Message-
From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Marco Lettere
Sent: Wednesday, August 31, 2016 12:53 PM
To: BaseX 
Subject: [basex-talk] 8.5.3 and xquery:invoke ...

Hello again,
at the end I managed to reproduce the issue we are currently facing with 
xquery:invoke in 8.5.3 (and also in latest snapshot).

Please consider the three queries [1,2,3]. The store [2] into a subdirectory 
src of the directory containing [1] and store [3] into a directory src2 of src.
You should get the error message [4].
If instead you change the parameter of xquery:invoke to "src/src2/q2.xqm" it 
works.
Seems like the reference for static-base-uri in [2] remains the one of [1] even 
if it is reported by static-base-uri() to be correctly 
"/home/user/src/q1.xqm"(I've double-checked this).
Sorry it's not clear to me whether this is meant to be the correct behaviour ...
Thanks for any help,
Marco.

[1] q.xqm
import module namespace q1="urn:q1" at "src/q1.xqm";

q1:f()

[2] q1.xqm
module namespace q1 = "urn:q1";

declare function q1:f(){
   xquery:invoke("src2/q2.xqm")
};

[3] q2.xqm
1 = 1

[4]
Stopped at /home/user/src/q1.xqm, 4/16:
[FODC0002] Resource 'src2/q2.xqm' does not exist.



Re: [basex-talk] Autocommit option using sql:connect to Oracle 11.2.0.3 with ojdbc6.jar

2016-08-15 Thread Lizzi, Vincent
This is only tangentially related, but I have used the SQL module to connect to 
Oracle databases so I want to mention this in case it helps someone. The Oracle 
driver ojdbc7.jar seems to have trouble with complex SQL SELECT statements, and 
gives misleading error messages. When encountering problems with SELECT 
statements, the solution I’ve found is to add a wrapping select statement. For 
example:

SELECT * FROM ( …original query… )

The same occurs using ojdbc7 from Java, so this problem is related to the 
Oracle driver and is not specific to BaseX.

Vincent



From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Christian Grün
Sent: Sunday, August 14, 2016 4:53 PM
To: chrisis ; BaseX 
Subject: Re: [basex-talk] Autocommit option using sql:connect to Oracle 
11.2.0.3 with ojdbc6.jar

Hi Chris (cc to the list),

> I've also attached the basexdebug.txt but it just shows the requests so
> probably not useful.

Sounds ok so far.

I have slightly revised our SQL code (although it shouldn’t make a big
difference), and I have uploaded a new snapshot [1]. Could you give it
a try?

In general, our SQL wrapper is pretty simple [1,2]. Maybe you could
check if you get autocommit=false running with a plain Java JDBC code
snippet?

Hope this helps,
Christian

[1] 
http://files.basex.org/releases/latest/
[1] 
https://github.com/BaseXdb/basex/blob/9a700bf1f663b18eb578acb1a8aa8bbb4d4dd707/basex-core/src/main/java/org/basex/query/func/sql/SqlConnect.java#L56-L59
[2] 
https://github.com/BaseXdb/basex/blob/9a700bf1f663b18eb578acb1a8aa8bbb4d4dd707/basex-core/src/main/java/org/basex/query/func/sql/SqlFn.java



>
> Thanks,
>
> chris
>
> On 12 August 2016 at 14:46, Christian Grün 
> >
> wrote:
>>
>> Could you try the -d command-line flag (or the command SET DEBUG true
>> in the GUI) and check if you get some helpful feedback on stderr?
>>
>>
>> On Fri, Aug 12, 2016 at 3:44 PM, chrisis 
>> > wrote:
>> > The connection is opened successfully.Not getting any feedback.
>> > I'm doing a lot of inserts in a loop and it's committing after every
>> > insert.
>> > My issue is with the overhead of all the extra commits
>> >
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> > chris
>> >
>> >
>> > On 12 August 2016 at 14:19, Christian Grün 
>> > >
>> > wrote:
>> >>
>> >> > I'm trying to set the autocommit option off and I can't seem to get
>> >> > it
>> >> > to
>> >> > work.
>> >> > I'm passing the option to sql:connect as a map
>> >> > map { 'autocommit': false() }
>> >> > I also tried map { 'autoCommit': false() } as the Oracle JDBC docs
>> >> > seem
>> >> > to
>> >> > suggest.
>> >>
>> >> What’s happening? Do you get any feedback?
>> >>
>> >> Thanks
>> >> Christian
>> >
>> >
>
>


Re: [basex-talk] retrieve a sequence of all values within an attribute index

2016-07-11 Thread Lizzi, Vincent
I have a similar situation in which I want to get all distinct values of a 
specific attribute. I’ve tried using 2 different approaches: group and 
distinct-values. On small or medium size databases group tends to be faster. 
When trying to get distinct values of a specific attribute from large databases 
however both approaches are timing out for me. I’m looking for a way to 
optimize this query:

distinct-values(for $db in db:list() return 
distinct-values(db:open($db)//@sec-type))

Thanks,
Vincent


From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Christian Grün
Sent: Monday, July 11, 2016 2:46 PM
To: Alex Muir 
Cc: BaseX 
Subject: Re: [basex-talk] retrieve a sequence of all values within an attribute 
index

> I wrote the following query which returns 59 distinct periods from a 8gb
> db.. It's quite slow but it works

Ah, well… I guess that all the values are numeric? In that case, only
the min and max value will be stored in the statistics (and that won’t
help you in fact). Bad luck. You can call index:facets("13F") to get
more insight… Maybe we can fix that in future and store distinct
numbers as well.


> let $periods := distinct-values(db:open("13F")//data/@periodOfReport)
> let $transform :=
> 
> {
> for $period in $periods
> return {$period}
> }
> 
>
> return file:write('/var/www/appusec3.jahtoe.com/xml/periods.xml',
> $transform)
>
>
> Regards
> Alex
> tech.jahtoe.com
> bafila.jahtoe.com
>
> On Mon, Jul 11, 2016 at 6:21 PM, Christian Grün 
> >
> wrote:
>>
>> > any way to retrieve the index for a specific attribute name?
>>
>> Nope, sorry. The index itself has no information on the location of
>> the text and attribute values. You’ll have to use distinct-values:
>>
>> distinct-values(//periodOfReport)
>>
>> If the number of distinct values is smaller than MAXCATS [1], the path
>> index will be utilized to speed up your query [2]. You can set MAXCATS
>> to a much larger value, but this might slow down the time required for
>> opening a database.
>>
>> Hope this helps
>> Christian
>>
>> [1] 
>> http://docs.basex.org/wiki/Options#MAXCATS
>> [2] 
>> http://docs.basex.org/wiki/Indexes#Path_Index
>
>


Re: [basex-talk] unexpected whitespace-handling behavior in BaseX 8.3.1

2016-06-24 Thread Lizzi, Vincent
To add to these examples, I think another way to add a document with whitespace 
preserved is:

declare option db:chop 'false';
db:add("DB",  doc("http://example.com/doc.xml;), "doc.xml")

Is this equivalent to db:add("DB", "http://example.com/doc.xml;, "doc.xml", map 
{ "chop": false() }) ?

Vincent


From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of C. M. 
Sperberg-McQueen
Sent: Friday, June 24, 2016 1:12 PM
To: Christian Grün 
Cc: C. M. Sperberg-McQueen ; BaseX 

Subject: Re: [basex-talk] unexpected whitespace-handling behavior in BaseX 8.3.1


On Jun 21, 2016, at 12:43 AM, Christian Grün wrote:

>> Yes. What puzzles me is that calling db:replace with a fourth
>> argument of map { "chop" : false() } appears not to have any
>> effect in the database in question.
>
> This is probably because the input you are specifying are nodes, for
> which whitespaces have already been chopped in a previous step I tried
> to explain this better now in our documentation [1].

[Headslap] D'oh! Thank you very much.

I have created two simple examples to try to teach myself what
is going on here. Is the characterization correct?

db:add("DB", "http://example.com/doc.xml",
"doc.xml", map { "chop" false() }) -- parses the file at the URI
http://example.com/doc.xml with the CHOP option 
turned
off (so whitespace is preserved), and adds it to database DB.


db:add("DB", doc("http://example.com/doc.xml"),
"doc.xml", map { "chop" false() }) -- parses the file at the URI
http://example.com/doc.xml with the default parser
settings and adds it to database DB. Note that the CHOP setting
in the fourth argument has no effect, since the document
is parsed by the doc() function, not the db:add function.

If you think it helpful, feel free to add these to the documentation
for db:add() or db:replace; it might help even readers like me
to understand what is going on.

best,

Michael

--

* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net





Re: [basex-talk] question about http server

2016-06-07 Thread Lizzi, Vincent
One hint on your PHP code SemantiqFn: variables file and relation are being 
concatenated into the query string by PHP. Instead you can bind variables, 
which is safer in case the data contains a quote which would break your 
concatenation. Declare the variables at the top of each query:

declare variable $file external;
declare variable $relation external;

then in your PHP pass these variables to the query using the bind method.

Vincent


From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Mohamed kharrat
Sent: Tuesday, June 07, 2016 12:48 PM
To: Christian Grün 
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] question about http server

can you please now see in attachement the php file containing the connexion of 
baseX ?
it do not work for me, it show me (cannot communicate with server)



De : Christian Grün 
>
À : Mohamed kharrat >
Cc : 
"basex-talk@mailman.uni-konstanz.de" 
>
Envoyé le : Mardi 7 juin 2016 17h39
Objet : Re: [basex-talk] question about http server

It both works for me.

A little hint: Better try to avoid posting private URLs to the list.

On Tue, Jun 7, 2016 at 6:27 PM, Mohamed kharrat 
> wrote:
> Can you please try this 2 adress and tell me what you get?
>
> http://197.0.28.226:8984
>
> and
> http://197.0.28.226:80
>
> Thanks
>
> 
> De : Christian Grün 
> >
> À : Mohamed kharrat >
> Cc : 
> "basex-talk@mailman.uni-konstanz.de"
> >
> Envoyé le : Mardi 7 juin 2016 11h07
> Objet : Re: [basex-talk] question about http server
>
> Hi Mohamed,
>
> This sounds like a general port issue, it’s probably not strictly
> related to BaseX. Did you check your firewall?
>
> Regards,
> Christian
>
>
> On Mon, Jun 6, 2016 at 5:23 PM, Mohamed kharrat 
> > wrote:
>> Hi,
>> i have setted my pc as server over TinyWeb
>> i have everything ok.
>> i opened in  my pc  8984 port.
>> if i execute my php file containing xquery query locally, it works but
>> if i execute it remotely over the ip and port, it show me this message:
>> can't communicate with server.
>>
>> Anyone have an idea about that?
>>
>> Thanks
>>
>
>



[basex-talk] Optimal size for large databases

2016-06-07 Thread Lizzi, Vincent
I am importing a large (~3 million) set of XML documents to BaseX and am 
running into some problems as the databases grow beyond a few gigabytes. I am 
using BaseX 8.4.4, and have increased the memory available to BaseX to 12288m 
(12 Gb) using -Xmx (the machine has 20 Gb total). The documents are being 
stored in several databases that range in size from 300 Mb up to 25 Gb.

My question is: what is an optimal maximum database size (in gigabytes)? I am 
hoping for some general advice. I appreciate that the answer can vary depending 
various factors.

The problems that I've encountered with the larger databases are:

1. Running OPTIMIZE or OPTIMIZE ALL on the larger databases results in an out 
of memory error. I have switched to running CREATE INDEX to separately create 
text, attribute, token, and fulltext indexes, and found that creating these 
indexes separately produces fewer out of memory errors. I would like to be able 
to use OPTIMIZE ALL because over time some documents will be removed from the 
databases and the documentation indicates that optimize all will remove stale 
information from the indexes.

2. The command scripts that run CREATE INDEX or OPTIMIZE (ALL) seem to tie up 
the machine for a long time, maybe due to heavy disk access.

3. As the database grows in size the rate at which documents are added slows 
down. I have been measuring the number of documents imported, and observed 
rates over 100 documents per minute, and typical rates are around 60 - 30 
documents per minute. As the database grows over a few gigabytes the speed 
slows to around 20 documents per minute. This is not much of a problem because 
when I see the rate slow down I can start a new database. Unfortunately I have 
been recording the number of documents, not the database size.


In case this information is useful, my project is structured as follows:


* There is 1 central index database which records for each document the 
BaseX database name and path where a document is stored, and some metadata that 
we use to identify or locate documents.

* There are multiple content databases to store the actual documents. 
These content databases are organized by DTD and time period.

* Each insert is done using the BaseX REST API. A BaseX HTTP server 
instance is running to receive documents, and a basex instance is running from 
the command line to locate and provide documents. Each insert is done by a POST 
that includes data and an updating query which adds (using db:replace) to the 
central index database and a document database in one "transaction". This helps 
to make the import resilient to duplicate documents, and to any problem that 
can prevent a single from document being added, and allows the process to 
continue if interrupted.

I will probably need to re-organize the content databases so that each database 
is only a few gigabytes in size. Does anyone have advice on what would be a 
good maximum size for each database?

Thanks,
Vincent


Vincent M. Lizzi - Electronic Production Manager
Taylor & Francis Group
530 Walnut St., Suite 850, Philadelphia, PA 19106
E-Mail: 
vincent.li...@taylorandfrancis.com
Phone: 215-606-4221
Fax: 215-207-0047
Web: http://www.tandfonline.com/

Taylor & Francis is a trading name of Informa UK Limited,
registered in England under no. 1072954

"Everything should be made as simple as possible, but not simpler."



Re: [basex-talk] XLSX to XML

2016-04-06 Thread Lizzi, Vincent
Florian,

The xsl-excel-engine project might help you get started working with xlsx files:

https://github.com/foglcz/xsl-excel-engine

xsl-excel-engine is for writing XML files so it does not do what you are 
asking, but the wiki documentation provides an introduction to the Excel file 
format. It includes scripts to parse stringValues.xml which you might be able 
to use.

Vincent


From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Dirk Kirsten
Sent: Wednesday, April 06, 2016 6:44 AM
To: Florian Eckey ; 
basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] XLSX to XML

Hello Florian,

please remember to always include the list when replying as it allows
others to benefit from our exchange as well and also allows others to
help you.

I just want to point out, again, that you it doesn't make sense to say
"convert the excel file to xml", because it already is XML. Yes, there
might be multiple XML files and they reference each other, but this is
just a very normal thing for XML and for every reasonably complex system
to reference each other.

So I guess what you really want is an XQuery module which allows you to
easily manipulate xlsx files without the need to worry about internal
ooxml format stuff like shared strings. This if course makes a lot of
sense! However, as the format is ridiculously complicated it is a hard
task to write a general-purpose library for all kinds of manipulations.
As Christian indicated we wrote for ourself some small helpers functions
which dies the stuff we need in our projects, but is very far from being
complete on the xlsx standard.

Cheers
Dirk

On 04/06/2016 12:35 PM, Florian Eckey wrote:
> Hello Dirk,
>
> thanks. That was my idea as well. But the format xlsx is really complicated, 
> because the content (sheet01.xml) in the cells is referenced to an other 
> document (stringValues.xml) using an index. I guess anyone has implemented a 
> simple xquery to convert the excel file to xml?
> But if nobody has done that before, i have to spend time for the 
> implementation on my own. :)
>
> Thanks, best,
> Florian
>
>
>
>
> Am 06.04.16, 12:26 schrieb "Dirk Kirsten" 
> >:
>
>> Hello Florian,
>>
>> xlsx is just a zip file containing many xml files. you can simply unzip
>> the xlsx (e.g. by using the BaseX zip module), modify the xml files
>> inside using standard XQuery and rezip it again as xslx.
>>
>> Cheers
>> Dirk
>>
>> On 04/06/2016 12:18 PM, Florian Eckey wrote:
>>> Hi guys,
>>>
>>> are there any ideas how to convert excel's xlsx (not xls) files to xml
>>> with XQuery or to use a Java library which can be imported? It looks
>>> like BaseX has no internal functions as for instance MarkLogic.
>>>
>>> Any ideas or example implementations to do that in XQuery or Java?
>>>
>>> Best,
>>> Florian
>> --
>> Dirk Kirsten, BaseX GmbH, http://basexgmbh.de
>> |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
>> |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
>> | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
>> `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22
>>

--
Dirk Kirsten, BaseX GmbH, http://basexgmbh.de
|-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
|-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
| Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
`-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22


Re: [basex-talk] BaseX, Zorba, e-mail module

2016-02-04 Thread Lizzi, Vincent
Hi Christian,

I’m having trouble installing the BaseX email module. The install appears to 
succeed but then the module is not available.

BaseX 8.4 beta 2fabc36 [Standalone]
Try 'help' to get more information.
> repo install http://files.basex.org/modules/org/basex/modules/email/EMail.jar
Package 'http://files.basex.org/modules/org/basex/modules/email/EMail.jar' 
installed in 1556.4 ms.
> repo list
Name  Version  Type  Path
-

0 package(s).

After attempting to install the module, when trying the email.xq example there 
is an error message “[XQST0059] Module 'http://basex.org/modules/email/EMail' 
not found.”

Does the email module work with current versions of BaseX?

Also, is it possible to send HTML formatted email with images using the email 
module?

Thank you,
Vincent




From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Christian Grün
Sent: Wednesday, September 23, 2015 3:24 AM
To: Tim Thompson 
Cc: BaseX 
Subject: Re: [basex-talk] BaseX, Zorba, e-mail module

Hi Tim,

We have implemented an XQuery/Java e-mail module quite some time ago,
which we are using in various commercial projects. The main reason why
we didn't make it public yet is that we didn't have time to document
it properly. I have just uploaded this module, and you can install it
via:

repo install 
http://files.basex.org/modules/org/basex/modules/email/EMail.jar

The jar file contains two simple example for sending mails (in the
xquery sub-directory).

Looking forward to your feedback.
Christian
___

On Wed, Sep 23, 2015 at 7:04 AM, Tim Thompson 
> wrote:
> Hello,
>
> I would like to be able to send e-mail from a Web app I am developing in
> BaseX and was wondering whether anyone else had implemented this
> functionality (or whether a BaseX e-mail module might be in the works?).
>
> I see that Zorba has an e-mail module with support for SMTP and IMAP (not to
> mention many other useful modules), which leads me to wonder whether anyone
> has worked on a BaseX connector for Zorba.
>
> In the meantime, I decided to try using the BaseX process module to call
> Zorba, like so:
>
> proc:system(
> 'zorba', (
> '-q', convert:binary-to-string(
> db:retrieve(
> 'out', 'email.xq'
> )
> )
> )
> )
>
> I stored the XQuery code for Zorba as a raw file in the BaseX database,
> since storing it in the webapp directory caused an error, due to the
> unrecognized Zorba module declarations. This approach seems to work
> adequately, but does anyone have any suggestions for improvement, or a
> different approach?
>
> Thanks in advance,
> Tim
>
>
> --
> Tim A. Thompson
> Metadata Librarian (Spanish/Portuguese Specialty)
> Princeton University Library
>


Re: [basex-talk] BaseX, Zorba, e-mail module

2016-02-04 Thread Lizzi, Vincent
Hi Christian,

I might be able get it working. The link you mentioned isn’t easily findable in 
the list archive. Where is the source of the email module located?

Thanks,
Vincent


From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: Thursday, February 04, 2016 6:27 PM
To: Lizzi, Vincent <vincent.li...@taylorandfrancis.com>
Cc: Tim Thompson <timat...@gmail.com>; BaseX 
<basex-talk@mailman.uni-konstanz.de>
Subject: RE: [basex-talk] BaseX, Zorba, e-mail module


Hi Vincent,

I'm sorry, we have no time to provide better support for this module. Maybe you 
can take advantage of the source code that I referenced in an earlier reply? It 
should be pretty straightforward. Maybe you could build an updated module and 
provide it back to the community?

Christian
Am 04.02.2016 10:45 nachm. schrieb "Lizzi, Vincent" 
<vincent.li...@taylorandfrancis.com<mailto:vincent.li...@taylorandfrancis.com>>:
Hi Christian,

I’m having trouble installing the BaseX email module. The install appears to 
succeed but then the module is not available.

BaseX 8.4 beta 2fabc36 [Standalone]
Try 'help' to get more information.
> repo install 
> http://files.basex.org/modules/org/basex/modules/email/EMail.jar<https://protect-us.mimecast.com/s/lN5JBRUa5O8kTm>
Package 
'http://files.basex.org/modules/org/basex/modules/email/EMail.jar<https://protect-us.mimecast.com/s/lN5JBRUa5O8kTm>'
 installed in 1556.4 ms.
> repo list
Name  Version  Type  Path
-

0 package(s).

After attempting to install the module, when trying the email.xq example there 
is an error message “[XQST0059] Module 
'http://basex.org/modules/email/EMail<https://protect-us.mimecast.com/s/zNXlBdUlrngxuZ>'
 not found.”

Does the email module work with current versions of BaseX?

Also, is it possible to send HTML formatted email with images using the email 
module?

Thank you,
Vincent




From: 
basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>
 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>]
 On Behalf Of Christian Grün
Sent: Wednesday, September 23, 2015 3:24 AM
To: Tim Thompson <timat...@gmail.com<mailto:timat...@gmail.com>>
Cc: BaseX 
<basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>>
Subject: Re: [basex-talk] BaseX, Zorba, e-mail module

Hi Tim,

We have implemented an XQuery/Java e-mail module quite some time ago,
which we are using in various commercial projects. The main reason why
we didn't make it public yet is that we didn't have time to document
it properly. I have just uploaded this module, and you can install it
via:

repo install 
http://files.basex.org/modules/org/basex/modules/email/EMail.jar<https://protect-us.mimecast.com/s/lN5JBRUa5O8kTm>

The jar file contains two simple example for sending mails (in the
xquery sub-directory).

Looking forward to your feedback.
Christian
___

On Wed, Sep 23, 2015 at 7:04 AM, Tim Thompson 
<timat...@gmail.com<mailto:timat...@gmail.com>> wrote:
> Hello,
>
> I would like to be able to send e-mail from a Web app I am developing in
> BaseX and was wondering whether anyone else had implemented this
> functionality (or whether a BaseX e-mail module might be in the works?).
>
> I see that Zorba has an e-mail module with support for SMTP and IMAP (not to
> mention many other useful modules), which leads me to wonder whether anyone
> has worked on a BaseX connector for Zorba.
>
> In the meantime, I decided to try using the BaseX process module to call
> Zorba, like so:
>
> proc:system(
> 'zorba', (
> '-q', convert:binary-to-string(
> db:retrieve(
> 'out', 'email.xq'
> )
> )
> )
> )
>
> I stored the XQuery code for Zorba as a raw file in the BaseX database,
> since storing it in the webapp directory caused an error, due to the
> unrecognized Zorba module declarations. This approach seems to work
> adequately, but does anyone have any suggestions for improvement, or a
> different approach?
>
> Thanks in advance,
> Tim
>
>
> --
> Tim A. Thompson
> Metadata Librarian (Spanish/Portuguese Specialty)
> Princeton University Library
>


Re: [basex-talk] Validate module and catalogs

2015-09-14 Thread Lizzi, Vincent
Hi Christian,

Yes, I will give it a try. This will also help in a current project for my 
work. Thanks for the links to the relevant sections of code.

Thanks for adding the link to my Schematron XQuery module!

Vincent


-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com] 
Sent: Monday, September 14, 2015 6:41 AM
To: Lizzi, Vincent <vincent.li...@taylorandfrancis.com>
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Validate module and catalogs

Hi Vincent,

As you already found out, the catalog option is currently not applied when 
validating resources. Currently, it's only applied when parsing and importing 
XML documents [1].

As you have recently worked on Schematron validation (thanks again), maybe you 
already know how this is done in Java? If yes, I invite you to send me some 
Java code, and I will include it in our XQuery DTD (and maybe XSD) functions 
[2].

Thanks,
Christian

[1] 
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/build/xml/SAXWrapper.java#L70-L71
[2] 
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/query/func/validate/


On Wed, Sep 2, 2015 at 3:18 AM, Lizzi, Vincent 
<vincent.li...@taylorandfrancis.com> wrote:
> It appears that the Validate module is not using catalogs. I’m working 
> with XML that contains a DTD DOCTYPE declaration with a public 
> identifier and a system identifier that contains the filename of the 
> DTD, and I have set up a catalog that provides the DTD location. Here is a 
> rough example:
>
>
>
> declare option db:chop 'false';
>
> declare option db:dtd 'true';
>
> declare option db:catfile 'schemas/catalog.xml';
>
>
>
> let $xmlFile := ‘path/to/file.xml’
>
> let $parsed := doc($xmlFile)
>
> let $report := validate:dtd-report($xmlFile)
>
> return ($report, $parsed)
>
>
>
> This fails on the validate:dtd-report line. The error report indicates 
> that BaseX is trying to find the DTD in the same folder as the XML file.
>
>
>
> I know the catalog is working because the doc() call is parsing the 
> XML and populating default attributes, and if I remove the db:catfile 
> option declaration the doc() call fails.
>
>
>
> If I add a path to the DTD in the second parameter in 
> validate:dtd-report($xmlFile, ‘path/to/dtd.dtd’) the validation works. 
> I’m not sure why this is needed because the XML contains a DOCTYPE and 
> the catalog should be resolving the DTD location.
>
>
>
> Is this intentional? There are use cases to require that validation is 
> done using a specific DTD and not simply any DTD that the XML happens to 
> declare.
> In other cases however it is easier to validate using the DTD that is 
> already identified in the DOCTYPE of the XML file. Is this a bug?
>
>
>
> Thanks,
>
> Vincent
>
>


[basex-talk] Validate module and catalogs

2015-09-01 Thread Lizzi, Vincent
It appears that the Validate module is not using catalogs. I'm working with XML 
that contains a DTD DOCTYPE declaration with a public identifier and a system 
identifier that contains the filename of the DTD, and I have set up a catalog 
that provides the DTD location. Here is a rough example:

declare option db:chop 'false';
declare option db:dtd 'true';
declare option db:catfile 'schemas/catalog.xml';

let $xmlFile := 'path/to/file.xml'
let $parsed := doc($xmlFile)
let $report := validate:dtd-report($xmlFile)
return ($report, $parsed)

This fails on the validate:dtd-report line. The error report indicates that 
BaseX is trying to find the DTD in the same folder as the XML file.

I know the catalog is working because the doc() call is parsing the XML and 
populating default attributes, and if I remove the db:catfile option 
declaration the doc() call fails.

If I add a path to the DTD in the second parameter in 
validate:dtd-report($xmlFile, 'path/to/dtd.dtd') the validation works. I'm not 
sure why this is needed because the XML contains a DOCTYPE and the catalog 
should be resolving the DTD location.

Is this intentional? There are use cases to require that validation is done 
using a specific DTD and not simply any DTD that the XML happens to declare. In 
other cases however it is easier to validate using the DTD that is already 
identified in the DOCTYPE of the XML file. Is this a bug?

Thanks,
Vincent



[basex-talk] Schematron validation

2015-08-29 Thread Lizzi, Vincent
I have just released a module that provides an easy way to use Schematron in 
BaseX. It's is available on GitHub at 
https://github.com/vincentml/schematron-basex. This module is a convenience 
wrapper around the standard ISO Schematron XSLTs.


In addition, a version of this module for eXist-db is available at 
https://github.com/vincentml/schematron-exist


Vincent


Everything should be made as simple as possible, but not simpler.


Re: [basex-talk] Converting to UTF-8 in SQL module

2015-08-17 Thread Lizzi, Vincent
Thanks for sharing your solution!

Vincent

From: Tim Thompson [mailto:timat...@gmail.com]
Sent: Monday, August 17, 2015 5:00 PM
To: Lizzi, Vincent vincent.li...@taylorandfrancis.com
Subject: Re: [basex-talk] Converting to UTF-8 in SQL module

Thanks, Christian, Vincent. Following Christian's suggestion, I used the 
RAWTOHEX() function in my SQL query, then cast it to an xs:hexBinary and 
applied BaseX's convert:binary-to-string(). Seems to work perfectly.
--Tim

--
Tim A. Thompson
Metadata Librarian (Spanish/Portuguese Specialty)
Princeton University Library
On Mon, Aug 17, 2015 at 4:48 PM, Lizzi, Vincent 
vincent.li...@taylorandfrancis.commailto:vincent.li...@taylorandfrancis.com 
wrote:
Hi Tim,

Oracle should able to convert its output to Unicode before returning query 
results to the client (BaseX). Are you using Oracle's JDBC driver? It might be 
helpful to look into Oracle's NLS_LANG setting or the 'convert' function.

Vincent



-Original Message-
From: 
basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de
 
[mailto:basex-talk-boun...@mailman.uni-konstanz.demailto:basex-talk-boun...@mailman.uni-konstanz.de]
 On Behalf Of Christian Grün
Sent: Monday, August 17, 2015 3:23 PM
To: Tim Thompson timat...@gmail.commailto:timat...@gmail.com
Cc: BaseX 
basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Converting to UTF-8 in SQL module

Hi Tim,

 I am using the BaseX SQL module to query the Oracle database of a
 library catalog. Its character data is encoded as ISO 2709 (MARC-8)[1]
 and is stored in Oracle as US7ASCII.

 The data contains diacritics with combining characters like this:
 http://www.fileformat.info/info/unicode/char/0301/index.htm

I don't know enough about the encodings of Oracle, but US7ASCII
sounds to me as if only 7 bits are used for each character. Do you know how 
non-ASCII characters, such as combining characters, are stored in that format?

Maybe convert:string-to-hex and convert:binary-to-string [1] could be used to 
convert the result to the correct encoding.

Basically, all we do in BaseX is using standard JDBC functionality [2]. If 
there is an easy way to fix the issue with JDBC, it should be easy to also get 
it working in XQuery.

Christian

[1] http://docs.basex.org/wiki/Conversion_Module
[2] 
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/query/func/sql/SqlExecute.java



Re: [basex-talk] Destination of result-document from XSLT module

2015-08-04 Thread Lizzi, Vincent
Marc,

Thanks for mentioning that! I am running into problems with whitespace being 
lost on input to the transformation too. For example, an XSLT contains inside a 
template:

twithin the past xsl:value-of select=$days/ days/t

The output has lost the whitespace around the xsl:value-of element:

within the past15days

Set CHOP false before the transformation helps, for example:

(# db:chop false #) { let $xslt := doc('stylesheet.xsl') return 
xslt:transform-text($doc, $xslt) }

Provides the expected output:

within the past 15 days

I will have to watch for other instances of lost whitespace in documents.


In regard to my original question about setting Base Output URI, the solution 
I've chosen for now is to pass a URI path to a folder as a parameter into the 
stylesheet, and update each xsl:result-document href attribute to include the 
parameter value.

xsl:param name=base-output-uri select=''/

xsl:result-document href={$base-output-uri}doc1.xml


Thanks,
Vincent



-Original Message-
From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Marc
Sent: Monday, August 03, 2015 5:49 PM
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Destination of result-document from XSLT module

Thank Vincent,
But it the input of the xslt I want to control.
I'm using BaseX with the CHOP false because I use text documentation and space 
is important at the end or the begining of an element, and when I try to use 
xslt (with baseX 7) it serialize with indent true the input of my xslt, so I 
lost the space.
Marc
Le 03/08/2015 23:07, Lizzi, Vincent a écrit :
 Thanks, Max and Andy. The XQuery 3.1 transform function looks promising.

 Marc, For controlling the serialization of XSLT output, you can specify 
 serialization for the XSLT within the XSLT itself using xsl:output, run the 
 XSLT using the xslt:transform-text, and then write the output to a file.

 file:write-text('file.txt', xslt:transform-text($doc, $xslt))

 I'm not sure if this will do what you need, but so far it has been working 
 for me.

 Vincent



 -Original Message-
 From: basex-talk-boun...@mailman.uni-konstanz.de 
 [mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Marc
 Sent: Monday, August 03, 2015 4:53 PM
 Cc: basex-talk@mailman.uni-konstanz.de
 Subject: Re: [basex-talk] Destination of result-document from XSLT 
 module

 Hi,
 I have an other problem is to control the serialization of the node passed to 
 the transform function.
 I don't see how to control it.
 Marc
 Le 03/08/2015 22:32, Andy Bunce a écrit :
  Hi Max,
  This sounds like a good thing.
  Another solution to the result-document issue might be to implement 
  the XQuery 3.1 transform function [1]
 
  /Andy
  [1] http://www.w3.org/TR/xpath-functions-31/#func-transform
 
  On 3 August 2015 at 20:54, Max Goltzsche max.goltzs...@algorythm.de 
  mailto:max.goltzs...@algorythm.de wrote:
 
   Hello Vincent,
 
   besides an URI resolver I also want to set XSLT 2.0's output
   destination in BaseX.
   Currently as you can see in BaseX' xslt:transform and
   xslt:transform-text implementation in
   
  https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/query/func/xslt/XsltTransform.java
   there is neither an URIResolver or OutputURIResolver set on the
   Transformer nor a destination systemId set on its StreamResult. Thus
   Saxon resolves output paths relative to your Java process' working
   directory (in fact all relative XSL include, import, document and
   collection paths in the XSLT passed to the transform method).
   Unfortunately to change this behaviour the XsltTransform class must
   be enhanced.
   If you need a quicker* solution for your problem you may have to
   build your own BaseX transform Java Module based on the
   XsltTransform class setting at least the StreamResult's systemId I
   think.
   I will be also working on this the next evenings.
 
   best regards,
   Max
 
 
   On 03.08.2015 20:54, Lizzi, Vincent wrote:
 
   I’m trying to use the XSLT Module in BaseX 8.2.3 with Saxon 9.6 to
   run an XSLT that produces several output documents using
   xsl:result-document. I’m having trouble setting the location of
   the output documents. I want to have xsl:result-document create
   the output documents in a temporary folder because the documents
   need to be zipped together. 
 
   __ __
 
   According to Saxon’s documentation, the a relative path in the
   href attribute of xsl:result-document will be resolved using
   either the path of the Destination, or the current directory. The
   XSLT Module does not appear have a way to provide a path for a
   destination document.  What I’m seeing is that the result
   documents are created in BaseX’s home directory. The XSLT works as
   expected when run using

[basex-talk] Destination of result-document from XSLT module

2015-08-03 Thread Lizzi, Vincent
I'm trying to use the XSLT Module in BaseX 8.2.3 with Saxon 9.6 to run an XSLT 
that produces several output documents using xsl:result-document. I'm having 
trouble setting the location of the output documents. I want to have 
xsl:result-document create the output documents in a temporary folder because 
the documents need to be zipped together.

According to Saxon's documentation, the a relative path in the href attribute 
of xsl:result-document will be resolved using either the path of the 
Destination, or the current directory. The XSLT Module does not appear have a 
way to provide a path for a destination document.  What I'm seeing is that the 
result documents are created in BaseX's home directory. The XSLT works as 
expected when run using Saxon from the command line, where it's possible to set 
a destination path.

Is there a way to specify a Base Output URI to the XSLT Module? Or, would it be 
possible to specify a file URI output location to a method like xslt:transform?

One possible workaround is to provide an absolute path as a parameter to the 
XSLT, and use that parameter in the xsl:result-document href location.

Here is a self-contained example code:


declare function local:example($in, $xsl, $zipPath) {
  let $tempDir := file:create-temp-dir('test', 'example')
  let $x := xslt:transform-text($in, $xsl)
  return
let $zip := archive:create-from($tempDir)
return (
  file:write-binary($zipPath, $zip)
  (: , file:delete($tempDir, true()) :)
)
};


let $xsl := xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform; 
xmlns:xs=http://www.w3.org/2001/XMLSchema; version=2.0
xsl:template match=/
xsl:result-document href=doc1.xml
testthis is a test 1 xsl:apply-templates//test
/xsl:result-document
xsl:result-document href=doc2.xml
testthis is a test 2 xsl:apply-templates//test
/xsl:result-document
/xsl:template
/xsl:stylesheet

let $doc := testthis is input/test

let $zipPath := 'report.zip'

return local:example($doc, $xsl, $zipPath)

The expected output is a zip file report.zip that contains doc1.xml and 
doc2.xml.

However, what I'm seeing is that report.zip is created as an empty zip file and 
doc1.xml and doc2.xml are placed in BaseX's home directory.

Thanks,
Vincent


Re: [basex-talk] Destination of result-document from XSLT module

2015-08-03 Thread Lizzi, Vincent
Thanks, Max and Andy. The XQuery 3.1 transform function looks promising. 

Marc, For controlling the serialization of XSLT output, you can specify 
serialization for the XSLT within the XSLT itself using xsl:output, run the 
XSLT using the xslt:transform-text, and then write the output to a file. 

file:write-text('file.txt', xslt:transform-text($doc, $xslt))

I'm not sure if this will do what you need, but so far it has been working for 
me.

Vincent



-Original Message-
From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Marc
Sent: Monday, August 03, 2015 4:53 PM
Cc: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Destination of result-document from XSLT module

Hi,
I have an other problem is to control the serialization of the node passed to 
the transform function.
I don't see how to control it.
Marc
Le 03/08/2015 22:32, Andy Bunce a écrit :
 Hi Max,
 This sounds like a good thing.
 Another solution to the result-document issue might be to implement 
 the XQuery 3.1 transform function [1]

 /Andy
 [1] http://www.w3.org/TR/xpath-functions-31/#func-transform

 On 3 August 2015 at 20:54, Max Goltzsche max.goltzs...@algorythm.de 
 mailto:max.goltzs...@algorythm.de wrote:

 Hello Vincent,

 besides an URI resolver I also want to set XSLT 2.0's output
 destination in BaseX.
 Currently as you can see in BaseX' xslt:transform and
 xslt:transform-text implementation in
 
 https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/query/func/xslt/XsltTransform.java
 there is neither an URIResolver or OutputURIResolver set on the
 Transformer nor a destination systemId set on its StreamResult. Thus
 Saxon resolves output paths relative to your Java process' working
 directory (in fact all relative XSL include, import, document and
 collection paths in the XSLT passed to the transform method).
 Unfortunately to change this behaviour the XsltTransform class must
 be enhanced.
 If you need a quicker* solution for your problem you may have to
 build your own BaseX transform Java Module based on the
 XsltTransform class setting at least the StreamResult's systemId I
 think.
 I will be also working on this the next evenings.

 best regards,
 Max


 On 03.08.2015 20:54, Lizzi, Vincent wrote:
 
  I’m trying to use the XSLT Module in BaseX 8.2.3 with Saxon 9.6 to
  run an XSLT that produces several output documents using
  xsl:result-document. I’m having trouble setting the location of
  the output documents. I want to have xsl:result-document create
  the output documents in a temporary folder because the documents
  need to be zipped together. 
 
  __ __
 
  According to Saxon’s documentation, the a relative path in the
  href attribute of xsl:result-document will be resolved using
  either the path of the Destination, or the current directory. The
  XSLT Module does not appear have a way to provide a path for a
  destination document.  What I’m seeing is that the result
  documents are created in BaseX’s home directory. The XSLT works as
  expected when run using Saxon from the command line, where it’s
  possible to set a destination path.
 
  __ __
 
  Is there a way to specify a Base Output URI to the XSLT Module?
  Or, would it be possible to specify a file URI output location to
  a method like xslt:transform?
 
  __ __
 
  One possible workaround is to provide an absolute path as a
  parameter to the XSLT, and use that parameter in the
  xsl:result-document href location. 
 
  __ __
 
  Here is a self-contained example code:
 
  __ __
 
  __ __
 
  declare function local:example($in, $xsl, $zipPath) {
 
let $tempDir := file:create-temp-dir('test', 'example')
 
let $x := xslt:transform-text($in, $xsl)
 
return 
 
  let $zip := archive:create-from($tempDir)
 
  return (
 
  file:write-binary($zipPath, $zip)
 
(: , file:delete($tempDir, true()) :)
 
  )
 
  };
 
  __ __
 
  __ __
 
  let $xsl := xsl:stylesheet
  xmlns:xsl=http://www.w3.org/1999/XSL/Transform;
  http://www.w3.org/1999/XSL/Transform
  xmlns:xs=http://www.w3.org/2001/XMLSchema;
  http://www.w3.org/2001/XMLSchema version=2.0
 
  xsl:template match=/
 
  xsl:result-document href=doc1.xml
 
  testthis is a test 1 
  xsl:apply-templates//test
 
  /xsl:result-document
 
  xsl:result-document href=doc2.xml
 
  testthis is a test 2 
  xsl:apply-templates//test
 
  /xsl:result-document
 
  /xsl:template
 
  /xsl:stylesheet
 
  __ __
 
  let $doc := testthis is input

Re: [basex-talk] HTTP module and cookies

2015-07-14 Thread Lizzi, Vincent
The EXPath HTTP Client does seem to provide low level HTTP access. I am hoping 
to find an XQuery library that implements some common things such as cookies 
and authentication on top of HTTP Client, but haven’t come across such a 
library yet. There are a few OATH implementations for authentication though.

I’ll have a look at XML Calabash’s HTTP cookie handling.

Fortunately, in the project that I currently have authentication is not needed. 
 Here is the code that I currently have working. A query can fetch URL(s) by 
calling local:httpGet(), which does a request to get the cookies that the web 
site requires, and then does request(s) to return the web page for each URL 
provided.

declare function local:httpResponseCookies($response as element(http:response)) 
as element(http:header) {
  let $setCookies := $response/http:header[@name = 'Set-Cookie']/@value/data()
  let $cookies := string-join(for $cookie in $setCookies return 
substring-before($cookie, '; '), '; ')
  return http:header name=Cookie value={$cookies}/
};

declare function local:httpGet($urls as xs:string+) as element(page)* {
  let $response := http:send-request(http:request method='get'/, $urls[1])
  for $url in $urls
  let $response := http:send-request(http:request method='get'
{local:httpResponseCookies($response[self::http:response])}
/http:request, $url)
  return element page { attribute url { $url }, $response[2] }
};


Thanks,
Vincent




From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Andy Bunce
Sent: Tuesday, July 14, 2015 12:11 PM
To: Florent Georges
Cc: BaseX
Subject: Re: [basex-talk] HTTP module and cookies

In my experience the case that causes the most problem is the authentication 
redirect. I have never tried this with BaseX but I have been very grateful in 
the past that XMLCalabash implements this:

The exception arises in the case of redirection. If a redirect response 
includes cookies, those cookies are forwarded as appropriate to the redirected 
location when the redirection is followed.  [1]
/Andy

[1] http://xprocbook.com/book/refentry-19.html#cookies



On 10 July 2015 at 10:36, Florent Georges 
fgeor...@fgeorges.orgmailto:fgeor...@fgeorges.org wrote:
  Hi,

  Correct me if I am wrong, but I believe the HTTP Client in BaseX is
the EXPath HTTP Client?  It was indeed designed to provide access to
low-level, raw HTTP.  It does not contain a lot of higher level
feature based on HTTP itself.  Indeed, you have to handle cookies
yourself for instance.

  The difficulty here, if I am right, is the side-effects required to
pass information somehow (in a hidden way) between 2 different HTTP
requests.

  Any suggestion to improve the API is welcome (at least on the EXPath
mailing list, I don't want to speak for BaseX developers, but I am
pretty sure here as well :-)...)

  Regards,

--
Florent Georges
http://fgeorges.org/
http://h2oconsulting.be/


On 10 July 2015 at 11:13, Christian Grün wrote:
 Hi Vincent,

 So far, I'm not aware of a standard solution to handle and cache
 client-side cookies with BaseX. Could you show us your solution? It
 might help us to discuss alternative solutions.

 Best,
 Christian



 On Thu, Jul 9, 2015 at 8:30 PM, Lizzi, Vincent
 vincent.li...@taylorandfrancis.commailto:vincent.li...@taylorandfrancis.com
  wrote:
 I am using BaseX to scrape data from a web site. This web site, probably
 like many other websites, relies on cookies and if it does not receive the
 expected cookies it delivers a page instructing you to enable cookies in
 your browser. I was able to get this working by parsing the http:header
 response to get the cookies to use in subsequent requests. This is the
 second time I’ve done this, and even though this works it seems a bit hacky.
 Is there a standard way of handling cookies using the HTTP Module or the
 Fetch module? Or, are there any well written code examples available?

 In other environments typically you define a cookie jar in some way, and the
 cookie jar is used (and is updated) automatically in all subsequent HTTP
 requests. I’m hoping to find something similar in BaseX.

 Thanks,
 Vincent



Re: [basex-talk] Add metadata to a group of documents inside the database

2015-05-12 Thread Lizzi, Vincent
Menashè,

In similar situations I've used a second database to store metadata at the same 
path as the document in the primary database. For example:

db:open('database', '/path/a.xml')
db:open('database_metadata', '/path/a.xml')

Also, there is a feature request for adding properties as additional metadata 
for documents in the database:

https://github.com/BaseXdb/basex/issues/988

If I may ask, what metadata information do you need to record about each 
document?

Vincent



-Original Message-
From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Christian Grün
Sent: Tuesday, May 12, 2015 8:41 AM
To: Menashè Eliezer
Cc: BaseX
Subject: Re: [basex-talk] Add metadata to a group of documents inside the 
database

If you create a database with a single CREATE or db:create call, the resulting 
path structure will reflect the original directory structure. However, you can 
also add files and directories to specific target paths in a second steps, e.g. 
via db:add [1]. Our Wiki should contain all relevant information (see e.g. [2]).

[1] http://docs.basex.org/wiki/Database_Module#db:add
[2] http://docs.basex.org/wiki/Databases


On Tue, May 12, 2015 at 2:35 PM, Menashè Eliezer melie...@ogs.trieste.it 
wrote:
 Hi Christian,

 Thank you very much for your reply!
 As for path: A database is created from all files in a specific folder 
 which has subfolders 'group1', 'group2' etc. and the path is derived 
 from these subfolders of adding files to db?

 With kind regards,
 Menashè


 On 05/12/2015 02:19 PM, Christian Grün wrote:

 Hi Menashè,

 Just use different database paths for each group (e.g. '/path1/' and 
 '/path2/'), and specify the sub path with db:open:

db:open('db', '/path1')/...

 You can also store the documents in two separate databases and use a 
 single XQuery expression to query all documents:

for $db in ('db1', 'db2')
return db:open($db)/...

 Hope this helps,
 Christian


 On Tue, May 12, 2015 at 2:14 PM, Menashè Eliezer 
 melie...@ogs.trieste.it wrote:

 Hello,
 I have two groups of xml to be included in the same database.
 Usually the same query will be performed on both of them, but I need 
 to able to query only one group.
 The difference between the groups is known only to add the xml 
 files, e.g.
 the origin of the files. This information is not found inside the 
 files, and I prefer not modifying their content.

 I was hoping to be able to define groups, using subfolders and 
 base-uri or multiple collections, but I know it's not possible using 
 Basex.
 Maybe I can tag them? The group type is both a property and a 
 possible filter in a query, so I need a good performance.

 Right now I see two alternatives:
 1. Using two separate databases. Once I need to query all files, 
 I'll make the same query on multiple databases...
 2. One database, but a query to a new xml document, which includes 
 list of node-ids per group type, will be used both for knowing the 
 type and for querying a subset.

 Any ideas, please?

 --
 With kind regards,
 Menashè




Re: [basex-talk] Large Document Upload Performance

2015-03-13 Thread Lizzi, Vincent
Hi Jonathan,

A few months ago I needed to import XML documents that were over 50 Mb to 
BaseX. After a few attempts to speed the process I found that using Saxon's 
s9api and Xerces2 as shown below performed the best. The bottleneck appeared to 
not be in BaseX but actually in making the process of sending the data to BaseX 
efficient. Here is the Java code.


protected void loadXmlDocument(BaseXClient client, File xmlFile) throws 
Exception {
DocumentBuilder docBuilder = sxProcessor.newDocumentBuilder();
SAXSource source = prepareSaxSource(xmlFile);
XdmNode doc = docBuilder.build(source);
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
Serializer ser = new Serializer(baos);
ser.setOutputProperty(Serializer.Property.ENCODING, UTF-8);
ser.serializeNode(doc);
try (InputStream is = new 
ByteArrayInputStream(baos.toByteArray())) {
client.replace(path, is);
}
}
}   

protected SAXSource prepareSaxSource(File xmlFile) throws 
ParserConfigurationException, SAXException, MalformedURLException {
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
saxFactory.setNamespaceAware(true);
saxFactory.setXIncludeAware(true);
saxFactory.setValidating(false);
SAXParser saxParser = saxFactory.newSAXParser();
XMLReader reader = saxParser.getXMLReader();

CatalogResolver resolver = new CatalogResolver(catalogManager);
reader.setEntityResolver(resolver);

SAXSource source = new SAXSource();
source.setInputSource(new 
InputSource(xmlFile.toURI().toURL().toExternalForm()));
source.setXMLReader(reader);

return source;
}

I tried to make the above code self-contained by cobbling together relevant 
parts of the code, so this is untested but carries the idea. 

I hope this helps.

Vincent


-Original Message-
From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Jonathan Clarke
Sent: Friday, March 13, 2015 3:50 PM
To: 'Christian Grün'
Cc: 'BaseX'
Subject: Re: [basex-talk] Large Document Upload Performance

Hi Christian,

I wouldn't be able to provide you with the data itself, but I'm not using a 
query, I'm simply using the BaseXClient that's provided on your site, it's just 
a connection open to the server, and then a call to the replace function. 
What's the typical time you would expect to see for a file of that size? Some 
research online has suggested that the delay is caused by the document indexing 
that gets underway at the point of update. In the meantime, I'll try and 
construct a file of similar size that's non-descript that we can use. Are there 
any other performance enhancing settings that you've advised others for a 
similar reports? Like the flushing, and I able to postpone or turn off the 
document indexing until I'm ready to call the function explicitly?

Jonathan.

-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com]
Sent: 13 March 2015 19:12
To: Jonathan Clarke
Cc: BaseX
Subject: Re: [basex-talk] Large Document Upload Performance

Hi Jonathan,

 I hope you can help me. I am using BaseX to underpin a complex 
 distributed system, which also requires storage of xml document in 
 soft real-time. At the moment, I’m getting storage times for a 4Mb XML 
 file of about 500ms. Can you advise how I might be able to bring that down, 
 please, by at least 75%?

We'll probably need more information on your queries etc.

 I also tried to use AddCache, and that just crashed the latest 
 production release of the server.

If you find out how we can reproduce this, your feedback is welcome.

Best,
Christian




[basex-talk] BaseX 8 and oXygen via XQJ

2014-12-23 Thread Lizzi, Vincent
After upgrading to BaseX 8.0 I'm no longer able to run queries on BaseX from 
oXygen. Connecting to BaseX from oXygen using WebDav still works, but trying to 
run a transformation in BaseX using the XQJ driver fails with a message Access 
denied. It appears that the XQJ driver needs to be updated for the new 
authentication mechanism. Will an updated version of the BaseX XQJ driver be 
available soon?

Thank you,
Vincent




Re: [basex-talk] BaseX and MySQL

2014-11-19 Thread Lizzi, Vincent
Hans,

I had to do this a few days ago, so it’s fresh in my mind. Download Connector/J 
from the MySQL website and place .jar file in the basex/lib folder. Restart 
BaseX (if you’re using the server version) so it picks up the new .jar.

Queries can then be run like:

let $c := sql:connect('jdbc:mysql://localhost/database', 'user', 'password')
sql:execute($c, select …)
sql:execute($c, call stored_procedure…();)


Cheers,
Vincent


From: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Hans-Juergen 
Rennau
Sent: Wednesday, November 19, 2014 5:20 PM
To: basex-talk@mailman.uni-konstanz.de
Subject: [basex-talk] BaseX and MySQL

Dear BaseX team,

can you point me to some information about which steps I must take in order to 
start accessing MySQL databases via the sql module?

Thank you for help,
Hans-Juergen


[basex-talk] locking problem

2014-10-16 Thread Lizzi, Vincent
This is an update, and further problem, on a problem I posted here a few months 
ago. I've written a program to import XML documents into BaseX in one database, 
and populate a second database with a small amount of metadata about each 
document. This works but crashes every so often, and I have to restore the 
databases from backup and restart the import. To get through a large set of 
documents I now have the program create a backup (using the CREATE BACKUP 
command) after every 1000 documents imported.

Since my earlier post, I found that the connection between the client and 
server was occasionally being dropped. I've switched to opening and closing a 
connection for each document instead of using one long-running connection. The 
client and server are running on the same machine.

I'm using the latest version of BaseX (BaseX80-20141013.213520.zip). Here is 
the error message reported by my program when it crashes. The error occurs on 
the Query.execute() method of org.basex.examples.api.BaseXClient.java.

Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.0 beta a91f5c2
Java: Oracle Corporation, 1.7.0_67
OS: Windows 7, amd64
Stack Trace:
java.nio.channels.OverlappingFileLockException
at sun.nio.ch.SharedFileLockTable.checkList(Unknown Source)
at sun.nio.ch.SharedFileLockTable.add(Unknown Source)
at sun.nio.ch.FileChannelImpl.tryLock(Unknown Source)
at 
org.basex.io.random.TableDiskAccess.lock(TableDiskAccess.java:139)
at 
org.basex.io.random.TableDiskAccess.init(TableDiskAccess.java:87)
at org.basex.data.DiskData.init(DiskData.java:119)
at org.basex.data.DiskData.init(DiskData.java:80)
at org.basex.core.cmd.Open.open(Open.java:70)
at org.basex.core.cmd.Open.run(Open.java:36)
at org.basex.core.Command.run(Command.java:360)
at org.basex.core.Command.execute(Command.java:94)
at org.basex.server.ClientListener.run(ClientListener.java:145)


The server instance reports the error with this stack trace:

C:\Programs\basex\basex\binbasexhttp
BaseX 8.0 beta a91f5c2 [Server]
Server was started (port: 1984)
HTTP Server was started (port: 8984)
java.io.FileNotFoundException: 
C:\Programs\basex\basex\data\article_xml_meta\doc.basex (The requested 
operation cannot be performed on a file with a user-mapped section open
)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(Unknown Source)
at java.io.FileOutputStream.init(Unknown Source)
at org.basex.io.out.DataOutput.init(DataOutput.java:47)
at org.basex.io.out.DataOutput.init(DataOutput.java:36)
at org.basex.index.resource.Docs.write(Docs.java:73)
at org.basex.index.resource.Resources.write(Resources.java:51)
at org.basex.data.DiskData.write(DiskData.java:141)
at org.basex.data.DiskData.flush(DiskData.java:249)
at org.basex.data.DiskData.finishUpdate(DiskData.java:239)
at org.basex.query.up.ContextModifier.apply(ContextModifier.java:124)
at org.basex.query.up.Updates.apply(Updates.java:129)
at org.basex.query.QueryContext.iter(QueryContext.java:351)
at org.basex.query.QueryProcessor.iter(QueryProcessor.java:80)
at org.basex.server.ServerQuery.execute(ServerQuery.java:128)
at org.basex.server.ClientListener.query(ClientListener.java:492)
at org.basex.server.ClientListener.run(ClientListener.java:109)



Here is the update statement where the error occurs, along with a bit of the 
surrounding Java.

BaseXClient.Query queryIndex = client.query(declare variable $indexDb 
external; declare variable $hash external; declare variable $database external; 
declare variable $path external; let $p := 
propertiesdatabase{$database}/databasepath{$path}/pathsha1{$hash}/sha1loadedDate{current-dateTime()}/loadedDate/properties
 return db:replace(' + indexDatabaseName + ', $path, $p));
try {
queryIndex.bind($indexDb, indexDatabaseName);
queryIndex.bind($database, managedDatabaseName);
queryIndex.bind($path, path);
queryIndex.bind($hash, hash);
queryIndex.execute();
} finally {
queryIndex.close();
}

Previously this code used a variable for the first parameter of db:replace() 
but I changed it to hard code the database name into the query after reading 
about transactions and locking in the wiki. This change doesn't appear to have 
helped.

To put this in context, the above update is run after the XML document is 
imported using BaseXClient.replace(). The next thing that happens in the 
program is an XQuery db:replace to add more properties for the document to the 
metadata database.

I've added a Thread.sleep(300), to create a small delay between loading each 
document and 

Re: [basex-talk] db documents metadata

2014-09-02 Thread Lizzi, Vincent
Hi Christian,

The properties I'm storing/planning for in my ancillary database are:

- dateTime the source document was loaded to BaseX
- sha1 hash of the source document - used in determining if the source document 
has changed and should be replaced in BaseX
- identifiers assigned by our content management system and archive
- path to the source document
- filename of the source document

These properties could be stored using strings for keys and values. 

An extension to db:list-details(), with a method like db:store-details(),to 
allow setting and retrieving user-defined properties would work. A more 
extensive set of features as Marc described based on Qizx would also work and 
could support a larger variety of cases.

The ability to access these methods via a Java API or the BaseXClient API would 
be useful. Although, presumably a simple wrapper could be employed with the 
existing APIs to access the XQuery methods for querying and setting properties. 

Thanks,
Vincent




-Original Message-
From: Christian Grün [mailto:christian.gr...@gmail.com] 
Sent: Friday, August 29, 2014 6:33 AM
To: Lizzi, Vincent
Cc: Marc van Grootel; BaseX
Subject: Re: [basex-talk] db documents metadata

@Marc:

For BaseX 8.0, we are planning to speed up our document index, and we could 
possibly enrich it with some more (possibly user-specific) metadata. I have 
added a reference to this mailing-list thread in the correspondent GitHub issue 
[1].

However, I am not sure if we should extend in our existing APIs. Maybe it would 
be more consistent to provide an additional XQuery Module for that, or extend 
the Database Module. Additional metadata could be returned via 
db:list-details(), and we could an updating function, sth. like 
db:store-details(). What do you think? Any more suggestions are welcome.

@Vincent:

 I've started to implement along these lines by creating a second database to 
 hold metadata about documents in the actual database. If there is a better 
 option I'll switch to it.

I would be interested which metadata properties you currently storing in this 
auxiliary database?

Thanks,
Christian

[1] https://github.com/BaseXdb/basex/issues/804

 I would find this feature useful for several similar scenarios. I want to use 
 BaseX for querying XML documents and keep BaseX synchronized with external 
 archives/repositories where the XML files are maintained.


 Vincent


 
 From: basex-talk-boun...@mailman.uni-konstanz.de 
 basex-talk-boun...@mailman.uni-konstanz.de on behalf of Marc van 
 Grootel marc.van.groo...@gmail.com
 Sent: Thursday, August 28, 2014 5:38 PM
 To: BaseX
 Subject: [basex-talk] db documents metadata

 Hi,

 I was looking through the feature list in the issue tracker to see 
 what's in the pipeline. I suddenly remembered a feature from an xml 
 database I used a couple of years ago called Qizx. This had a very 
 neat feature where every database document and collection could have a 
 special map with metadata properties. These do not affect the XML 
 content in any way but they can be accessed via special API calls or 
 Qizx specific extension module.

 A better explanation of this feature can be read in the Qizx manual 
 (for example here http://kiwi.emse.fr/DN/qizx-manual.pdf on page 18 
 and 57).

 I have used such metadata properties on nodes to implement syncing XML 
 documents in a SCM (Subversion). I stored revision id's and other SCM 
 control data in those properties. Authors would work in Subversion and 
 certain directories where kept synced to a Qizx database so we could 
 easily create PDF publications of the latest XML with zero impact on 
 the XML itself.

 Maybe BaseX already uses something like that under the hood, I don't 
 know. If so extending it or opening it for use would be useful I 
 think, and generally  cool :-)

 --
 --Marc


Re: [basex-talk] db documents metadata

2014-08-28 Thread Lizzi, Vincent
+1

I would find this feature useful for several similar scenarios. I want to use 
BaseX for querying XML documents and keep BaseX synchronized with external 
archives/repositories where the XML files are maintained. 

I've started to implement along these lines by creating a second database to 
hold metadata about documents in the actual database. If there is a better 
option I'll switch to it.

Vincent



From: basex-talk-boun...@mailman.uni-konstanz.de 
basex-talk-boun...@mailman.uni-konstanz.de on behalf of Marc van Grootel 
marc.van.groo...@gmail.com
Sent: Thursday, August 28, 2014 5:38 PM
To: BaseX
Subject: [basex-talk] db documents metadata

Hi,

I was looking through the feature list in the issue tracker to see
what's in the pipeline. I suddenly remembered a feature from an xml
database I used a couple of years ago called Qizx. This had a very
neat feature where every database document and collection could have a
special map with metadata properties. These do not affect the XML
content in any way but they can be accessed via special API calls or
Qizx specific extension module.

A better explanation of this feature can be read in the Qizx manual
(for example here http://kiwi.emse.fr/DN/qizx-manual.pdf on page 18
and 57).

I have used such metadata properties on nodes to implement syncing XML
documents in a SCM (Subversion). I stored revision id's and other SCM
control data in those properties. Authors would work in Subversion and
certain directories where kept synced to a Qizx database so we could
easily create PDF publications of the latest XML with zero impact on
the XML itself.

Maybe BaseX already uses something like that under the hood, I don't
know. If so extending it or opening it for use would be useful I
think, and generally  cool :-)

--
--Marc


[basex-talk] locking problem

2014-08-06 Thread Lizzi, Vincent
I've written a process using the Java BaseXClient to load XML documents to one 
database (A) and save a small amount of data about each document in a second 
database (B) using a parallel document path. The data in (B) includes a sha1 
hash which is used to determine if the source XML document has changed or not 
to avoid unnecessary reloading. With database (A) the replace command is used 
to import documents. With database (B) a query is run to get the current sha1 
hash (if any) and the db:replace() function is used to create/update a document 
for each document in (A). This works well for small sets of documents, often 
loading several documents per second. However after 1,000 - 4,000 documents it 
eventually crashes with this exception:

[bxerr:BXDB0002] Database (B) is being updated, or update was not completed.

The exception is thrown when reading the sha1 hash from (B). After this 
exception, it's not possible to connect to database (B) even with the BaseX GUI 
unless I manually delete the upd.basex file. I have to delete both databases 
and try again. No other processes are accessing the databases.

It seems like this problem is related to the high number of reads and writes 
happening on (B). 

Has anyone encountered a similar problem or have any suggestions? 

Thanks,
Vincent