Our mission today is to use Basex to remove tags injected right between
the bytes of multibyte UTF-8 characters.
http://www.couchsurfing.org/group_read.html?gid=430&post=13986932
> "CG" == Christian Grün writes:
CG> Have you tried method=raw, as mentioned in our documentation
CG> (http://doc
>>>>> "CG" == Christian Grün writes:
CG> Jidanni,
>> echo '你好'|perl -pwle 's![^[:ascii:]]!$&!'|basex -q '
>> declare option db:parser "html";
>> declare option output:method "raw";
>> doc(&
LREQ> Your perl substitution is putting after the first non-ascii
LREQ> character on the line, and 你 is for sure not an ascii character,
LREQ> so you get after it.
Not exactly after it. 1/3 of the way through it. I.e., shattered UTF-8.
I was just curious if there was a way in basex if I could do
> "LREQ" == Liam R E Quin writes:
LREQ> Treating the individual UTF-8 octets individually?
Yes.
LREQ> Not in standard XQuery, but that doesn't preclude a BaseX extension...
Well no big deal, I was just curious.
>> I was just curious if there was a way in basex if I could do s!!!g
>> like I can
Here we see that there is no way to differentiate
declare option db:parser "html";
"SPACE",
doc("http://eaip.caa.gov.tw/eaip/history/2013-05-02-AIRAC/html/eAIP/RC-ENR-3.3-en-TW.html";)//*[@class="Table-row-type-2
"]
,"NO SPACE",
doc("http://eaip.caa.gov.tw/eaip/history/2013-05-02-AIRAC/html/eAIP/R
Clicking on things in the text view should cause some indication in the
tree view of where we are clicking.
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Placing the mouse upon different items in the tree view should be enough
to trigger their being highlighted in the neighboring text view. But
alas one is forced to click an item there in the tree view, which then
repaints the text view to show only that item, thus losing context.
__
In the tree view say we have something clicked and it is red.
Well we choose the green up arrow icon and indeed we move to its parent
node, however the red mark of where we were is now gone and we can't
tell where we came from.
Hmmm, I suppose I should have filed these in the basex bug tracker but
X-debbugs-Cc: basex-talk@mailman.uni-konstanz.de
Package: basex
Version: 7.6-1
File: /usr/bin/basexgui
Severity: wishlist
Perhaps this ought to work correctly,
# su - nobody
No directory, logging in with HOME=/
$ HOME=/tmp basexgui /usr/share/doc/basex/examples/input.xml
Saving properties in "/n
X-debbugs-Cc: basex-talk@mailman.uni-konstanz.de
Package: basex
Version: 7.6-1
File: /usr/bin/basexgui
Severity: wishlist
When we start the GUI there is a row of
--II-III a total of 17 icons across the screen. (And others
elsewhere on the screen too.)
Well it would be easier to guess
X-debbugs-Cc: basex-talk@mailman.uni-konstanz.de
Package: basex
Version: 7.6-1
File: /usr/bin/basexgui
$ rm -r .basexgui BaseXData .basex
$ basexgui /usr/share/doc/basex/examples/input.xml&
The window is cut into four squares.
Each square has a magnifying glass, which upon clicking reveals a tex
X-debbugs-Cc: basex-talk@mailman.uni-konstanz.de
Package: basex
Version: 7.6-1
File: /usr/bin/basexgui
In the Database menu (which should expand when we hit ALT-D or ALT-d,
but doesn't!) we see that CTRL-Q should allow us to quit the program...
but it doesn't. Nor probably do any of the other CTRL
X-debbugs-Cc: basex-talk@mailman.uni-konstanz.de
Package: basex
Version: 7.6-1
File: /usr/bin/basexgui
Severity: wishlist
In the "tree" window we click something with the mouse. It turns red and
its text shows up in the text window.
At this point it would be very handy if the four arrows on the ke
Pardon me but basex 7.6 doc() function is out of control.
$ more ib.xml z.xq|cat
::
ib.xml
::
There should be a space: :here!
There should be a space: :here!
There should be a space: :here!
There should be a space:
:here!
There should be a space:
:here!
There should be
OK but do you admit that this
* wrecks HTML jammingwordstogether
* wrecks KML jammingcoordinatestogether
https://developers.google.com/kml/documentation/kmlreference#gxlatlonquad
in fact I bet it wrecks all the other *ML languages.
You can compress the whitespace down to one, but any furtherisjust
http://www.w3.org/TR/REC-xml/#sec-white-space
...On the other hand, "significant" white space that should be preserved...
So since your parser by default creates significant whitespace where there was
none,
and removes it where there was, perhaps it could be fixed please, without the
user
needin
http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSc3ff6d0ea77859461172e0811cbec133ba-7fd9.html
"if an XML comment is in the middle of a block of text, the DOM node
view represents its position in the text while the basic view does
not."
http://www.w3.org/TR/html401/struct/text.html#idx-white
[Why did this not get posted...]
OK but do you admit that this
* wrecks HTML jammingwordstogether
* wrecks KML jammingcoordinatestogether
https://developers.google.com/kml/documentation/kmlreference#gxlatlonquad
in fact I bet it wrecks all the other *ML languages.
You can compress the whitespace d
[Why did this not get posted...]
OK it did get posted.
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
declare option output:omit-xml-declaration "no";
import module namespace j='http://jidanni.org/' at "./j.xq";
causes a never ending nightmare of
Stopped at line 2, column 8 in /home/jidanni/millerliu/air/rcr7.xq:
[XPST0003] Unexpected end of query: 'module name
Yes we are talking about data damage. As bad as disk errors garbling one's data.
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
{comment {"Made by noise.xq, will get OVERWRITTEN"}}
gives me
$ basex noise.xq
http://www.opengis.net/kml/2.2";>
...
Is there any function to enable me not to need to hardwire the name into
the file?
___
BaseX-Talk mailing list
BaseX-Talk@mailman.un
Do I really need to do all this just to get one item per line
$ basex -s format=no -q 'doc("...")//.../*:name/concat(text(),"
")'
AM 594 復興1
AM 612 中央
AM 630 台灣
AM 657 正聲
I use -s format=no to turn off spaces.
-s format=text not much help.
There really ought to be a -s option to say I want newl
> "CG" == Christian Grün writes:
CG> ..not quite sure what you're trying to achieve: Do you want to request
CG> the name of your query file from within XQuery?
Yes, like I can do in bash
$ cat f
echo $0
$ bash f
f
CG> Currently, there's no way to do this, but we could think about
CG> adding
Well I guess one way not to hardwire it into the program is to pass it
in as an external variable... rather clumsy with the same item twice on
the command line.
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de
> "CG" == Christian Grün writes:
CG> However, you may want to use the -w flag to preserve the original
CG> document whitespaces and newlines.
I would have never guessed from the man page, which just mentions input,
not output:
-w By default, whitespaces around text nodes are cho
Wait, -w falls apart when just wanting the X nodes,
$ basex -w -s method=text -q 'doc("u.xml")'
AA AA
BB BB
CC CC
DD DD
$ basex -w -s method=text -q 'doc("u.xml")//X'; echo
AA AABB BBCC CCDD DD
$
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni
Last year's discussion
https://mailman.uni-konstanz.de/pipermail/basex-talk/2010-October/000758.html
Related discussion http://www.stylusstudio.com/xquerytalk/201012/003341.html
Anyways, proof that basex is "unfair":
$ basex -q 'doc("u.xml")/W'; echo
AA AA
BB BB
CC CC
DD DD
$ basex -w
> "AH" == Anders Hessellund writes:
AH> elements and attributes. Specifically, we need to know filename, line
number and column of element (start tags) and
AH> attributes. Is this possible? And if so, how?
Might be related
http://lists.gnu.org/archive/html/bug-gnu-emacs/2011-12/msg00319.htm
Whilst waiting for Debian's basex to support http(s)_proxy, I found a
workaround: installing the caching proxy WWWOFFLE, one then can use
$ basex -q 'doc("http://localhost:8080/http/example.org/api?xyz";)'
without which I would very fast exceed daily API usage limits for mere
repetitive tests.
Is it true that basex can output HTML,
http://docs.basex.org/wiki/Serialization
but not read it back in?
http://docs.basex.org/wiki/Parsers
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/bas
> "MS" == Michael Seiferle writes:
MS> Hi,
MS> if Tagsoup [1] is present in the classpath (it comes with our Zip
MS> packages e.g.), BaseX will allow (the "poor, nasty and brutish" [1])
MS> HTML input.
Well all I know is that
http://docs.basex.org/wiki/Parsers
should mention what to do to re
Don't you want to mention SET PARSER HTML on
http://docs.basex.org/wiki/Parsers ?
Also there is nothing 'bad' about the HTML... It is valid
form of one of the html versions mentioned on
http://docs.basex.org/wiki/Serialization .
Indeed, the versions that SET PARSER HTML will support should please
X-debbugs-Cc: basex-talk@mailman.uni-konstanz.de
Package: basex
Version: 7.1.1-2
Severity: wishlist
We read
basex (7.1.1-2) unstable; urgency=low
* Allow non well-formed HTML to be parsed if libtagsoup-java is installed.
* Updated man page with an example on how to parse HTML.
But we find
I did successfully manage checking my website for deficient IMG links,
$ tail -n 2 Makefile
xxqq:l.xq
basex -bM="$$(find ~/jidanni.org -name \*.html ! -name \*_en.html)" $?
$ cat l.xq
declare option db:parser "html";
declare variable $M external; (: haven't learned collections / weeding
On http://docs.basex.org/wiki/Serialization we see
newline.
But we have no idea how to make it work.
Please add an example to that page!
declare option output:method "text";
declare option output:newline "\n";
for $m in (1,2,3)
return $m
Nope no idea.
basex:
Installed: 7.1.1-2
__
> "CD" == Charles Duffy writes:
CD> The newline option just indicates which kind of newlines to use in
CD> places where newlines would normally be selected -- it doesn't make
CD> lists automatically newline-separated.
CD> Maybe you want something more like this:
CD> declare option output:fo
Dear t...@x-query.com, there should be a way to get 'items' back out of
Xquery, one per line, without having to hardwire 0x0A (you know,
$ perl -wle 'printf "%08b\n" , ord"\n"'
1010
into the code, as discussed in
https://mailman.uni-konstanz.de/pipermail/basex-talk/2012-March/002738.html )
...
> "DL" == David Lee writes:
DL> There is, but you have to call XQuery natively instead of via "the command
line"
Running the output through a tag stripper,
perl -pwle 's/<[^>]+>//g'
sounds a 1000 times easier than trying to figure out how to do it
whatever is that way you mention.
__
All I know is using
$ cat file.xq
(: Make bus timetable going westbound past Zaokeng :)
declare option db:parser "html";
declare option output:method "html"; (: etc., not poorly supported
"text" though :)
declare function local:d($M){
doc(concat("http://www.fybus.com.tw/data/";, $M))
};
let $line
> "LREQ" == Liam R E Quin writes:
LREQ> On Sat, 2012-03-10 at 22:56 +, David Lee wrote:
>> There is, but you have to call XQuery natively instead of via "the
>> command line"
LREQ> ??!?
LREQ> you can use string-join( your query here, "
")
Haven't had to make my own newlines since using t
OK, I submitted https://www.w3.org/Bugs/Public/show_bug.cgi?id=16311
However as I have a rather hard time expressing myself,
perhaps those who know what I am saying could add some detail to the
bug. In the rare case the detail is wrong, I will chime in there. Thanks.
___
> "DL" == David Lee writes:
DL> It looks like the CLP of DB2 gives you the behaviour you are asking for
DL>
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp?topic=%2Fcom.ibm.db2.luw.xml.doc%2Fdoc%2Fxqrserial.html
OK maybe BaseX could do something like that, if that could indeed
> "AW" == Andrew Welch writes:
AW> So the easy way, is to just return a single item using string-join.
Which means we text wanters don't get to enjoy the benefits of
all that serialization work one single bit.
AW> ...which suggests you just want indent set?
All I know is I just added an exam
All I know back in December the Makefile and programs in
http://radioscanningtw.jidanni.org/images/radioscanningtw/maps/
worked fine. Now with BaseX 7.1.1 it is one big disaster and will
take days to rewrite.
___
BaseX-Talk mailing list
BaseX-Talk@mailman
>>>>> "j" == jidanni writes:
j> All I know back in December the Makefile and programs in
j> http://radioscanningtw.jidanni.org/images/radioscanningtw/maps/
j> worked fine. Now with BaseX 7.1.1 it is one big disaster and will
j> take days to rewrite.
The wh
wget http://radioscanningtw.jidanni.org/images/radioscanningtw/maps/Makefile
wget
http://radioscanningtw.jidanni.org/images/radioscanningtw/maps/TaiwanAMFMSW.xq
wget
http://radioscanningtw.jidanni.org/images/radioscanningtw/maps/AMFMXTaiwan.gpx.zip
unzip AMFMXTaiwan.gpx.zip
make
This makes three
Never mind. I got my program working again with the newer basex via
- {local:K($y)/name}
+ {local:K($y)/*:name}
- {local:K($y)/o/*}
+ {local:K($y)/*:o/*}
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.un
You'll be sorry not fixing this sooner rather than later.
What if the UNIX shell automatically exported every variable by default?
This would cause the worst kind of bugs. The ones that creep in while
you aren't aware.
A program that worked fine in your polluted environment suddenly stops
workin
All I want is
a b c
a b c
a b c
a b c
No trailing blanks.
No leading blanks.
Please advise.
$ cat e.xq
for $m in (1,2,3,4) return "a b c
"
$ basex e.xq|cat -e
a b c $
a b c $
a b c $
a b c $
$ basex --version
BaseX 7.1.1 [Standalone]
The
wad at the end is as far as I got, from reading the
OK, I have found a way that works every time:
... return concat($members," ",$id," ",$name,"\n")
$ make -n
basex g.xq|perl -pwle 's/\\n ?/\n/g'
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo
Is this the state of the art for making tab aligned columns?
concat($members," ",$id," ",$name...
basex 7.1.1.
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
> "CG" == Christian Grün writes:
CG> Yes; you may as well enter an ordinary tab character instead of .
CG> Christian
Raw tab characters in programs? I'll stick to to my "\t" and I suppose
that gob above. Thanks anyway.
___
BaseX-Talk mailing list
> "CG" == Christian Grün writes:
CG> Did you have a look at the separator output option which I've added for you
[1]?
CG> [1] http://docs.basex.org/wiki/Serialization
I'm sorry but it just doesn't work.
declare option output:method "text";
declare option output:newline "\n";
for $m in (1,2
> "CG" == Christian Grün writes:
CG> Please take notice of the documentation, which says you need at
CG> Version 7.2 to get this working.
But http://docs.basex.org/wiki/Serialization#Version_7.1 says newline is
in. You might want to revise that.
___
OK... I got it working!
[got WARNING: untrusted versions of the following packages will be installed!]
$ cat e.xq
declare option output:separator "\n";
for $m in (1,2,3,4) return "a b c"
$ basex -L e.xq
a b c
a b c
a b c
a b c
$ < /dev/null basex -v|sed q #no --version option yet
BaseX 7.2 [Standal
How does one insert an arbitrary file.
doc('q')
parses it. How does one just, you know,
#include q
as in C?
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
How do I prevent BaseX from insisting on adding these attributes?
What more do I need to tell it?
declare option db:parser "html";
declare option db:htmlopt "method=html,nons=true";
declare option output:method "html";
declare option output:version "4.01";
declare option output:doctype-public "-/
> "CG" == Christian Grün writes:
CG> Hm, where do you want to include your file? Do you have a small,
CG> concise example?
CG> C.
{insert_file("blob_of_bytes")}
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman
> "AH" == Alexander Holupirek writes:
AH> Please post a small snippet or example, so that we are able to test the
problem.
Taking the example from the Debian basex man page, we add an innocent
and :
cat > bad.html <<\EOF
Az
B
> "CG" == Christian Grün writes:
CG> ..try file:read-text() or file:read-binary():
CG> http://docs.basex.org/wiki/File#file:read-text
They don't work.
The first ruins all brackets it encounters, escaping them.
The latter just returns base64.
There is no way to get the darn file just inse
> "CG" == Christian Grün writes:
>> The first ruins all brackets it encounters, escaping them.
CG> That's what XML serialization is about; everything else would be
CG> invalid. Once again, you may want to switch to "text" as output method
CG> to avoid escaping:
CG> http://docs.basex.org/wi
> "CG" == Christian Grün writes:
CG> XQuery may not be the best language for such a use case.
OK, I put a @@@ marker in a comment to trigger sed to read it in in my Makefile
lyrics.html:playlists.xquery playlists.xml english.html
basex $< |sed '/@@@/r english.html' > $@
validat
63 matches
Mail list logo