Re: [basex-talk] Cant add raw files explicitly or text index ms office docs

2015-11-26 Thread E. Wray Johnson
Thanks!  Today is our Thanksgiving holiday so I am not working today.
I will look at this soon.

Consider a file filter that uses regular expression(s).

Wray Johnson
(m) 704-293-9008

> On Nov 26, 2015, at 11:43 AM, "Christian Grün"  
> wrote:
>
> Hi E. Wray,
>
> I have attached a little example for some XQuery code, which adds
> files, archives and archive contents to a database. It’s probably not
> the most efficient solution, so feel free to enhance it or ask more
> questions.
>
> I agree that your use case is an enticing one: We also use BaseX to
> process office files, and Rositsa Shadura wrote an interesting thesis
> on that topic [1]. As Dirk pointed out, it turned out that we didn’t
> want to choose one particular solution, and the XQuery approach is
> currently the most flexible one.
>
> Hope this helps,
> Christian
>
> [1] http://basex.org/about-us/publications
> ___
>
>> On Wed, Nov 25, 2015 at 5:43 PM, Dirk Kirsten  wrote:
>> Hello,
>>
>> which problems did you encounter? This problem should be solvable using a
>> small XQuery, basically putting what you describe in natural languages in
>> XQuery so our processor understands it.
>>
>> I don't think it would make any sense to add such a specific format. There
>> are simply way to many possible combinations - You want archive files
>> extracted, others might want not to do this. In the end we would end up with
>> a very complex definition language - And what's the point if we already have
>> a standardized query language like XQuery, which can achieve the same thing?
>>
>> Cheers
>> Dirk
>>
>> On 11/25/2015 05:38 PM, E. Wray Johnson wrote:
>>
>> Here is what I want to do: For a given folder and all its subfolders on my
>> physical dive, mirror its contents including the contents of archives,
>> parsing xml, json,html, text, etc. using their respective parser skipping
>> invalids, and adding all other files as raw. I want archive files (*.zip,
>> *.doxc) to be added as raw, however I want the text inside archive files
>> like docx (ms-word) to be indexed and any files in the archives files that
>> match a filter to be indexed.
>>
>> Note: It would be nice if there was a single db:add method that allowed me
>> to specify a map of filters to parsers with options, where all files that do
>> not match a filter (or are invalid) will be optionally added as raw.
> 


[basex-talk] db:create, namespaces not stripped when input is variable?

2015-11-26 Thread Hondros, Constantine (ELS-AMS)
Hello all,
Any particular reason behind this observed behaviour? (Basex 8.3).

..
[1]
..
declare variable $test :=
http://www.elsevier.com/xml/ani/common; 
xmlns="http://www.elsevier.com/xml/ani/ani;>
  M.S.
;

db:create('test', $test, 'blah', map {'stripns':true()})

..
[2]
..

db:open('test')/*

..
[3]
..

http://www.elsevier.com/xml/ani/ani; 
xmlns:ce="http://www.elsevier.com/xml/ani/common;>
  M.S.




I would have expected the namespaces to be shredded ...

Cheers,
Constantine





Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33156677, Registered in The Netherlands.