Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-22 Thread Kari Cowan
I don’t like to admit it but I had some test mode get all…. $doc/i:HTML 
returned just what it should, $doc//i:HTML returned the extras…. 

☺


From: <general-boun...@developer.marklogic.com> on behalf of Kari Cowan 
<kco...@alm.com>
Reply-To: MarkLogic <general@developer.marklogic.com>
Date: Wednesday, March 22, 2017 at 8:12 AM
To: MarkLogic <general@developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

Righto – I’ll look to add such a function – thanks.

From: <general-boun...@developer.marklogic.com> on behalf of Christopher Hamlin 
<cbham...@gmail.com>
Reply-To: MarkLogic <general@developer.marklogic.com>
Date: Wednesday, March 22, 2017 at 7:48 AM
To: MarkLogic <general@developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

My guess is that it's a big doc and hard to find the HTML tags?

Open the doc in an XML editor and search for *:HTML and it may show.

Also, those are both {incisive-repository}HTML nodes, even if there is a 
(surface) difference in prefix/namespace.  This is an example of why regex for 
xml strings can't cope.

It's hard to recommend anything (in detail) since I guess I don't undestand the 
requirements.

It's easy to say, though:  regex is not good for something like this.  You can 
use xslt or recursive xquery pretty easily in ML.

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-22 Thread Kari Cowan
Righto – I’ll look to add such a function – thanks.

From: <general-boun...@developer.marklogic.com> on behalf of Christopher Hamlin 
<cbham...@gmail.com>
Reply-To: MarkLogic <general@developer.marklogic.com>
Date: Wednesday, March 22, 2017 at 7:48 AM
To: MarkLogic <general@developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

My guess is that it's a big doc and hard to find the HTML tags?

Open the doc in an XML editor and search for *:HTML and it may show.

Also, those are both {incisive-repository}HTML nodes, even if there is a 
(surface) difference in prefix/namespace.  This is an example of why regex for 
xml strings can't cope.

It's hard to recommend anything (in detail) since I guess I don't undestand the 
requirements.

It's easy to say, though:  regex is not good for something like this.  You can 
use xslt or recursive xquery pretty easily in ML.

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-22 Thread Christopher Hamlin
My guess is that it's a big doc and hard to find the HTML tags?

Open the doc in an XML editor and search for *:HTML and it may show.

Also, those are both {incisive-repository}HTML nodes, even if there is a
(surface) difference in prefix/namespace.  This is an example of why regex
for xml strings can't cope.

It's hard to recommend anything (in detail) since I guess I don't undestand
the requirements.

It's easy to say, though:  regex is not good for something like this.  You
can use xslt or recursive xquery pretty easily in ML.
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-22 Thread Kari Cowan
Thanks – these are good ideas and make sense, but as I dig into the data a 
little deeper I see something odd that doesn’t seem be working the way I would 
expect it.

Assume I inspected a document via:
doc("/data-sources/lawcom-contrib/sites/almstaff/2017/03/21/no-womans-land-cybersecurity-industry-suffers-from-gender-imbalance-discrimination.xml")

In that I can see 1 single HTML node starting with

... bunch of  child nodes and then 

Then directly followed by
http://luxid.temis.com/occurrence/attribute; 
xmlns:entityattr="http://luxid.temis.com/entity/attribute; 
xmlns:entity="http://luxid.temis.com/entity; 
xmlns:category="http://luxid.temis.com/category; xmlns="">
... bunch of  nodes and the same  nodes n the HTML set.

So in my view, there’s only 1 HTML node in the doc.

But when I do a directory query to return docs and write the value for 
$doc//ir:HTML

I get first

.. bunch of  child nodes and ending with 

Then

.. image
+ http://www.w3.org/1999/xhtml;>
... entity nodes and a duplicate of the  children in the first.

How come there's only 1 HTML node in the doc when inspecting it but when I do a 
directory query and write the HTML value with descendants, I get more than 1?

Does the XSLT notation and unwrap suggestion still make sense given that 
context?


From: <general-boun...@developer.marklogic.com> on behalf of Geert Josten 
<geert.jos...@marklogic.com>
Reply-To: MarkLogic <general@developer.marklogic.com>
Date: Monday, March 20, 2017 at 8:49 AM
To: MarkLogic <general@developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

You may want to unwrap entity:entity and suppress entity:entityattr instead, 
but otherwise this should work just fine all the way down to at least MarkLogic 
5.. :)

Cheers

From: 
<general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>>
 on behalf of Christopher Hamlin <cbham...@gmail.com<mailto:cbham...@gmail.com>>
Reply-To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Monday, March 20, 2017 at 4:29 PM
To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

I don't know off-hand of changes in xslt between 7 and 8.

Something like this in 8 is what I was thinking, don't know if it is really 
what you need:

let $doc := (: blah blah blah :)
let $xslt :=
http://www.w3.org/1999/XSL/Transform; 
xmlns:ir="incisive-repository">
  

  

  
  

return xdmp:xslt-eval ($xslt, $doc)
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-20 Thread Geert Josten
You may want to unwrap entity:entity and suppress entity:entityattr instead, 
but otherwise this should work just fine all the way down to at least MarkLogic 
5.. :)

Cheers

From: 
<general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>>
 on behalf of Christopher Hamlin <cbham...@gmail.com<mailto:cbham...@gmail.com>>
Reply-To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Date: Monday, March 20, 2017 at 4:29 PM
To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

I don't know off-hand of changes in xslt between 7 and 8.

Something like this in 8 is what I was thinking, don't know if it is really 
what you need:

let $doc := (: blah blah blah :)
let $xslt :=
http://www.w3.org/1999/XSL/Transform; 
xmlns:ir="incisive-repository">
  

  

  
  

return xdmp:xslt-eval ($xslt, $doc)
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-20 Thread Christopher Hamlin
I don't know off-hand of changes in xslt between 7 and 8.

Something like this in 8 is what I was thinking, don't know if it is really
what you need:

let $doc := (: blah blah blah :)
let $xslt :=
http://www.w3.org/1999/XSL/Transform; xmlns:ir="incisive-repository">
  

  

  
  

return xdmp:xslt-eval ($xslt, $doc)
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-19 Thread Kari Cowan
Thanks. I’ll look into that.  Do you have any links/examples particular to ML 7?


From: <general-boun...@developer.marklogic.com> on behalf of Christopher Hamlin 
<cbham...@gmail.com>
Reply-To: MarkLogic <general@developer.marklogic.com>
Date: Saturday, March 18, 2017 at 6:03 PM
To: MarkLogic <general@developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

Note that regex works on strings, not nodes.  It's easier I think to operate on 
nodes here.

You could do it with xslt also.  Start with an identity transform and then have 
{incisive-repository}HTML nodes with similar preceding-siblings transform to 
nothing.


___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-18 Thread Christopher Hamlin
Note that regex works on strings, not nodes.  It's easier I think to
operate on nodes here.

You could do it with xslt also.  Start with an identity transform and then
have {incisive-repository}HTML nodes with similar preceding-siblings
transform to nothing.
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-17 Thread Kari Cowan
Sure, in the same return, I am getting back 2 HTML elements.  So the first one 
is the one I want to keep.  The 2nd one I would like to do away with entirely.  
At the least to remove all child elements and I could strip the rest on the 
frontend if I had to.  

Here’s a pared-down example of the HTML elements in the same doc.  Is this what 
you wanted to see?


http://www.w3.org/1999/xhtml;>
http://images.legalweek.com/contrib/content/uploads/sites/378/2016/11/charlotte_stalin_01_revised-002-Article-201611151248.jpg;>
http://images.legalweek.com/contrib/content/uploads/sites/378/2016/11/charlotte_stalin_01_revised-002-Article-201611151248.jpg;
 width="616" height="372" />


http://www.w3.org/1999/xhtml;>Simmons  Simmons has appointed 
two new sector heads for
its financial institutions and life sciences groups.
http://www.w3.org/1999/xhtml;>London financial services partner 
Charlotte Stalin will lead
financial institutions while Paris disputes partner Alexandre
Reginault will head up life sciences. Both positions are
effective from 1 May for a three year term.


http://luxid.temis.com/occurrence/attribute; 
xmlns:entityattr="http://luxid.temis.com/entity/attribute; 
xmlns:entity="http://luxid.temis.com/entity; 
xmlns:category="http://luxid.temis.com/category; xmlns="incisive-repository" 
xmlns:i="incisive-repository">
http://www.w3.org/1999/xhtml;>
http://images.legalweek.com/contrib/content/uploads/sites/378/2016/11/charlotte_stalin_01_revised-002-Article-201611151248.jpg;>
http://images.legalweek.com/contrib/content/uploads/sites/378/2016/11/charlotte_stalin_01_revised-002-Article-201611151248.jpg;
 width="616" height="372" />


http://www.w3.org/1999/xhtml;>








Simmons  
Simmons has appointed two new sector heads for
its financial institutions and life sciences groups.
http://www.w3.org/1999/xhtml;>London financial services partner 
Charlotte Stalin will lead
financial institutions while Paris disputes partner Alexandre
Reginault will head up life sciences. Both positions are
effective from 1 May for a three year term.


On 3/17/17, 1:49 PM, "general-boun...@developer.marklogic.com on behalf of 
Justin Makeig"  wrote:

The general way to recursively transform one node tree to another in XQuery 
is via recursive descent 
.
 

I can't quite tell from your description what your transformation needs to 
do, though. "Strip out everything in between" what? Can you give a more fleshed 
out example of the input and your expected output?

Justin



> On Mar 17, 2017, at 1:28 PM, Kari Cowan  wrote:
> 
> Inside my return for a query, I have an HTML node that I don’t need that 
includes a bunch of child elements  -- I want to strip out everything in 
between; alternately I could be happy to remove the node entirely from the 
returned data.  Is there a function for that I don’t know about or is th is 
better done with a regex?  Anyone have a tip they can share?
>  
> http://luxid.temis.com/occurrence/attribute; 
xmlns:entityattr="http://luxid.temis.com/entity/attribute; 
xmlns:entity="http://luxid.temis.com/entity; 
xmlns:category="http://luxid.temis.com/category; xmlns="incisive-repository" 
xmlns:i="incisive-repository">  (bunches of child elements here )  
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at: 
> http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-17 Thread Justin Makeig
The general way to recursively transform one node tree to another in XQuery is 
via recursive descent 
.
 

I can't quite tell from your description what your transformation needs to do, 
though. "Strip out everything in between" what? Can you give a more fleshed out 
example of the input and your expected output?

Justin



> On Mar 17, 2017, at 1:28 PM, Kari Cowan  wrote:
> 
> Inside my return for a query, I have an HTML node that I don’t need that 
> includes a bunch of child elements  -- I want to strip out everything in 
> between; alternately I could be happy to remove the node entirely from the 
> returned data.  Is there a function for that I don’t know about or is th is 
> better done with a regex?  Anyone have a tip they can share?
>  
> http://luxid.temis.com/occurrence/attribute; 
> xmlns:entityattr="http://luxid.temis.com/entity/attribute; 
> xmlns:entity="http://luxid.temis.com/entity; 
> xmlns:category="http://luxid.temis.com/category; xmlns="incisive-repository" 
> xmlns:i="incisive-repository">  (bunches of child elements here )  
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at: 
> http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] Using RegEx in xQuery

2017-03-17 Thread Kari Cowan
Inside my return for a query, I have an HTML node that I don’t need that 
includes a bunch of child elements  -- I want to strip out everything in 
between; alternately I could be happy to remove the node entirely from the 
returned data.  Is there a function for that I don’t know about or is th is 
better done with a regex?  Anyone have a tip they can share?

http://luxid.temis.com/occurrence/attribute; 
xmlns:entityattr="http://luxid.temis.com/entity/attribute; 
xmlns:entity="http://luxid.temis.com/entity; 
xmlns:category="http://luxid.temis.com/category; xmlns="incisive-repository" 
xmlns:i="incisive-repository">  (bunches of child elements here )  
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general