[MarkLogic Dev General] potential non-conformance with RFC 3986?

2017-03-22 Thread Jakob Fix
Hello,

we recently observed an unexpected behaviour in how MarkLogic treats the
keys in the query part of a submitted URL (note the case of the two query
param keys):

http://localhost:/app/test.xqy?param=yes=no

let $q1 := xdmp:get-request-field('param')
let $q2 := xdmp:get-request-field('Param')

return "q1: " || $q1 || " -- q2: " || $q2

one should reasonably expect to see the following result:

q1: yes -- q2: no

However, the actual result is an error because "arg1 is not of type
xs:anyAtomicType?"

XDMP-ARGTYPE: (err:XPTY0004) "q1: " || $q1 || " -- q2: " || $q2 -- arg1 is
not of type xs:anyAtomicType?in /app/test.xqy, at 7:11 [1.0-ml]
$q1 = ("yes", "no")
$q2 = ("yes", "no")

xdmp:get-request-field-names() correctly returns both 'param' and 'Param'.

For some reason, MarkLogic normalises (presumably lowercases) the keys of
the query string when looking up a query parameter value which seems to be
counter to what is described in section 6.2.2.1 Case normalisation of RFC
3986 [1]:

When a URI uses components of the generic syntax, the component syntax
equivalence rules always apply; namely, that the scheme and host are
case-insensitive and therefore should be normalized to lowercase. For
example, the URI  is equivalent to <
http://www.example.com/>. *The other generic syntax components are assumed
to be case-sensitive* unless specifically defined otherwise by the scheme
(see Section 6.2.3).

Are we interpreting the RFC wrongly?

Yes, I've tested this on 8.0-6.3.

cheers,
Jakob.

PS: Thanks to my colleague Romuald for mentioning this over beer! ;-)


[1] https://tools.ietf.org/html/rfc3986#section-6.2.2.1
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Regular Expressions

2017-03-22 Thread Mary Holstege
On Wed, 22 Mar 2017 12:55:27 -0700, Oleksii Segeda  
 wrote:

> Hi everyone,
>
> Quick questions regarding regex in ML:
>
>
> 1.   What's ML alternative to word boundaries \b? Seems that  
> fn:analyze-string doesn't support this special character.
>
> 2.   Does  JS version of this function (fn.analyzeString) use JS  
> regex engine? If so, why it gives me error for fn.analyzeString("foo bar  
> bar", "\\b(bar)\\b") ?

The regex language doesn't have non-capturing groups, but with some  
post-processing you could do this:

"(^|\W)(bar)(\W|$)"

or if you don't, then unroll it:

"^(bar)$|^(bar)\W|\W(bar)$|\W(bar)\W"

//Mary
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Regular Expressions

2017-03-22 Thread Oleksii Segeda
Hi Erik,

Unfortunately, all my codebase is in XQuery, but I need to use word-boundaries 
and non-matching groups. So far the only idea that comes to my mind is to use 
xdmp:javascript-eval.
I'm curious if this approach is considered as a normal practice.

Best,


Oleksii Segeda

IT Analyst

Information and Technology Solutions

[http://siteresources.worldbank.org/NEWS/Images/spacer.png]

[http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png]



From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Erik Hennum
Sent: Wednesday, March 22, 2017 4:58 PM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] Regular Expressions

Hi, Oleksii:

Regarding question 2, aside from a few edge cases,
the MarkLogic libraries have the same core implementation
with JavaScript and XQuery interfaces.

The core behavior of functions in the MarkLogic libraries are
(in almost every case) consistent across environments.

If you are working in JavaScript and the regex implementation
from v8 is a good fit for your requirements, you should take
advantage of JavaScript regex objects and methods.


Hoping that clarifies,

Erik Hennum


From: 
general-boun...@developer.marklogic.com
 [general-boun...@developer.marklogic.com] on behalf of Sewell, David R. 
(drs2n) [dsew...@virginia.edu]
Sent: Wednesday, March 22, 2017 1:40 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Regular Expressions
I'm not sure what the answer is to question 2, but for question 1, the answer 
is that MarkLogic's implementation of XPath doesn't support the \b character 
escape because it is not included in the  XPath specification for regular 
expressions, which itself is based on "XML Schema Part 2: Datatypes Second 
Edition". The only single-character escapes are these:

https://www.w3.org/TR/xmlschema-2/#nt-charClassEsc

Some XSLT and XQuery processors support extended regular expressions as a 
proprietary feature (for example, Saxon has a semi-documented extension that 
allows full Java regex), but MarkLogic doesn't (unless there is undocumented 
support that I don't know about).

David

On Mar 22, 2017, at 3:55 PM, Oleksii Segeda 
> wrote:

Hi everyone,

Quick questions regarding regex in ML:

1.   What's ML alternative to word boundaries \b? Seems that 
fn:analyze-string doesn't support this special character.
2.   Does  JS version of this function (fn.analyzeString) use JS regex 
engine? If so, why it gives me error for fn.analyzeString("foo bar bar", 
"\\b(bar)\\b") ?

Regards,

Oleksii Segeda

IT Analyst

Information and Technology Solutions







___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Regular Expressions

2017-03-22 Thread Erik Hennum
Hi, Oleksii:

Regarding question 2, aside from a few edge cases,
the MarkLogic libraries have the same core implementation
with JavaScript and XQuery interfaces.

The core behavior of functions in the MarkLogic libraries are
(in almost every case) consistent across environments.

If you are working in JavaScript and the regex implementation
from v8 is a good fit for your requirements, you should take
advantage of JavaScript regex objects and methods.


Hoping that clarifies,


Erik Hennum



From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Sewell, David R. (drs2n) 
[dsew...@virginia.edu]
Sent: Wednesday, March 22, 2017 1:40 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Regular Expressions

I’m not sure what the answer is to question 2, but for question 1, the answer 
is that MarkLogic’s implementation of XPath doesn’t support the \b character 
escape because it is not included in the  XPath specification for regular 
expressions, which itself is based on "XML Schema Part 2: Datatypes Second 
Edition”. The only single-character escapes are these:

https://www.w3.org/TR/xmlschema-2/#nt-charClassEsc

Some XSLT and XQuery processors support extended regular expressions as a 
proprietary feature (for example, Saxon has a semi-documented extension that 
allows full Java regex), but MarkLogic doesn’t (unless there is undocumented 
support that I don’t know about).

David

On Mar 22, 2017, at 3:55 PM, Oleksii Segeda 
> wrote:

Hi everyone,

Quick questions regarding regex in ML:

1.   What’s ML alternative to word boundaries \b? Seems that 
fn:analyze-string doesn’t support this special character.
2.   Does  JS version of this function (fn.analyzeString) use JS regex 
engine? If so, why it gives me error for fn.analyzeString("foo bar bar", 
"\\b(bar)\\b") ?

Regards,

Oleksii Segeda

IT Analyst

Information and Technology Solutions







___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Regular Expressions

2017-03-22 Thread Sewell, David R. (drs2n)
I’m not sure what the answer is to question 2, but for question 1, the answer 
is that MarkLogic’s implementation of XPath doesn’t support the \b character 
escape because it is not included in the  XPath specification for regular 
expressions, which itself is based on "XML Schema Part 2: Datatypes Second 
Edition”. The only single-character escapes are these:

https://www.w3.org/TR/xmlschema-2/#nt-charClassEsc

Some XSLT and XQuery processors support extended regular expressions as a 
proprietary feature (for example, Saxon has a semi-documented extension that 
allows full Java regex), but MarkLogic doesn’t (unless there is undocumented 
support that I don’t know about).

David

On Mar 22, 2017, at 3:55 PM, Oleksii Segeda 
> wrote:

Hi everyone,

Quick questions regarding regex in ML:

1.   What’s ML alternative to word boundaries \b? Seems that 
fn:analyze-string doesn’t support this special character.
2.   Does  JS version of this function (fn.analyzeString) use JS regex 
engine? If so, why it gives me error for fn.analyzeString("foo bar bar", 
"\\b(bar)\\b") ?

Regards,

Oleksii Segeda

IT Analyst

Information and Technology Solutions







___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] Regular Expressions

2017-03-22 Thread Oleksii Segeda
Hi everyone,

Quick questions regarding regex in ML:


1.   What's ML alternative to word boundaries \b? Seems that 
fn:analyze-string doesn't support this special character.

2.   Does  JS version of this function (fn.analyzeString) use JS regex 
engine? If so, why it gives me error for fn.analyzeString("foo bar bar", 
"\\b(bar)\\b") ?

Regards,

Oleksii Segeda

IT Analyst

Information and Technology Solutions

[http://siteresources.worldbank.org/NEWS/Images/spacer.png]

[http://siteresources.worldbank.org/NEWS/Images/WBG_Information_and_Technology_Solutions.png]



___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

2017-03-22 Thread Geert Josten
Valid points all, but MLCP warns about spaces in header names, and proceeds by 
converting them to underscores before generating XML out of them.

On the other hand, though unlikely nor practical, spaces in property names are 
allowed in JSON. ;-)

Cheers,
Geert

From: 
>
 on behalf of Florent Georges >
Reply-To: MarkLogic Developer Discussion 
>
Date: Wednesday, March 22, 2017 at 3:01 PM
To: MarkLogic Developer Discussion 
>
Subject: Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

Hi,

That is indeed the most likely explanation.  Just to make it clear to the OP, 
in such a situation an XML parser MUST stop normal processing (see e.g. 
http://w3.org/TR/xml/#sec-terminology, and the fact that having "" where a 
start tag is possible is ultimately breaking the document production rule).

When it comes to XML (in general, not only with MarkLogic), sometimes working 
around validity might the right solution, depending on the technical and 
non-technical context.  But having ill-formed documents never is.  Fixing 
ill-formedness is always less painful than any other solution.

Just my 2 cents.  Regards,

--
Florent Georges
H2O Consulting
http://h2o.consulting/


On 22 March 2017 at 14:14, Martijn Sintemaartensdijk wrote:
Dear Lucas,

judging from your command, I think your input file contains an XML-starttag 
"" and corresponding endtag "". Unfortunately, XML tag names 
may not contain empty spaces (See also: 
https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name).

MLCP tries to interpret the xml-file and it reports an unexpected character, 
">". MLCP assumes "_id" to be an attribute name to the tag name "uri", like 
. The next character following "_id" is therefore expected to 
be an equal sign.

I would advice you to request the output file be offered in accordance with the 
XML-specification, rather than trying to fix the document. Otherwise, I fear, 
you will be forced to use sed, or a something similar, to replace the malformed 
XML-tags through the entire document each and every time you receive a new 
version.


Met vriendelijke groet / Kind regards,



Martijn Sintemaartensdijk



[http://www.dikw.com/wp-content/uploads/2016/02/DIKW-logo-250x88-a.png]



A: Einsteinbaan 12, 3439 NJ Nieuwegein

T: (+31) 06 40 59 09 36

E: martijn.sintemaartensd...@dikw.com

W: www.dikw.nl



Hartelijk dank voor uw waardering en 
stem!



[banner 468x60 DIKW 
prijswinnaar]

On 21 March 2017 at 19:02, Lucas Davenport 
> wrote:
I am a newb, so forgive me if I missed this answer while searching.

I am testing ML 8 for a project at work and we have a requirement to load large 
amounts of historical data. I've read the mlcp documentation and can 
successfully import some test data, but the problem I am facing is the archive 
data has a space in the record identifier.

My command is:
 mlcp.sh import -host localhost -port 8006 -username dataload -password 
dataload -mode local -input_file_path ../xml/MD2014aggregate.xml 
-input_file_type aggregates -aggregate_record_element row -uri_id "row _id" 
-output_uri_prefix /traffic/MD -output_uri_suffix .xml -output_collections 
published

This produces the following error:
17/03/21 13:49:20 ERROR contentpump.ContentPump: Unrecognized argument: \_id

I've escaped both the space and the underscore (row\ _id and row\ \_id) and 
still get the same error. I've also wrapped in in single quotes and double 
quotes.

I'm trying to keep from having to use sed to remove the space between row and 
_id in the entire file.

Is there a way to make mlcp see the URI_ID literally as "row _id"?

Thanks in advance.

___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general



___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general





___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-22 Thread Kari Cowan
I don’t like to admit it but I had some test mode get all…. $doc/i:HTML 
returned just what it should, $doc//i:HTML returned the extras…. 

☺


From:  on behalf of Kari Cowan 

Reply-To: MarkLogic 
Date: Wednesday, March 22, 2017 at 8:12 AM
To: MarkLogic 
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

Righto – I’ll look to add such a function – thanks.

From:  on behalf of Christopher Hamlin 

Reply-To: MarkLogic 
Date: Wednesday, March 22, 2017 at 7:48 AM
To: MarkLogic 
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

My guess is that it's a big doc and hard to find the HTML tags?

Open the doc in an XML editor and search for *:HTML and it may show.

Also, those are both {incisive-repository}HTML nodes, even if there is a 
(surface) difference in prefix/namespace.  This is an example of why regex for 
xml strings can't cope.

It's hard to recommend anything (in detail) since I guess I don't undestand the 
requirements.

It's easy to say, though:  regex is not good for something like this.  You can 
use xslt or recursive xquery pretty easily in ML.

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-22 Thread Kari Cowan
Righto – I’ll look to add such a function – thanks.

From:  on behalf of Christopher Hamlin 

Reply-To: MarkLogic 
Date: Wednesday, March 22, 2017 at 7:48 AM
To: MarkLogic 
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

My guess is that it's a big doc and hard to find the HTML tags?

Open the doc in an XML editor and search for *:HTML and it may show.

Also, those are both {incisive-repository}HTML nodes, even if there is a 
(surface) difference in prefix/namespace.  This is an example of why regex for 
xml strings can't cope.

It's hard to recommend anything (in detail) since I guess I don't undestand the 
requirements.

It's easy to say, though:  regex is not good for something like this.  You can 
use xslt or recursive xquery pretty easily in ML.

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-22 Thread Christopher Hamlin
My guess is that it's a big doc and hard to find the HTML tags?

Open the doc in an XML editor and search for *:HTML and it may show.

Also, those are both {incisive-repository}HTML nodes, even if there is a
(surface) difference in prefix/namespace.  This is an example of why regex
for xml strings can't cope.

It's hard to recommend anything (in detail) since I guess I don't undestand
the requirements.

It's easy to say, though:  regex is not good for something like this.  You
can use xslt or recursive xquery pretty easily in ML.
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Using RegEx in xQuery

2017-03-22 Thread Kari Cowan
Thanks – these are good ideas and make sense, but as I dig into the data a 
little deeper I see something odd that doesn’t seem be working the way I would 
expect it.

Assume I inspected a document via:
doc("/data-sources/lawcom-contrib/sites/almstaff/2017/03/21/no-womans-land-cybersecurity-industry-suffers-from-gender-imbalance-discrimination.xml")

In that I can see 1 single HTML node starting with

... bunch of  child nodes and then 

Then directly followed by
http://luxid.temis.com/occurrence/attribute; 
xmlns:entityattr="http://luxid.temis.com/entity/attribute; 
xmlns:entity="http://luxid.temis.com/entity; 
xmlns:category="http://luxid.temis.com/category; xmlns="">
... bunch of  nodes and the same  nodes n the HTML set.

So in my view, there’s only 1 HTML node in the doc.

But when I do a directory query to return docs and write the value for 
$doc//ir:HTML

I get first

.. bunch of  child nodes and ending with 

Then

.. image
+ http://www.w3.org/1999/xhtml;>
... entity nodes and a duplicate of the  children in the first.

How come there's only 1 HTML node in the doc when inspecting it but when I do a 
directory query and write the HTML value with descendants, I get more than 1?

Does the XSLT notation and unwrap suggestion still make sense given that 
context?


From:  on behalf of Geert Josten 

Reply-To: MarkLogic 
Date: Monday, March 20, 2017 at 8:49 AM
To: MarkLogic 
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

You may want to unwrap entity:entity and suppress entity:entityattr instead, 
but otherwise this should work just fine all the way down to at least MarkLogic 
5.. :)

Cheers

From: 
>
 on behalf of Christopher Hamlin >
Reply-To: MarkLogic Developer Discussion 
>
Date: Monday, March 20, 2017 at 4:29 PM
To: MarkLogic Developer Discussion 
>
Subject: Re: [MarkLogic Dev General] Using RegEx in xQuery

I don't know off-hand of changes in xslt between 7 and 8.

Something like this in 8 is what I was thinking, don't know if it is really 
what you need:

let $doc := (: blah blah blah :)
let $xslt :=
http://www.w3.org/1999/XSL/Transform; 
xmlns:ir="incisive-repository">
  

  

  
  

return xdmp:xslt-eval ($xslt, $doc)
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

2017-03-22 Thread Florent Georges
Hi,

That is indeed the most likely explanation.  Just to make it clear to the
OP, in such a situation an XML parser MUST stop normal processing (see e.g.
http://w3.org/TR/xml/#sec-terminology, and the fact that having ""
where a start tag is possible is ultimately breaking the document
production rule).

When it comes to XML (in general, not only with MarkLogic), sometimes
working around validity might the right solution, depending on the
technical and non-technical context.  But having ill-formed documents never
is.  Fixing ill-formedness is always less painful than any other solution.

Just my 2 cents.  Regards,

-- 
Florent Georges
H2O Consulting
http://h2o.consulting/


On 22 March 2017 at 14:14, Martijn Sintemaartensdijk wrote:

> Dear Lucas,
>
> judging from your command, I think your input file contains an
> XML-starttag "" and corresponding endtag "".
> Unfortunately, XML tag names may not contain empty spaces (See also:
> https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name).
>
> MLCP tries to interpret the xml-file and it reports an unexpected
> character, ">". MLCP assumes "_id" to be an attribute name to the tag name
> "uri", like . The next character following "_id" is
> therefore expected to be an equal sign.
>
> I would advice you to request the output file be offered in accordance
> with the XML-specification, rather than trying to fix the document.
> Otherwise, I fear, you will be forced to use sed, or a something similar,
> to replace the malformed XML-tags through the entire document each and
> every time you receive a new version.
>
> Met vriendelijke groet / Kind regards,
>
>
>
> *Martijn Sintemaartensdijk*
>
>
>
>
>
>
> *A:* Einsteinbaan 12, 3439 NJ Nieuwegein
>
> *T:* (+31) 06 40 59 09 36
>
> *E:* martijn.sintemaartensd...@dikw.com
>
> *W:* www.dikw.nl
>
>
>
> Hartelijk dank voor uw waardering en stem!
> 
>
>
> [image: banner 468x60 DIKW prijswinnaar]
> 
>
>
> On 21 March 2017 at 19:02, Lucas Davenport 
> wrote:
>
>> I am a newb, so forgive me if I missed this answer while searching.
>>
>> I am testing ML 8 for a project at work and we have a requirement to load
>> large amounts of historical data. I've read the mlcp documentation and can
>> successfully import some test data, but the problem I am facing is the
>> archive data has a space in the record identifier.
>>
>> My command is:
>>  mlcp.sh import -host localhost -port 8006 -username dataload -password
>> dataload -mode local -input_file_path ../xml/MD2014aggregate.xml
>> -input_file_type aggregates -aggregate_record_element row -uri_id "row _id"
>> -output_uri_prefix /traffic/MD -output_uri_suffix .xml -output_collections
>> published
>>
>> This produces the following error:
>> 17/03/21 13:49:20 ERROR contentpump.ContentPump: Unrecognized argument:
>> \_id
>>
>> I've escaped both the space and the underscore (row\ _id and row\ \_id)
>> and still get the same error. I've also wrapped in in single quotes and
>> double quotes.
>>
>> I'm trying to keep from having to use sed to remove the space between row
>> and _id in the entire file.
>>
>> Is there a way to make mlcp see the URI_ID literally as "row _id"?
>>
>> Thanks in advance.
>>
>> ___
>> General mailing list
>> General@developer.marklogic.com
>> Manage your subscription at:
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>>
>
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

2017-03-22 Thread Martijn Sintemaartensdijk
Dear Lucas,

judging from your command, I think your input file contains an XML-starttag
"" and corresponding endtag "". Unfortunately, XML tag
names may not contain empty spaces (See also:
https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name).

MLCP tries to interpret the xml-file and it reports an unexpected
character, ">". MLCP assumes "_id" to be an attribute name to the tag name
"uri", like . The next character following "_id" is
therefore expected to be an equal sign.

I would advice you to request the output file be offered in accordance with
the XML-specification, rather than trying to fix the document. Otherwise, I
fear, you will be forced to use sed, or a something similar, to replace the
malformed XML-tags through the entire document each and every time you
receive a new version.

Met vriendelijke groet / Kind regards,



*Martijn Sintemaartensdijk*






*A:* Einsteinbaan 12, 3439 NJ Nieuwegein

*T:* (+31) 06 40 59 09 36

*E:* martijn.sintemaartensd...@dikw.com

*W:* www.dikw.nl



Hartelijk dank voor uw waardering en stem!



[image: banner 468x60 DIKW prijswinnaar]



On 21 March 2017 at 19:02, Lucas Davenport  wrote:

> I am a newb, so forgive me if I missed this answer while searching.
>
> I am testing ML 8 for a project at work and we have a requirement to load
> large amounts of historical data. I've read the mlcp documentation and can
> successfully import some test data, but the problem I am facing is the
> archive data has a space in the record identifier.
>
> My command is:
>  mlcp.sh import -host localhost -port 8006 -username dataload -password
> dataload -mode local -input_file_path ../xml/MD2014aggregate.xml
> -input_file_type aggregates -aggregate_record_element row -uri_id "row _id"
> -output_uri_prefix /traffic/MD -output_uri_suffix .xml -output_collections
> published
>
> This produces the following error:
> 17/03/21 13:49:20 ERROR contentpump.ContentPump: Unrecognized argument:
> \_id
>
> I've escaped both the space and the underscore (row\ _id and row\ \_id)
> and still get the same error. I've also wrapped in in single quotes and
> double quotes.
>
> I'm trying to keep from having to use sed to remove the space between row
> and _id in the entire file.
>
> Is there a way to make mlcp see the URI_ID literally as "row _id"?
>
> Thanks in advance.
>
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] problem with importing Reduency tuples.

2017-03-22 Thread Geert Josten
If you talk about semantics, you probably mean triples instead of tuples (which 
is a more generic term). If you use SPARQL to query your RDF data / triples, 
you don’t need to worry about duplicate triples. The triple/sparql engine will 
deduplicate for you automatically.

Kind regards,
Geert

From: 
>
 on behalf of NAVEEN KUMAR MOTIPALLI Computer Science & Engineering 
>
Reply-To: MarkLogic Developer Discussion 
>
Date: Wednesday, March 22, 2017 at 5:29 AM
To: "general@developer.marklogic.com" 
>
Subject: [MarkLogic Dev General] problem with importing Reduency tuples.

I working with semantics using ML 8 as backend database. i storing tuples into 
database. now the problem occurred is, it unable to detect reduency tuples 
inserting into ML. Is there any way to set to eliminate reduency tuples 
entering into database. or we want pre-process every tuples in ML before 
inserting into database.
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general