Re: [MarkLogic Dev General] Unfiltered, exact searches

2017-03-23 Thread Mary Holstege

Not so much a bug, as a consequence of how indexing works. Value queries, even 
exact values queries are word searches with a spanning constraint. If you 
really want an equality search, set up a range index and do a range query.

What is going on is that the index key for "new" (value exact) is same as the 
key for "NEW" (case-insensitive word) and as an unfiltered index resolution, 
the two cases cannot be distinguished without filtering. The flags are not 
baked into the lookup keys (if they were, index sixes would be much much 
larger). Phrases (in whitespace-separated languages) don't have this same issue 
because a word key cannot include the space between words, so an exact value 
query for "new status" will not match "New status"

//Mary

On 03/23/2017 12:40 AM, Andreas Hubmer wrote:
Hi,

There seems to be a bug related to unfiltered and exact value searches.

We are using value queries in the Java API, but I've boiled it down to cts 
searches.
The following snippet exhibits the wrong behavior:

xquery version "1.0-ml";
xdmp:document-insert("/bug/doc.xml", NEW)
;

"Document is found: OK",
cts:search(/,
  cts:and-query((cts:directory-query("/bug/", "infinity"), 
cts:element-value-query(xs:QName("status"), "NEW", ("exact",
  "unfiltered"
)

,"---",
"Document is not found: OK",
cts:search(/,
  cts:and-query((cts:directory-query("/bug/", "infinity"), 
cts:element-value-query(xs:QName("status"), "NEw", ("exact",
  "unfiltered"
)

,"---",
"Document is found: WRONG",
cts:search(/,
  cts:and-query((cts:directory-query("/bug/", "infinity"), 
cts:element-value-query(xs:QName("status"), "new", ("exact",
  "unfiltered"
)

We are using a database with fast-case-sensitive-searches and 
fast-diacritic-sensitive-searches turned on, while all other indexes are turned 
off.
As far as I know only the two indexes are needed for unfiltered exact value 
searches.

Regards,
Andreas



___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] Connecting to MarkLogic 9ea on CentOS via ODBC

2017-03-23 Thread Nick Heidke
All-

Are there any specific ODBC settings that need to be set in order to connect to 
a MarkLogic ODBC app server running on CentOS? We're using MarkLogic 9ea.

I've got an identical set up running locally on my Windows 10 machine with 
MarkLogic 8, and I'm able to connect to it via ODBC without issue.

Here's the error I'm seeing:

[cid:image001.jpg@01D2A3CB.40A01FD0]

Attached is a ODBC log file from tracing the event.

Nick Heidke
Business Intelligence Consultant
[Description: Description: cid:image002.png@01CDB779.DECB3E80]
5250 E Terrace Dr #130, Madison, WI
d:608.284.2040 ext. 2305 | c:920.385.9110
nick.hei...@omniresources.com

*This electronic message contains information from Omni Resources, Inc. 
that may be confidential or privileged. The information is intended to be used 
solely by the individual or entity named above. If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, distribution, 
or use of the contents of this information is prohibited. If you have received 
this transmission in error, please destroy it and notify us immediately by 
replying to this email. Omni Resources, Inc. does not represent, warrant or 
guarantee neither the integrity of this communication has been maintained nor 
that the communication is free of errors, viruses or interference. Omni 
Resources, Inc. assumes no responsibility for damages resulting from 
unauthorized access, disclosure or tampering, which could have occurred during 
transmission. 


SQL.LOG
Description: SQL.LOG
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp #CGO#

2017-03-23 Thread Jain, Abhishek
Hi Lucas,

I'm not sure if it's a solution but It may help. I did something similar while 
uploading from query console.
If you are open to use query console/ mlcp transform, the below code I used to 
handle uri even with blank spaces.
I just used xdmp-url-encode(uri). We can create custome uris as required. 
Ignore if not relevant.

xquery version "1.0-ml";
import module namespace info = "http://marklogic.com/appservices/infostudio; at 
"/MarkLogic/appservices/infostudio/info.xqy";
declare namespace ts = "http://marklogic.com/MLU/top-songs;;
let $path := "D:\test\songs"

for $d in xdmp:filesystem-directory($path)//dir:entry
let $filepath := $d/dir:pathname/string()
let $doc := xdmp:document-get($d//dir:pathname)
let $title := $doc/ts:top-song/ts:title/string()
let $artist := $doc/ts:top-song/ts:artist/string()
let $genre := $doc//ts:top-song//ts:genres/ts:genre/string()
let $ref-uri := fn:concat("/songs/",$artist,"/",$title,".xml")
let $options :=
  
  { xdmp:url-encode(xs:string($ref-uri)) }
  none
  {xdmp:default-permissions()}
  
songs-xml
{
  for $gen in $doc//ts:top-song//ts:genres/ts:genre
  return {$gen/string()}
}
  

let $database := "test"
let $genlen := fn:string-length(xdmp:url-encode(xs:string($ref-uri)))
return
xdmp:document-load($filepath,$options)

Thanks and Regards,
-Abhishek Jain

From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Geert Josten
Sent: Thursday, March 23, 2017 2:39 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

Sorry, ignore my reply, it only applies to delimited_text. Thanks to Martijn 
for pointing that out to me..

@Lucas, you did not mention XML parsing errors, so maybe your XML is just fine, 
and all you try to do is take an attribute value and use that as uri. 
Unfortunately, you can't do that with -uri_id, it only takes xml element and 
json property names. To be able to do that would require using MLCP transforms..

Kind regards,
Geert

From: 
>
 on behalf of Geert Josten 
>
Reply-To: MarkLogic Developer Discussion 
>
Date: Wednesday, March 22, 2017 at 8:18 PM
To: MarkLogic Developer Discussion 
>
Subject: Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

Valid points all, but MLCP warns about spaces in header names, and proceeds by 
converting them to underscores before generating XML out of them.

On the other hand, though unlikely nor practical, spaces in property names are 
allowed in JSON. ;-)

Cheers,
Geert

From: 
>
 on behalf of Florent Georges >
Reply-To: MarkLogic Developer Discussion 
>
Date: Wednesday, March 22, 2017 at 3:01 PM
To: MarkLogic Developer Discussion 
>
Subject: Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

Hi,

That is indeed the most likely explanation.  Just to make it clear to the OP, 
in such a situation an XML parser MUST stop normal processing (see e.g. 
http://w3.org/TR/xml/#sec-terminology, and the fact that having "" where a 
start tag is possible is ultimately breaking the document production rule).
When it comes to XML (in general, not only with MarkLogic), sometimes working 
around validity might the right solution, depending on the technical and 
non-technical context.  But having ill-formed documents never is.  Fixing 
ill-formedness is always less painful than any other solution.
Just my 2 cents.  Regards,

--
Florent Georges
H2O Consulting
http://h2o.consulting/

On 22 March 2017 at 14:14, Martijn Sintemaartensdijk wrote:
Dear Lucas,

judging from your command, I think your input file contains an XML-starttag 
"" and corresponding endtag "". Unfortunately, XML tag names 
may not contain empty spaces (See also: 
https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name).

MLCP tries to interpret the xml-file and it reports an unexpected character, 
">". MLCP assumes "_id" to be an attribute name to the tag name "uri", like 
. The next character following "_id" is therefore expected to 
be an equal sign.

I would advice you to request the output file be offered in accordance with the 
XML-specification, rather than trying to fix the document. Otherwise, I fear, 
you will be forced to use sed, or a something similar, to replace the malformed 
XML-tags through the entire document each and every time you receive a new 
version.


Met vriendelijke groet 

Re: [MarkLogic Dev General] potential non-conformance with RFC 3986?

2017-03-23 Thread Jakob Fix
Thanks Geert!

cheers,
Jakob.

On Thu, Mar 23, 2017 at 9:19 AM, Geert Josten 
wrote:

> This seems to sum up all relevant parts nicely:
>
> http://stackoverflow.com/questions/15641694/are-uris-
> case-insensitive/26196170#26196170
>
> And it seems to confirm your statements. I raised RFE #3921 on your
> behalf..
>
> Cheers,
> Geert
>
> From:  on behalf of Jakob Fix <
> jakob@gmail.com>
> Reply-To: MarkLogic Developer Discussion 
> Date: Thursday, March 23, 2017 at 1:02 AM
> To: General Mark Logic Developer Discussion  com>
> Subject: [MarkLogic Dev General] potential non-conformance with RFC 3986?
>
> Hello,
>
> we recently observed an unexpected behaviour in how MarkLogic treats the
> keys in the query part of a submitted URL (note the case of the two query
> param keys):
>
> http://localhost:/app/test.xqy?param=yes=no
>
> let $q1 := xdmp:get-request-field('param')
> let $q2 := xdmp:get-request-field('Param')
>
> return "q1: " || $q1 || " -- q2: " || $q2
>
> one should reasonably expect to see the following result:
>
> q1: yes -- q2: no
>
> However, the actual result is an error because "arg1 is not of type
> xs:anyAtomicType?"
>
> XDMP-ARGTYPE: (err:XPTY0004) "q1: " || $q1 || " -- q2: " || $q2 -- arg1 is
> not of type xs:anyAtomicType?in /app/test.xqy, at 7:11 [1.0-ml]
> $q1 = ("yes", "no")
> $q2 = ("yes", "no")
>
> xdmp:get-request-field-names() correctly returns both 'param' and 'Param'.
>
> For some reason, MarkLogic normalises (presumably lowercases) the keys of
> the query string when looking up a query parameter value which seems to be
> counter to what is described in section 6.2.2.1 Case normalisation of RFC
> 3986 [1]:
>
> When a URI uses components of the generic syntax, the component syntax
> equivalence rules always apply; namely, that the scheme and host are
> case-insensitive and therefore should be normalized to lowercase. For
> example, the URI  is equivalent to <
> http://www.example.com/>. *The other generic syntax components are
> assumed to be case-sensitive* unless specifically defined otherwise by
> the scheme (see Section 6.2.3).
>
> Are we interpreting the RFC wrongly?
>
> Yes, I've tested this on 8.0-6.3.
>
> cheers,
> Jakob.
>
> PS: Thanks to my colleague Romuald for mentioning this over beer! ;-)
>
>
> [1] https://tools.ietf.org/html/rfc3986#section-6.2.2.1
>
>
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

2017-03-23 Thread Geert Josten
Sorry, ignore my reply, it only applies to delimited_text. Thanks to Martijn 
for pointing that out to me..

@Lucas, you did not mention XML parsing errors, so maybe your XML is just fine, 
and all you try to do is take an attribute value and use that as uri. 
Unfortunately, you can’t do that with -uri_id, it only takes xml element and 
json property names. To be able to do that would require using MLCP transforms..

Kind regards,
Geert

From: 
>
 on behalf of Geert Josten 
>
Reply-To: MarkLogic Developer Discussion 
>
Date: Wednesday, March 22, 2017 at 8:18 PM
To: MarkLogic Developer Discussion 
>
Subject: Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

Valid points all, but MLCP warns about spaces in header names, and proceeds by 
converting them to underscores before generating XML out of them.

On the other hand, though unlikely nor practical, spaces in property names are 
allowed in JSON. ;-)

Cheers,
Geert

From: 
>
 on behalf of Florent Georges >
Reply-To: MarkLogic Developer Discussion 
>
Date: Wednesday, March 22, 2017 at 3:01 PM
To: MarkLogic Developer Discussion 
>
Subject: Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

Hi,

That is indeed the most likely explanation.  Just to make it clear to the OP, 
in such a situation an XML parser MUST stop normal processing (see e.g. 
http://w3.org/TR/xml/#sec-terminology, and the fact that having "" where a 
start tag is possible is ultimately breaking the document production rule).

When it comes to XML (in general, not only with MarkLogic), sometimes working 
around validity might the right solution, depending on the technical and 
non-technical context.  But having ill-formed documents never is.  Fixing 
ill-formedness is always less painful than any other solution.

Just my 2 cents.  Regards,

--
Florent Georges
H2O Consulting
http://h2o.consulting/


On 22 March 2017 at 14:14, Martijn Sintemaartensdijk wrote:
Dear Lucas,

judging from your command, I think your input file contains an XML-starttag 
"" and corresponding endtag "". Unfortunately, XML tag names 
may not contain empty spaces (See also: 
https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name).

MLCP tries to interpret the xml-file and it reports an unexpected character, 
">". MLCP assumes "_id" to be an attribute name to the tag name "uri", like 
. The next character following "_id" is therefore expected to 
be an equal sign.

I would advice you to request the output file be offered in accordance with the 
XML-specification, rather than trying to fix the document. Otherwise, I fear, 
you will be forced to use sed, or a something similar, to replace the malformed 
XML-tags through the entire document each and every time you receive a new 
version.


Met vriendelijke groet / Kind regards,



Martijn Sintemaartensdijk



[http://www.dikw.com/wp-content/uploads/2016/02/DIKW-logo-250x88-a.png]



A: Einsteinbaan 12, 3439 NJ Nieuwegein

T: (+31) 06 40 59 09 36

E: martijn.sintemaartensd...@dikw.com

W: www.dikw.nl



Hartelijk dank voor uw waardering en 
stem!



[banner 468x60 DIKW 
prijswinnaar]

On 21 March 2017 at 19:02, Lucas Davenport 
> wrote:
I am a newb, so forgive me if I missed this answer while searching.

I am testing ML 8 for a project at work and we have a requirement to load large 
amounts of historical data. I've read the mlcp documentation and can 
successfully import some test data, but the problem I am facing is the archive 
data has a space in the record identifier.

My command is:
 mlcp.sh import -host localhost -port 8006 -username dataload -password 
dataload -mode local -input_file_path ../xml/MD2014aggregate.xml 
-input_file_type aggregates -aggregate_record_element row -uri_id "row _id" 
-output_uri_prefix /traffic/MD -output_uri_suffix .xml -output_collections 
published

This produces the following error:
17/03/21 13:49:20 ERROR contentpump.ContentPump: Unrecognized argument: \_id

I've escaped both the space and the underscore (row\ _id and row\ \_id) and 
still get the same error. I've also wrapped in in single quotes and double 
quotes.

I'm trying to keep from having to use sed to remove the space between row and 
_id in the 

Re: [MarkLogic Dev General] Unfiltered, exact searches

2017-03-23 Thread Geert Josten
Hi Andreas,

Sounds like a bug indeed. It is as if it appends a case-insensitive flag 
despite the ‘exact’, because of the all-lowercase ’new’. Can you tell which 
version of MarkLogic you are running, and on which architecture?

Cheers,
Geert

From: 
>
 on behalf of Andreas Hubmer 
>
Reply-To: MarkLogic Developer Discussion 
>
Date: Thursday, March 23, 2017 at 8:40 AM
To: MarkLogic Developer Discussion 
>
Subject: [MarkLogic Dev General] Unfiltered, exact searches

Hi,

There seems to be a bug related to unfiltered and exact value searches.

We are using value queries in the Java API, but I've boiled it down to cts 
searches.
The following snippet exhibits the wrong behavior:

xquery version "1.0-ml";
xdmp:document-insert("/bug/doc.xml", NEW)
;

"Document is found: OK",
cts:search(/,
  cts:and-query((cts:directory-query("/bug/", "infinity"), 
cts:element-value-query(xs:QName("status"), "NEW", ("exact",
  "unfiltered"
)

,"---",
"Document is not found: OK",
cts:search(/,
  cts:and-query((cts:directory-query("/bug/", "infinity"), 
cts:element-value-query(xs:QName("status"), "NEw", ("exact",
  "unfiltered"
)

,"---",
"Document is found: WRONG",
cts:search(/,
  cts:and-query((cts:directory-query("/bug/", "infinity"), 
cts:element-value-query(xs:QName("status"), "new", ("exact",
  "unfiltered"
)

We are using a database with fast-case-sensitive-searches and 
fast-diacritic-sensitive-searches turned on, while all other indexes are turned 
off.
As far as I know only the two indexes are needed for unfiltered exact value 
searches.

Regards,
Andreas
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] potential non-conformance with RFC 3986?

2017-03-23 Thread Geert Josten
This seems to sum up all relevant parts nicely:

http://stackoverflow.com/questions/15641694/are-uris-case-insensitive/26196170#26196170

And it seems to confirm your statements. I raised RFE #3921 on your behalf..

Cheers,
Geert

From: 
>
 on behalf of Jakob Fix >
Reply-To: MarkLogic Developer Discussion 
>
Date: Thursday, March 23, 2017 at 1:02 AM
To: General Mark Logic Developer Discussion 
>
Subject: [MarkLogic Dev General] potential non-conformance with RFC 3986?

Hello,

we recently observed an unexpected behaviour in how MarkLogic treats the keys 
in the query part of a submitted URL (note the case of the two query param 
keys):

http://localhost:/app/test.xqy?param=yes=no

let $q1 := xdmp:get-request-field('param')
let $q2 := xdmp:get-request-field('Param')

return "q1: " || $q1 || " -- q2: " || $q2

one should reasonably expect to see the following result:

q1: yes -- q2: no

However, the actual result is an error because "arg1 is not of type 
xs:anyAtomicType?"

XDMP-ARGTYPE: (err:XPTY0004) "q1: " || $q1 || " -- q2: " || $q2 -- arg1 is not 
of type xs:anyAtomicType?in /app/test.xqy, at 7:11 [1.0-ml]
$q1 = ("yes", "no")
$q2 = ("yes", "no")

xdmp:get-request-field-names() correctly returns both 'param' and 'Param'.

For some reason, MarkLogic normalises (presumably lowercases) the keys of the 
query string when looking up a query parameter value which seems to be counter 
to what is described in section 6.2.2.1 Case normalisation of RFC 3986 [1]:

When a URI uses components of the generic syntax, the component syntax 
equivalence rules always apply; namely, that the scheme and host are 
case-insensitive and therefore should be normalized to lowercase. For example, 
the URI  is equivalent to . 
The other generic syntax components are assumed to be case-sensitive unless 
specifically defined otherwise by the scheme (see Section 6.2.3).

Are we interpreting the RFC wrongly?

Yes, I've tested this on 8.0-6.3.

cheers,
Jakob.

PS: Thanks to my colleague Romuald for mentioning this over beer! ;-)


[1] https://tools.ietf.org/html/rfc3986#section-6.2.2.1

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] Unfiltered, exact searches

2017-03-23 Thread Andreas Hubmer
Hi,

There seems to be a bug related to unfiltered and exact value searches.

We are using value queries in the Java API, but I've boiled it down to cts
searches.
The following snippet exhibits the wrong behavior:

xquery version "1.0-ml";
xdmp:document-insert("/bug/doc.xml", NEW)
;

"Document is found: OK",
cts:search(/,
  cts:and-query((cts:directory-query("/bug/", "infinity"),
cts:element-value-query(xs:QName("status"), "NEW", ("exact",
  "unfiltered"
)

,"---",
"Document is not found: OK",
cts:search(/,
  cts:and-query((cts:directory-query("/bug/", "infinity"),
cts:element-value-query(xs:QName("status"), "NEw", ("exact",
  "unfiltered"
)

,"---",
"Document is found: WRONG",
cts:search(/,
  cts:and-query((cts:directory-query("/bug/", "infinity"),
cts:element-value-query(xs:QName("status"), "new", ("exact",
  "unfiltered"
)

We are using a database with fast-case-sensitive-searches and
fast-diacritic-sensitive-searches turned on, while all other indexes are
turned off.
As far as I know only the two indexes are needed for unfiltered exact value
searches.

Regards,
Andreas
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general