Re: [MarkLogic Dev General] root collation vs unicode collation in terms of performance

2016-08-23 Thread Mary Holstege
On Tue, 23 Aug 2016 08:46:40 -0700, Tim Meagher  wrote:

> Just wondering why MarkLogic does not make codepoint the default  
> collation
> if it results in a 10% performance improvement.
>
>
> Tim

Let's not confuse the default appserver collation with the collation you  
might want to use on a range index or word lexicon.
There is no default for a range index or word lexicon: you need to pick  
when you configure them and you should pick what gives you the proper  
balance of functionality and performance.

The 10% faster stat was a measurement of running through the entire range  
index comparing every value, and it was made some time ago. It may have  
shifted a bit because we've done various work optimizing collations and  
various lexicon operations.  There are, however, cases where in practice  
the root collation is faster because it has smaller ranges of values to  
look at.  For example, if you are doing a case-insensitive  
diacritic-insensitive comparison using a codepoint collation word lexicon,  
since the variants can be widely separated in codepoint order and there  
are theoretical variants in the exciting reaches of Unicode that you have  
make sure you look for, you end up looking at a lot of needless cruft that  
is all sorted continguously in the root collation. So, the general rule of  
performance still applies: measure, because it is never what you think.  
Performance stats here are highly data and operation dependent.

The other thing to keep in mind is that the appserver default collation is  
what is used for basic comparisons and order by in your modules, and the  
codepoint ordering makes no sense to normal humans, who don't want to see  
deYoung before Darwin when they sort names, just because the codepoints  
for uppercase letters come first.

//Mary

>
>
> From: general-boun...@developer.marklogic.com
> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Yalaverthi,
> Sudheer (LNG-RDU)
> Sent: Tuesday, August 23, 2016 11:27 AM
> To: MarkLogic Developer Discussion 
> Subject: Re: [MarkLogic Dev General] root collation vs unicode collation  
> in
> terms of performance
>
>
> Hi,
>
>
> If anyone can share their experiences or knowledge in terms of which one
> works better in terms of performance, it will be very helpful.
>
>
> Thanks.
>
>
> -Sudheer
>
>
> From: general-boun...@developer.marklogic.com
> 
> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Yalaverthi,
> Sudheer (LNG-RDU)
> Sent: Monday, August 22, 2016 2:31 PM
> To: MarkLogic Developer Discussion   >
> Subject: [MarkLogic Dev General] root collation vs unicode collation in
> terms of performance
>
>
> Hi,
>
>
>
> In one of older developer community threads here
>   
> ,
> I have found this statement from Mary.
>
>
> "If you are not collapsing values, the codepoint collation
>
> is generally about 10% faster in its operations."
>
>
>
> We have few elements for which we need range indexes but these elements  
> do
> not have any diacritic sensitive information and they just store GUIDs or
> similar sort of values. I was initially thinking of using root collation
> indexes for this. But after reading the above thread, it made me wonder  
> if I
> have to be using codepoint collation for better performance. Since these
> elements do not have diacritic sensitive information anyway, I wonder if
> root collation performance will be in par with codepoint.
>
>
> Let me know which one is better in this scenario.
>
>
>
> Thanks,
>
> Sudheer
>
>


-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] root collation vs unicode collation in terms of performance

2016-08-23 Thread Christopher Hamlin
The old email from Mary explains more than I would know.

My takeaway, take it for what it's worth, is:

Use the correct collation for your 'text' indexes; use codepoint for
'value' indexes like UIDs and such.

You won't be worrying about whether your UID has diacritics, or whether it
sorts according to German or French rules (I'm guessing).

My experience is that codepoint is faster, but it will be a question of
what you do and what your indexes are.

If the indexes are small, or you only ever get the first value, it may not
matter (unless they grow . . .).

If you have huge indexes across many nodes and do things that require
sorting/unique-ifying then it can matter.

For something like a UUID, for example, it sounds like codepoint is the way
to go.

A little discussion here:

https://docs.marklogic.com/guide/search-dev/encodings_collations#id_70034

If you have data, you can test for your situation to see.


On Tue, Aug 23, 2016 at 11:46 AM, Tim Meagher  wrote:

> Just wondering why MarkLogic does not make codepoint the default collation
> if it results in a 10% performance improvement…
>
>
>
> Tim
>
>
>
> *From:* general-boun...@developer.marklogic.com [mailto:general-bounces@
> developer.marklogic.com] *On Behalf Of *Yalaverthi, Sudheer (LNG-RDU)
> *Sent:* Tuesday, August 23, 2016 11:27 AM
> *To:* MarkLogic Developer Discussion 
> *Subject:* Re: [MarkLogic Dev General] root collation vs unicode
> collation in terms of performance
>
>
>
> Hi,
>
>
>
> If anyone can share their experiences or knowledge in terms of which one
> works better in terms of performance, it will be very helpful.
>
>
>
> Thanks.
>
>
>
> -Sudheer
>
>
>
> *From:* general-boun...@developer.marklogic.com [mailto:general-bounces@
> developer.marklogic.com ] *On
> Behalf Of *Yalaverthi, Sudheer (LNG-RDU)
> *Sent:* Monday, August 22, 2016 2:31 PM
> *To:* MarkLogic Developer Discussion 
> *Subject:* [MarkLogic Dev General] root collation vs unicode collation in
> terms of performance
>
>
>
> Hi,
>
>
>
>
>
> In one of older developer community threads here
> ,
> I have found this statement from Mary.
>
>
>
> “If you are not collapsing values, the codepoint collation
>
> is generally about 10% faster in its operations.”
>
>
>
>
>
> We have few elements for which we need range indexes but these elements do
> not have any diacritic sensitive information and they just store GUIDs or
> similar sort of values. I was initially thinking of using root collation
> indexes for this. But after reading the above thread, it made me wonder if
> I have to be using codepoint collation for better performance. Since these
> elements do not have diacritic sensitive information anyway, I wonder if
> root collation performance will be in par with codepoint.
>
>
>
> Let me know which one is better in this scenario.
>
>
>
>
>
> Thanks,
>
> Sudheer
>
>
>
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] root collation vs unicode collation in terms of performance

2016-08-23 Thread Tim Meagher
Just wondering why MarkLogic does not make codepoint the default collation
if it results in a 10% performance improvement.

 

Tim

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Yalaverthi,
Sudheer (LNG-RDU)
Sent: Tuesday, August 23, 2016 11:27 AM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] root collation vs unicode collation in
terms of performance

 

Hi,

 

If anyone can share their experiences or knowledge in terms of which one
works better in terms of performance, it will be very helpful.

 

Thanks.

 

-Sudheer

 

From: general-boun...@developer.marklogic.com

[mailto:general-boun...@developer.marklogic.com] On Behalf Of Yalaverthi,
Sudheer (LNG-RDU)
Sent: Monday, August 22, 2016 2:31 PM
To: MarkLogic Developer Discussion mailto:general@developer.marklogic.com> >
Subject: [MarkLogic Dev General] root collation vs unicode collation in
terms of performance

 

Hi,

 

 

In one of older developer community threads here
 ,
I have found this statement from Mary.

 

"If you are not collapsing values, the codepoint collation

is generally about 10% faster in its operations."

 

 

We have few elements for which we need range indexes but these elements do
not have any diacritic sensitive information and they just store GUIDs or
similar sort of values. I was initially thinking of using root collation
indexes for this. But after reading the above thread, it made me wonder if I
have to be using codepoint collation for better performance. Since these
elements do not have diacritic sensitive information anyway, I wonder if
root collation performance will be in par with codepoint. 

 

Let me know which one is better in this scenario.

 

 

Thanks,

Sudheer

 

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] root collation vs unicode collation in terms of performance

2016-08-23 Thread Yalaverthi, Sudheer (LNG-RDU)
Hi,

If anyone can share their experiences or knowledge in terms of which one works 
better in terms of performance, it will be very helpful.

Thanks.

-Sudheer

From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Yalaverthi, 
Sudheer (LNG-RDU)
Sent: Monday, August 22, 2016 2:31 PM
To: MarkLogic Developer Discussion 
Subject: [MarkLogic Dev General] root collation vs unicode collation in terms 
of performance

Hi,


In one of older developer community threads 
here, 
I have found this statement from Mary.


"If you are not collapsing values, the codepoint collation
is generally about 10% faster in its operations."


We have few elements for which we need range indexes but these elements do not 
have any diacritic sensitive information and they just store GUIDs or similar 
sort of values. I was initially thinking of using root collation indexes for 
this. But after reading the above thread, it made me wonder if I have to be 
using codepoint collation for better performance. Since these elements do not 
have diacritic sensitive information anyway, I wonder if root collation 
performance will be in par with codepoint.

Let me know which one is better in this scenario.


Thanks,
Sudheer

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Adding PDF to an existing json document using Patch

2016-08-23 Thread Erik Hennum
Hi, Shiv:

A patch request returns a 204 if none of the operations match on the patched 
document.

That's an indication to check the paths in the operations to make sure they 
match the structure and vocabulary of the target JSON document.

The patch from your earlier post would change this document:

{..., "parent":{..., "childe":{...}, ...}, ...}

into this document

{..., "parent":{..., "childe":{..., "image":"/example/scdhhs_fm_300.pdf"}, 
...}, ...}

Without the actual patch operations and actual target document structure, no 
one can make more specific suggestions.


Erik Hennum



From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Shiv Shankar 
[shiv.shivshan...@gmail.com]
Sent: Tuesday, August 23, 2016 3:12 AM
To: general@developer.marklogic.com
Subject: Re: [MarkLogic Dev General] General Digest, Vol 146, Issue 46

Hi Erik,
Thanks for quick reply,
With the error message, I see we can PATCH only json/XML.

So, to have the PDF linked to current document, I would like to do the 
following.
1. Load binary document with a URI  ( ex: /documents/drivier-license.pdf)
2. Add this uri ( /documents/drivier-license.pdf) to existing document as a 
patch)

When I did the same above, i.e adding a json as part of content 
{"uri":"/documents/drivier-license.pdf"} as a PATCH, it is coming as 204 
unchanged.
Please advice.

Regards
Shiv.

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] General Digest, Vol 146, Issue 46

2016-08-23 Thread Shiv Shankar
gt; > two ways to store and handle triples in Marklogic : UNMANAGED TRIPLES and
> > MANAGED TRIPLE, we can discuss much if you want.
> >
> > Follow the link to see how I developped that small application :
> > https://github.com/yimengael/marklogic-dataframework
> >
> > Gael
> >
> >
> > Ga?l.
> >
> > --
> >
> >
> >
> > On Wed, Aug 17, 2016 at 1:56 AM, Jain, Abhishek <
> > abhishek.b.j...@capgemini.com> wrote:
> >
> > Hi Forks,
> >
> >
> >
> > I am a newbie to marklogic, I just want to migrate all my RDBMS data into
> > marklogic.
> >
> > If anyone can just share with me the idea how to model simple
> > Employee-Department
> >
> > Relationship in Marklogic and run join query . Do we need triples to
> > achieve this ? If yes a simple example will do.
> >
> > Thanks in advance.
> >
> >
> >
> > Thanks and Regards,
> >
> > [image: Email_CBE.gif]Abhishek Jain
> >
> > Associate Consultant
> >
> > *People matter, results count.*
> >
> >
> >
> > This message contains information that may be privileged or confidential
> > and is the property of the Capgemini Group. It is intended only for the
> > person to whom it is addressed. If you are not the intended recipient,
> you
> > are not authorized to read, print, retain, copy, disseminate, distribute,
> > or use this message or any part thereof. If you receive this message in
> > error, please notify the sender immediately and delete all copies of this
> > message.
> >
> >
> > ___
> > General mailing list
> > General@developer.marklogic.com
> > Manage your subscription at:
> > http://developer.marklogic.com/mailman/listinfo/general
> >
> >
> >
> -- next part --
> An HTML attachment was scrubbed...
> URL: http://developer.marklogic.com/pipermail/general/
> attachments/20160822/e09a157c/attachment-0001.html
> -- next part --
> A non-text attachment was scrubbed...
> Name: image002.gif
> Type: image/gif
> Size: 1616 bytes
> Desc: not available
> Url : http://developer.marklogic.com/pipermail/general/
> attachments/20160822/e09a157c/attachment-0002.gif
> -- next part --
> A non-text attachment was scrubbed...
> Name: image002.gif
> Type: image/gif
> Size: 1616 bytes
> Desc: not available
> Url : http://developer.marklogic.com/pipermail/general/
> attachments/20160822/e09a157c/attachment-0003.gif
>
> --
>
> Message: 2
> Date: Tue, 23 Aug 2016 02:29:33 +
> From: Erik Hennum 
> Subject: Re: [MarkLogic Dev General] Adding PDF to an existing json
> documentusing Patch
> To: MarkLogic Developer Discussion 
> Message-ID:
>  marklogic.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi, Shiv:
>
> Sorry, but I don't understand the question.
>
> You cannot store a PDF document and JSON document in a single document.
>
> A PDF document is a binary.  A JSON document is a structured tree.
>
> You cannot patch a binary document.  It is a black box without addressable
> structure.
>
> What are the uris for the associated PDF and JSON documents?  How are they
> ingested.
>
>
> Erik Hennum
>
>
> 
> From: general-boun...@developer.marklogic.com [general-bounces@developer.
> marklogic.com] on behalf of Shiv Shankar [shiv.shivshan...@gmail.com]
> Sent: Monday, August 22, 2016 11:01 AM
> To: general@developer.marklogic.com
> Subject: [MarkLogic Dev General] Adding PDF to an existing json document
> using Patch
>
> Hi Erik Hennum
>
> If I go with PUT, it is replacing the entire document,which is not
> acceptable.
>
> How to add json document+pdf in one document ?
>
> I used below approach
> 1. Added a PDF as a separate document and tried adding that uri as part of
> PATCH to referring document, but getting message 204
>
> curl  --basic --user  user:pwd -X POST -d@./patch.json \
>  -i -H "Content-type: application/json" \
>   -H "X-HTTP-Method-Override: PATCH" \
>   'http://localbox:9004/LATEST/documents?uri=/test/LW88899'
>
> and patch.json is
>
> {"pathlang": "jsonpath",
>
>   "patch": [
>
> {
>
>   "insert": {
>
> "context": "$.parent.childe",
>
> "position": "last-child",
>
> "content": {
>
>   "image": "/example/scdhhs_fm_300.pdf"
>
> }
>
>   }
>
> }
>
>   ]
>
> }
>
> Message:
> Response:
> HTTP/1.1 204 Unchanged
> Server: MarkLogic
> Content-Length: 0
> Connection: Keep-Alive
> Keep-Alive: timeout=5
>
> -- next part --
> An HTML attachment was scrubbed...
> URL: http://developer.marklogic.com/pipermail/general/
> attachments/20160823/914da743/attachment.html
>
> --
>
> ___
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
> End of General Digest, Vol 146, Issue 46
> 
>
___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Model Employee-Department relationship into marklogic, Running Join queries. #CGO#

2016-08-23 Thread Jain, Abhishek
Thanks a lot Gael !!  I will fiddle it. :)

Thanks and Regards,
[Email_CBE.gif]Abhishek Jain
Associate Consultant
Capgemini India | Hyderabad
People matter, results count.

From: Gaël YIMEN YIMGA [mailto:yimeng...@gmail.com]
Sent: Tuesday, August 23, 2016 4:34 AM
To: Jain, Abhishek
Cc: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Model Employee-Department relationship 
into marklogic, Running Join queries. #CGO#

Hello Abhi,

You definitely need a web application server like Tomcat, it's the one that I 
used.

You could use maven to build the project as a WAR and then deploy that WAR file 
in Tomcat. Then the project works as a web service, so You can test it using a 
tool like POSTMAN or FIDDLER or simply a command Line tool like CURL.

Regards
Gael
On Mon, Aug 22, 2016 at 7:14 AM Jain, Abhishek 
mailto:abhishek.b.j...@capgemini.com>> wrote:
Hi  Gael,

Thanks, I found it so useful but I am new to JAVA, I tried my best to install 
it.  however-


1.   I set up the code through eclipse MVN build fails (MVN clean looks 
good) , do we need any app server ?

2.   Can you please send me quick steps to use it OR add some description 
to GitHub repository?


A few details will help me kick start my project.

Thanks and Regards,
[Email_CBE.gif]Abhishek Jain
Associate Consultant
Capgemini India | Hyderabad
People matter, results count.

From: Jain, Abhishek
Sent: Thursday, August 18, 2016 12:19 PM
To: 'Gaël YIMEN YIMGA'
Cc: MarkLogic Developer Discussion
Subject: RE: [MarkLogic Dev General] Model Employee-Department relationship 
into marklogic, Running Join queries. #CGO#

Hi Gael,

Thanks for a quick response, It looks like a solution. I will come up with some 
queries surely.

Thanks and Regards,
[Email_CBE.gif]Abhishek Jain
Associate Consultant
Capgemini India | Hyderabad
Mob: +91-9030744998 Ext:4028950
People matter, results count.

From: Gaël YIMEN YIMGA [mailto:yimeng...@gmail.com]
Sent: Wednesday, August 17, 2016 10:43 PM
To: Jain, Abhishek
Cc: MarkLogic Developer Discussion

Subject: Re: [MarkLogic Dev General] Model Employee-Department relationship 
into marklogic, Running Join queries. #CGO#

Hello Jain,
I had developped a simple application using a simple model Employee-Department 
relationship using Java API of MarkLogic. To give some answers to your reply, 
Yes sure you could need triples to achieve this all depends on how you need to 
store your data and how to query them. There is two ways to store and handle 
triples in Marklogic : UNMANAGED TRIPLES and MANAGED TRIPLE, we can discuss 
much if you want.
Follow the link to see how I developped that small application : 
https://github.com/yimengael/marklogic-dataframework
Gael

Gaël.
--

On Wed, Aug 17, 2016 at 1:56 AM, Jain, Abhishek 
mailto:abhishek.b.j...@capgemini.com>> wrote:
Hi Forks,

I am a newbie to marklogic, I just want to migrate all my RDBMS data into 
marklogic.
If anyone can just share with me the idea how to model simple 
Employee-Department
Relationship in Marklogic and run join query . Do we need triples to achieve 
this ? If yes a simple example will do.
Thanks in advance.

Thanks and Regards,
[Email_CBE.gif]Abhishek Jain
Associate Consultant
People matter, results count.

This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.

___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general