Re: [MarkLogic Dev General] SQL Query Performance and Tableau

2015-04-29 Thread Javier Lizarraga
Hi Sara,

Can you please provide a copy to me as well.  We are currently evaluating using 
Tableau with MarkLogic, it would be very helpful.

Thanks,

Javier

From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Sara Mazer
Sent: Wednesday, April 29, 2015 10:49 AM
To: proverb...@hotmail.com; general@developer.marklogic.com
Cc: Scott Fowler
Subject: Re: [MarkLogic Dev General] SQL Query Performance and Tableau

Hi Freddy,
We are working on a document that contains best practices for using MarkLogic 
and Tableau.  My colleague, Scott Fowler, will send it over as soon as it is 
ready.

What I meant by grouping in Tableau is to limit the GROUP BY in SQL when 
possible ( i.e. use DISTINCT instead of GROUP BY 1).  Also, yes, a trade-off 
sometimes against performance in ML and limiting data returned to Tableau, but 
I've often found it faster to do grouping and ordering in Tableau.  Also, I 
usually see a huge performance increase and ability to limit data returned to 
Tableau when using, and pushing, as much logic as possible to the right of a 
SQL MATCH keyword.
Sara

From: 
general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Freddy Holwerda
Sent: Tuesday, April 28, 2015 4:57 AM
To: general@developer.marklogic.commailto:general@developer.marklogic.com
Subject: Re: [MarkLogic Dev General] SQL Query Performance and Tableau


Dear Sara,

Thank you for your elaborate reply on this topic. I am also investigating a 
scenario in which SQL would be used as entry point to ML.

Is there any document available that actually describes the practices you refer 
to, eg. concerning the data model, which operations to do client/server side 
and how that relates to minimizing the data that comes back, translating an SQL 
scenario to a XQuery/Javascript scenario, etc..

Specific to your reply below: I do assume that doing the grouping at the side 
of Tableau instead of ML, would actually maximize, not minimize the data 
retrieved from ML?

Thank you in advance,

Freddy Holwerda


 From: 
 general-requ...@developer.marklogic.commailto:general-requ...@developer.marklogic.com
 Subject: General Digest, Vol 130, Issue 106
 To: general@developer.marklogic.commailto:general@developer.marklogic.com
 Date: Mon, 27 Apr 2015 22:52:56 -0700

 Send General mailing list submissions to
 general@developer.marklogic.commailto:general@developer.marklogic.com

 To subscribe or unsubscribe via the World Wide Web, visit
 http://developer.marklogic.com/mailman/listinfo/general
 or, via email, send a message with subject or body 'help' to
 general-requ...@developer.marklogic.commailto:general-requ...@developer.marklogic.com

 You can reach the person managing the list at
 general-ow...@developer.marklogic.commailto:general-ow...@developer.marklogic.com

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of General digest...


 Today's Topics:

 1. SQL Query Performance and Tableau Connection (Cynthia Jiang)
 2. Re: SQL Query Performance and Tableau Connection (Sara Mazer)
 3. Is data also store in forest (Shashidhar Rao)


 --

 Message: 1
 Date: Mon, 27 Apr 2015 20:56:29 +
 From: Cynthia Jiang ji...@rdacorp.commailto:ji...@rdacorp.com
 Subject: [MarkLogic Dev General] SQL Query Performance and Tableau
 Connection
 To: General@developer.marklogic.commailto:General@developer.marklogic.com
 General@developer.marklogic.commailto:General@developer.marklogic.com
 Cc: Walt Rolle ro...@rdacorp.commailto:ro...@rdacorp.com
 Message-ID:
 778d42e304ff4316bd367fbcae250...@corpexch-mb1.rdacorp.commailto:778d42e304ff4316bd367fbcae250...@corpexch-mb1.rdacorp.com
 Content-Type: text/plain; charset=us-ascii

 Hello,

 We have been testing MarkLogic 8 downloaded from your website for a POC.

 Our environment is very straight forward.

 A single VM (windows 7 Enterprise - 64bit) with 16GM RAM and 2 Intel (E5-2690 
 v2) 3GHz processors.

 We have around 4 SQL tables that we modeled in Json format so each row ends 
 up becoming a document, and here are some of the stats:

 Patient : 8 fields / 20,924 documents
 PatientAddress: 5 fields / 36,193 documents
 Claim: less than 30 fields / 335,349 documents
 ClaimDiagnosis: 2 fields / 8,676 documents

 We created element range index for all the fields, and created 4 views that 
 each represent a SQL table.

 Then we were able to use SQL to query the data and get the result in the 
 query console.

 But the performance is not nearly as what we expected, we thought it would 
 work faster than SQL environment.

 A simple query like the one below took 30 seconds to run, and the same query 
 took 3 seconds to run with a larger data set in a SQL environment with 6GB 
 memory and 4 processors.

 select c.Patientid, ClaimNumber, FirstName, LastName, sum (PharmacyTotalCost

Re: [MarkLogic Dev General] SQL Query Performance and Tableau

2015-04-29 Thread Sara Mazer
Hi Freddy,
We are working on a document that contains best practices for using MarkLogic 
and Tableau.  My colleague, Scott Fowler, will send it over as soon as it is 
ready.

What I meant by grouping in Tableau is to limit the GROUP BY in SQL when 
possible ( i.e. use DISTINCT instead of GROUP BY 1).  Also, yes, a trade-off 
sometimes against performance in ML and limiting data returned to Tableau, but 
I've often found it faster to do grouping and ordering in Tableau.  Also, I 
usually see a huge performance increase and ability to limit data returned to 
Tableau when using, and pushing, as much logic as possible to the right of a 
SQL MATCH keyword.
Sara

From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Freddy Holwerda
Sent: Tuesday, April 28, 2015 4:57 AM
To: general@developer.marklogic.com
Subject: Re: [MarkLogic Dev General] SQL Query Performance and Tableau


Dear Sara,

Thank you for your elaborate reply on this topic. I am also investigating a 
scenario in which SQL would be used as entry point to ML.

Is there any document available that actually describes the practices you refer 
to, eg. concerning the data model, which operations to do client/server side 
and how that relates to minimizing the data that comes back, translating an SQL 
scenario to a XQuery/Javascript scenario, etc..

Specific to your reply below: I do assume that doing the grouping at the side 
of Tableau instead of ML, would actually maximize, not minimize the data 
retrieved from ML?

Thank you in advance,

Freddy Holwerda


 From: general-requ...@developer.marklogic.com
 Subject: General Digest, Vol 130, Issue 106
 To: general@developer.marklogic.com
 Date: Mon, 27 Apr 2015 22:52:56 -0700

 Send General mailing list submissions to
 general@developer.marklogic.commailto:general@developer.marklogic.com

 To subscribe or unsubscribe via the World Wide Web, visit
 http://developer.marklogic.com/mailman/listinfo/general
 or, via email, send a message with subject or body 'help' to
 general-requ...@developer.marklogic.commailto:general-requ...@developer.marklogic.com

 You can reach the person managing the list at
 general-ow...@developer.marklogic.commailto:general-ow...@developer.marklogic.com

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of General digest...


 Today's Topics:

 1. SQL Query Performance and Tableau Connection (Cynthia Jiang)
 2. Re: SQL Query Performance and Tableau Connection (Sara Mazer)
 3. Is data also store in forest (Shashidhar Rao)


 --

 Message: 1
 Date: Mon, 27 Apr 2015 20:56:29 +
 From: Cynthia Jiang ji...@rdacorp.commailto:ji...@rdacorp.com
 Subject: [MarkLogic Dev General] SQL Query Performance and Tableau
 Connection
 To: General@developer.marklogic.commailto:General@developer.marklogic.com
 General@developer.marklogic.commailto:General@developer.marklogic.com
 Cc: Walt Rolle ro...@rdacorp.commailto:ro...@rdacorp.com
 Message-ID:
 778d42e304ff4316bd367fbcae250...@corpexch-mb1.rdacorp.commailto:778d42e304ff4316bd367fbcae250...@corpexch-mb1.rdacorp.com
 Content-Type: text/plain; charset=us-ascii

 Hello,

 We have been testing MarkLogic 8 downloaded from your website for a POC.

 Our environment is very straight forward.

 A single VM (windows 7 Enterprise - 64bit) with 16GM RAM and 2 Intel (E5-2690 
 v2) 3GHz processors.

 We have around 4 SQL tables that we modeled in Json format so each row ends 
 up becoming a document, and here are some of the stats:

 Patient : 8 fields / 20,924 documents
 PatientAddress: 5 fields / 36,193 documents
 Claim: less than 30 fields / 335,349 documents
 ClaimDiagnosis: 2 fields / 8,676 documents

 We created element range index for all the fields, and created 4 views that 
 each represent a SQL table.

 Then we were able to use SQL to query the data and get the result in the 
 query console.

 But the performance is not nearly as what we expected, we thought it would 
 work faster than SQL environment.

 A simple query like the one below took 30 seconds to run, and the same query 
 took 3 seconds to run with a larger data set in a SQL environment with 6GB 
 memory and 4 processors.

 select c.Patientid, ClaimNumber, FirstName, LastName, sum (PharmacyTotalCost) 
 as PharmacyTotalCost
 from claim c inner join patient p on c.patientid = p.patientid left outer 
 join patientaddress a on p.patientid = a.patientid
 where a.[State] = 'LA'
 group by c.PatientId, ClaimNumber, FirstName, LastName order by 
 SUM(PharmacyTotalCost) desc limit 10

 We also tested out the connection between MarkLogic and Tableau 9 trial 
 version through MarkLogic 64 bit ODBC connector on the same VM.
 The connection is not stable, got lost very frequently, and sometimes it 
 won't stop running for 20 minutes after adding one dimension and one simple 
 measure the columns and rows shelves.

 We are under a very tight

Re: [MarkLogic Dev General] SQL Query Performance and Tableau Connection

2015-04-28 Thread Brad Rix
One comment about your specific query.You have a left outer join against 
the patientaddress table.  This can actually be written as an inner join with 
the same results since you are restricting the right side of the outer join 
with the where clause.   You can try your query as an inner join and see if 
that improves your performance.The relational database engine probably 
optimized the query by re-writing it as an inner join.If you did not have 
the a.State='LA', then it would have to remain an outer join.   The left 
outer join might be greatly effecting performance.Since you are also using 
group by and order by, the total amount of results might also effect the 
performance.

select c.Patientid, ClaimNumber, FirstName, LastName, sum (PharmacyTotalCost) 
as PharmacyTotalCost
  from claim c inner join patient p on c.patientid = p.patientid
  left outer join patientaddress a on p.patientid = a.patientid
  where a.[State] = 'LA'
  group by  c.PatientId, ClaimNumber, FirstName, LastName  order by 
SUM(PharmacyTotalCost) desc limit 10

Note:  You can see a reference in mysql as an example about when outer joins 
can be re-written to inner joins:  
https://dev.mysql.com/doc/refman/5.0/en/outer-join-simplification.html

Brad Rix | Technical Lead
O: +1 303 542 2172 | M: +1 303 915 2771
Skype: Brad.Rix | Google Talk: 
bradford@gmail.commailto:bradford@gmail.com

From: 
general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Sara Mazer
Sent: Monday, April 27, 2015 6:30 PM
To: MarkLogic Developer Discussion; ji...@rdacorp.commailto:ji...@rdacorp.com
Cc: Scott Fowler; Walt Rolle
Subject: Re: [MarkLogic Dev General] SQL Query Performance and Tableau 
Connection

Hi Cynthia,
My name is Sara Mazer, and I am a solutions consultant in the DC area.  I've 
worked a lot with Tableau and MarkLogic and would like to set up some time to 
discuss your evaluation and give you some best practices so that you can get 
the best performance possible.  I was the person who worked with Tableau to add 
MarkLogic as a named connection and supported their certification of MarkLogic, 
and I think I can help you.

I have a too many best practices to go over by email, so I'd like to speak with 
you and walk you through setting up databases for successful analytics through 
WebEx.  Would that be possible?  I have a meeting from 9:30-11:00 but can move 
any other meeting I have tomorrow to suit your schedule, just let me know a 
good time for you.

In general, I plan on:
Recommending the best data model for MarkLogic of this type of data (hint: it 
doesn't look like what would be in an RDBMs)
Showing you a demo of patient and claims data with Tableau and review how it 
was done
Discussing the best practices of working with Tableau (minimize the data coming 
back from MarkLogic by using custom SQL for charts, using MATCH keyword, having 
Tableau do grouping/sorting instead of in SQL)
Reviewing common connection issues between MarkLogic and Tableau and how to 
avoid them (most likely you are corrupting your SQL views:a best practice is to 
have a unique schema database for each MarkLogic database)
Reviewing performance of XQuery/JavaScript vs. SQL in MarkLogic

Best regards,
Sara Mazer
sara dot mazer at marklogic dot com
This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.



From: 
general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Cynthia Jiang
Sent: Monday, April 27, 2015 4:56 PM
To: General@developer.marklogic.commailto:General@developer.marklogic.com
Cc: Walt Rolle
Subject: [MarkLogic Dev General] SQL Query Performance and Tableau Connection

Hello,

We have been testing MarkLogic 8 downloaded from your website for a POC.

Our environment is very straight forward.

A single VM (windows 7 Enterprise - 64bit) with 16GM RAM and 2 Intel (E5-2690 
v2) 3GHz processors.

We have around 4 SQL tables that we modeled in Json format so each row ends up 
becoming a document, and here are some of the stats:

Patient : 8 fields / 20,924 documents
PatientAddress: 5 fields / 36,193 documents
Claim: less than 30 fields / 335,349 documents
ClaimDiagnosis: 2 fields / 8,676 documents

We created element range index for all the fields, and created 4 views that 
each represent a SQL table.

Then we were able to use SQL to query the data and get the result in the query 
console.

But the performance is not nearly as what we expected, we thought it would

Re: [MarkLogic Dev General] SQL Query Performance and Tableau Connection

2015-04-28 Thread Walt Rolle
Sorry for the duplicate here, I had to subscribe first...

From: Walt Rolle
Sent: Tuesday, April 28, 2015 11:24 AM
To: 'Brad Rix'; MarkLogic Developer Discussion; Cynthia Jiang
Cc: Scott Fowler; 'sara.ma...@marklogic.com'
Subject: RE: [MarkLogic Dev General] SQL Query Performance and Tableau 
Connection

Hi Brad,
Thanks for the tip. Also it looks like it's going to make sense for us just to 
store the primary address at the patient level anyway (so no joins).

I wanted to mention some conclusions we arrived at after talking to Sara.


*If we want to use SQL at all, then we can't have repeating rows in our 
document structure, or at least need to name the first one in the collection 
something different to separate it.

*Getting SQL to work in our situation is about supporting ODBC 
connectivity options such as ODBC to Tableau. We can simplify the data model by 
de-normalizing a bit, such as sticking to just primary address and active 
eligibility. After that, we supply Tableau with our own specific SQL statement 
to limit the columns and number of tables. Group by / order by / functions 
are best left to Tableau afterwards, so our source queries are simple.

*MarkLogic lends itself better to document search, which would use 
things like XQuery or Javascript to screen the documents and return patient 
documents that match the criteria. Because we broke apart our model to support 
SQL, we take a performance hit here in having to do joins or store data in 
duplicate locations. The other option would be to keep the large hierarchical 
documents and ETL a subset off to a data mart, if the customer wants to do 
relational things against it.

Because the customer has a lot of existing code and internal capability around 
the traditional SQL Server stack, we're thinking of having our model driven 
towards supporting SQL and NoSQL techniques simultaneously - at least as the 
first step of the roadmap. In the future, it would not be much effort to 
generate new documents that are complete hierarchies of patient history. To do 
this we need to separate objects by grain.


1.  Patient (with primary address and active eligibility)

2.  Claim Header (consider storing diagnosis codes as repeating rows in one 
field, it's essentially factless)

3.  Claim Revenue (this is line item detail with dollar amounts)

4.  Claim Procedure (another type of line item detail with dollar amounts)

5.  Care Provider (specialty may need to be a field level collection like 
diagnosis)

6.  Facility

We'll have to revisit these decisions when we get more time to spend on looking 
at the analytics use cases to make sure we always choose the right tool for the 
job. As far as deep amounts of storage for terabytes of data, MarkLogic seems 
to be a good landing place for the variety of incoming claims files, either 
before or after we standardize them or both.

Thanks,
Walt

Walter Rolle, PMP
Project Manager
ro...@rdacorp.commailto:ro...@rdacorp.com
o: (770) 884.4512 | m: (678) 722.1549 | f: (770) 884.4512
2475 NorthWinds Parkway, Suite 200
Alpharetta, GA 30009-4807
[in]http://www.linkedin.com/in/wrolle[cid:image004.png@01CDD2D0.8A8D29B0]http://twitter.com/rdadevteam[cid:image003.jpg@01CDD2D0.8A8D29B0]http://www.facebook.com/pages/RDA-Corporation/9054123
www.rdacorp.comhttp://www.rdacorp.com/

From: Brad Rix [mailto:brad@flatironssolutions.com]
Sent: Tuesday, April 28, 2015 10:45 AM
To: MarkLogic Developer Discussion; Cynthia Jiang
Cc: Scott Fowler; Walt Rolle
Subject: RE: [MarkLogic Dev General] SQL Query Performance and Tableau 
Connection

One comment about your specific query.You have a left outer join against 
the patientaddress table.  This can actually be written as an inner join with 
the same results since you are restricting the right side of the outer join 
with the where clause.   You can try your query as an inner join and see if 
that improves your performance.The relational database engine probably 
optimized the query by re-writing it as an inner join.If you did not have 
the a.State='LA', then it would have to remain an outer join.   The left 
outer join might be greatly effecting performance.Since you are also using 
group by and order by, the total amount of results might also effect the 
performance.

select c.Patientid, ClaimNumber, FirstName, LastName, sum (PharmacyTotalCost) 
as PharmacyTotalCost
  from claim c inner join patient p on c.patientid = p.patientid
  left outer join patientaddress a on p.patientid = a.patientid
  where a.[State] = 'LA'
  group by  c.PatientId, ClaimNumber, FirstName, LastName  order by 
SUM(PharmacyTotalCost) desc limit 10

Note:  You can see a reference in mysql as an example about when outer joins 
can be re-written to inner joins:  
https://dev.mysql.com/doc/refman/5.0/en/outer-join-simplification.html

Brad Rix | Technical Lead
O: +1 303 542 2172 | M: +1 303 915 2771
Skype: Brad.Rix | Google Talk

Re: [MarkLogic Dev General] SQL Query Performance and Tableau

2015-04-28 Thread Freddy Holwerda



Dear Sara,


Thank you for your elaborate reply
on this topic. I am also investigating a scenario in which SQL would be
used as entry point to ML.


Is there any document available that actually describes the
practices you refer to, eg. concerning the data model, which operations to do
client/server side and how that relates to minimizing the data that comes back,
translating an SQL scenario to a XQuery/Javascript scenario, etc.. Specific to 
your reply below: I do assume that doing the grouping at the side of Tableau 
instead of ML, would actually maximize, not minimize the data retrieved from 
ML? Thank you in advance, Freddy Holwerda


 
 From: general-requ...@developer.marklogic.com
 Subject: General Digest, Vol 130, Issue 106
 To: general@developer.marklogic.com
 Date: Mon, 27 Apr 2015 22:52:56 -0700
 
 Send General mailing list submissions to
   general@developer.marklogic.com
 
 To subscribe or unsubscribe via the World Wide Web, visit
   http://developer.marklogic.com/mailman/listinfo/general
 or, via email, send a message with subject or body 'help' to
   general-requ...@developer.marklogic.com
 
 You can reach the person managing the list at
   general-ow...@developer.marklogic.com
 
 When replying, please edit your Subject line so it is more specific
 than Re: Contents of General digest...
 
 
 Today's Topics:
 
1. SQL Query Performance and Tableau Connection (Cynthia Jiang)
2. Re: SQL Query Performance and Tableau   Connection (Sara Mazer)
3. Is data also store in forest (Shashidhar Rao)
 
 
 --
 
 Message: 1
 Date: Mon, 27 Apr 2015 20:56:29 +
 From: Cynthia Jiang ji...@rdacorp.com
 Subject: [MarkLogic Dev General] SQL Query Performance and Tableau
   Connection
 To: General@developer.marklogic.com
   General@developer.marklogic.com
 Cc: Walt Rolle ro...@rdacorp.com
 Message-ID:
   778d42e304ff4316bd367fbcae250...@corpexch-mb1.rdacorp.com
 Content-Type: text/plain; charset=us-ascii
 
 Hello,
 
 We have been testing MarkLogic 8 downloaded from your website for a POC.
 
 Our environment is very straight forward.
 
 A single VM (windows 7 Enterprise - 64bit) with 16GM RAM and 2 Intel (E5-2690 
 v2) 3GHz processors.
 
 We have around 4 SQL tables that we modeled in Json format so each row ends 
 up becoming a document, and here are some of the stats:
 
 Patient : 8 fields / 20,924 documents
 PatientAddress: 5 fields / 36,193 documents
 Claim: less than 30 fields / 335,349 documents
 ClaimDiagnosis: 2 fields / 8,676 documents
 
 We created element range index for all the fields, and created 4 views that 
 each represent a SQL table.
 
 Then we were able to use SQL to query the data and get the result in the 
 query console.
 
 But the performance is not nearly as what we expected, we thought it would 
 work faster than SQL environment.
 
 A simple query like the one below took 30 seconds to run, and the same query 
 took 3 seconds to run with a larger data set in a SQL environment with 6GB 
 memory and 4 processors.
 
 select c.Patientid, ClaimNumber, FirstName, LastName, sum (PharmacyTotalCost) 
 as PharmacyTotalCost
 from claim c inner join patient p on c.patientid = p.patientid  left outer 
 join patientaddress a on p.patientid = a.patientid
 where a.[State] = 'LA'
 group by  c.PatientId, ClaimNumber, FirstName, LastName  order by 
 SUM(PharmacyTotalCost) desc limit 10
 
 We also tested out the connection between MarkLogic and Tableau 9 trial 
 version through MarkLogic 64 bit ODBC connector on the same VM.
 The connection is not stable, got lost very frequently, and sometimes it 
 won't stop running for 20 minutes after adding one dimension and one simple 
 measure the columns and rows shelves.
 
 We are under a very tight timeline and have to report our findings back to 
 our client within 24 hours. With the current performance, we will not be able 
 to recommend any MarkLogic product to our client.
 
 Please help us if there is anything that we could do to improve the SQL query 
 performance and data connectivity between Tableau and MarkLogic.
 
 Also, for the query as the simple example above, is there any XQuery or 
 Javascript Query that we can use that could produce the similar results? We 
 cannot find any documented examples.
 
 Any help will be really appreciated!
 
 Thank you very much,
 
 Cynthia Jiang
 RDA Corpration
 
 
 -- next part --
 An HTML attachment was scrubbed...
 URL: 
 http://developer.marklogic.com/pipermail/general/attachments/20150427/6baf58a2/attachment-0001.html
  
 
 --
 
 Message: 2
 Date: Tue, 28 Apr 2015 00:29:37 +
 From: Sara Mazer sara.ma...@marklogic.com
 Subject: Re: [MarkLogic Dev General] SQL Query Performance and Tableau
   Connection
 To: MarkLogic Developer Discussion general@developer.marklogic.com,
   ji...@rdacorp.com ji...@rdacorp.com
 Cc: Scott Fowler scott.fow

[MarkLogic Dev General] SQL Query Performance and Tableau Connection

2015-04-27 Thread Cynthia Jiang
Hello,

We have been testing MarkLogic 8 downloaded from your website for a POC.

Our environment is very straight forward.

A single VM (windows 7 Enterprise - 64bit) with 16GM RAM and 2 Intel (E5-2690 
v2) 3GHz processors.

We have around 4 SQL tables that we modeled in Json format so each row ends up 
becoming a document, and here are some of the stats:

Patient : 8 fields / 20,924 documents
PatientAddress: 5 fields / 36,193 documents
Claim: less than 30 fields / 335,349 documents
ClaimDiagnosis: 2 fields / 8,676 documents

We created element range index for all the fields, and created 4 views that 
each represent a SQL table.

Then we were able to use SQL to query the data and get the result in the query 
console.

But the performance is not nearly as what we expected, we thought it would work 
faster than SQL environment.

A simple query like the one below took 30 seconds to run, and the same query 
took 3 seconds to run with a larger data set in a SQL environment with 6GB 
memory and 4 processors.

select c.Patientid, ClaimNumber, FirstName, LastName, sum (PharmacyTotalCost) 
as PharmacyTotalCost
from claim c inner join patient p on c.patientid = p.patientid  left outer join 
patientaddress a on p.patientid = a.patientid
where a.[State] = 'LA'
group by  c.PatientId, ClaimNumber, FirstName, LastName  order by 
SUM(PharmacyTotalCost) desc limit 10

We also tested out the connection between MarkLogic and Tableau 9 trial version 
through MarkLogic 64 bit ODBC connector on the same VM.
The connection is not stable, got lost very frequently, and sometimes it won't 
stop running for 20 minutes after adding one dimension and one simple measure 
the columns and rows shelves.

We are under a very tight timeline and have to report our findings back to our 
client within 24 hours. With the current performance, we will not be able to 
recommend any MarkLogic product to our client.

Please help us if there is anything that we could do to improve the SQL query 
performance and data connectivity between Tableau and MarkLogic.

Also, for the query as the simple example above, is there any XQuery or 
Javascript Query that we can use that could produce the similar results? We 
cannot find any documented examples.

Any help will be really appreciated!

Thank you very much,

Cynthia Jiang
RDA Corpration


___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] SQL Query Performance and Tableau Connection

2015-04-27 Thread Sara Mazer
Hi Cynthia,
My name is Sara Mazer, and I am a solutions consultant in the DC area.  I've 
worked a lot with Tableau and MarkLogic and would like to set up some time to 
discuss your evaluation and give you some best practices so that you can get 
the best performance possible.  I was the person who worked with Tableau to add 
MarkLogic as a named connection and supported their certification of MarkLogic, 
and I think I can help you.

I have a too many best practices to go over by email, so I'd like to speak with 
you and walk you through setting up databases for successful analytics through 
WebEx.  Would that be possible?  I have a meeting from 9:30-11:00 but can move 
any other meeting I have tomorrow to suit your schedule, just let me know a 
good time for you.

In general, I plan on:
Recommending the best data model for MarkLogic of this type of data (hint: it 
doesn't look like what would be in an RDBMs)
Showing you a demo of patient and claims data with Tableau and review how it 
was done
Discussing the best practices of working with Tableau (minimize the data coming 
back from MarkLogic by using custom SQL for charts, using MATCH keyword, having 
Tableau do grouping/sorting instead of in SQL)
Reviewing common connection issues between MarkLogic and Tableau and how to 
avoid them (most likely you are corrupting your SQL views:a best practice is to 
have a unique schema database for each MarkLogic database)
Reviewing performance of XQuery/JavaScript vs. SQL in MarkLogic

Best regards,
Sara Mazer
sara dot mazer at marklogic dot com

This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.



From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Cynthia Jiang
Sent: Monday, April 27, 2015 4:56 PM
To: General@developer.marklogic.com
Cc: Walt Rolle
Subject: [MarkLogic Dev General] SQL Query Performance and Tableau Connection

Hello,

We have been testing MarkLogic 8 downloaded from your website for a POC.

Our environment is very straight forward.

A single VM (windows 7 Enterprise - 64bit) with 16GM RAM and 2 Intel (E5-2690 
v2) 3GHz processors.

We have around 4 SQL tables that we modeled in Json format so each row ends up 
becoming a document, and here are some of the stats:

Patient : 8 fields / 20,924 documents
PatientAddress: 5 fields / 36,193 documents
Claim: less than 30 fields / 335,349 documents
ClaimDiagnosis: 2 fields / 8,676 documents

We created element range index for all the fields, and created 4 views that 
each represent a SQL table.

Then we were able to use SQL to query the data and get the result in the query 
console.

But the performance is not nearly as what we expected, we thought it would work 
faster than SQL environment.

A simple query like the one below took 30 seconds to run, and the same query 
took 3 seconds to run with a larger data set in a SQL environment with 6GB 
memory and 4 processors.

select c.Patientid, ClaimNumber, FirstName, LastName, sum (PharmacyTotalCost) 
as PharmacyTotalCost
from claim c inner join patient p on c.patientid = p.patientid  left outer join 
patientaddress a on p.patientid = a.patientid
where a.[State] = 'LA'
group by  c.PatientId, ClaimNumber, FirstName, LastName  order by 
SUM(PharmacyTotalCost) desc limit 10

We also tested out the connection between MarkLogic and Tableau 9 trial version 
through MarkLogic 64 bit ODBC connector on the same VM.
The connection is not stable, got lost very frequently, and sometimes it won't 
stop running for 20 minutes after adding one dimension and one simple measure 
the columns and rows shelves.

We are under a very tight timeline and have to report our findings back to our 
client within 24 hours. With the current performance, we will not be able to 
recommend any MarkLogic product to our client.

Please help us if there is anything that we could do to improve the SQL query 
performance and data connectivity between Tableau and MarkLogic.

Also, for the query as the simple example above, is there any XQuery or 
Javascript Query that we can use that could produce the similar results? We 
cannot find any documented examples.

Any help will be really appreciated!

Thank you very much,

Cynthia Jiang
RDA Corpration


___
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general