*Verzonden:* zondag 26 februari 2012 1:05
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Processing Large Documents?
Ah, of course. Compression. I looked to see that the legacy system has
SQL statements for insert and select of the table doc_fil and it is calling
I believe I've found an answer to my question where did my namespaced
attribute xsi:type='xs:hexBinary' go?
I declared the two namespaces in my XQuery code:
declare namespace xs = http://www.w3.org/2001/XMLSchema;;
declare namespace xsi = http://www.w3.org/2001/XMLSchema-instance;;
And then I
Here is what I've accomplished this weekend. My next step in this process
is to scrub the foreign keys from the database. The foreign keys have the
pattern {table}_id, so for instance the field /xyz/usr_id would reference
the primary document /usr/id. Leveraging this pattern, I want to
My question is, what happened to the /file_blob/@xsi:type attribute. Was
it interpreted by MarkLogic or just discarded or is it there but my query
fails to properly ask for it?Could it be that the element is already
known by MarkLogic as hexBinary and so a cast to hexBinary is going in the
*Van:* general-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *Todd Gochenour
*Verzonden:* zaterdag 25 februari 2012 18:54
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Processing Large Documents?
My question is, what
row
field name=id32/field
field name=doc_rep_id1/field
field name=doc_fld_id1/field
field name=fil_version2/field
field name=upload_usr_id1/field
field name=upload_date2006-11-01 15:26:34/field
field name=mime_typeapplication/excel/field
field name=abstractxls/field
field
...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *Todd Gochenour
*Verzonden:* zaterdag 25 februari 2012 19:06
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Processing Large Documents?
row
field name=id32/field
field name=doc_rep_id1
Ah, of course. Compression. I looked to see that the legacy system has
SQL statements for insert and select of the table doc_fil and it is calling
compress() and uncompress(). I found this on google, InnoDB implements
compression with the help of the well-known zlib
It's time for me to pick this project up now that the work week has
passed.
I'm attempting to implement Michael Blakeley's recommendation to move the
SQL blob content into it's own document as part of this initial load/chunk
phase. Here's how I see the strategy. As I iterate through each record
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Processing Large Documents?
It's time for me to pick this project up now that the work week has
passed.
I'm attempting to implement Michael Blakeley's recommendation to move the
SQL blob content into it's own
Here's an update. Turns out there's another database I need to port to
XQuery. The first one was admin. The second one has attachments stored as
blobs in the database, so I turned on the hex-blob option in mysqldump to
get a 537MB database extract. The blobs were marked up with the attribute
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Processing Large Documents?
Here's an update. Turns out there's another database I need to port to XQuery.
The first one was admin. The second one has attachments stored as blobs in
the database, so I turned on the hex-blob
Great. Good to hear that the database elements and attributes are indexed
by default. eXistDB by default does the same.
I'm looking at the Information Studio/Application Services/Database
Settings page and wondering then what these options provide in addition to
the default index.
:52 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Processing Large Documents?
Great. Good to hear that the database elements and attributes are indexed by
default. eXistDB by default does the same.
I'm looking at the Information Studio/Application Services/Database
Developer Discussion general@developer.marklogic.com
Sent: Tuesday, February 21, 2012 11:02:42 AM
Subject: Re: [MarkLogic Dev General] Processing Large Documents?
Here I am going further from my area of expertise so please buyer beware.
The basic indexing options provide for a limited set
The options on that page offer a limited subset of common index settings that
customers often enable to support application features. Support for wildcard
queries is a good example; there are multiple indexes over and above default
indexes that can be added for best results, and the checkbox on
As David said, you probably won't need to make any changes to the index config
for some time. Mostly folks make changes to tweak full-text search capabilities.
But I thought I'd point out that you can check the evaluation of an XPath. Note
that I added a missing '@' to your original expression.
I know I said 'attributes' in my original question but the example was
correct, id's are child elements, not attributes.
I assume something like xdmp:plan(//usr[id='123']) or the generic
xdmp:plan(//*[id='123']) is already indexed?
___
General mailing
-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *Todd Gochenour
*Verz**onden:* maandag 20 februari 2012 8:47
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Processing Large Documents?
This is my second day spent working
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Processing Large Documents?
This advice repeats a recommendation I saw earlier tonight during some of my
research, namely that with MarkLogic it's better to break up documents into
smaller fragments. I guess there's a performance
Day three. President's day. I will first chunk the data for each row as
this will improve concurrency. I gather I will need to generate random
document names for each chunk and put these documents in a collection using
the name of the database as the folder name.I see the terms Forest and
Oops, I just realized that when I said 154 Gigabytes I should have said 154
Megabytes. My first transformation reduces this to 6 Megabytes. Big
difference, yes?
Todd
___
General mailing list
General@developer.marklogic.com
Ignore forests and stands for now. Those are physical storage artifacts,
completely orthogonal to collections.
One difference you may note to existdb is that a document can be in many
collections at the same time. As I understand it, existdb collections act sort
of like filesystem directories.
The XQuery I have for performing the chunking is timing out after 9 minutes
(running in the query console). There are 156000 'rows' total in this
extract. I'm now reading the Developer's guide for Understanding
Transactions to figure out how I might optimize this query. My query
reads:
I could do the denormalization work in SQL, but this would be a tedious
manual process. My hope with XQuery is that I can analyse the structure
and do this process automatically. Then I'd have a generic algorithim
which can be applied to other databases.
Developer Discussion
Subject: Re: [MarkLogic Dev General] Processing Large Documents?
The XQuery I have for performing the chunking is timing out after 9 minutes
(running in the query console). There are 156000 'rows' total in this
extract. I'm now reading the Developer's guide
You can raise the time limit:
http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/admin/http.xmlquery=request+timeout
Default Time Limit specifies the default value for any request's time limit,
when otherwise unspecified. A request can change its time limit
Hmm... I didn't see any joins or denormalization in the XQuery you posted most
recently. So maybe we are talking at cross-purposes? Is your denormalization
simply changing the row elements to elements named after table_data/@name? If
so, I can see why that would be tedious: relational systems
The de-normalization phase will happen in a subsequent pass across the
data. Now I'm just trying to get the chunking and refactor of
names accomplished in this first pass.
I put the original MySQL datadump into the database so that I could perform
queries against it. It only took 31 seconds to
Michael's last example with spawning almost worked. The generated document
name for each record re-used the same table index, so I was left with only
45 documents in the end. I changed the XQuery to read:
(: query console :)
for $table in
I have a 154Gig file representing a data dump from MySQL that I want to
load into MarkLogic and analyze.
When I use the flow editor to collect/load this file into an empty
database, it takes 33 seconds.
When I add two delete element transforms to the flow the load fails with a
timeout error
, 2012 7:59 PM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Processing Large Documents?
I have a 154Gig file representing a data dump from MySQL that I want to load
into MarkLogic and analyze.
When I use the flow editor to collect/load this file into an empty database
This advice repeats a recommendation I saw earlier tonight during some of
my research, namely that with MarkLogic it's better to break up documents
into smaller fragments. I guess there's a performance gain in bursting a
document into small fragments, something to do with concurrency and locking
Gochenour
*Verzonden:* maandag 20 februari 2012 7:57
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* Re: [MarkLogic Dev General] Processing Large Documents?
This advice repeats a recommendation I saw earlier tonight during some of
my research, namely that with MarkLogic it's better to break up
:00
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* [MarkLogic Dev General] Processing Large Documents?
I have a 154Gig file representing a data dump from MySQL that I want to
load into MarkLogic and analyze.
When I use the flow editor to collect/load this file into an empty
database
Gochenour
*Verzonden:* maandag 20 februari 2012 2:00
*Aan:* MarkLogic Developer Discussion
*Onderwerp:* [MarkLogic Dev General] Processing Large Documents?
I have a 154Gig file representing a data dump from MySQL that I want to
load into MarkLogic and analyze.
When I use the flow editor
36 matches
Mail list logo