I have a 154Gig file representing a data dump from MySQL that I want to
load into MarkLogic and analyze.
When I use the flow editor to collect/load this file into an empty
database, it takes 33 seconds.
When I add two delete element transforms to the flow the load fails with a
timeout error
Todd,
RecordLoader and CoRB are useful tools for bulk loading and processing,
respectively, and are on the MarkLogic developer site.
Typically, XML documents in MarkLogic correspond to rows rather than tables, so
it may be ideal to use RecordLoader's RECORD_NAME configuration property to
This advice repeats a recommendation I saw earlier tonight during some of
my research, namely that with MarkLogic it's better to break up documents
into smaller fragments. I guess there's a performance gain in bursting a
document into small fragments, something to do with concurrency and locking
Hi Todd,
It is mostly because of two reasons: memory footprint, and indexing.
If you don’t have fragmentation enabled in the database configuration, then
the entire document is one fragment of 150Gb. Any processing on fragments
mean that the entire fragment is loaded into memory. Luckily
Hi Todd,
I know a few tricks that could help getting this done with information
studio. One of which is putting your XQuery in a custom XQuery transform.
But you need to copy things like collection from the input file, and some
other properties as well, to make sure resulting files are treated
This is my second day spent working with MarkLogic, having just come back
this week from XMLPrague. So everything in my system to date is default
configuration, straight out of the box. I have seen the Fragment Roots
and Fragment Parents nodes listed under my database in the configure
database