The index I'm trying to define is for a full text search. So most properties on the new content types I am creating will contain human readable content would be added to this index, like URL references, notes, alt text, headers...
I'll try just using the JCR API, shouldn't be too hard since the source for IndexUtils contains a lot of the property names needed to generate the index. Thanks Davide I am still interested in what to do with a NodeBuilder after it is generated. Guess merging would probably work if I wasn't using a DocumentNodeStore and instead using a SegmentNodeStore (but I'm not seeing any way to use a SNS with MongoDB).