After adding some JMX instrumentation, I can clearly see that the time is being spent in the session.save() called when initially creating the document node, and then calling vm.checkin(newNode.getPath()) after that. These slowed way down as more and more nodes were added to the repository.
1) Is it necessary to do the above in two steps, or can a Node be created and checked in with the VersionManager in one shot? 2) What is actually happening in terms of indexing when I do the above? Is there any way to / would it be useful to temporarily disable any on-the-fly indexing during a bulk import and run it later? 3) Does GC / compaction in the NodeStore come into play when adding nodes? Same question -- is there a way to / would it be useful to disable anything related to those during a bulk import and perform them offline after? 4) Is there anything inherent in the SegmentNodeStore that would decrease in performance as the repository grows? Thanks! - Bill On Fri, Mar 16, 2018 at 12:30 PM, William Markmann < [email protected]> wrote: > Is there any reason I'd see: > > 2018-03-15 21:48:54.673 INFO 20475 --- [ex-update-async] > o.a.j.oak.plugins.index.IndexUpdate : Incremental indexing Traversed > #10000 /NJ Foreclosure/342CKA-IANWK/SCRA SEARCHES-AAMZG/FRCL201604PTI00 > 003XEGT-20160425-4351400/jcr:content [Infinity nodes/s, Infinity nodes/hr] > > ...regularly at the outset, but it stops appearing after a certain point? > > If I take a thread-dump after it starts slowing down, I see the worker > threads (usually all but one once slow-down starts) parked at: > > "pool-5-thread-3" #25772 prio=5 os_prio=0 tid=0x000000000201e000 > nid=0xc0c0 waiting on condition [0x00007f42486cc000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000004c1327098> (a java.util.concurrent. > Semaphore$FairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at java.util.concurrent.locks.AbstractQueuedSynchronizer. > acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:312) > at org.apache.jackrabbit.oak.segment.scheduler. > LockBasedScheduler.schedule(LockBasedScheduler.java:217) > at org.apache.jackrabbit.oak.segment.SegmentNodeStore. > merge(SegmentNodeStore.java:195) > at org.apache.jackrabbit.oak.core.MutableRoot.commit(MutableRoot.java:248) > at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate. > commit(SessionDelegate.java:347) > at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate. > commit(SessionDelegate.java:360) > at org.apache.jackrabbit.oak.jcr.version.ReadWriteVersionManager.checkin( > ReadWriteVersionManager.java:129) > at org.apache.jackrabbit.oak.jcr.delegate.VersionManagerDelegate.checkin( > VersionManagerDelegate.java:67) > at org.apache.jackrabbit.oak.jcr.version.VersionManagerImpl$7. > perform(VersionManagerImpl.java:371) > at org.apache.jackrabbit.oak.jcr.version.VersionManagerImpl$7. > perform(VersionManagerImpl.java:362) > at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate. > perform(SessionDelegate.java:208) > at org.apache.jackrabbit.oak.jcr.version.VersionManagerImpl. > checkin(VersionManagerImpl.java:362) > > > When I'm inserting documents at the very beginning (no content yet), the > individual threads don't park at that state for nearly as long... > > > > On Fri, Mar 16, 2018 at 11:55 AM, Julian Reschke <[email protected]> > wrote: > >> On 2018-03-16 16:46, William Markmann wrote: >> >>> Folders are: *org.apache.jackrabbit.JcrConstants.NT_FOLDER* >>> Documents are: >>> >>> Binary fileBinary = >>> session.getValueFactory().createBinary(new >>> ByteArrayInputStream(data)); >>> Node newFile = parentNode.addNode(filename, >>> *JcrConstants.NT_FILE*); >>> newFile.addMixin(*JcrConstants.MIX_VERSIONABLE*); >>> Node docContents = >>> newFile.addNode(*JcrConstants.JCR_CONTENT*, >>> *JcrConstants.NT_RESOURCE*); >>> // docContents.setProperty(JcrConstants.JCR_MIMETYPE, >>> getMimeType(filename, getFileExtension(filename))); >>> docContents.setProperty(JcrConstants.JCR_MIMETYPE, >>> FileUtils.getMimeType(FileUtils.getFileExtension(filename))); >>> docContents.setProperty(JcrConstants.JCR_ENCODING, ""); >>> docContents.setProperty(JcrConstants.JCR_DATA, >>> fileBinary); >>> >>> Is there a better choice? >>> ... >>> >> >> I was worried the folder might have "orderable" child nodes, which >> creates a significant overhead. But AFAIR that is not the case for >> nt:folder (but you may want to check). >> >> Best regards, Julian >> >> PS: I wouldn't set JCR_ENCODING if that information isn't present. >> > > > > -- > *Bill Markmann* > *President | 866 809 0394 x 701* > *Counterpoint Consulting* > *Automate. Innovate. Accelerate.* > c20g.com | *Blog <http://www.c20g.com/site/blog> **| Linkedin > <http://www.linkedin.com/company/counterpoint-consulting-inc.>** | > Twitter <https://twitter.com/c20g>* > -- *Bill Markmann* *President | 866 809 0394 x 701* *Counterpoint Consulting* *Automate. Innovate. Accelerate.* c20g.com | *Blog <http://www.c20g.com/site/blog> **| Linkedin <http://www.linkedin.com/company/counterpoint-consulting-inc.>** | Twitter <https://twitter.com/c20g>*
