Thank you guys for your comments. On Fri, Jun 9, 2017 at 9:50 AM, Ian Boston <[email protected]> wrote:
> Hi, > Assuming the MongoDB instance is performing well and does not show any slow > queries in the mongodb logs, running the index operation on many cores, > each core handling one index writer should parallelise the operation. IIRC > this is theoretically possible, and might have been implemented in the > latest versions of Oak (Chetan?). If you are in AWS then a X1 instance will > give you 128 cores and upto 2TB of ram, for the duration of the re-index. > Other cloud vendors have equivalent VMS. Whatever the instance is, the Oak > cluster leader should be allocated to this instance as IIRC only the Oak > cluster leader performs the index operation. The single threaded index > writer is a feature/limitation of the way Lucene works, but Oak has many > independent indexes. your deployment may not have 128 so may not be able to > use all the cores of the largest instance. > > If however, the MongoDB cluster is showing any signs or slow queries in the > logs (> 100ms), any level of read IO then however many cores over however > many VMs wont speed the process up and may slow the process down. To be > certain of no bottleneck in MongoDB, ensure the VM has more memory than the > disk size of the database. The latest version of MongoDB supported by Oak, > running WiredTiger will greatly reduce memory pressure and IO as it doesnt > use memory mapping as the primary DB to disk mechanism, and compresses the > data as it writes. > > The instance running Oak must also be sized correctly. I suspect you will > be running a persistent cache which must be sized to give optimum > performance and minimise IO, which then also requires sufficient memory. > For the period of the re-index, the largest AEM instance you can afford > will minimise IO. Big VMs (in AWS at least) have more network bandwidth > which also helps. > > Finally disks. Dont use HDD, only use SSD and ensure that there is > sufficient IOPS available at all times, and enable all the Oak indexing > optimisation switches (copyOnRead, copyOnWrite etc) > > IO generally kills performance, and if the VMs have not been configured > (THP off, readhead low, XFS or noatime ext4 disks) then that IO will be > amplified. > > If you have done all of this, then you might have to wait for OAK-6246 (I > see Chetan just responded), but if you haven't please do check that you are > running as fast as possible with no constrained resources. > > HTH, if its been said before sorry for the noise and please ignore. > Best Regards > Ian > > On 9 June 2017 at 07:49, Alvaro Cabrerizo <[email protected]> wrote: > > > Thanks Chetan, > > > > Sorry, but that part is out of my reach. There is an IT team in charge of > > managing the infrastructure and make optimizations, so It is difficult to > > get that information. Basically what is was looking for is the way > > to parallelize the indexing process. On the other hand, reducing the > > indexing time would be fine (it was previously reduced from 7 to 2 days), > > but I think that traversing more than 100000000 nodes is a pretty tough > > operation and I'm not sure if there is much we can do. Anyway, any > pointer > > related to indexing optimization or any advice on how to design the repo > > (e.g. use different paths to isolate different groups of assets, use > > different nodetypes to differentiate content type, create different > > repositories [is that possible?] for different groups of uses...) is > > welcome. > > > > Regards. > > > > On Thu, Jun 8, 2017 at 12:44 PM, Chetan Mehrotra < > > [email protected]> > > wrote: > > > > > On Thu, Jun 8, 2017 at 4:04 PM, Alvaro Cabrerizo <[email protected]> > > > wrote: > > > > It is a DocumentNodeStore based instance. We don't extract data from > > > binary > > > > files, just indexing metadata stored on nodes. > > > > > > In that case 48 hrs is a long time. Can you share some details around > > > how many nodes are being indexed as part of that index and the repo > > > size in terms of Mongo stats if possible? > > > > > > Chetan Mehrotra > > > > > >
