Hey Alan

see inline
(ccing opengrok-dev, just for having this info sent to more dev folks in case they want to chime in)

On 8.2.2011 18:31, ALAN KAPLAN, BLOOMBERG/ 731 LEXIN wrote:
hi Lubos, I was wondering if you could answer these questions for me?

1. When you remove something from your source (ie. an entire directory), when
i did the index it said it was removing a bunch of stale files, and it created 
the index. So should I assume that the index no longer contains this info, for 
the deleted directory? however, there is an entry in the historycache dir, and 
the index dir. can I safely just remove that? It seems this doesn't get cleaned 
up? please advise how to proceed here. thanks.

if it's a top folder("project") and
if the folder doesn't exists in src, then feel free do remove it

if it's a subdirectory of a project and if it also doesn't exist in src anymore, you can do the same imho (eventually for the time being, move it aside, to be safe ... if no probs will arise in a week, then delete)

afaik opengrok will just remove the files from lucene index (hence you will not be able to search for them), but all the cache (xrefs and history) will probably stay behind - feel free to file a bug on this

2. I see there is a new OpenGrok version .10, can I just install it and run 
instead of .9 without having to reindex my entire project? When I initially 
created the project the job ran for a week. Now it does incremental updates in 
about 1-2 hours/day.. Can I continue doing this with the new version? or do I 
have to redo it from scratch?

so ... you should probably follow kahs notes - http://blogs.sun.com/kah/entry/opengrok_0_10

you can most probably use 0.10 without reindex, BUT
navigate will not work for sure + you will not be able to leverage fixes done to xref cache on existing files until they change (obviously new files will be generated with new xref format)

so ... index from scratch is seriously recommended

if you don't mind, I'd rather elaborate on the problem of opengrok running for a week than running it without new features, hmm?

e.g. we have several opengrok instances, and some of them index A LOT of sources with different SCMs The biggest problem so far was with cvs history regeneration, the rest was quite fast. On one of our biggest servers (~20G of sources - hg,svn,teamware(sccs),cvs used) indexing from scratch takes roughly 2 days and that's only because of bsd historycache, which runs from remote bsd servers which are VERY SLOW. What we eventually do is that we temporarily disable historycache for some of the projects we know that they are slow to generate(e.g. for bsd, openssl), then we let opengrok do its stuff and most of the source is ready. Then we enable history again for the slower scms and reindex when there is spare time - e.g. over next weekend - this way the downtime is minimal. By disabling of history I mean e.g. moving CVS aside, or moving .svn aside, so opengrok will not detect the scm and will not try to use it.

Another low downtime (~5mins) can be achieved by having opengrok metadata on a zfs dataset. You can run 0.10 indexer with different target dir (which is another dataset) - OPENGROK_INSTANCE_BASE variable can be used for that Once indexing is done, you just stop tomcat/glassfish, do zfs rename of old to some backup, new to default one, then copy over the war (or do OpenGrok deploy, ev. with OPENGROK_TOMCAT_BASE)
and start the container anew (~ 5min)

let me know if you want to pursue the long indexing problem, we can eventually improve this time ...

xing the fingers
Lubos

Any assistance is greatly appreciated. thanks. --Alan

_______________________________________________
opengrok-dev mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opengrok-dev

Reply via email to