I think we have a bug for both concerns you raised:
- log the exception when the indexer thread dies ( I think I even have a patch for this somewhere, need to look around) - log file names before and/or after they are indexed (I think this logger is turned off by default)

so if it is a html, then the indexer was correct, we index html files using xmlanalyzer (if I am not wrong from top of my head) ok, so I will try to generate a 400MB html file, name it with .xls and try to index it

I expect the analyzer runs out of memory, which is a bit odd, since the default OpenGrok script allocates 2G of memory for indexing ...
anyway, this looks like a problem with the xml analyzer ...
if you want you can file a bug, but I will try to reproduce and file one myself later this week

as said, for now use -i option passed to java -jar opengrok.jar call to get this filename to ignore list and the indexer should proceed

thnx for report
Lubos

On 8.10.2011 0:20, Joshua J. Kugler wrote:
On Friday, October 07, 2011, Lubos Kosco elucidated thus:
so first odd thing is why does it use an xml analyzer on xls file
Well, because it was in fact an HTML file that someone had given an .xls
extension.  It does look like it was an excel export, but it was an HTML
file. And it's 373MB

So, the next question is:

1) Why is it dying?
2) Why is it dying with no errors, killing the indexing, and making it
look like everything completed successfully?

can you share that xls file so we can reproduce ?
I will see...it might be company-internal data, so I'm not sure.

as workaround you can add this file to ignore list
(I think there is an option for this , which is also not in Opengrok,
just add it there
-i or -I , check command line options -
http://src.opensolaris.org/source/xref/opengrok/trunk/src/org/opensol
aris/opengrok/index/CommandLineOptions.java#69 )
This could get tedious if there are logs of files that kill the indexer.
:)

Thanks again!

j

thnx
L

On 7.10.2011 20:31, Joshua J. Kugler wrote:
First off: Thanks to Lubos and Trond for all the help and pointers
so far.

So, I have a pretty standard install (I think).  I have:

/var

   /opengrok

    /data
    /etc
    /log
    logging.properties
    /src

     /my_svn # a full checkout of my SVN repository

/var/opengrok (and everything under it) is owned by the user
running the OpenGrok script.

data/ is empty

OpenGrok starts, takes a LOOOONG time to do the history cache; but
this is 25,000+ commits, so that's not too bad.

It starts indexing, and then I see this:

10:43:42 INFO: Add: /crchealth_svn/projects/CRM Projects/Territory
Management/Activities for Territory Management.xls (XMLAnalyzer)
10:44:11 INFO: Send configuration to: localhost:2424
10:44:12 INFO: Configuration update routine done, check log output
for errors.

It has index nowhere NEAR the entire repository. In fact, it only
indexed on top-level directory, and then another. So I run it
again. Here is the run in its entirety:

$ OPENGROK_VERBOSE=true ./OpenGrok index
Loading the default instance configuration ...
Logging filehandler pattern: /var/opengrok/log/opengrok%g.%u.log
11:17:09 INFO: Scanning for repositories...
11:17:09 INFO: Done scanning for repositories (0s)
11:17:09 INFO: Writing configuration to
/var/opengrok/etc/configuration.xml
11:17:09 INFO: Done...
11:17:09 INFO: Generating history cache for all repositories ...
11:17:09 INFO: Create historycache for /var/opengrok/src/my_svn
(SubversionRepository)
11:25:18 INFO: Creating historycache for /var/opengrok/src/my_svn
took (488766ms)
11:25:18 INFO: Done...
11:25:18 INFO: Starting indexing
11:25:18 INFO: Add: /my_svn/projects/CRM Projects/Territory
Management/Activities for Territory Management.xls (XMLAnalyzer)
11:25:20 INFO: Send configuration to: localhost:2424
11:25:20 INFO: Configuration update routine done, check log output
for errors.

And I run it again. Exact same thing. I appears to be dying on
Activities for Territory Management.xls but not throwing an error
about it.  I've attached the log from the latest run.

Anything I can do to help you guys debug this?

j



_______________________________________________
opengrok-discuss mailing list
opengrok-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opengrok-discuss


_______________________________________________
opengrok-discuss mailing list
opengrok-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opengrok-discuss

Reply via email to