On Friday, October 07, 2011, Lubos Kosco elucidated thus: > so first odd thing is why does it use an xml analyzer on xls file
Well, because it was in fact an HTML file that someone had given an .xls extension. It does look like it was an excel export, but it was an HTML file. And it's 373MB So, the next question is: 1) Why is it dying? 2) Why is it dying with no errors, killing the indexing, and making it look like everything completed successfully? > can you share that xls file so we can reproduce ? I will see...it might be company-internal data, so I'm not sure. > as workaround you can add this file to ignore list > (I think there is an option for this , which is also not in Opengrok, > just add it there > -i or -I , check command line options - > http://src.opensolaris.org/source/xref/opengrok/trunk/src/org/opensol > aris/opengrok/index/CommandLineOptions.java#69 ) This could get tedious if there are logs of files that kill the indexer. :) Thanks again! j > > thnx > L > > On 7.10.2011 20:31, Joshua J. Kugler wrote: > > First off: Thanks to Lubos and Trond for all the help and pointers > > so far. > > > > So, I have a pretty standard install (I think). I have: > > > > /var > > > > /opengrok > > > > /data > > /etc > > /log > > logging.properties > > /src > > > > /my_svn # a full checkout of my SVN repository > > > > /var/opengrok (and everything under it) is owned by the user > > running the OpenGrok script. > > > > data/ is empty > > > > OpenGrok starts, takes a LOOOONG time to do the history cache; but > > this is 25,000+ commits, so that's not too bad. > > > > It starts indexing, and then I see this: > > > > 10:43:42 INFO: Add: /crchealth_svn/projects/CRM Projects/Territory > > Management/Activities for Territory Management.xls (XMLAnalyzer) > > 10:44:11 INFO: Send configuration to: localhost:2424 > > 10:44:12 INFO: Configuration update routine done, check log output > > for errors. > > > > It has index nowhere NEAR the entire repository. In fact, it only > > indexed on top-level directory, and then another. So I run it > > again. Here is the run in its entirety: > > > > $ OPENGROK_VERBOSE=true ./OpenGrok index > > Loading the default instance configuration ... > > Logging filehandler pattern: /var/opengrok/log/opengrok%g.%u.log > > 11:17:09 INFO: Scanning for repositories... > > 11:17:09 INFO: Done scanning for repositories (0s) > > 11:17:09 INFO: Writing configuration to > > /var/opengrok/etc/configuration.xml > > 11:17:09 INFO: Done... > > 11:17:09 INFO: Generating history cache for all repositories ... > > 11:17:09 INFO: Create historycache for /var/opengrok/src/my_svn > > (SubversionRepository) > > 11:25:18 INFO: Creating historycache for /var/opengrok/src/my_svn > > took (488766ms) > > 11:25:18 INFO: Done... > > 11:25:18 INFO: Starting indexing > > 11:25:18 INFO: Add: /my_svn/projects/CRM Projects/Territory > > Management/Activities for Territory Management.xls (XMLAnalyzer) > > 11:25:20 INFO: Send configuration to: localhost:2424 > > 11:25:20 INFO: Configuration update routine done, check log output > > for errors. > > > > And I run it again. Exact same thing. I appears to be dying on > > Activities for Territory Management.xls but not throwing an error > > about it. I've attached the log from the latest run. > > > > Anything I can do to help you guys debug this? > > > > j > > > > > > > > _______________________________________________ > > opengrok-discuss mailing list > > opengrok-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/opengrok-discuss -- Joshua J. Kugler - Fairbanks, Alaska Azariah Enterprises - Programming and Website Design jos...@azariah.com - Jabber: pedah...@gmail.com PGP Key: http://pgp.mit.edu/ ID 0x73B13B6A _______________________________________________ opengrok-discuss mailing list opengrok-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/opengrok-discuss