Hmm, you know, I don't even know what a "row" means when importing XML. But let's talk about importing XML. As far as I know, unless you use XSLT to perform a transformation, Solr doesn't import XML except as well-formed Solr documents, some form like: <add> <doc> <field name="blah">value</field> </doc> </add>
If you're importing anything else, I don't think Solr understands it at all... So what does your "funky XML document" look like? What, if any, errors are reported in your Solr logs? Also, it's surprisingly easy to debug Solr when it runs. In IntelliJ, all it involves is creating an application and you tell it to add a "remote" application and it'll give you the parameters you need to specify when you start your Solr. From there you just invoke your Solr instance with those parameters and connect remotely. I took the entire source tree for the Solr I was using and compiled it (ant example) and it was easy. So you might get more mileage out of debugging in Solr rather than logging, but that's a guess. Best Erick On Sat, Oct 1, 2011 at 6:17 PM, Pulkit Singhal <pulkitsing...@gmail.com> wrote: > ==== > The Problem: > ==== > When using DIH with trunk 4.x, I am seeing some very funny numbers > with a particularly large XML file that I'm trying to import. Usually > there are bound to be more rows than documents indexed in DIH because > of the foreach property but my other xm lfiles have maybe 1.5 times > the rows compared to the # of docs indexed. > > This particular funky file ends up with something like: > <str name="Total Rows Fetched">25614008</str> > <str name="Total Documents Processed">1048</str> > That's 25 million rows fetched before even a measly 1000 docs are indexed! > Something has to be wrong here. > I checked the xml for well-formed-ness in vim by running "!:xmllint > --noout %" so I think there are no issues there. > > ==== > The Question: > ==== > For those intimately familiar with DIH code/behaviour: What is the > appropriate log-level that will let me see the rows & docs printed out > to log as each one is fetched/created? I don't want to make the logs > explode because then I won't be able to read through them. Is there > some gentle balance here that I can leverage? > > Thanks! > - Pulkit >