Thanks Otis,

DirectXmlRequest is part of the SolrJ library, so I guess that means it is not 
commonly used.  My use case is that I'm applying an XSLT to the raw XML on the 
client side, instead of leaving that up to the Solr master (although even if I 
applied the XSLT on the Solr server, I'd still use DirectXmlRequest to get the 
raw XML there).  This does lead me to the idea that parsing the XML without the 
XSLT is probably better than copying some of XMLLoader to parse Solr XML as a 
workaround, and might be a good idea to do anyway.

I've done some research and I'm fairly confident that apache commons-fileupload 
library is responsible for the temp files.  There's an explanation for how 
files are cleaned up at http://commons.apache.org/fileupload/using.html in the 
"Resource cleanup" section.  I have observed that forcing a garbage collection 
over JMX results in all temporary files being purged.  This implies that many 
of the java.io.File objects are moving to old gen in the heap which survive 
long enough (only a few minutes in my case) to use up all tmp disk space.

I think this can probably be solved by GC tuning, or, failing that, introducing 
a (less desirable) System.gc() somewhere in the updateRequestProcessorChain.

Thanks for your help, and hopefully this will be useful if someone else runs 
into a similar problem.

Ryan
________________________________________
From: Otis Gospodnetic [otis.gospodne...@gmail.com]
Sent: Wednesday, January 09, 2013 11:53 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrJ DirectXmlRequest

Hi Ryan,

One typically uses a Solr client library to talk to Solr instead of sending
raw XML.  For example, if your application in written in Java then you
would use SolrJ.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Jan 9, 2013 at 12:03 PM, Ryan Josal <rjo...@rim.com> wrote:

> I also don't know what's creating them.  Maybe Solr, but also maybe
> Tomcat, maybe apache commons.  I could change java.io.tmpdir to one with
> more space, but the problem is that many of the temp files end up
> permanent, so eventually it would still run out of space.  I also
> considered setting the tmpdir to /dev/null, but that would defeat the
> purpose of whatever is writing those log files in the first place.  I could
> periodically clean up the tmpdir myself, but that feels the hackiest.
>
> Is it fairly common to send XML to Solr this way from a remote host?  If
> it is, then that would lead me to believe Solr and any of it's libraries
> aren't causing it, and I should inspect Tomcat.  I'm using Tomcat 7.
>
> Ryan
> ________________________________________
> From: Otis Gospodnetic [otis.gospodne...@gmail.com]
> Sent: Tuesday, January 08, 2013 7:29 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrJ DirectXmlRequest
>
> Hi Ryan,
>
> I'm not sure what is creating those upload files.... something in Solr? Or
> Tomcat?
>
> Why not specify a different temp dir via system property command line
> parameter?
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Jan 8, 2013 12:17 PM, "Ryan Josal" <rjo...@rim.com> wrote:
>
> > I have encountered an issue where using DirectXmlRequest to index data on
> > a remote host results in eventually running out have temp disk space in
> the
> > java.io.tmpdir directory.  This occurs when I process a sufficiently
> large
> > batch of files.  About 30% of the temporary files end up permanent.  The
> > filenames look like: upload__2341cdae_13c02829b77__7ffd_00029003.tmp.
>  Has
> > anyone else had this happen before?  The relevant code is:
> >
> >         DirectXmlRequest up = new DirectXmlRequest( "/update", xml );
> >         up.process(solr);
> >
> > where `xml` is a String containing Solr formatted XML, and `solr` is the
> > SolrServer.  When disk space is eventually exhausted, this is the error
> > message that is repeatedly seen on the master host:
> >
> > 2013-01-07 19:22:16,911 [http-bio-8090-exec-2657] [] ERROR
> > org.apache.solr.servlet.SolrDispatchFilter  [] -
> > org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
> > Processing of multipart/form-data request failed. No space left on device
> >         at
> >
> org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)
> >         at
> >
> org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
> >         at
> >
> org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:344)
> >         at
> >
> org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:397)
> >         at
> >
> org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
> >         at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
> >         at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> > ... truncated stack trace
> >
> > I am running Solr 3.6 on an Ubuntu 12.04 server.  I am considering
> working
> > around this by pulling out as much as I can from XMLLoader into my
> client,
> > and processing the XML myself into SolrInputDocuments for indexing, but
> > this is certainly not ideal.
> >
> > Ryan
> > ---------------------------------------------------------------------
> > This transmission (including any attachments) may contain confidential
> > information, privileged material (including material protected by the
> > solicitor-client or other applicable privileges), or constitute
> non-public
> > information. Any use of this information by anyone other than the
> intended
> > recipient is prohibited. If you have received this transmission in error,
> > please immediately reply to the sender and delete this information from
> > your system. Use, dissemination, distribution, or reproduction of this
> > transmission by unintended recipients is not authorized and may be
> unlawful.
> >
>
> ---------------------------------------------------------------------
> This transmission (including any attachments) may contain confidential
> information, privileged material (including material protected by the
> solicitor-client or other applicable privileges), or constitute non-public
> information. Any use of this information by anyone other than the intended
> recipient is prohibited. If you have received this transmission in error,
> please immediately reply to the sender and delete this information from
> your system. Use, dissemination, distribution, or reproduction of this
> transmission by unintended recipients is not authorized and may be unlawful.
>

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.

Reply via email to