Re: Question about indexing PDFs
Erick, I’m not sure of anything. I’m new to Solr and find the documentation extremely confusing. I’ve searched the web and found tutorials/advice, but they generally refer to older versions of Solr, and refer to methods/settings/whatever that no longer exist. That’s why I’m asking for help here. I looked at the list of fields in the schema browser, and ‘content' is not there. If that is not enough to ‘assume’ that the content is not being indexed, then please enlighten me as to what is. I inserted the docs in batches by posting them, following the ‘Quick Start’ tutorial. It seemed like a safe assumption that the tutorial on the Solr site would be correct and produce desirable results. What I really want to do is index the XML versions of the documents which have been run through another system, but I cannot for the life of me figure out how to do that. I’ve tried, but the documentation about XML makes no sense to me. I thought indexing the PDF versions would be easier and more straightforward, but perhaps that is not the case. Thanks, betsey On 8/25/16, 5:39 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: >That is always a dangerous assumption. Are you sure >you're searching on the proper field? Are you sure it's indexed? Are >you sure it's > >The schema browser I indicated above will give you some >idea what's actually in the field. You can not only see the >fields Solr (actually Lucene) see in your index, but you can >also see what some of the terms are. > >Adding =query and looking at the parsed query >will show you what fields are being searched against. The >most common causes of what you're describing are: > >> not searching against the field you think you are. This >is very easy to do without knowing it. > >> not actually having 'indexed="true" set in your schema > >> not committing after inserting the doc > >Best, >Erick > >On Thu, Aug 25, 2016 at 11:19 AM, Betsey Benagh < >betsey.ben...@stresearch.com> wrote: > >> It looks like the metadata of the PDFs was indexed, but not the content >> (which is what I was interested in). Searches on terms I know exist in >> the content come up empty. >> >> On 8/25/16, 2:16 PM, "Betsey Benagh" <betsey.ben...@stresearch.com> >>wrote: >> >> >Right, that¹s where I looked. No Œcontent¹. Which is what confused >>me. >> > >> > >> >On 8/25/16, 1:56 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: >> > >> >>when you say "I don't see it in the schema for that collection" are >>you >> >>talking schema.xml? managed_schema? Or actual documents in the index? >> >>Often >> >>these are defined by dynamic fields and the like in the schema files. >> >> >> >>Take a look at the admin UI>>schema browser>>drop down and you'll see >>all >> >>the actual fields in your index... >> >> >> >>Best, >> >>Erick >> >> >> >>On Thu, Aug 25, 2016 at 8:39 AM, Betsey Benagh >> >><betsey.ben...@stresearch.com >> >>> wrote: >> >> >> >>> Following the instructions in the quick start guide, I imported a >>bunch >> >>>of >> >>> PDF documents into my Solr 6.0 instance. As far as I can tell from >>the >> >>> documentation, there should be a 'content' field indexing, well, the >> >>> content, but I don't see it in the schema for that collection. Is >> >>>there >> >>> something obvious I might have missed? >> >>> >> >>> Thanks! >> >>> >> >>> >> > >> >>
Re: Question about indexing PDFs
Right, that¹s where I looked. No Œcontent¹. Which is what confused me. On 8/25/16, 1:56 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: >when you say "I don't see it in the schema for that collection" are you >talking schema.xml? managed_schema? Or actual documents in the index? >Often >these are defined by dynamic fields and the like in the schema files. > >Take a look at the admin UI>>schema browser>>drop down and you'll see all >the actual fields in your index... > >Best, >Erick > >On Thu, Aug 25, 2016 at 8:39 AM, Betsey Benagh ><betsey.ben...@stresearch.com >> wrote: > >> Following the instructions in the quick start guide, I imported a bunch >>of >> PDF documents into my Solr 6.0 instance. As far as I can tell from the >> documentation, there should be a 'content' field indexing, well, the >> content, but I don't see it in the schema for that collection. Is there >> something obvious I might have missed? >> >> Thanks! >> >>
Re: Question about indexing PDFs
It looks like the metadata of the PDFs was indexed, but not the content (which is what I was interested in). Searches on terms I know exist in the content come up empty. On 8/25/16, 2:16 PM, "Betsey Benagh" <betsey.ben...@stresearch.com> wrote: >Right, that¹s where I looked. No Œcontent¹. Which is what confused me. > > >On 8/25/16, 1:56 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: > >>when you say "I don't see it in the schema for that collection" are you >>talking schema.xml? managed_schema? Or actual documents in the index? >>Often >>these are defined by dynamic fields and the like in the schema files. >> >>Take a look at the admin UI>>schema browser>>drop down and you'll see all >>the actual fields in your index... >> >>Best, >>Erick >> >>On Thu, Aug 25, 2016 at 8:39 AM, Betsey Benagh >><betsey.ben...@stresearch.com >>> wrote: >> >>> Following the instructions in the quick start guide, I imported a bunch >>>of >>> PDF documents into my Solr 6.0 instance. As far as I can tell from the >>> documentation, there should be a 'content' field indexing, well, the >>> content, but I don't see it in the schema for that collection. Is >>>there >>> something obvious I might have missed? >>> >>> Thanks! >>> >>> >
Question about indexing PDFs
Following the instructions in the quick start guide, I imported a bunch of PDF documents into my Solr 6.0 instance. As far as I can tell from the documentation, there should be a 'content' field indexing, well, the content, but I don't see it in the schema for that collection. Is there something obvious I might have missed? Thanks!
Oddity with importing documents...
Since it appears that using a recent version of Tika with Solr is not really feasible, I'm trying to run Grobid on my files, and then import the corresponding XML into Solr. I don't see any errors on the post: bba0124$ bin/post -c lrdtest ~/software/grobid/out/021002_1.tei.xml /Library/Java/JavaVirtualMachines/jdk1.8.0_71.jdk/Contents/Home/bin/java -classpath /Users/bba0124/software/solr-5.5.0/dist/solr-core-5.5.0.jar -Dauto=yes -Dc=lrdtest -Ddata=files org.apache.solr.util.SimplePostTool /Users/bba0124/software/grobid/out/021002_1.tei.xml SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/lrdtest/update... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,r tf,htm,html,txt,log POSTing file 021002_1.tei.xml (application/xml) to [base] 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/lrdtest/update... Time spent: 0:00:00.027 But the documents don't seem to show up in the index, either. Additionally, if I try uploading the documents using the web UI, they appear to upload successfully, Response:{ "responseHeader": { "status": 0, "QTime": 7 } } But aren't in the index. What am I missing?
Re: Integrating grobid with Tika in solr
As a workaround, I’m trying to run Grobid on my files, and then import the corresponding XML into Solr. I don’t see any errors on the post: bba0124$ bin/post -c lrdtest ~/software/grobid/out/021002_1.tei.xml /Library/Java/JavaVirtualMachines/jdk1.8.0_71.jdk/Contents/Home/bin/java -classpath /Users/bba0124/software/solr-5.5.0/dist/solr-core-5.5.0.jar -Dauto=yes -Dc=lrdtest -Ddata=files org.apache.solr.util.SimplePostTool /Users/bba0124/software/grobid/out/021002_1.tei.xml SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/lrdtest/update... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,r tf,htm,html,txt,log POSTing file 021002_1.tei.xml (application/xml) to [base] 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/lrdtest/update... Time spent: 0:00:00.027 But the documents don’t seem to show up in the index, either. Additionally, if I try uploading the documents using the web UI, they appear to upload successfully, Response:{ "responseHeader": { "status": 0, "QTime": 7 } } But aren’t in the index. What am I missing? On 5/4/16, 10:55 AM, "Shawn Heisey" <apa...@elyograg.org> wrote: >On 5/4/2016 8:38 AM, Betsey Benagh wrote: >> Thanks, I¹m currently using 5.5, and will try upgrading to 6.0. >> >> >> On 5/4/16, 10:37 AM, "Allison, Timothy B." <talli...@mitre.org> wrote: >>> Y. Solr 6.0.0 is shipping with Tika 1.7. Grobid came in with Tika >>>1.11. > >Just upgrading to 6.0.0 isn't enough. As Tim said, Solr 6 currently >uses Tika 1.7, but 1.11 is required. That's four minor versions behind >the minimum. > >Tim has filed an issue for upgrading Tika to 1.13 in Solr, which he did >mention in a previous reply, but I do not know when it will be >available. Tim might have a better idea. > >https://issues.apache.org/jira/browse/SOLR-8981 > >You might be able to upgrade Tika in your Solr install to 1.12 yourself >by simply replacing the jar in WEB-INF/lib ... but I do not know whether >this will cause any other problems. Historically, replacing the jar has >been a safe option ... but I can't guarantee that this will always be >the case. > >Thanks, >Shawn >
Re: Integrating grobid with Tika in solr
I’m feeling particularly dense, because I don’t see any Tika jars in WEB-INF/lib: antlr4-runtime-4.5.1-1.jar asm-5.0.4.jar asm-commons-5.0.4.jar commons-cli-1.2.jar commons-codec-1.10.jar commons-collections-3.2.2.jar commons-configuration-1.6.jar commons-exec-1.3.jar commons-fileupload-1.2.1.jar commons-io-2.4.jar commons-lang-2.6.jar concurrentlinkedhashmap-lru-1.2.jar dom4j-1.6.1.jar guava-14.0.1.jar hadoop-annotations-2.6.0.jar hadoop-auth-2.6.0.jar hadoop-common-2.6.0.jar hadoop-hdfs-2.6.0.jar hppc-0.7.1.jar htrace-core-3.0.4.jar httpclient-4.4.1.jar httpcore-4.4.1.jar httpmime-4.4.1.jar jackson-core-2.5.4.jar jackson-dataformat-smile-2.5.4.jar joda-time-2.2.jar listing.txt lucene-analyzers-common-5.5.0.jar lucene-analyzers-kuromoji-5.5.0.jar lucene-analyzers-phonetic-5.5.0.jar lucene-backward-codecs-5.5.0.jar lucene-codecs-5.5.0.jar lucene-core-5.5.0.jar lucene-expressions-5.5.0.jar lucene-grouping-5.5.0.jar lucene-highlighter-5.5.0.jar lucene-join-5.5.0.jar lucene-memory-5.5.0.jar lucene-misc-5.5.0.jar lucene-queries-5.5.0.jar lucene-queryparser-5.5.0.jar lucene-sandbox-5.5.0.jar lucene-spatial-5.5.0.jar lucene-suggest-5.5.0.jar noggit-0.6.jar org.restlet-2.3.0.jar org.restlet.ext.servlet-2.3.0.jar protobuf-java-2.5.0.jar solr-core-5.5.0.jar solr-solrj-5.5.0.jar spatial4j-0.5.jar stax2-api-3.1.4.jar t-digest-3.1.jar woodstox-core-asl-4.4.1.jar zookeeper-3.4.6.jar On 5/4/16, 10:55 AM, "Shawn Heisey" <apa...@elyograg.org> wrote: >On 5/4/2016 8:38 AM, Betsey Benagh wrote: >> Thanks, I¹m currently using 5.5, and will try upgrading to 6.0. >> >> >> On 5/4/16, 10:37 AM, "Allison, Timothy B." <talli...@mitre.org> wrote: >>> Y. Solr 6.0.0 is shipping with Tika 1.7. Grobid came in with Tika >>>1.11. > >Just upgrading to 6.0.0 isn't enough. As Tim said, Solr 6 currently >uses Tika 1.7, but 1.11 is required. That's four minor versions behind >the minimum. > >Tim has filed an issue for upgrading Tika to 1.13 in Solr, which he did >mention in a previous reply, but I do not know when it will be >available. Tim might have a better idea. > >https://issues.apache.org/jira/browse/SOLR-8981 > >You might be able to upgrade Tika in your Solr install to 1.12 yourself >by simply replacing the jar in WEB-INF/lib ... but I do not know whether >this will cause any other problems. Historically, replacing the jar has >been a safe option ... but I can't guarantee that this will always be >the case. > >Thanks, >Shawn >
Re: Integrating grobid with Tika in solr
Thanks, I¹m currently using 5.5, and will try upgrading to 6.0. On 5/4/16, 10:37 AM, "Allison, Timothy B." <talli...@mitre.org> wrote: >Y. Solr 6.0.0 is shipping with Tika 1.7. Grobid came in with Tika 1.11. > >-Original Message- >From: Allison, Timothy B. [mailto:talli...@mitre.org] >Sent: Wednesday, May 4, 2016 10:29 AM >To: solr-user@lucene.apache.org >Subject: RE: Integrating grobid with Tika in solr > >I think Solr is using a version of Tika that predates that addition of >the Grobid parser. You'll have to add that manually somehow until Solr >upgrades to Tika 1.13 (soon to be released...I think). SOLR-8981. > >-Original Message- >From: Betsey Benagh [mailto:betsey.ben...@stresearch.com] >Sent: Wednesday, May 4, 2016 10:07 AM >To: solr-user@lucene.apache.org >Subject: Re: Integrating grobid with Tika in solr > >Grobid runs as a service, and I'm (theoretically) configuring Tika to >call it. > >From the Grobid wiki, here are instructions for integrating with Tika >application: > >First we need to create the GrobidExtractor.properties file that points >to the Grobid REST Service. My file looks like the following: > >grobid.server.url=http://localhost:[port] > >Now you can run GROBID via Tika-app with the following command on a >sample PDF file. > >java -classpath >$HOME/src/grobidparser-resources/:tika-app-1.11-SNAPSHOT.jar >org.apache.tika.cli.TikaCLI >--config=$HOME/src/grobidparser-resources/tika-config.xml -J >$HOME/src/grobid/papers/ICSE06.pdf > >Here's the stack trace. > >name="error-class">org.apache.solr.common.SolrExceptionname="root-error-class">java.lang.ClassNotFoundExceptionname="msg">org.apache.tika.exception.TikaException: Unable to find a >parser class: org.apache.tika.parser.journal.JournalParsername="trace">org.apache.solr.common.SolrException: >org.apache.tika.exception.TikaException: Unable to find a parser class: >org.apache.tika.parser.journal.JournalParser >at >org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(Extract >ingRequestHandler.java:82) >at >org.apache.solr.core.PluginBag$LazyPluginHolder.createInst(PluginBag.java: >367) >at org.apache.solr.core.PluginBag$LazyPluginHolder.get(PluginBag.java:348) >at org.apache.solr.core.PluginBag.get(PluginBag.java:148) >at >org.apache.solr.handler.RequestHandlerBase.getRequestHandler(RequestHandle >rBase.java:231) >at org.apache.solr.core.SolrCore.getRequestHandler(SolrCore.java:1362) >at >org.apache.solr.servlet.HttpSolrCall.extractHandlerFromURLPath(HttpSolrCal >l.java:326) >at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:296) >at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:412) >at >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav >a:225) >at >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav >a:183) >at >org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl >er.java:1652) >at >org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) >at >org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1 >43) >at >org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577 >) >at >org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.ja >va:223) >at >org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.ja >va:1127) >at >org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) >at >org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.jav >a:185) >at >org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.jav >a:1061) >at >org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1 >41) >at >org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHa >ndlerCollection.java:215) >at >org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollectio >n.java:110) >at >org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java >:97) >at org.eclipse.jetty.server.Server.handle(Server.java:499) >at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) >at >org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257 >) >at >org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) >at >org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.jav >a:635) >at >org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java >:555) >at java.lang.Thread.run(Thread.java:745) >Caused by: org.apache.tika.exception.Tika
Re: Integrating grobid with Tika in solr
Grobid runs as a service, and I’m (theoretically) configuring Tika to call it. >From the Grobid wiki, here are instructions for integrating with Tika >application: First we need to create the GrobidExtractor.properties file that points to the Grobid REST Service. My file looks like the following: grobid.server.url=http://localhost:[port] Now you can run GROBID via Tika-app with the following command on a sample PDF file. java -classpath $HOME/src/grobidparser-resources/:tika-app-1.11-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=$HOME/src/grobidparser-resources/tika-config.xml -J $HOME/src/grobid/papers/ICSE06.pdf Here’s the stack trace. org.apache.solr.common.SolrExceptionjava.lang.ClassNotFoundExceptionorg.apache.tika.exception.TikaException: Unable to find a parser class: org.apache.tika.parser.journal.JournalParserorg.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unable to find a parser class: org.apache.tika.parser.journal.JournalParser at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:82) at org.apache.solr.core.PluginBag$LazyPluginHolder.createInst(PluginBag.java:367) at org.apache.solr.core.PluginBag$LazyPluginHolder.get(PluginBag.java:348) at org.apache.solr.core.PluginBag.get(PluginBag.java:148) at org.apache.solr.handler.RequestHandlerBase.getRequestHandler(RequestHandlerBase.java:231) at org.apache.solr.core.SolrCore.getRequestHandler(SolrCore.java:1362) at org.apache.solr.servlet.HttpSolrCall.extractHandlerFromURLPath(HttpSolrCall.java:326) at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:296) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:412) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.tika.exception.TikaException: Unable to find a parser class: org.apache.tika.parser.journal.JournalParser at org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:362) at org.apache.tika.config.TikaConfig.init(TikaConfig.java:127) at org.apache.tika.config.TikaConfig.init(TikaConfig.java:115) at org.apache.tika.config.TikaConfig.init(TikaConfig.java:111) at org.apache.tika.config.TikaConfig.init(TikaConfig.java:92) at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:80) ... 30 more Caused by: java.lang.ClassNotFoundException: org.apache.tika.parser.journal.JournalParser at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.tika.config.ServiceLoader.getServiceClass(ServiceLoader.java:189) at org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:338) ... 35 more 500 On 5/4/16, 10:00 AM, "Shawn Heisey" <apa...@elyograg.org<mailto:apa...@elyograg.org>> wrote: On 5/4/2016 7:15 AM, Betsey Benagh wrote: (X-posted from stack overflow) This feels like a basic, dumb question, but my reading of the documentation has not led me to an answer. i'm using Solr to index jou
Integrating grobid with Tika in solr
(X-posted from stack overflow) This feels like a basic, dumb question, but my reading of the documentation has not led me to an answer. i'm using Solr to index journal articles. Using the out-of-the-box configuration, it indexed the text of the documents, but I'm looking to use Grobid to pull out the authors, title, affiliations, etc. I got grobid up and running as a service. I added /path/to/tika-config.xml to the requestHandler for /update/extract in solrconfig.xml The tika-config looks like: application/pdf I'm getting a ClassNotFound exception when I try to import a document, but can't figure out where to set the classpath to fix it.
Re: Growing memory?
bin/solr status shows the memory usage increasing, as does the admin ui. I¹m running this on a shared machine that is supporting several other applications, so I can¹t be particularly greedy with memory usage. Is there anything out there that gives guidelines on what an appropriate amount of heap is based on number of documents or whatever? We¹re just playing around with it right now, but it sounds like we may need a different machine in order to load in all of the data we want to have available. Thanks, betsey On 4/14/16, 3:08 PM, "Shawn Heisey" <apa...@elyograg.org> wrote: >On 4/14/2016 12:45 PM, Betsey Benagh wrote: >> I'm running solr 6.0.0 in server mode. I have one core. I loaded about >>2000 documents in, and it was using about 54 MB of memory. No problem. >>Nobody was issuing queries or doing anything else, but over the course >>of about 4 hours, the memory usage had tripled to 152 MB. I shut solr >>down and restarted it, and saw the memory usage back at 54 MB. Again, >>with no queries or anything being executed against the core, the memory >>usage is creeping up - after 17 minutes, it was up to 60 MB. I've looked >>at the documentation for how to limit memory usage, but I want to >>understand why it's creeping up when nothing is happening, lest it run >>out of memory when I limit the usage. The machine is running CentOS 6.6, >>if that matters, with Java 1.8.0_65. > >When you start Solr 5.0 or later directly from the download or directly >after installing it with the service installer script (on *NIX >platforms), Solr starts with a 512MB Java heap. You can change this if >you need to -- most Solr users do need to increase the heap size to a >few gigabytes. > >Java uses a garbage collection memory model. It's perfectly normal >during the operation of a Java program, even one that is not doing >anything you can see, for the memory utilization to rise up to the >configured heap size. This is simply how things work in systems using a >garbage collection memory model. > >Where exactly are you looking to find the memory utilization? In the >admin UI, that number will go up over time, until one of the memory >pools gets full and Java does a garbage collection, and then it will >likely go down again. From the operating system point of view, the >resident memory usage will increase up to a point (when the entire heap >has been allocated) and probably never go back down -- but it also >shouldn't go up either. > >Thanks, >Shawn >
Re: Growing memory?
Thanks for the quick response. Forgive the naïve question, but shouldn¹t it be doing garbage collection automatically? Having to manually force GC via jconsole isn¹t a sustainable solution. Thanks again, betsey On 4/14/16, 2:54 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: >well, things _are_ running, specifically the communications channels >are looking for incoming messages and the like, generating garbage >etc. > >Try attaching jconsole to the process and hitting the GC button to >force a garbage collection. As long as your memory gets to some level >and drops back to that level after forcing GCs, you'll be fine. > >Best, >Erick > >On Thu, Apr 14, 2016 at 11:45 AM, Betsey Benagh ><betsey.ben...@stresearch.com> wrote: >> X-posted from stack overflow... >> >> I'm running solr 6.0.0 in server mode. I have one core. I loaded about >>2000 documents in, and it was using about 54 MB of memory. No problem. >>Nobody was issuing queries or doing anything else, but over the course >>of about 4 hours, the memory usage had tripled to 152 MB. I shut solr >>down and restarted it, and saw the memory usage back at 54 MB. Again, >>with no queries or anything being executed against the core, the memory >>usage is creeping up - after 17 minutes, it was up to 60 MB. I've looked >>at the documentation for how to limit memory usage, but I want to >>understand why it's creeping up when nothing is happening, lest it run >>out of memory when I limit the usage. The machine is running CentOS 6.6, >>if that matters, with Java 1.8.0_65. >> >> Thanks! >>
Growing memory?
X-posted from stack overflow... I'm running solr 6.0.0 in server mode. I have one core. I loaded about 2000 documents in, and it was using about 54 MB of memory. No problem. Nobody was issuing queries or doing anything else, but over the course of about 4 hours, the memory usage had tripled to 152 MB. I shut solr down and restarted it, and saw the memory usage back at 54 MB. Again, with no queries or anything being executed against the core, the memory usage is creeping up - after 17 minutes, it was up to 60 MB. I've looked at the documentation for how to limit memory usage, but I want to understand why it's creeping up when nothing is happening, lest it run out of memory when I limit the usage. The machine is running CentOS 6.6, if that matters, with Java 1.8.0_65. Thanks!