Re: 404 Errors on update/extract
Hi Leon, Feel free to create JIRA issue https://issues.apache.org/jira/secure/Dashboard.jspa and then do Github pull request to fix the example name. The documentation is in asciidoc format at: https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide/src with names matching those on the server. This could be a great issue to cut your teeth on with helping Solr :-) Regards, Alex. On Fri, 5 Feb 2021 at 10:35, nq wrote: > > Hi Alex, > > > Thanks a lot for your help! > > I have tested the same using the 'techproducts' example as proposed, and > it worked fine. > > > You are right, the documentation seems to be outdated in this aspect. > > I have just reviewed the solrconfig.xml of the 'schemaless' example and > found all the Solr Cell config was completely missing. > > After adding it as described at > > https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-extractingrequesthandler-in-solrconfig-xml > > everything worked fine again. > > > What can I do to help updating the docs? > > > Best regards, > > Leon > > > Am 05.02.21 um 16:15 schrieb Alexandre Rafalovitch: > > I think the extract handler is not defined in schemaless. This may be > > a change from before and the documentation is out of sync. > > > > Can you try 'techproducts' example instead of schemaless: > > bin/solr stop (if you are still running it) > > bin/solr start -e techproducts > > > > Then the import command. > > > > The Tika integration is defined in solrconfig.xml and needs both > > handler defined and some libraries loaded. Once you confirmed you like > > what you see, you can copy those into whatever configuration you are > > working with. > > > > Regards, > > Alex. > > > > On Fri, 5 Feb 2021 at 07:38, nq wrote: > >> Hi, > >> > >> > >> I am new to Solr and tried to follow the guide to upload PDF data using > >> Tika, on Solr 8.7.0 (running on Debian 10): > >> > >> https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html > >> > >> but I get an HTTP 404 error when trying to import the file. > >> > >> > >> In the solr installation directory, after spinning up the example server > >> using > >> > >> solr/bin/solr -e schemaless > >> > >> I firstly used the Post Tool to index a PDF file as described in the > >> guide, giving the following output (paths truncated using “[…]” for > >> privacy reasons): > >> > >> bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params > >> "literal.id=doc1" > >> > >>> java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes > >>> -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa > >>> che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf > >>> SimplePostTool version 5.0.0 > >>> Posting files to [base] url > >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > >>> Entering auto mode. File endings considered are > >>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log > >>> POSTing file solr-word.pdf (application/pdf) to [base]/extract > >>> SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for > >>> url: > >>> http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1 > >>> esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > >>> SimplePostTool: WARNING: Response: > >>> > >>> > >>> Error 404 Not Found > >>> > >>> HTTP ERROR 404 Not Found > >>> > >>> URI:/solr/gettingstarted/update/extract > >>> STATUS:404 > >>> MESSAGE:Not Found > >>> SERVLET:default > >>> > >>> > >>> > >>> > >>> SimplePostTool: WARNING: IOException while reading response: > >>> java.io.FileNotFoundException: > >>> http://localhost:8983/solr/gettingstarted/update/extract > >>> ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > >>> > >>> 1 files indexed. > >>> COMMITting Solr index changes to > >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > >>> Time spent: 0:00:00.038 > >> resulting in no actual changes being visible in the Solr. > >> > >> > >> Using curl results in the same HTTP response: > >> > >>> curl > >>> 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true' > >>> -F "myfile=@example > >>> /exampledocs/solr-word.pdf" > >>> > >>> > >>> > >>> Error 404 Not Found > >>> > >>> HTTP ERROR 404 Not Found > >>> > >>> URI:/solr/gettingstarted/update/extract > >>> STATUS:404 > >>> MESSAGE:Not Found > >>> SERVLET:default > >>> > >>> > >>> > >>> > >>> > >> Sorry if this has already been discussed somewhere; I have not been able > >> to find anything helpful yet. > >> > >> Thank you! > >> > >> Leon > >>
Re: 404 Errors on update/extract
Hi Alex, Thanks a lot for your help! I have tested the same using the 'techproducts' example as proposed, and it worked fine. You are right, the documentation seems to be outdated in this aspect. I have just reviewed the solrconfig.xml of the 'schemaless' example and found all the Solr Cell config was completely missing. After adding it as described at https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-extractingrequesthandler-in-solrconfig-xml everything worked fine again. What can I do to help updating the docs? Best regards, Leon Am 05.02.21 um 16:15 schrieb Alexandre Rafalovitch: I think the extract handler is not defined in schemaless. This may be a change from before and the documentation is out of sync. Can you try 'techproducts' example instead of schemaless: bin/solr stop (if you are still running it) bin/solr start -e techproducts Then the import command. The Tika integration is defined in solrconfig.xml and needs both handler defined and some libraries loaded. Once you confirmed you like what you see, you can copy those into whatever configuration you are working with. Regards, Alex. On Fri, 5 Feb 2021 at 07:38, nq wrote: Hi, I am new to Solr and tried to follow the guide to upload PDF data using Tika, on Solr 8.7.0 (running on Debian 10): https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html but I get an HTTP 404 error when trying to import the file. In the solr installation directory, after spinning up the example server using solr/bin/solr -e schemaless I firstly used the Post Tool to index a PDF file as described in the guide, giving the following output (paths truncated using “[…]” for privacy reasons): bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params "literal.id=doc1" java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file solr-word.pdf (application/pdf) to [base]/extract SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1 esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf SimplePostTool: WARNING: Response: Error 404 Not Found HTTP ERROR 404 Not Found URI:/solr/gettingstarted/update/extract STATUS:404 MESSAGE:Not Found SERVLET:default SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/gettingstarted/update/extract ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... Time spent: 0:00:00.038 resulting in no actual changes being visible in the Solr. Using curl results in the same HTTP response: curl 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true' -F "myfile=@example /exampledocs/solr-word.pdf" Error 404 Not Found HTTP ERROR 404 Not Found URI:/solr/gettingstarted/update/extract STATUS:404 MESSAGE:Not Found SERVLET:default Sorry if this has already been discussed somewhere; I have not been able to find anything helpful yet. Thank you! Leon
Re: 404 Errors on update/extract
I think the extract handler is not defined in schemaless. This may be a change from before and the documentation is out of sync. Can you try 'techproducts' example instead of schemaless: bin/solr stop (if you are still running it) bin/solr start -e techproducts Then the import command. The Tika integration is defined in solrconfig.xml and needs both handler defined and some libraries loaded. Once you confirmed you like what you see, you can copy those into whatever configuration you are working with. Regards, Alex. On Fri, 5 Feb 2021 at 07:38, nq wrote: > > Hi, > > > I am new to Solr and tried to follow the guide to upload PDF data using > Tika, on Solr 8.7.0 (running on Debian 10): > > https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html > > but I get an HTTP 404 error when trying to import the file. > > > In the solr installation directory, after spinning up the example server > using > > solr/bin/solr -e schemaless > > I firstly used the Post Tool to index a PDF file as described in the > guide, giving the following output (paths truncated using “[…]” for > privacy reasons): > > bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params > "literal.id=doc1" > > > java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes > > -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa > > che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf > > SimplePostTool version 5.0.0 > > Posting files to [base] url > > http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > > Entering auto mode. File endings considered are > > xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log > > POSTing file solr-word.pdf (application/pdf) to [base]/extract > > SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for > > url: > > http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1 > > esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > > SimplePostTool: WARNING: Response: > > > > > > Error 404 Not Found > > > > HTTP ERROR 404 Not Found > > > > URI:/solr/gettingstarted/update/extract > > STATUS:404 > > MESSAGE:Not Found > > SERVLET:default > > > > > > > > > > SimplePostTool: WARNING: IOException while reading response: > > java.io.FileNotFoundException: > > http://localhost:8983/solr/gettingstarted/update/extract > > ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > > > > 1 files indexed. > > COMMITting Solr index changes to > > http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > > Time spent: 0:00:00.038 > resulting in no actual changes being visible in the Solr. > > > Using curl results in the same HTTP response: > > > curl > > 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true' > > -F "myfile=@example > > /exampledocs/solr-word.pdf" > > > > > > > > Error 404 Not Found > > > > HTTP ERROR 404 Not Found > > > > URI:/solr/gettingstarted/update/extract > > STATUS:404 > > MESSAGE:Not Found > > SERVLET:default > > > > > > > > > > > > Sorry if this has already been discussed somewhere; I have not been able > to find anything helpful yet. > > Thank you! > > Leon >
404 Errors on update/extract
Hi, I am new to Solr and tried to follow the guide to upload PDF data using Tika, on Solr 8.7.0 (running on Debian 10): https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html but I get an HTTP 404 error when trying to import the file. In the solr installation directory, after spinning up the example server using solr/bin/solr -e schemaless I firstly used the Post Tool to index a PDF file as described in the guide, giving the following output (paths truncated using “[…]” for privacy reasons): bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params "literal.id=doc1" java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file solr-word.pdf (application/pdf) to [base]/extract SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1 esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf SimplePostTool: WARNING: Response: Error 404 Not Found HTTP ERROR 404 Not Found URI:/solr/gettingstarted/update/extract STATUS:404 MESSAGE:Not Found SERVLET:default SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/gettingstarted/update/extract ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... Time spent: 0:00:00.038 resulting in no actual changes being visible in the Solr. Using curl results in the same HTTP response: curl 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true' -F "myfile=@example /exampledocs/solr-word.pdf" Error 404 Not Found HTTP ERROR 404 Not Found URI:/solr/gettingstarted/update/extract STATUS:404 MESSAGE:Not Found SERVLET:default Sorry if this has already been discussed somewhere; I have not been able to find anything helpful yet. Thank you! Leon
Error 500 with update extract handler on Solr 7.4.0
Hi all, I recently experienced some problems with the update extract handler on a Solr 7.4.0 instance. When sending a document via multipart POST update request, if a doc parameter name contains too much chars, the POST method fails with a 500 code error and I can see the following exception in the Solr logs : ERROR 2019-06-20T09:43:41,089 (qtp1625082366-13) - Solr|Solr|solr.servlet.HttpSolrCall|[c:FileShare s:shard1 r:core_node2 x:FileShare_shard1_replica_n1] o.a.s.s.HttpSolrCall null:org.apache.commons.fileupload.FileUploadException: Header section has more than 10240 bytes (maybe it is not properly terminated) at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362) at org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:115) at org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:602) at org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:784) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:167) at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:317) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678) at java.lang.Thread.run(Thread.java:748) Caused
Error 500 with update extract handler on Solr 7.4.0
Hi all, I recently experienced some problems with the update extract handler on a Solr 7.4.0 instance. When sending a document via multipart POST update request, if a doc parameter name contains too much chars, the POST method fails with a 500 code error and I can see the following exception in the Solr logs : ERROR 2019-06-20T09:43:41,089 (qtp1625082366-13) - Solr|Solr|solr.servlet.HttpSolrCall|[c:FileShare s:shard1 r:core_node2 x:FileShare_shard1_replica_n1] o.a.s.s.HttpSolrCall null:org.apache.commons.fileupload.FileUploadException: Header section has more than 10240 bytes (maybe it is not properly terminated) at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362) at org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:115) at org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:602) at org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:784) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:167) at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:317) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678) at java.lang.Thread.run(Thread.java:748) Caused
Problems with long named parameters with update extract handler
Hi all, I recently experienced some problems with the update extract handler on a Solr 7.4.0 instance. When sending a document via multipart POST update request, if a doc parameter name contains too much chars, the POST method fails and I can see the following exception in the Solr logs : ERROR 2019-06-20T09:43:41,089 (qtp1625082366-13) - Solr|Solr|solr.servlet.HttpSolrCall|[c:FileShare s:shard1 r:core_node2 x:FileShare_shard1_replica_n1] o.a.s.s.HttpSolrCall null:org.apache.commons.fileupload.FileUploadException: Header section has more than 10240 bytes (maybe it is not properly terminated) at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362) at org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:115) at org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:602) at org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:784) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:167) at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:317) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678) at java.lang.Thread.run(Thread.java:748) Caused
Problems with long named parameters with update extract handler
Hi, I recently experienced some problems with the update extract handler. When sending a document via multipart POST update request, if a doc parameter name contains too much chars, the POST method fails and I can see the following exception in the Solr logs : ERROR 2019-06-20T09:43:41,089 (qtp1625082366-13) - Solr|Solr|solr.servlet.HttpSolrCall|[c:FileShare s:shard1 r:core_node2 x:FileShare_shard1_replica_n1] o.a.s.s.HttpSolrCall null:org.apache.commons.fileupload.FileUploadException: Header section has more than 10240 bytes (maybe it is not properly terminated) at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362) at org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:115) at org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:602) at org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:784) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:167) at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:317) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.commons.fileupload.MultipartStream
AW: Re: update/extract override ExtractTyp
I am useing the Extract URL And Renamed the File to test.txtBut it is still Parsed with the XML ParserCan I force the txt Parser for all .txt Files? Von meinem Samsung Gerät gesendet. Ursprüngliche Nachricht Von: Shawn Heisey <apa...@elyograg.org> Datum: 04.01.17 17:10 (GMT+01:00) An: solr-user@lucene.apache.org Betreff: Re: update/extract override ExtractTyp On 1/4/2017 8:12 AM, sn0...@ulysses-erp.com wrote: > Is it possible to override the ExtractClass for a specific document? > I would like to upload a XML Document, but this XML is not XML conform > > I need this XML because it is part of a project where a corrupt XML is > need, for testing purpose. > > > The update/extract process failes every time with an 500 error. > > I tried to override the Content-Type with "text/plain" but get still > the XML parse error. If you send something to the /update handler, and don't tell Solr that it is another format that it knows like CSV, JSON, or Javabin, then Solr assumes that it is XML -- and that it is the *specific* XML format that Solr uses. "text/plain" is not one of the formats that the update handler knows how to handle, so it will assume XML. If you send some other arbitrary XML content, even if that XML is otherwise correctly formed (which apparently yours isn't), Solr will throw an error, because it is not the type of XML that Solr is looking for. On this page are some examples of what Solr is expecting when you send XML: https://wiki.apache.org/solr/UpdateXmlMessages If you want to parse arbitrary XML into fields, you probably need to send it using DIH and the XPathEntityProcessor. If you want the XML to go into a field completely as-is, then you need to encode the XML into one of the update formats that Solr knows (XML, JSON, etc) and set it as the value of one of the fields. Thanks, Shawn
Re: update/extract override ExtractTyp
On 1/4/2017 8:12 AM, sn0...@ulysses-erp.com wrote: > Is it possible to override the ExtractClass for a specific document? > I would like to upload a XML Document, but this XML is not XML conform > > I need this XML because it is part of a project where a corrupt XML is > need, for testing purpose. > > > The update/extract process failes every time with an 500 error. > > I tried to override the Content-Type with "text/plain" but get still > the XML parse error. If you send something to the /update handler, and don't tell Solr that it is another format that it knows like CSV, JSON, or Javabin, then Solr assumes that it is XML -- and that it is the *specific* XML format that Solr uses. "text/plain" is not one of the formats that the update handler knows how to handle, so it will assume XML. If you send some other arbitrary XML content, even if that XML is otherwise correctly formed (which apparently yours isn't), Solr will throw an error, because it is not the type of XML that Solr is looking for. On this page are some examples of what Solr is expecting when you send XML: https://wiki.apache.org/solr/UpdateXmlMessages If you want to parse arbitrary XML into fields, you probably need to send it using DIH and the XPathEntityProcessor. If you want the XML to go into a field completely as-is, then you need to encode the XML into one of the update formats that Solr knows (XML, JSON, etc) and set it as the value of one of the fields. Thanks, Shawn
update/extract override ExtractTyp
Hello Is it possible to override the ExtractClass for a specific document? I would like to upload a XML Document, but this XML is not XML conform I need this XML because it is part of a project where a corrupt XML is need, for testing purpose. The update/extract process failes every time with an 500 error. I tried to override the Content-Type with "text/plain" but get still the XML parse error. Is it possible to override it? This message was sent using IMP, the Internet Messaging Program.
Re: language configuration in update extract request handler
This question should be posted on tika mailing list. It is not related to index or search but about parsing content of image. On Sun, Jun 5, 2016 at 10:20 PM, SIDDHAST® Roshanwrote: > Hi All, > > we are using the application for indexing and searching text using > solr. we refered the guide posted > > http://hortonworks.com/hadoop-tutorial/indexing-and-searching-text-within-images-with-apache-solr/ > > Problem: we are want to index hindi images. we want to know how to set > configuration parameter of tesseract via tika or external params > > -- > Roshan Agarwal > Siddhast® > 907 chandra vihar colony > Jhansi-284002 > M:+917376314900 >
language configuration in update extract request handler
Hi All, we are using the application for indexing and searching text using solr. we refered the guide posted http://hortonworks.com/hadoop-tutorial/indexing-and-searching-text-within-images-with-apache-solr/ Problem: we are want to index hindi images. we want to know how to set configuration parameter of tesseract via tika or external params -- Roshan Agarwal Siddhast® 907 chandra vihar colony Jhansi-284002 M:+917376314900
Re: Commit Within and /update/extract handler
This is being triggered by adding the commitWithin param to ContentStreamUpdateRequest (request.setCommitWithin(1);). My configuration has autoCommit max time of 15s and openSearcher set to false. I'm assuming that changing openSeracher to true should address this, and adding the softCommit = true to the request would make the documents available in the mean time? On Apr 8, 2014 10:02 AM, Erick Erickson erickerick...@gmail.com wrote: Got a clue how it's being generated? Because it's not going to show you documents. commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} openSearcher=false and softCommit=false so the documents will be invisible. You need one or the other set to true. What it will do is close the current segment, open a new one and truncate the current transaction log. These may be good things but they have nothing to do with making docs visible :). See: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Mon, Apr 7, 2014 at 8:43 PM, Jamie Johnson jej2...@gmail.com wrote: Below is the log showing what I believe to be the commit 07-Apr-2014 23:40:55.846 INFO [catalina-exec-5] org.apache.solr.update.processor.LogUpdateProcessor.finish [forums] webapp=/solr path=/update/extract params={uprefix=attr_literal.source_id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.content_group=File literal.id =e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.forum_id=3literal.content_type=application/octet-streamwt=javabinliteral.uploaded_by=+version=2literal.content_type=application/octet-streamliteral.file_name=exclusions} {add=[e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcce (1464785652471037952)]} 0 563 07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1] org.apache.solr.update.DirectUpdateHandler2.commit start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: commit: start 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: commit: enter lock 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: commit: now prepare 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: prepareCommit: flush 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: index before flush _y(4.6):C1 _10(4.6):C1 _11(4.6):C1 _12(4.6):C1 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DW][commitScheduler-10-thread-1]: commitScheduler-10-thread-1 startFullFlush 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DW][commitScheduler-10-thread-1]: anyChanges? numDocsInRam=1 deletes=true hasTickets:false pendingChangesInFullFlush: false 07-Apr-2014 23:41:10.850 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWFC][commitScheduler-10-thread-1]: addFlushableState DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_14, aborting=false, numDocsInRAM=1, deleteQueue=DWDQ: [ generation: 2 ]] 07-Apr-2014 23:41:10.852 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flush postings as segment _14 numDocs=1 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: new segment has 0 deleted docs 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: new segment has no vectors; norms; no docValues; prox; freqs 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flushedFiles=[_14.nvd, _14_Lucene41_0.pos, _14_Lucene41_0.tip, _14_Lucene41_0.tim, _14.nvm, _14.fdx, _14_Lucene41_0.doc, _14.fnm, _14.fdt] 07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flushed codec=Lucene46 07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flushed: segment=_14 ramUsed=0.122 MB newFlushedSize(includes docstores)=0.003 MB docs/MB=322.937 07-Apr-2014 23:41
Re: Commit Within and /update/extract handler
On 4/9/2014 7:47 AM, Jamie Johnson wrote: This is being triggered by adding the commitWithin param to ContentStreamUpdateRequest (request.setCommitWithin(1);). My configuration has autoCommit max time of 15s and openSearcher set to false. I'm assuming that changing openSeracher to true should address this, and adding the softCommit = true to the request would make the documents available in the mean time? My personal opinion: autoCommit should not be used for document visibility, even though it CAN be used for it. It belongs in every config that uses the transaction log, with openSearcher set to false, and carefully considered maxTime and/or maxDocs parameters. I think it's better to control document visibility entirely manually, but if you actually do want to have an automatic commit for document visibility, use autoSoftCommit. It doesn't make any sense to disable openSearcher on a soft commit, so just leave that out. The docs/time intervals for this can be smaller or greater than the intervals for autoCommit, depending on your needs. Any manual commits that you send probably should be soft commits, but honestly that doesn't really matter if your auto settings are correct. Thanks, Shawn
Re: Commit Within and /update/extract handler
Thanks Shawn, I appreciate the information. On Wed, Apr 9, 2014 at 10:27 AM, Shawn Heisey s...@elyograg.org wrote: On 4/9/2014 7:47 AM, Jamie Johnson wrote: This is being triggered by adding the commitWithin param to ContentStreamUpdateRequest (request.setCommitWithin(1);). My configuration has autoCommit max time of 15s and openSearcher set to false. I'm assuming that changing openSeracher to true should address this, and adding the softCommit = true to the request would make the documents available in the mean time? My personal opinion: autoCommit should not be used for document visibility, even though it CAN be used for it. It belongs in every config that uses the transaction log, with openSearcher set to false, and carefully considered maxTime and/or maxDocs parameters. I think it's better to control document visibility entirely manually, but if you actually do want to have an automatic commit for document visibility, use autoSoftCommit. It doesn't make any sense to disable openSearcher on a soft commit, so just leave that out. The docs/time intervals for this can be smaller or greater than the intervals for autoCommit, depending on your needs. Any manual commits that you send probably should be soft commits, but honestly that doesn't really matter if your auto settings are correct. Thanks, Shawn
Re: Commit Within and /update/extract handler
Got a clue how it's being generated? Because it's not going to show you documents. commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} openSearcher=false and softCommit=false so the documents will be invisible. You need one or the other set to true. What it will do is close the current segment, open a new one and truncate the current transaction log. These may be good things but they have nothing to do with making docs visible :). See: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Mon, Apr 7, 2014 at 8:43 PM, Jamie Johnson jej2...@gmail.com wrote: Below is the log showing what I believe to be the commit 07-Apr-2014 23:40:55.846 INFO [catalina-exec-5] org.apache.solr.update.processor.LogUpdateProcessor.finish [forums] webapp=/solr path=/update/extract params={uprefix=attr_literal.source_id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.content_group=File literal.id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.forum_id=3literal.content_type=application/octet-streamwt=javabinliteral.uploaded_by=+version=2literal.content_type=application/octet-streamliteral.file_name=exclusions} {add=[e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcce (1464785652471037952)]} 0 563 07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1] org.apache.solr.update.DirectUpdateHandler2.commit start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: commit: start 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: commit: enter lock 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: commit: now prepare 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: prepareCommit: flush 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: index before flush _y(4.6):C1 _10(4.6):C1 _11(4.6):C1 _12(4.6):C1 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DW][commitScheduler-10-thread-1]: commitScheduler-10-thread-1 startFullFlush 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DW][commitScheduler-10-thread-1]: anyChanges? numDocsInRam=1 deletes=true hasTickets:false pendingChangesInFullFlush: false 07-Apr-2014 23:41:10.850 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWFC][commitScheduler-10-thread-1]: addFlushableState DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_14, aborting=false, numDocsInRAM=1, deleteQueue=DWDQ: [ generation: 2 ]] 07-Apr-2014 23:41:10.852 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flush postings as segment _14 numDocs=1 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: new segment has 0 deleted docs 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: new segment has no vectors; norms; no docValues; prox; freqs 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flushedFiles=[_14.nvd, _14_Lucene41_0.pos, _14_Lucene41_0.tip, _14_Lucene41_0.tim, _14.nvm, _14.fdx, _14_Lucene41_0.doc, _14.fnm, _14.fdt] 07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flushed codec=Lucene46 07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flushed: segment=_14 ramUsed=0.122 MB newFlushedSize(includes docstores)=0.003 MB docs/MB=322.937 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DW][commitScheduler-10-thread-1]: publishFlushedSegment seg-private updates=null 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: publishFlushedSegment 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [BD][commitScheduler-10-thread-1]: push deletes 1 deleted terms (unique
Re: Commit Within and /update/extract handler
You say you see the commit happen in the log, is openSearcher specified? This sounds like you're somehow getting a commit with openSearcher=false... Best, Erick On Sun, Apr 6, 2014 at 5:37 PM, Jamie Johnson jej2...@gmail.com wrote: I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to work when I am using the /update/extract request handler. It looks like a commit is happening from the logs, but the documents don't become available for search until I do a commit manually. Could this be some type of configuration issue?
Re: Commit Within and /update/extract handler
What does the call look like? Are you setting opening a new searcher or not? That should be in the log line where the commit is recorded... FWIW, Erick On Sun, Apr 6, 2014 at 5:37 PM, Jamie Johnson jej2...@gmail.com wrote: I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to work when I am using the /update/extract request handler. It looks like a commit is happening from the logs, but the documents don't become available for search until I do a commit manually. Could this be some type of configuration issue?
Re: Commit Within and /update/extract handler
Below is the log showing what I believe to be the commit 07-Apr-2014 23:40:55.846 INFO [catalina-exec-5] org.apache.solr.update.processor.LogUpdateProcessor.finish [forums] webapp=/solr path=/update/extract params={uprefix=attr_literal.source_id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.content_group=File literal.id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.forum_id=3literal.content_type=application/octet-streamwt=javabinliteral.uploaded_by=+version=2literal.content_type=application/octet-streamliteral.file_name=exclusions} {add=[e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcce (1464785652471037952)]} 0 563 07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1] org.apache.solr.update.DirectUpdateHandler2.commit start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: commit: start 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: commit: enter lock 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: commit: now prepare 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: prepareCommit: flush 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: index before flush _y(4.6):C1 _10(4.6):C1 _11(4.6):C1 _12(4.6):C1 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DW][commitScheduler-10-thread-1]: commitScheduler-10-thread-1 startFullFlush 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DW][commitScheduler-10-thread-1]: anyChanges? numDocsInRam=1 deletes=true hasTickets:false pendingChangesInFullFlush: false 07-Apr-2014 23:41:10.850 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWFC][commitScheduler-10-thread-1]: addFlushableState DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_14, aborting=false, numDocsInRAM=1, deleteQueue=DWDQ: [ generation: 2 ]] 07-Apr-2014 23:41:10.852 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flush postings as segment _14 numDocs=1 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: new segment has 0 deleted docs 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: new segment has no vectors; norms; no docValues; prox; freqs 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flushedFiles=[_14.nvd, _14_Lucene41_0.pos, _14_Lucene41_0.tip, _14_Lucene41_0.tim, _14.nvm, _14.fdx, _14_Lucene41_0.doc, _14.fnm, _14.fdt] 07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flushed codec=Lucene46 07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DWPT][commitScheduler-10-thread-1]: flushed: segment=_14 ramUsed=0.122 MB newFlushedSize(includes docstores)=0.003 MB docs/MB=322.937 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [DW][commitScheduler-10-thread-1]: publishFlushedSegment seg-private updates=null 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: publishFlushedSegment 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [BD][commitScheduler-10-thread-1]: push deletes 1 deleted terms (unique count=1) bytesUsed=1024 delGen=4 packetCount=1 totBytesUsed=1024 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IW][commitScheduler-10-thread-1]: publish sets newSegment delGen=5 seg=_14(4.6):C1 07-Apr-2014 23:41:10.908 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IFD][commitScheduler-10-thread-1]: now checkpoint _y(4.6):C1 _10(4.6):C1 _11(4.6):C1 _12(4.6):C1 _14(4.6):C1 [5 segments ; isCommit = false] 07-Apr-2014 23:41:10.908 INFO [commitScheduler-10-thread-1] org.apache.solr.update.LoggingInfoStream.message [IFD][commitScheduler-10-thread-1]: 0 msec to checkpoint 07-Apr-2014 23:41:10.908 INFO [commitScheduler-10-thread-1
Commit Within and /update/extract handler
I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to work when I am using the /update/extract request handler. It looks like a commit is happening from the logs, but the documents don't become available for search until I do a commit manually. Could this be some type of configuration issue?
Re: Send many files to update/extract
HttpSolrServer allows to send multiple documents at once. But they need to be extracted/converted on the client. However, if you know you will be sending a lot of documents to Solr, you are better off to run Tika locally on the client (or as a standalone network server). A lot more performant. I am not sure if ExtractingRequestHandler takes multipart MIME format, but that would be the thing to check if you still want to process on the server. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Mar 18, 2014 at 12:55 PM, Александр Вандышев a-wonde...@rambler.ru wrote: Who knows how to index a lot of files with ExtractingRequestHandler using a single query?
Send many files to update/extract
Who knows how to index a lot of files with ExtractingRequestHandler using a single query?
Curl : shell script : The requested resource is not available. update/extract !
Hi all, Following throw The request resource is not available curl http://localhost:8080/solr/#/dev/update/extract?stream.file=/home/priti/$fileliteral.id=document$icommit=true I don't understand what is literal.id ?? Is it mandatory. [Please share reading links if known] /headbodyh1HTTP Status 404 - /solr/#/dev/update/extract/h1HR size=1 noshade=noshadepbtype/b Status report/ppb*message*/b u/solr/#dev/update/extract/u/ppbdescription/b uThe requested resource is not available./u/pHR size=1 noshade=noshadeh3Apache Tomcat/7.0.42/h3/bod Whats wrong? Regards, Priti
Re: Curl : shell script : The requested resource is not available. update/extract !
literal.id should contain a unique identifier for each document (assuming that the unique identifier field in your solr schema is called id); see http://wiki.apache.org/solr/ExtractingRequestHandler . I'm guessing that the url for the ExtractinRequestHandler is incorrect, or maybe you haven't even configured/enabled the ExtractingRequestHandler in solr-config.xml. Further, from the url you show, I'm guessing that $file and $i are references to shell variables that have been incorrectly quoted (for example, by enclosing a constructed url in single quotes instead of double quotes, if you're on a Unixoid platform.) On Mon, Mar 10, 2014 at 2:51 PM, Priti Solanki pritiatw...@gmail.comwrote: Hi all, Following throw The request resource is not available curl http://localhost:8080/solr/#/dev/update/extract?stream.file=/home/priti/$fileliteral.id=document$icommit=true I don't understand what is literal.id ?? Is it mandatory. [Please share reading links if known] /headbodyh1HTTP Status 404 - /solr/#/dev/update/extract/h1HR size=1 noshade=noshadepbtype/b Status report/ppb*message*/b u/solr/#dev/update/extract/u/ppbdescription/b uThe requested resource is not available./u/pHR size=1 noshade=noshadeh3Apache Tomcat/7.0.42/h3/bod Whats wrong? Regards, Priti
Re: Curl : shell script : The requested resource is not available. update/extract !
The # character introduces the fragment portion of a URL, so /dev/update/extract is not a part of the path of the URL. In this case the URL path is /solr/ and the server is simply complaining that there is no code registered to process that path. Normally, the collection name (core name) follows /solr/. -- Jack Krupansky -Original Message- From: Priti Solanki Sent: Monday, March 10, 2014 9:51 AM To: solr-user@lucene.apache.org Subject: Curl : shell script : The requested resource is not available. update/extract ! Hi all, Following throw The request resource is not available curl http://localhost:8080/solr/#/dev/update/extract?stream.file=/home/priti/$fileliteral.id=document$icommit=true I don't understand what is literal.id ?? Is it mandatory. [Please share reading links if known] /headbodyh1HTTP Status 404 - /solr/#/dev/update/extract/h1HR size=1 noshade=noshadepbtype/b Status report/ppb*message*/b u/solr/#dev/update/extract/u/ppbdescription/b uThe requested resource is not available./u/pHR size=1 noshade=noshadeh3Apache Tomcat/7.0.42/h3/bod Whats wrong? Regards, Priti
Re: requested url solr/update/extract not available on this server
Rest of the queries work and i have added the following in solrconfig.xml: requestHandler name=/update/extract class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=map.Last-Modifiedlast_modified/str str name=fmap.contentcontents/str str name=lowernamestrue/str str name=uprefixignored_/str /lst /requestHandler On Sun, Sep 22, 2013 at 8:53 PM, Erick Erickson [via Lucene] ml-node+s472066n4091440...@n3.nabble.com wrote: Please review: http://wiki.apache.org/solr/UsingMailingLists Erick On Sun, Sep 22, 2013 at 5:52 AM, Nutan [hidden email]http://user/SendEmail.jtp?type=nodenode=4091440i=0 wrote: I did define the request handler. On Sun, Sep 22, 2013 at 12:51 AM, Erick Erickson [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4091440i=1 wrote: bq: And im not using the example config file It looks like you have not included the request handler in your solrconfig.xml, something like (from the stock distro): !-- Solr Cell Update Request Handler http://wiki.apache.org/solr/ExtractingRequestHandler -- requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=uprefixignored_/str !-- capture link hrefs but ignore div attributes -- str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler I'd start with the stock config and try removing things one-by-one... Best, Erick On Sat, Sep 21, 2013 at 7:34 AM, Nutan [hidden email] http://user/SendEmail.jtp?type=nodenode=4091391i=0 wrote: Yes I do get the solr admin page.And im not using the example config file,I have create mine own for my project as required.I have also defined update/extract in solrconfig.xml. On Tue, Sep 17, 2013 at 4:45 AM, Chris Hostetter-3 [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4091391i=1 wrote: : Is /solr/update working? more importantly: does /solr/ work in your browser and return anything useful? (nothing you've told us yet gives us anyway of knowning if solr is even up and running) if 'http://localhost:8080/solr/' shows you the solr admin UI, and you are using the stock Solr 4.2 example configs, then http://localhost:8080/solr/update/extract should not give you a 404 error. if however you are using some other configs, it might not work unless those configs register a handler with the path /update/extract. Using the jetty setup provided with 4.2, and the example configs (from 4.2) I was able to index a sample PDF just fine using your curl command... hossman@frisbee:~/tmp$ curl http://localhost:8983/solr/update/extract?literal.id=1commit=true; -F myfile=@stump.winners.san.diego.2013.pdf ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1839/int/lst /response : : Check solrconfig to see that /update/extract is configured as in the standard : Solr example. : : Does /solr/update/extract work for you using the standard Solr example? : : -- Jack Krupansky : : -Original Message- From: Nutan : Sent: Sunday, September 15, 2013 2:37 AM : To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4090459i=0 : Subject: requested url solr/update/extract not available on this server : : I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this : error:requested url solr/update/extract not available on this server : When my curl is : : curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F : myfile=@cookbook.pdf : There is no entry in log files. Please help. : : : : -- : View this message in context: : http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4090459.html To unsubscribe from requested url solr/update/extract not available on this server, click here . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails
Re: requested url solr/update/extract not available on this server
I did define the request handler. On Sun, Sep 22, 2013 at 12:51 AM, Erick Erickson [via Lucene] ml-node+s472066n4091391...@n3.nabble.com wrote: bq: And im not using the example config file It looks like you have not included the request handler in your solrconfig.xml, something like (from the stock distro): !-- Solr Cell Update Request Handler http://wiki.apache.org/solr/ExtractingRequestHandler -- requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=uprefixignored_/str !-- capture link hrefs but ignore div attributes -- str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler I'd start with the stock config and try removing things one-by-one... Best, Erick On Sat, Sep 21, 2013 at 7:34 AM, Nutan [hidden email]http://user/SendEmail.jtp?type=nodenode=4091391i=0 wrote: Yes I do get the solr admin page.And im not using the example config file,I have create mine own for my project as required.I have also defined update/extract in solrconfig.xml. On Tue, Sep 17, 2013 at 4:45 AM, Chris Hostetter-3 [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4091391i=1 wrote: : Is /solr/update working? more importantly: does /solr/ work in your browser and return anything useful? (nothing you've told us yet gives us anyway of knowning if solr is even up and running) if 'http://localhost:8080/solr/' shows you the solr admin UI, and you are using the stock Solr 4.2 example configs, then http://localhost:8080/solr/update/extract should not give you a 404 error. if however you are using some other configs, it might not work unless those configs register a handler with the path /update/extract. Using the jetty setup provided with 4.2, and the example configs (from 4.2) I was able to index a sample PDF just fine using your curl command... hossman@frisbee:~/tmp$ curl http://localhost:8983/solr/update/extract?literal.id=1commit=true; -F myfile=@stump.winners.san.diego.2013.pdf ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1839/int/lst /response : : Check solrconfig to see that /update/extract is configured as in the standard : Solr example. : : Does /solr/update/extract work for you using the standard Solr example? : : -- Jack Krupansky : : -Original Message- From: Nutan : Sent: Sunday, September 15, 2013 2:37 AM : To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4090459i=0 : Subject: requested url solr/update/extract not available on this server : : I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this : error:requested url solr/update/extract not available on this server : When my curl is : : curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F : myfile=@cookbook.pdf : There is no entry in log files. Please help. : : : : -- : View this message in context: : http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4090459.html To unsubscribe from requested url solr/update/extract not available on this server, click here . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091371.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091391.html To unsubscribe from requested url solr/update/extract not available on this server, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4090153code=bnV0YW5zaGluZGUxOTkyQGdtYWlsLmNvbXw0MDkwMTUzfC0xMzEzOTU5Mzcx . NAMLhttp
Re: requested url solr/update/extract not available on this server
Please review: http://wiki.apache.org/solr/UsingMailingLists Erick On Sun, Sep 22, 2013 at 5:52 AM, Nutan nutanshinde1...@gmail.com wrote: I did define the request handler. On Sun, Sep 22, 2013 at 12:51 AM, Erick Erickson [via Lucene] ml-node+s472066n4091391...@n3.nabble.com wrote: bq: And im not using the example config file It looks like you have not included the request handler in your solrconfig.xml, something like (from the stock distro): !-- Solr Cell Update Request Handler http://wiki.apache.org/solr/ExtractingRequestHandler -- requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=uprefixignored_/str !-- capture link hrefs but ignore div attributes -- str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler I'd start with the stock config and try removing things one-by-one... Best, Erick On Sat, Sep 21, 2013 at 7:34 AM, Nutan [hidden email]http://user/SendEmail.jtp?type=nodenode=4091391i=0 wrote: Yes I do get the solr admin page.And im not using the example config file,I have create mine own for my project as required.I have also defined update/extract in solrconfig.xml. On Tue, Sep 17, 2013 at 4:45 AM, Chris Hostetter-3 [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4091391i=1 wrote: : Is /solr/update working? more importantly: does /solr/ work in your browser and return anything useful? (nothing you've told us yet gives us anyway of knowning if solr is even up and running) if 'http://localhost:8080/solr/' shows you the solr admin UI, and you are using the stock Solr 4.2 example configs, then http://localhost:8080/solr/update/extract should not give you a 404 error. if however you are using some other configs, it might not work unless those configs register a handler with the path /update/extract. Using the jetty setup provided with 4.2, and the example configs (from 4.2) I was able to index a sample PDF just fine using your curl command... hossman@frisbee:~/tmp$ curl http://localhost:8983/solr/update/extract?literal.id=1commit=true; -F myfile=@stump.winners.san.diego.2013.pdf ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1839/int/lst /response : : Check solrconfig to see that /update/extract is configured as in the standard : Solr example. : : Does /solr/update/extract work for you using the standard Solr example? : : -- Jack Krupansky : : -Original Message- From: Nutan : Sent: Sunday, September 15, 2013 2:37 AM : To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4090459i=0 : Subject: requested url solr/update/extract not available on this server : : I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this : error:requested url solr/update/extract not available on this server : When my curl is : : curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F : myfile=@cookbook.pdf : There is no entry in log files. Please help. : : : : -- : View this message in context: : http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4090459.html To unsubscribe from requested url solr/update/extract not available on this server, click here . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091371.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091391.html To unsubscribe from requested url solr/update/extract not available on this server, click herehttp://lucene.472066.n3.nabble.com
Re: requested url solr/update/extract not available on this server
Yes I do get the solr admin page.And im not using the example config file,I have create mine own for my project as required.I have also defined update/extract in solrconfig.xml. On Tue, Sep 17, 2013 at 4:45 AM, Chris Hostetter-3 [via Lucene] ml-node+s472066n409045...@n3.nabble.com wrote: : Is /solr/update working? more importantly: does /solr/ work in your browser and return anything useful? (nothing you've told us yet gives us anyway of knowning if solr is even up and running) if 'http://localhost:8080/solr/' shows you the solr admin UI, and you are using the stock Solr 4.2 example configs, then http://localhost:8080/solr/update/extract should not give you a 404 error. if however you are using some other configs, it might not work unless those configs register a handler with the path /update/extract. Using the jetty setup provided with 4.2, and the example configs (from 4.2) I was able to index a sample PDF just fine using your curl command... hossman@frisbee:~/tmp$ curl http://localhost:8983/solr/update/extract?literal.id=1commit=true; -F myfile=@stump.winners.san.diego.2013.pdf ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1839/int/lst /response : : Check solrconfig to see that /update/extract is configured as in the standard : Solr example. : : Does /solr/update/extract work for you using the standard Solr example? : : -- Jack Krupansky : : -Original Message- From: Nutan : Sent: Sunday, September 15, 2013 2:37 AM : To: [hidden email]http://user/SendEmail.jtp?type=nodenode=4090459i=0 : Subject: requested url solr/update/extract not available on this server : : I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this : error:requested url solr/update/extract not available on this server : When my curl is : : curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F : myfile=@cookbook.pdf : There is no entry in log files. Please help. : : : : -- : View this message in context: : http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4090459.html To unsubscribe from requested url solr/update/extract not available on this server, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4090153code=bnV0YW5zaGluZGUxOTkyQGdtYWlsLmNvbXw0MDkwMTUzfC0xMzEzOTU5Mzcx . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091371.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: requested url solr/update/extract not available on this server
bq: And im not using the example config file It looks like you have not included the request handler in your solrconfig.xml, something like (from the stock distro): !-- Solr Cell Update Request Handler http://wiki.apache.org/solr/ExtractingRequestHandler -- requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=uprefixignored_/str !-- capture link hrefs but ignore div attributes -- str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler I'd start with the stock config and try removing things one-by-one... Best, Erick On Sat, Sep 21, 2013 at 7:34 AM, Nutan nutanshinde1...@gmail.com wrote: Yes I do get the solr admin page.And im not using the example config file,I have create mine own for my project as required.I have also defined update/extract in solrconfig.xml. On Tue, Sep 17, 2013 at 4:45 AM, Chris Hostetter-3 [via Lucene] ml-node+s472066n409045...@n3.nabble.com wrote: : Is /solr/update working? more importantly: does /solr/ work in your browser and return anything useful? (nothing you've told us yet gives us anyway of knowning if solr is even up and running) if 'http://localhost:8080/solr/' shows you the solr admin UI, and you are using the stock Solr 4.2 example configs, then http://localhost:8080/solr/update/extract should not give you a 404 error. if however you are using some other configs, it might not work unless those configs register a handler with the path /update/extract. Using the jetty setup provided with 4.2, and the example configs (from 4.2) I was able to index a sample PDF just fine using your curl command... hossman@frisbee:~/tmp$ curl http://localhost:8983/solr/update/extract?literal.id=1commit=true; -F myfile=@stump.winners.san.diego.2013.pdf ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1839/int/lst /response : : Check solrconfig to see that /update/extract is configured as in the standard : Solr example. : : Does /solr/update/extract work for you using the standard Solr example? : : -- Jack Krupansky : : -Original Message- From: Nutan : Sent: Sunday, September 15, 2013 2:37 AM : To: [hidden email]http://user/SendEmail.jtp?type=nodenode=4090459i=0 : Subject: requested url solr/update/extract not available on this server : : I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this : error:requested url solr/update/extract not available on this server : When my curl is : : curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F : myfile=@cookbook.pdf : There is no entry in log files. Please help. : : : : -- : View this message in context: : http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4090459.html To unsubscribe from requested url solr/update/extract not available on this server, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4090153code=bnV0YW5zaGluZGUxOTkyQGdtYWlsLmNvbXw0MDkwMTUzfC0xMzEzOTU5Mzcx . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091371.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: requested url solr/update/extract not available on this server
: Is /solr/update working? more importantly: does /solr/ work in your browser and return anything useful? (nothing you've told us yet gives us anyway of knowning if solr is even up and running) if 'http://localhost:8080/solr/' shows you the solr admin UI, and you are using the stock Solr 4.2 example configs, then http://localhost:8080/solr/update/extract should not give you a 404 error. if however you are using some other configs, it might not work unless those configs register a handler with the path /update/extract. Using the jetty setup provided with 4.2, and the example configs (from 4.2) I was able to index a sample PDF just fine using your curl command... hossman@frisbee:~/tmp$ curl http://localhost:8983/solr/update/extract?literal.id=1commit=true; -F myfile=@stump.winners.san.diego.2013.pdf ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1839/int/lst /response : : Check solrconfig to see that /update/extract is configured as in the standard : Solr example. : : Does /solr/update/extract work for you using the standard Solr example? : : -- Jack Krupansky : : -Original Message- From: Nutan : Sent: Sunday, September 15, 2013 2:37 AM : To: solr-user@lucene.apache.org : Subject: requested url solr/update/extract not available on this server : : I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this : error:requested url solr/update/extract not available on this server : When my curl is : : curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F : myfile=@cookbook.pdf : There is no entry in log files. Please help. : : : : -- : View this message in context: : http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss
requested url solr/update/extract not available on this server
I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this error:requested url solr/update/extract not available on this server When my curl is : curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F myfile=@cookbook.pdf There is no entry in log files. Please help. -- View this message in context: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: requested url solr/update/extract not available on this server
Is /solr/update working? Check solrconfig to see that /update/extract is configured as in the standard Solr example. Does /solr/update/extract work for you using the standard Solr example? -- Jack Krupansky -Original Message- From: Nutan Sent: Sunday, September 15, 2013 2:37 AM To: solr-user@lucene.apache.org Subject: requested url solr/update/extract not available on this server I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this error:requested url solr/update/extract not available on this server When my curl is : curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F myfile=@cookbook.pdf There is no entry in log files. Please help. -- View this message in context: http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.2 update/extract adding unknown field, can we change field type from string to text
hi, while indexing document with unknown fields, its adding unknown fields in schema but its always guessing it as string type. is it possible to specify default field type for unknown fields to some other type, like text so that it gets tokenized? also can we specify other properties by default like indexed/stored/multivalued? PS am using solr4.2. Thanks alot. Jai
Re: Solr 4.2 update/extract adding unknown field, can we change field type from string to text
You can use the dynamic fields feature of Solr to map unknown field names to types. For example, a dynamic field named as *_s i.e. any field name ending with _s can be mapped to string and so on. In your cases, if your field names do not follow a set pattern, then you can even specify a dynamic field as * and map it to text type. See https://cwiki.apache.org/confluence/display/solr/Dynamic+Fields On Tue, Sep 3, 2013 at 12:00 PM, Jai jai4l...@gmail.com wrote: hi, while indexing document with unknown fields, its adding unknown fields in schema but its always guessing it as string type. is it possible to specify default field type for unknown fields to some other type, like text so that it gets tokenized? also can we specify other properties by default like indexed/stored/multivalued? PS am using solr4.2. Thanks alot. Jai -- Regards, Shalin Shekhar Mangar.
Re: Solr 4.2 update/extract adding unknown field, can we change field type from string to text
Your email is vague in terms of what you are actually *doing* and what behavior you are seeing. Providing specific details like This is my schema.xml and this is my solrconfig.xml; when i POST this file to this URL i get this result and i would instead like to get this result is useful for other people to provide you with meaningful help... https://wiki.apache.org/solr/UsingMailingLists My best guess is that you are refering specifically to the behavior of ExtractingRequestHandler and the fields it tries to include in documents that are exstracted, and how those fileds are indexed -- in which case you can use the uprefix option to add a prefix to the name of all fields generated by Tika that aren't already in your schema, and you can then define a dynamicField matching hat prefix to ontrol every aspect of the resulting fields... https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika#UploadingDatawithSolrCellusingApacheTika-InputParameters -Hoss
/update/extract error
Hi all, im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper. All its runing ok, documents are indexing in 2 diferent shards and select *:* give me all documents. Now im trying to add/index a new document via solj ussing CloudSolrServer. the code: CloudSolrServer server = new CloudSolrServer(localhost:2181); server.setDefaultCollection(tika); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(C:\\sample.pdf), application/octet-stream); up.setParam(literal.id, 666); server.request(up); server.commit(); when up.setParam(literal.id, 666);, a exception is thown: *apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR: [doc=66 6] unknown field 'ignored_dcterms:modified'* at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ er.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ er.java:180) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j ava:401) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j ava:375) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:43 9) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:918) at java.lang.Thread.run(Thread.java:662) My schema looks like this: fields field name=id type=integer indexed=true stored=true required=true/ field name=title type=string indexed=true stored=true/ field name=author type=string indexed=true stored=true / field name=text type=text_ind indexed=true stored=true / field name=_version_ type=long indexed=true stored=true/ /fields my solrConfig.xml: requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=fmap.Last-Modifiedlast_modified/str str name=uprefixignored_/str /lst lst name=date.formats str-MM-dd/str /lst /requestHandler i have already activate /admin/luke check the schema, no dcterms:modified field in the response only the corrects fields declared in schema.xml Can someone help me with this issue? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/update-extract-error-tp4079555.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: /update/extract error
You need a dynamic field pattern for ignored_* to ignore unmapped metadata. -- Jack Krupansky -Original Message- From: franagan Sent: Monday, July 22, 2013 5:14 PM To: solr-user@lucene.apache.org Subject: /update/extract error Hi all, im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper. All its runing ok, documents are indexing in 2 diferent shards and select *:* give me all documents. Now im trying to add/index a new document via solj ussing CloudSolrServer. the code: CloudSolrServer server = new CloudSolrServer(localhost:2181); server.setDefaultCollection(tika); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(C:\\sample.pdf), application/octet-stream); up.setParam(literal.id, 666); server.request(up); server.commit(); when up.setParam(literal.id, 666);, a exception is thown: *apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR: [doc=66 6] unknown field 'ignored_dcterms:modified'* at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ er.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ er.java:180) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j ava:401) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j ava:375) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:43 9) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:918) at java.lang.Thread.run(Thread.java:662) My schema looks like this: fields field name=id type=integer indexed=true stored=true required=true/ field name=title type=string indexed=true stored=true/ field name=author type=string indexed=true stored=true / field name=text type=text_ind indexed=true stored=true / field name=_version_ type=long indexed=true stored=true/ /fields my solrConfig.xml: requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=fmap.Last-Modifiedlast_modified/str str name=uprefixignored_/str /lst lst name=date.formats str-MM-dd/str /lst /requestHandler i have already activate /admin/luke check the schema, no dcterms:modified field in the response only the corrects fields declared in schema.xml Can someone help me with this issue? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/update-extract-error-tp4079555.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: /update/extract error
I added dynamicField name=ignored_* type=string indexed=true stored=true/ to the schema.xml and now its working. * Thank you very much Jack. * -- View this message in context: http://lucene.472066.n3.nabble.com/update-extract-error-in-Solr-4-3-1-tp4079555p4079564.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: /update/extract
The Extract Request Handler invokes the classes from the extraction package. https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/extraction/src/main/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java This is package into the apache-solr-cell jar. Regards, Jayendra* * On Thu, Aug 19, 2010 at 10:04 AM, satya swaroop sswaro...@gmail.com wrote: Hi all, when we handle extract request handler what class gets invoked.. I need to know the navigation of classes when we send any files to solr. can anybody tell me the classes or any sources where i can get the answer.. or can anyone tell me what classes get invoked when we start the solr... I be thankful if anybody can help me with regarding this.. Regards, satya
/update/extract
Hi all, when we handle extract request handler what class gets invoked.. I need to know the navigation of classes when we send any files to solr. can anybody tell me the classes or any sources where i can get the answer.. or can anyone tell me what classes get invoked when we start the solr... I be thankful if anybody can help me with regarding this.. Regards, satya
Re: Getting update/extract RequestHandler to work under Tomcat
Try making it a non-Lazy loaded handler. Does that help? On Nov 2, 2009, at 4:37 PM, Glock, Thomas wrote: Hoping someone might help with getting /update/extract RequestHandler to work under Tomcat. Error 500 happens when trying to access http://localhost:8080/apache-solr-1.4-dev/update/extract/ (see below) Note /update/extract DOES work correctly under the Jetty provided example. I think I must have a directory path incorrectly specified but not sure where. No errors in the Catalina log on startup - only this: Nov 2, 2009 7:10:49 PM org.apache.solr.core.RequestHandlers initHandlersFromConfig INFO: created /update/extract: org.apache.solr.handler.extraction.ExtractingRequestHandler Solrconfig.xml under tomcat is slightly changed from the example with regards to lib elements: lib dir=../contrib/extraction/lib / lib dir=../dist/ regex=apache-solr-cell-\d.*\.jar / lib dir=../dist/ regex=apache-solr-clustering-\d.*\.jar /: The \contrib and \dist directories were copied directly below the webapps\apache-solr-1.4-dev unchanged from the example. Im the catalina log I see all the Adding specified lib dirs... added without error: INFO: Adding specified lib dirs to ClassLoader Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat %206.0/we bapps/apache-solr-1.4-dev/contrib/extraction/lib/asm-3.1.jar' to classloader Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat %206.0/we bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcmail-jdk14-136.jar' to classloader Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat %206.0/we bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcprov-jdk14-136.jar' to classloader (...many more...) Solr Home is mapped to: INFO: SolrDispatchFilter.init() Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: .\webapps\apache-solr-1.4-dev\solr Nov 2, 2009 7:10:47 PM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: C:\Program Files\Apache Software Foundation\Tomcat 6.0\.\webapps\apache-solr-1.4-dev\solr\solr.xml Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '.\webapps\apache-solr-1.4-dev\solr\' 500 Error: HTTP Status 500 - lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers $LazyRequestHandlerWrapper.getWrappe dHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers $LazyRequestHandlerWrapper.handleReq uest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute (SolrDispatchFilter.ja va:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter (SolrDispatchFilter.j ava:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter (Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter (ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke (StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke (StandardContextValv e.java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke (Authenticator Base.java:433) at org.apache.catalina.core.StandardHostValve.invoke (StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke (ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke (StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service (CoyoteAdapter.java:2 93) at org.apache.coyote.http11.Http11AprProcessor.process (Http11AprProcessor.j ava:859) at org.apache.coyote.http11.Http11AprProtocol $Http11ConnectionHandler.proce ss(Http11AprProtocol.java:574) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java: 1527) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.extraction.ExtractingRequestHandler' at org.apache.solr.core.SolrResourceLoader.findClass (SolrResourceLoader.jav a:373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java: 449) at org.apache.solr.core.RequestHandlers $LazyRequestHandlerWrapper.getWrappe dHandler(RequestHandlers.java:240) ... 17 more Caused by: java.lang.ClassNotFoundException: org.apache.solr.handler.extraction.ExtractingRequestHandler at java.net.URLClassLoader$1.run(Unknown
RE: Getting update/extract RequestHandler to work under Tomcat
Thanks - Looked at it last night and I think the problem is that I need to compile the ExtractingRequestHandler classes/jar. I see the source - but no classes or jar that seems to fit the bill. I've had problems getting ant to build from the nightly trunk. I'm of the opinion I simply need to get the latest source and perform an ant build. But this is the first I've worked with ant and so I'm sure I don't have things set up correctly. If there is an existing jar of the ExtractingRequestHandler classes that I might download - please point me to it. I'll look at this today - thanks again - much appreciated. -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, November 03, 2009 8:12 AM To: solr-user@lucene.apache.org Subject: Re: Getting update/extract RequestHandler to work under Tomcat Try making it a non-Lazy loaded handler. Does that help? On Nov 2, 2009, at 4:37 PM, Glock, Thomas wrote: Hoping someone might help with getting /update/extract RequestHandler to work under Tomcat. Error 500 happens when trying to access http://localhost:8080/apache-solr-1.4-dev/update/extract/ (see below) Note /update/extract DOES work correctly under the Jetty provided example. I think I must have a directory path incorrectly specified but not sure where. No errors in the Catalina log on startup - only this: Nov 2, 2009 7:10:49 PM org.apache.solr.core.RequestHandlers initHandlersFromConfig INFO: created /update/extract: org.apache.solr.handler.extraction.ExtractingRequestHandler Solrconfig.xml under tomcat is slightly changed from the example with regards to lib elements: lib dir=../contrib/extraction/lib / lib dir=../dist/ regex=apache-solr-cell-\d.*\.jar / lib dir=../dist/ regex=apache-solr-clustering-\d.*\.jar /: The \contrib and \dist directories were copied directly below the webapps\apache-solr-1.4-dev unchanged from the example. Im the catalina log I see all the Adding specified lib dirs... added without error: INFO: Adding specified lib dirs to ClassLoader Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat %206.0/we bapps/apache-solr-1.4-dev/contrib/extraction/lib/asm-3.1.jar' to classloader Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat %206.0/we bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcmail-jdk14-136.jar' to classloader Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat %206.0/we bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcprov-jdk14-136.jar' to classloader (...many more...) Solr Home is mapped to: INFO: SolrDispatchFilter.init() Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: .\webapps\apache-solr-1.4-dev\solr Nov 2, 2009 7:10:47 PM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: C:\Program Files\Apache Software Foundation\Tomcat 6.0\.\webapps\apache-solr-1.4-dev\solr\solr.xml Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '.\webapps\apache-solr-1.4-dev\solr\' 500 Error: HTTP Status 500 - lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers $LazyRequestHandlerWrapper.getWrappe dHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers $LazyRequestHandlerWrapper.handleReq uest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute (SolrDispatchFilter.ja va:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter (SolrDispatchFilter.j ava:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter (Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter (ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke (StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke (StandardContextValv e.java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke (Authenticator Base.java:433) at org.apache.catalina.core.StandardHostValve.invoke (StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke (ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke (StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service (CoyoteAdapter.java:2 93) at org.apache.coyote.http11.Http11AprProcessor.process
RE: Getting update/extract RequestHandler to work under Tomcat
: I see the source - but no classes or jar that seems to fit the bill. : : I've had problems getting ant to build from the nightly trunk. I'm of ... : If there is an existing jar of the ExtractingRequestHandler classes that : I might download - please point me to it. If you are downloading a nightly (or a 1.4 release candidate) there is *nothing* you should need to build ... all of the compiled jars (including for all of hte contribs) can be found in the ./dist directory. (the only jars not included in the releases are the third-party clustering libraries not released under ASL compatible licenses, but those aren't neeed for extraction) -Hoss
Re: Getting update/extract RequestHandler to work under Tomcat
: The \contrib and \dist directories were copied directly below the : webapps\apache-solr-1.4-dev unchanged from the example. ...that doesn't sound right, they shouldn't be copied into webapps at all. can you show a full directory structure... : Im the catalina log I see all the Adding specified lib dirs... added : without error: : : INFO: Adding specified lib dirs to ClassLoader ... : (...many more...) ...can you elaborate on many more ... specificly do you ever see it say it's loading anything from contrib/extraction or apache-solr-cell-1.4.jar ? -Hoss
RE: Getting update/extract RequestHandler to work under Tomcat
Follow-up - This is now working (sadly I'm not sure exactly why!) but I've successfully used curl (under windows) and the following examples to parse content curl http://localhost:8080/apache-solr-1.4-dev/update/extract?extractOnly=tru e --data-binary @curl-config.pdf -H Content-type:application/pdf curl http://localhost:8080/apache-solr-1.4-dev/update/extract?extractOnly=tru e --data-binary @curl-config.html -H Content-type:text/html curl http://localhost:8080/apache-solr-1.4-dev/update/extract?extractOnly=tru e --data-binary @c:/EnterpriseSearchSummit.ppt -H Content-type:application/vnd.ms-powerpoint The solr-cell jar is being loaded as well as other jars from the contrib and dist directories see list below. Regarding files being located in the webapps structure - I did that because I wanted to try and keep 1.3 running under the same instance of tomcat as 1.4 and thought there might be difficulties specificing Solr Home via the tomcat java configuration. I've since removed the 1.3 instance. (I removed the replaceClassLoader lines for readability) .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/asm- 3.1.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/bcma il-jdk14-136.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/bcpr ov-jdk14-136.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm ons-codec-1.3.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm ons-compress-1.0.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm ons-io-1.4.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm ons-lang-2.1.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm ons-logging-1.1.1.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/dom4 j-1.6.1.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/font box-0.1.0.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/gero nimo-stax-api_1.0_spec-1.0.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/icu4 j-3.8.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/jemp box-0.2.0.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/log4 j-1.2.14.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/neko html-1.9.9.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/ooxm l-schemas-1.0.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/pdfb ox-0.7.3.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/poi- 3.5-beta6.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/poi- ooxml-3.5-beta6.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/poi- scratchpad-3.5-beta6.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/tika -core-0.4.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/tika -parsers-0.4.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/xerc esImpl-2.8.1.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/xml- apis-1.0.b2.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/xmlb eans-2.3.0.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/dist/apache-solr-cell-1.4-d ev.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/dist/apache-solr-clustering -1.4-dev.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/carr ot2-mini-3.1.0.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/comm ons-lang-2.4.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/ehca che-1.6.2.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/goog le-collections-1.0-rc2.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/jack son-core-asl-0.9.9-6.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/jack son-mapper-asl-0.9.9-6.jar' to classloader .../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/log4 j-1.2.14.jar' to classloader Nov 3, 2009 3:05:03 PM org.apache.solr.core.SolrConfig init INFO: Loaded SolrConfig: solrconfig.xml Nov 3, 2009 3:05:03 PM org.apache.solr.core.SolrCore init INFO: Opening new SolrCore at .\webapps\apache-solr-1.4-dev\solr\, dataDir=.\webapps\apache-solr-1.4-dev\solr\data\ -Original Message- From: Chris
Getting update/extract RequestHandler to work under Tomcat
Hoping someone might help with getting /update/extract RequestHandler to work under Tomcat. Error 500 happens when trying to access http://localhost:8080/apache-solr-1.4-dev/update/extract/ (see below) Note /update/extract DOES work correctly under the Jetty provided example. I think I must have a directory path incorrectly specified but not sure where. No errors in the Catalina log on startup - only this: Nov 2, 2009 7:10:49 PM org.apache.solr.core.RequestHandlers initHandlersFromConfig INFO: created /update/extract: org.apache.solr.handler.extraction.ExtractingRequestHandler Solrconfig.xml under tomcat is slightly changed from the example with regards to lib elements: lib dir=../contrib/extraction/lib / lib dir=../dist/ regex=apache-solr-cell-\d.*\.jar / lib dir=../dist/ regex=apache-solr-clustering-\d.*\.jar /: The \contrib and \dist directories were copied directly below the webapps\apache-solr-1.4-dev unchanged from the example. Im the catalina log I see all the Adding specified lib dirs... added without error: INFO: Adding specified lib dirs to ClassLoader Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat%206.0/we bapps/apache-solr-1.4-dev/contrib/extraction/lib/asm-3.1.jar' to classloader Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat%206.0/we bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcmail-jdk14-136.jar' to classloader Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat%206.0/we bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcprov-jdk14-136.jar' to classloader (...many more...) Solr Home is mapped to: INFO: SolrDispatchFilter.init() Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: .\webapps\apache-solr-1.4-dev\solr Nov 2, 2009 7:10:47 PM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: C:\Program Files\Apache Software Foundation\Tomcat 6.0\.\webapps\apache-solr-1.4-dev\solr\solr.xml Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '.\webapps\apache-solr-1.4-dev\solr\' 500 Error: HTTP Status 500 - lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappe dHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleReq uest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authenticator Base.java:433) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 93) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.j ava:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.proce ss(Http11AprProtocol.java:574) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1527) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.extraction.ExtractingRequestHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.jav a:373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappe dHandler(RequestHandlers.java:240) ... 17 more Caused by: java.lang.ClassNotFoundException: org.apache.solr.handler.extraction.ExtractingRequestHandler at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source