Re: 404 Errors on update/extract

2021-02-05 Thread Alexandre Rafalovitch
Hi Leon,

Feel free to create JIRA issue
https://issues.apache.org/jira/secure/Dashboard.jspa
and then do Github pull request to fix the example name.  The
documentation is in asciidoc format at:
https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide/src
with names matching those on the server.

This could be a great issue to cut your teeth on with helping Solr :-)

Regards,
   Alex.

On Fri, 5 Feb 2021 at 10:35, nq  wrote:
>
> Hi Alex,
>
>
> Thanks a lot for your help!
>
> I have tested the same using the 'techproducts' example as proposed, and
> it worked fine.
>
>
> You are right, the documentation seems to be outdated in this aspect.
>
> I have just reviewed the solrconfig.xml of the 'schemaless' example and
> found all the Solr Cell config was completely missing.
>
> After adding it as described at
>
> https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-extractingrequesthandler-in-solrconfig-xml
>
> everything worked fine again.
>
>
> What can I do to help updating the docs?
>
>
> Best regards,
>
> Leon
>
>
> Am 05.02.21 um 16:15 schrieb Alexandre Rafalovitch:
> > I think the extract handler is not defined in schemaless. This may be
> > a change from before and the documentation is out of sync.
> >
> > Can you try 'techproducts' example instead of schemaless:
> > bin/solr stop (if you are still running it)
> > bin/solr start -e techproducts
> >
> > Then the import command.
> >
> > The Tika integration is defined in solrconfig.xml and needs both
> > handler defined and some libraries loaded. Once you confirmed you like
> > what you see, you can copy those into whatever configuration you are
> > working with.
> >
> > Regards,
> > Alex.
> >
> > On Fri, 5 Feb 2021 at 07:38, nq  wrote:
> >> Hi,
> >>
> >>
> >> I am new to Solr and tried to follow the guide to upload PDF data using
> >> Tika, on Solr 8.7.0 (running on Debian 10):
> >>
> >> https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html
> >>
> >> but I get an HTTP 404 error when trying to import the file.
> >>
> >>
> >> In the solr installation directory, after spinning up the example server
> >> using
> >>
> >> solr/bin/solr -e schemaless
> >>
> >> I firstly used the Post Tool to index a PDF file as described in the
> >> guide, giving the following output (paths truncated using “[…]” for
> >> privacy reasons):
> >>
> >> bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params
> >> "literal.id=doc1"
> >>
> >>> java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes
> >>> -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa
> >>> che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf
> >>> SimplePostTool version 5.0.0
> >>> Posting files to [base] url
> >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
> >>> Entering auto mode. File endings considered are
> >>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> >>> POSTing file solr-word.pdf (application/pdf) to [base]/extract
> >>> SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
> >>> url:
> >>> http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1
> >>> esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
> >>> SimplePostTool: WARNING: Response: 
> >>> 
> >>> 
> >>> Error 404 Not Found
> >>> 
> >>> HTTP ERROR 404 Not Found
> >>> 
> >>> URI:/solr/gettingstarted/update/extract
> >>> STATUS:404
> >>> MESSAGE:Not Found
> >>> SERVLET:default
> >>> 
> >>>
> >>> 
> >>> 
> >>> SimplePostTool: WARNING: IOException while reading response:
> >>> java.io.FileNotFoundException:
> >>> http://localhost:8983/solr/gettingstarted/update/extract
> >>> ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
> >>>
> >>> 1 files indexed.
> >>> COMMITting Solr index changes to
> >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
> >>> Time spent: 0:00:00.038
> >> resulting in no actual changes being visible in the Solr.
> >>
> >>
> >> Using curl results in the same HTTP response:
> >>
> >>> curl
> >>> 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true'
> >>> -F "myfile=@example
> >>> /exampledocs/solr-word.pdf"
> >>> 
> >>> 
> >>> 
> >>> Error 404 Not Found
> >>> 
> >>> HTTP ERROR 404 Not Found
> >>> 
> >>> URI:/solr/gettingstarted/update/extract
> >>> STATUS:404
> >>> MESSAGE:Not Found
> >>> SERVLET:default
> >>> 
> >>>
> >>> 
> >>> 
> >>>
> >> Sorry if this has already been discussed somewhere; I have not been able
> >> to find anything helpful yet.
> >>
> >> Thank you!
> >>
> >> Leon
> >>


Re: 404 Errors on update/extract

2021-02-05 Thread nq

Hi Alex,


Thanks a lot for your help!

I have tested the same using the 'techproducts' example as proposed, and 
it worked fine.



You are right, the documentation seems to be outdated in this aspect.

I have just reviewed the solrconfig.xml of the 'schemaless' example and 
found all the Solr Cell config was completely missing.


After adding it as described at

https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-extractingrequesthandler-in-solrconfig-xml

everything worked fine again.


What can I do to help updating the docs?


Best regards,

Leon


Am 05.02.21 um 16:15 schrieb Alexandre Rafalovitch:

I think the extract handler is not defined in schemaless. This may be
a change from before and the documentation is out of sync.

Can you try 'techproducts' example instead of schemaless:
bin/solr stop (if you are still running it)
bin/solr start -e techproducts

Then the import command.

The Tika integration is defined in solrconfig.xml and needs both
handler defined and some libraries loaded. Once you confirmed you like
what you see, you can copy those into whatever configuration you are
working with.

Regards,
Alex.

On Fri, 5 Feb 2021 at 07:38, nq  wrote:

Hi,


I am new to Solr and tried to follow the guide to upload PDF data using
Tika, on Solr 8.7.0 (running on Debian 10):

https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html

but I get an HTTP 404 error when trying to import the file.


In the solr installation directory, after spinning up the example server
using

solr/bin/solr -e schemaless

I firstly used the Post Tool to index a PDF file as described in the
guide, giving the following output (paths truncated using “[…]” for
privacy reasons):

bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params
"literal.id=doc1"


java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes
-Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa
che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf
SimplePostTool version 5.0.0
Posting files to [base] url
http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file solr-word.pdf (application/pdf) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
url:
http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1
esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
SimplePostTool: WARNING: Response: 


Error 404 Not Found

HTTP ERROR 404 Not Found

URI:/solr/gettingstarted/update/extract
STATUS:404
MESSAGE:Not Found
SERVLET:default




SimplePostTool: WARNING: IOException while reading response:
java.io.FileNotFoundException:
http://localhost:8983/solr/gettingstarted/update/extract
?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf

1 files indexed.
COMMITting Solr index changes to
http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
Time spent: 0:00:00.038

resulting in no actual changes being visible in the Solr.


Using curl results in the same HTTP response:


curl
'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true'
-F "myfile=@example
/exampledocs/solr-word.pdf"



Error 404 Not Found

HTTP ERROR 404 Not Found

URI:/solr/gettingstarted/update/extract
STATUS:404
MESSAGE:Not Found
SERVLET:default






Sorry if this has already been discussed somewhere; I have not been able
to find anything helpful yet.

Thank you!

Leon



Re: 404 Errors on update/extract

2021-02-05 Thread Alexandre Rafalovitch
I think the extract handler is not defined in schemaless. This may be
a change from before and the documentation is out of sync.

Can you try 'techproducts' example instead of schemaless:
bin/solr stop (if you are still running it)
bin/solr start -e techproducts

Then the import command.

The Tika integration is defined in solrconfig.xml and needs both
handler defined and some libraries loaded. Once you confirmed you like
what you see, you can copy those into whatever configuration you are
working with.

Regards,
   Alex.

On Fri, 5 Feb 2021 at 07:38, nq  wrote:
>
> Hi,
>
>
> I am new to Solr and tried to follow the guide to upload PDF data using
> Tika, on Solr 8.7.0 (running on Debian 10):
>
> https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html
>
> but I get an HTTP 404 error when trying to import the file.
>
>
> In the solr installation directory, after spinning up the example server
> using
>
> solr/bin/solr -e schemaless
>
> I firstly used the Post Tool to index a PDF file as described in the
> guide, giving the following output (paths truncated using “[…]” for
> privacy reasons):
>
> bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params
> "literal.id=doc1"
>
> > java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes
> > -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa
> > che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf
> > SimplePostTool version 5.0.0
> > Posting files to [base] url
> > http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
> > Entering auto mode. File endings considered are
> > xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> > POSTing file solr-word.pdf (application/pdf) to [base]/extract
> > SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
> > url:
> > http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1
> > esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
> > SimplePostTool: WARNING: Response: 
> > 
> > 
> > Error 404 Not Found
> > 
> > HTTP ERROR 404 Not Found
> > 
> > URI:/solr/gettingstarted/update/extract
> > STATUS:404
> > MESSAGE:Not Found
> > SERVLET:default
> > 
> >
> > 
> > 
> > SimplePostTool: WARNING: IOException while reading response:
> > java.io.FileNotFoundException:
> > http://localhost:8983/solr/gettingstarted/update/extract
> > ?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
> >
> > 1 files indexed.
> > COMMITting Solr index changes to
> > http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
> > Time spent: 0:00:00.038
> resulting in no actual changes being visible in the Solr.
>
>
> Using curl results in the same HTTP response:
>
> > curl
> > 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true'
> > -F "myfile=@example
> > /exampledocs/solr-word.pdf"
> > 
> > 
> > 
> > Error 404 Not Found
> > 
> > HTTP ERROR 404 Not Found
> > 
> > URI:/solr/gettingstarted/update/extract
> > STATUS:404
> > MESSAGE:Not Found
> > SERVLET:default
> > 
> >
> > 
> > 
> >
>
> Sorry if this has already been discussed somewhere; I have not been able
> to find anything helpful yet.
>
> Thank you!
>
> Leon
>


404 Errors on update/extract

2021-02-05 Thread nq

Hi,


I am new to Solr and tried to follow the guide to upload PDF data using 
Tika, on Solr 8.7.0 (running on Debian 10):


https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html

but I get an HTTP 404 error when trying to import the file.


In the solr installation directory, after spinning up the example server 
using


solr/bin/solr -e schemaless

I firstly used the Post Tool to index a PDF file as described in the 
guide, giving the following output (paths truncated using “[…]” for 
privacy reasons):


bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params 
"literal.id=doc1"


java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes 
-Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa

che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf
SimplePostTool version 5.0.0
Posting files to [base] url 
http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log

POSTing file solr-word.pdf (application/pdf) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for 
url: 
http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1

esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
SimplePostTool: WARNING: Response: 


Error 404 Not Found

HTTP ERROR 404 Not Found

URI:/solr/gettingstarted/update/extract
STATUS:404
MESSAGE:Not Found
SERVLET:default




SimplePostTool: WARNING: IOException while reading response: 
java.io.FileNotFoundException: 
http://localhost:8983/solr/gettingstarted/update/extract
?literal.id=doc1=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf 


1 files indexed.
COMMITting Solr index changes to 
http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...

Time spent: 0:00:00.038

resulting in no actual changes being visible in the Solr.


Using curl results in the same HTTP response:

curl 
'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1=true' 
-F "myfile=@example

/exampledocs/solr-word.pdf"



Error 404 Not Found

HTTP ERROR 404 Not Found

URI:/solr/gettingstarted/update/extract
STATUS:404
MESSAGE:Not Found
SERVLET:default







Sorry if this has already been discussed somewhere; I have not been able 
to find anything helpful yet.


Thank you!

Leon



Error 500 with update extract handler on Solr 7.4.0

2019-06-21 Thread julien massiera
Hi all,

I recently experienced some problems with the update extract handler on a Solr 
7.4.0 instance. When sending a document via multipart POST update request, if a 
doc parameter name contains too much chars, the POST method fails with a 500 
code error and I can see the following exception in the Solr logs :

ERROR 2019-06-20T09:43:41,089 (qtp1625082366-13) - 
Solr|Solr|solr.servlet.HttpSolrCall|[c:FileShare s:shard1 r:core_node2 
x:FileShare_shard1_replica_n1] o.a.s.s.HttpSolrCall 
null:org.apache.commons.fileupload.FileUploadException: Header section has more 
than 10240 bytes (maybe it is not properly terminated)
at 
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)
at 
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:115)
at 
org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:602)
at 
org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:784)
at 
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:167)
at 
org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:317)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)
at java.lang.Thread.run(Thread.java:748)
Caused

Error 500 with update extract handler on Solr 7.4.0

2019-06-21 Thread julien massiera
Hi all,

I recently experienced some problems with the update extract handler on a Solr 
7.4.0 instance. When sending a document via multipart POST update request, if a 
doc parameter name contains too much chars, the POST method fails with a 500 
code error and I can see the following exception in the Solr logs :

ERROR 2019-06-20T09:43:41,089 (qtp1625082366-13) - 
Solr|Solr|solr.servlet.HttpSolrCall|[c:FileShare s:shard1 r:core_node2 
x:FileShare_shard1_replica_n1] o.a.s.s.HttpSolrCall 
null:org.apache.commons.fileupload.FileUploadException: Header section has more 
than 10240 bytes (maybe it is not properly terminated)
at 
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)
at 
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:115)
at 
org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:602)
at 
org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:784)
at 
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:167)
at 
org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:317)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)
at java.lang.Thread.run(Thread.java:748)
Caused

Problems with long named parameters with update extract handler

2019-06-21 Thread Julien
Hi all,

I recently experienced some problems with the update extract handler on a Solr 
7.4.0 instance. When sending a document via multipart POST update request, if a 
doc parameter name contains too much chars, the POST method fails and I can see 
the following exception in the Solr logs : 

ERROR 2019-06-20T09:43:41,089 (qtp1625082366-13) - 
Solr|Solr|solr.servlet.HttpSolrCall|[c:FileShare s:shard1 r:core_node2 
x:FileShare_shard1_replica_n1] o.a.s.s.HttpSolrCall 
null:org.apache.commons.fileupload.FileUploadException: Header section has more 
than 10240 bytes (maybe it is not properly terminated)
    at 
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)
    at 
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:115)
    at 
org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:602)
    at 
org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:784)
    at 
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:167)
    at 
org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:317)
    at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
    at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
    at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
    at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
    at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
    at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
    at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
    at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
    at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
    at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
    at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
    at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
    at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
    at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.Server.handle(Server.java:531)
    at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
    at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
    at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
    at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
    at 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
    at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
    at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
    at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
    at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
    at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
    at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)
    at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)
    at java.lang.Thread.run(Thread.java:748)
Caused

Problems with long named parameters with update extract handler

2019-06-20 Thread Julien
Hi,

I recently experienced some problems with the update extract handler. When 
sending a document via multipart POST update request, if a doc parameter name 
contains too much chars, the POST method fails and I can see the following 
exception in the Solr logs : 

ERROR 2019-06-20T09:43:41,089 (qtp1625082366-13) - 
Solr|Solr|solr.servlet.HttpSolrCall|[c:FileShare s:shard1 r:core_node2 
x:FileShare_shard1_replica_n1] o.a.s.s.HttpSolrCall 
null:org.apache.commons.fileupload.FileUploadException: Header section has more 
than 10240 bytes (maybe it is not properly terminated)
    at 
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)
    at 
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:115)
    at 
org.apache.solr.servlet.SolrRequestParsers$MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:602)
    at 
org.apache.solr.servlet.SolrRequestParsers$StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:784)
    at 
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:167)
    at 
org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:317)
    at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
    at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
    at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
    at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
    at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
    at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
    at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
    at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
    at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
    at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
    at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
    at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
    at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
    at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
    at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.Server.handle(Server.java:531)
    at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
    at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
    at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
    at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
    at 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
    at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
    at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
    at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
    at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
    at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
    at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)
    at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)
    at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.commons.fileupload.MultipartStream

AW: Re: update/extract override ExtractTyp

2017-01-05 Thread sn00py


I am useing the Extract URL And Renamed the File to test.txtBut it is still 
Parsed with the XML ParserCan I force the txt Parser for all .txt Files? 


Von meinem Samsung Gerät gesendet.

 Ursprüngliche Nachricht 
Von: Shawn Heisey <apa...@elyograg.org> 
Datum: 04.01.17  17:10  (GMT+01:00) 
An: solr-user@lucene.apache.org 
Betreff: Re: update/extract override ExtractTyp 

On 1/4/2017 8:12 AM, sn0...@ulysses-erp.com wrote:
> Is it possible to override the ExtractClass for a specific document?
> I would like to upload a XML Document, but this XML is not XML conform
>
> I need this XML because it is part of a project where a corrupt XML is
> need, for testing purpose.
>
>
> The update/extract process failes every time with an 500 error.
>
> I tried to override the Content-Type with "text/plain" but  get still
> the XML parse error.

If you send something to the /update handler, and don't tell Solr that
it is another format that it knows like CSV, JSON, or Javabin, then Solr
assumes that it is XML -- and that it is the *specific* XML format that
Solr uses.  "text/plain" is not one of the formats that the update
handler knows how to handle, so it will assume XML.

If you send some other arbitrary XML content, even if that XML is
otherwise correctly formed (which apparently yours isn't), Solr will
throw an error, because it is not the type of XML that Solr is looking
for.  On this page are some examples of what Solr is expecting when you
send XML:

https://wiki.apache.org/solr/UpdateXmlMessages

If you want to parse arbitrary XML into fields, you probably need to
send it using DIH and the XPathEntityProcessor.  If you want the XML to
go into a field completely as-is, then you need to encode the XML into
one of the update formats that Solr knows (XML, JSON, etc) and set it as
the value of one of the fields.

Thanks,
Shawn



Re: update/extract override ExtractTyp

2017-01-04 Thread Shawn Heisey
On 1/4/2017 8:12 AM, sn0...@ulysses-erp.com wrote:
> Is it possible to override the ExtractClass for a specific document?
> I would like to upload a XML Document, but this XML is not XML conform
>
> I need this XML because it is part of a project where a corrupt XML is
> need, for testing purpose.
>
>
> The update/extract process failes every time with an 500 error.
>
> I tried to override the Content-Type with "text/plain" but  get still
> the XML parse error.

If you send something to the /update handler, and don't tell Solr that
it is another format that it knows like CSV, JSON, or Javabin, then Solr
assumes that it is XML -- and that it is the *specific* XML format that
Solr uses.  "text/plain" is not one of the formats that the update
handler knows how to handle, so it will assume XML.

If you send some other arbitrary XML content, even if that XML is
otherwise correctly formed (which apparently yours isn't), Solr will
throw an error, because it is not the type of XML that Solr is looking
for.  On this page are some examples of what Solr is expecting when you
send XML:

https://wiki.apache.org/solr/UpdateXmlMessages

If you want to parse arbitrary XML into fields, you probably need to
send it using DIH and the XPathEntityProcessor.  If you want the XML to
go into a field completely as-is, then you need to encode the XML into
one of the update formats that Solr knows (XML, JSON, etc) and set it as
the value of one of the fields.

Thanks,
Shawn



update/extract override ExtractTyp

2017-01-04 Thread sn00py

Hello

Is it possible to override the ExtractClass for a specific document?
I would like to upload a XML Document, but this XML is not XML conform

I need this XML because it is part of a project where a corrupt XML is  
need, for testing purpose.



The update/extract process failes every time with an 500 error.

I tried to override the Content-Type with "text/plain" but  get still  
the XML parse error.


Is it possible to override it?


This message was sent using IMP, the Internet Messaging Program.



Re: language configuration in update extract request handler

2016-06-06 Thread Reth RM
This question should be posted on tika mailing list. It is not related to
index or search but about parsing content of image.

On Sun, Jun 5, 2016 at 10:20 PM, SIDDHAST® Roshan 
wrote:

> Hi All,
>
> we are using the application for indexing and searching text using
> solr. we refered the guide posted
>
> http://hortonworks.com/hadoop-tutorial/indexing-and-searching-text-within-images-with-apache-solr/
>
> Problem: we are want to index hindi images. we want to know how to set
> configuration parameter of tesseract via tika or external params
>
> --
> Roshan Agarwal
> Siddhast®
> 907 chandra vihar colony
> Jhansi-284002
> M:+917376314900
>


language configuration in update extract request handler

2016-06-05 Thread SIDDHAST® Roshan
Hi All,

we are using the application for indexing and searching text using
solr. we refered the guide posted
http://hortonworks.com/hadoop-tutorial/indexing-and-searching-text-within-images-with-apache-solr/

Problem: we are want to index hindi images. we want to know how to set
configuration parameter of tesseract via tika or external params

-- 
Roshan Agarwal
Siddhast®
907 chandra vihar colony
Jhansi-284002
M:+917376314900


Re: Commit Within and /update/extract handler

2014-04-09 Thread Jamie Johnson
This is being triggered by adding the commitWithin param to
ContentStreamUpdateRequest (request.setCommitWithin(1);).  My
configuration has autoCommit max time of 15s and openSearcher set to false.
 I'm assuming that changing openSeracher to true should address this, and
adding the softCommit = true to the request would make the documents
available in the mean time?

On Apr 8, 2014 10:02 AM, Erick Erickson erickerick...@gmail.com wrote:

 Got a clue how it's being generated? Because it's not going to show
 you documents.


 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

 openSearcher=false and softCommit=false so the documents will be
 invisible. You need one or the other set to true.

 What it will do is close the current segment, open a new one and
 truncate the current transaction log. These may be good things but
 they have nothing to do with making docs visible :).

 See:

 http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

 Best,
 Erick

 On Mon, Apr 7, 2014 at 8:43 PM, Jamie Johnson jej2...@gmail.com wrote:
  Below is the log showing what I believe to be the commit
 
  07-Apr-2014 23:40:55.846 INFO [catalina-exec-5]
  org.apache.solr.update.processor.LogUpdateProcessor.finish [forums]
  webapp=/solr path=/update/extract
 
 params={uprefix=attr_literal.source_id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.content_group=File
  literal.id
 =e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.forum_id=3literal.content_type=application/octet-streamwt=javabinliteral.uploaded_by=+version=2literal.content_type=application/octet-streamliteral.file_name=exclusions}
  {add=[e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcce (1464785652471037952)]} 0 563
  07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.DirectUpdateHandler2.commit start
 
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
  07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [IW][commitScheduler-10-thread-1]: commit: start
  07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [IW][commitScheduler-10-thread-1]: commit: enter lock
  07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [IW][commitScheduler-10-thread-1]: commit: now prepare
  07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [IW][commitScheduler-10-thread-1]: prepareCommit: flush
  07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [IW][commitScheduler-10-thread-1]:   index before flush _y(4.6):C1
  _10(4.6):C1 _11(4.6):C1 _12(4.6):C1
  07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DW][commitScheduler-10-thread-1]: commitScheduler-10-thread-1
  startFullFlush
  07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DW][commitScheduler-10-thread-1]: anyChanges? numDocsInRam=1
 deletes=true
  hasTickets:false pendingChangesInFullFlush: false
  07-Apr-2014 23:41:10.850 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWFC][commitScheduler-10-thread-1]: addFlushableState
  DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_14,
  aborting=false, numDocsInRAM=1, deleteQueue=DWDQ: [ generation: 2 ]]
  07-Apr-2014 23:41:10.852 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: flush postings as segment _14
 numDocs=1
  07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: new segment has 0 deleted docs
  07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: new segment has no vectors; norms;
 no
  docValues; prox; freqs
  07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: flushedFiles=[_14.nvd,
  _14_Lucene41_0.pos, _14_Lucene41_0.tip, _14_Lucene41_0.tim, _14.nvm,
  _14.fdx, _14_Lucene41_0.doc, _14.fnm, _14.fdt]
  07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: flushed codec=Lucene46
  07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
  org.apache.solr.update.LoggingInfoStream.message
  [DWPT][commitScheduler-10-thread-1]: flushed: segment=_14 ramUsed=0.122
 MB
  newFlushedSize(includes docstores)=0.003 MB docs/MB=322.937
  07-Apr-2014 23:41

Re: Commit Within and /update/extract handler

2014-04-09 Thread Shawn Heisey
On 4/9/2014 7:47 AM, Jamie Johnson wrote:
 This is being triggered by adding the commitWithin param to
 ContentStreamUpdateRequest (request.setCommitWithin(1);).  My
 configuration has autoCommit max time of 15s and openSearcher set to false.
  I'm assuming that changing openSeracher to true should address this, and
 adding the softCommit = true to the request would make the documents
 available in the mean time?

My personal opinion: autoCommit should not be used for document
visibility, even though it CAN be used for it.  It belongs in every
config that uses the transaction log, with openSearcher set to false,
and carefully considered maxTime and/or maxDocs parameters.

I think it's better to control document visibility entirely manually,
but if you actually do want to have an automatic commit for document
visibility, use autoSoftCommit.  It doesn't make any sense to disable
openSearcher on a soft commit, so just leave that out.  The docs/time
intervals for this can be smaller or greater than the intervals for
autoCommit, depending on your needs.

Any manual commits that you send probably should be soft commits, but
honestly that doesn't really matter if your auto settings are correct.

Thanks,
Shawn



Re: Commit Within and /update/extract handler

2014-04-09 Thread Jamie Johnson
Thanks Shawn, I appreciate the information.


On Wed, Apr 9, 2014 at 10:27 AM, Shawn Heisey s...@elyograg.org wrote:

 On 4/9/2014 7:47 AM, Jamie Johnson wrote:
  This is being triggered by adding the commitWithin param to
  ContentStreamUpdateRequest (request.setCommitWithin(1);).  My
  configuration has autoCommit max time of 15s and openSearcher set to
 false.
   I'm assuming that changing openSeracher to true should address this, and
  adding the softCommit = true to the request would make the documents
  available in the mean time?

 My personal opinion: autoCommit should not be used for document
 visibility, even though it CAN be used for it.  It belongs in every
 config that uses the transaction log, with openSearcher set to false,
 and carefully considered maxTime and/or maxDocs parameters.

 I think it's better to control document visibility entirely manually,
 but if you actually do want to have an automatic commit for document
 visibility, use autoSoftCommit.  It doesn't make any sense to disable
 openSearcher on a soft commit, so just leave that out.  The docs/time
 intervals for this can be smaller or greater than the intervals for
 autoCommit, depending on your needs.

 Any manual commits that you send probably should be soft commits, but
 honestly that doesn't really matter if your auto settings are correct.

 Thanks,
 Shawn




Re: Commit Within and /update/extract handler

2014-04-08 Thread Erick Erickson
Got a clue how it's being generated? Because it's not going to show
you documents.

commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

openSearcher=false and softCommit=false so the documents will be
invisible. You need one or the other set to true.

What it will do is close the current segment, open a new one and
truncate the current transaction log. These may be good things but
they have nothing to do with making docs visible :).

See:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Mon, Apr 7, 2014 at 8:43 PM, Jamie Johnson jej2...@gmail.com wrote:
 Below is the log showing what I believe to be the commit

 07-Apr-2014 23:40:55.846 INFO [catalina-exec-5]
 org.apache.solr.update.processor.LogUpdateProcessor.finish [forums]
 webapp=/solr path=/update/extract
 params={uprefix=attr_literal.source_id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.content_group=File
 literal.id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.forum_id=3literal.content_type=application/octet-streamwt=javabinliteral.uploaded_by=+version=2literal.content_type=application/octet-streamliteral.file_name=exclusions}
 {add=[e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcce (1464785652471037952)]} 0 563
 07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.DirectUpdateHandler2.commit start
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]: commit: start
 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]: commit: enter lock
 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]: commit: now prepare
 07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]: prepareCommit: flush
 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]:   index before flush _y(4.6):C1
 _10(4.6):C1 _11(4.6):C1 _12(4.6):C1
 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DW][commitScheduler-10-thread-1]: commitScheduler-10-thread-1
 startFullFlush
 07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DW][commitScheduler-10-thread-1]: anyChanges? numDocsInRam=1 deletes=true
 hasTickets:false pendingChangesInFullFlush: false
 07-Apr-2014 23:41:10.850 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWFC][commitScheduler-10-thread-1]: addFlushableState
 DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_14,
 aborting=false, numDocsInRAM=1, deleteQueue=DWDQ: [ generation: 2 ]]
 07-Apr-2014 23:41:10.852 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: flush postings as segment _14 numDocs=1
 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: new segment has 0 deleted docs
 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: new segment has no vectors; norms; no
 docValues; prox; freqs
 07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: flushedFiles=[_14.nvd,
 _14_Lucene41_0.pos, _14_Lucene41_0.tip, _14_Lucene41_0.tim, _14.nvm,
 _14.fdx, _14_Lucene41_0.doc, _14.fnm, _14.fdt]
 07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: flushed codec=Lucene46
 07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DWPT][commitScheduler-10-thread-1]: flushed: segment=_14 ramUsed=0.122 MB
 newFlushedSize(includes docstores)=0.003 MB docs/MB=322.937
 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [DW][commitScheduler-10-thread-1]: publishFlushedSegment seg-private
 updates=null
 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [IW][commitScheduler-10-thread-1]: publishFlushedSegment
 07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
 org.apache.solr.update.LoggingInfoStream.message
 [BD][commitScheduler-10-thread-1]: push deletes  1 deleted terms (unique

Re: Commit Within and /update/extract handler

2014-04-07 Thread Erick Erickson
You say you see the commit happen in the log, is openSearcher
specified? This sounds like you're somehow getting a commit
with openSearcher=false...

Best,
Erick

On Sun, Apr 6, 2014 at 5:37 PM, Jamie Johnson jej2...@gmail.com wrote:
 I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to
 work when I am using the /update/extract request handler.  It looks like a
 commit is happening from the logs, but the documents don't become available
 for search until I do a commit manually.  Could this be some type of
 configuration issue?


Re: Commit Within and /update/extract handler

2014-04-07 Thread Erick Erickson
What does the call look like? Are you setting opening a new searcher
or not? That should be in the log line where the commit is recorded...

FWIW,
Erick

On Sun, Apr 6, 2014 at 5:37 PM, Jamie Johnson jej2...@gmail.com wrote:
 I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to
 work when I am using the /update/extract request handler.  It looks like a
 commit is happening from the logs, but the documents don't become available
 for search until I do a commit manually.  Could this be some type of
 configuration issue?


Re: Commit Within and /update/extract handler

2014-04-07 Thread Jamie Johnson
Below is the log showing what I believe to be the commit

07-Apr-2014 23:40:55.846 INFO [catalina-exec-5]
org.apache.solr.update.processor.LogUpdateProcessor.finish [forums]
webapp=/solr path=/update/extract
params={uprefix=attr_literal.source_id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.content_group=File
literal.id=e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcceliteral.forum_id=3literal.content_type=application/octet-streamwt=javabinliteral.uploaded_by=+version=2literal.content_type=application/octet-streamliteral.file_name=exclusions}
{add=[e4bb4bb6-96ab-4f8f-8a2a-1cf37dc1bcce (1464785652471037952)]} 0 563
07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.DirectUpdateHandler2.commit start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
07-Apr-2014 23:41:10.847 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: commit: start
07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: commit: enter lock
07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: commit: now prepare
07-Apr-2014 23:41:10.848 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: prepareCommit: flush
07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]:   index before flush _y(4.6):C1
_10(4.6):C1 _11(4.6):C1 _12(4.6):C1
07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DW][commitScheduler-10-thread-1]: commitScheduler-10-thread-1
startFullFlush
07-Apr-2014 23:41:10.849 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DW][commitScheduler-10-thread-1]: anyChanges? numDocsInRam=1 deletes=true
hasTickets:false pendingChangesInFullFlush: false
07-Apr-2014 23:41:10.850 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWFC][commitScheduler-10-thread-1]: addFlushableState
DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_14,
aborting=false, numDocsInRAM=1, deleteQueue=DWDQ: [ generation: 2 ]]
07-Apr-2014 23:41:10.852 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: flush postings as segment _14 numDocs=1
07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: new segment has 0 deleted docs
07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: new segment has no vectors; norms; no
docValues; prox; freqs
07-Apr-2014 23:41:10.904 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: flushedFiles=[_14.nvd,
_14_Lucene41_0.pos, _14_Lucene41_0.tip, _14_Lucene41_0.tim, _14.nvm,
_14.fdx, _14_Lucene41_0.doc, _14.fnm, _14.fdt]
07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: flushed codec=Lucene46
07-Apr-2014 23:41:10.905 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DWPT][commitScheduler-10-thread-1]: flushed: segment=_14 ramUsed=0.122 MB
newFlushedSize(includes docstores)=0.003 MB docs/MB=322.937
07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[DW][commitScheduler-10-thread-1]: publishFlushedSegment seg-private
updates=null
07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: publishFlushedSegment
07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[BD][commitScheduler-10-thread-1]: push deletes  1 deleted terms (unique
count=1) bytesUsed=1024 delGen=4 packetCount=1 totBytesUsed=1024
07-Apr-2014 23:41:10.907 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IW][commitScheduler-10-thread-1]: publish sets newSegment delGen=5
seg=_14(4.6):C1
07-Apr-2014 23:41:10.908 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IFD][commitScheduler-10-thread-1]: now checkpoint _y(4.6):C1 _10(4.6):C1
_11(4.6):C1 _12(4.6):C1 _14(4.6):C1 [5 segments ; isCommit = false]
07-Apr-2014 23:41:10.908 INFO [commitScheduler-10-thread-1]
org.apache.solr.update.LoggingInfoStream.message
[IFD][commitScheduler-10-thread-1]: 0 msec to checkpoint
07-Apr-2014 23:41:10.908 INFO [commitScheduler-10-thread-1

Commit Within and /update/extract handler

2014-04-06 Thread Jamie Johnson
I'm running solr 4.6.0 and am noticing that commitWithin doesn't seem to
work when I am using the /update/extract request handler.  It looks like a
commit is happening from the logs, but the documents don't become available
for search until I do a commit manually.  Could this be some type of
configuration issue?


Re: Send many files to update/extract

2014-03-18 Thread Alexandre Rafalovitch
HttpSolrServer allows to send multiple documents at once. But they
need to be extracted/converted on the client. However, if you know you
will be sending a lot of documents to Solr, you are better off to run
Tika locally on the client (or as a standalone network server). A lot
more performant.

I am not sure if ExtractingRequestHandler takes multipart MIME format,
but that would be the thing to check if you still want to process on
the server.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Mar 18, 2014 at 12:55 PM, Александр Вандышев
a-wonde...@rambler.ru wrote:
 Who knows how to index a lot of files with ExtractingRequestHandler using a
 single query?


Send many files to update/extract

2014-03-17 Thread Александр Вандышев

Who knows how to index a lot of files with ExtractingRequestHandler using a
single query?


Curl : shell script : The requested resource is not available. update/extract !

2014-03-10 Thread Priti Solanki
Hi all,

Following throw The request resource is not available


curl 
http://localhost:8080/solr/#/dev/update/extract?stream.file=/home/priti/$fileliteral.id=document$icommit=true


I don't understand what is literal.id ?? Is it mandatory. [Please share
reading links if known]

 /headbodyh1HTTP Status 404 - /solr/#/dev/update/extract/h1HR
size=1 noshade=noshadepbtype/b Status report/ppb*message*/b
u/solr/#dev/update/extract/u/ppbdescription/b uThe requested
resource is not available./u/pHR size=1 noshade=noshadeh3Apache
Tomcat/7.0.42/h3/bod

Whats wrong?

Regards,
Priti


Re: Curl : shell script : The requested resource is not available. update/extract !

2014-03-10 Thread Raymond Wiker
literal.id should contain a unique identifier for each document (assuming
that the unique identifier field in your solr schema is called id); see
http://wiki.apache.org/solr/ExtractingRequestHandler .

I'm guessing that the url for the ExtractinRequestHandler is incorrect, or
maybe you haven't even configured/enabled the ExtractingRequestHandler in
solr-config.xml. Further, from the url you show, I'm guessing that $file
and $i are references to shell variables that have been incorrectly
quoted (for example, by enclosing a constructed url in single quotes
instead of double quotes, if you're on a Unixoid platform.)


On Mon, Mar 10, 2014 at 2:51 PM, Priti Solanki pritiatw...@gmail.comwrote:

 Hi all,

 Following throw The request resource is not available


 curl 

 http://localhost:8080/solr/#/dev/update/extract?stream.file=/home/priti/$fileliteral.id=document$icommit=true
 

 I don't understand what is literal.id ?? Is it mandatory. [Please share
 reading links if known]

  /headbodyh1HTTP Status 404 - /solr/#/dev/update/extract/h1HR
 size=1 noshade=noshadepbtype/b Status
 report/ppb*message*/b
 u/solr/#dev/update/extract/u/ppbdescription/b uThe requested
 resource is not available./u/pHR size=1 noshade=noshadeh3Apache
 Tomcat/7.0.42/h3/bod

 Whats wrong?

 Regards,
 Priti



Re: Curl : shell script : The requested resource is not available. update/extract !

2014-03-10 Thread Jack Krupansky
The # character introduces the fragment portion of a URL, so 
/dev/update/extract is not a part of the path of the URL. In this case 
the URL path is /solr/ and the server is simply complaining that there 
is no code registered to process that path.


Normally, the collection name (core name) follows /solr/.

-- Jack Krupansky

-Original Message- 
From: Priti Solanki

Sent: Monday, March 10, 2014 9:51 AM
To: solr-user@lucene.apache.org
Subject: Curl : shell script : The requested resource is not available. 
update/extract !


Hi all,

Following throw The request resource is not available


curl 
http://localhost:8080/solr/#/dev/update/extract?stream.file=/home/priti/$fileliteral.id=document$icommit=true


I don't understand what is literal.id ?? Is it mandatory. [Please share
reading links if known]

/headbodyh1HTTP Status 404 - /solr/#/dev/update/extract/h1HR
size=1 noshade=noshadepbtype/b Status 
report/ppb*message*/b

u/solr/#dev/update/extract/u/ppbdescription/b uThe requested
resource is not available./u/pHR size=1 noshade=noshadeh3Apache
Tomcat/7.0.42/h3/bod

Whats wrong?

Regards,
Priti 



Re: requested url solr/update/extract not available on this server

2013-09-24 Thread Nutan
Rest of the queries work and i have added the  following in solrconfig.xml:
requestHandler name=/update/extract
 class=solr.extraction.ExtractingRequestHandler 
lst name=defaults
str name=map.Last-Modifiedlast_modified/str
str name=fmap.contentcontents/str
str name=lowernamestrue/str
str name=uprefixignored_/str
/lst
/requestHandler


On Sun, Sep 22, 2013 at 8:53 PM, Erick Erickson [via Lucene] 
ml-node+s472066n4091440...@n3.nabble.com wrote:

 Please review:

 http://wiki.apache.org/solr/UsingMailingLists

 Erick

 On Sun, Sep 22, 2013 at 5:52 AM, Nutan [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4091440i=0
 wrote:

  I did define the request handler.
 
 
  On Sun, Sep 22, 2013 at 12:51 AM, Erick Erickson [via Lucene] 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4091440i=1
 wrote:
 
  bq: And im not using the example config file
 
  It looks like you have not included the request handler in your
  solrconfig.xml,
  something like (from the stock distro):
 
!-- Solr Cell Update Request Handler
 
 http://wiki.apache.org/solr/ExtractingRequestHandler
 
  --
requestHandler name=/update/extract
startup=lazy
class=solr.extraction.ExtractingRequestHandler 
  lst name=defaults
str name=lowernamestrue/str
str name=uprefixignored_/str
 
!-- capture link hrefs but ignore div attributes --
str name=captureAttrtrue/str
str name=fmap.alinks/str
str name=fmap.divignored_/str
  /lst
/requestHandler
 
  I'd start with the stock config and try removing things one-by-one...
 
  Best,
  Erick
 
  On Sat, Sep 21, 2013 at 7:34 AM, Nutan [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4091391i=0
  wrote:
 
   Yes I do get the solr admin page.And im not using the example config
  file,I
   have create mine own for my project as required.I have also defined
   update/extract in solrconfig.xml.
  
  
   On Tue, Sep 17, 2013 at 4:45 AM, Chris Hostetter-3 [via Lucene] 
   [hidden email] http://user/SendEmail.jtp?type=nodenode=4091391i=1

  wrote:
  
  
   : Is /solr/update working?
  
   more importantly: does /solr/ work in your browser and return
  anything
   useful?  (nothing you've told us yet gives us anyway of knowning if
   solr is even up and running)
  
   if 'http://localhost:8080/solr/' shows you the solr admin UI, and
 you
  are
   using the stock Solr 4.2 example configs, then
   http://localhost:8080/solr/update/extract should not give you a 404
   error.
  
   if however you are using some other configs, it might not work
 unless
   those configs register a handler with the path /update/extract.
  
   Using the jetty setup provided with 4.2, and the example configs
 (from
   4.2) I was able to index a sample PDF just fine using your curl
  command...
  
   hossman@frisbee:~/tmp$ curl 
   http://localhost:8983/solr/update/extract?literal.id=1commit=true;
 -F
   myfile=@stump.winners.san.diego.2013.pdf
   ?xml version=1.0 encoding=UTF-8?
   response
   lst name=responseHeaderint name=status0/intint
   name=QTime1839/int/lst
   /response
  
  
  
  
  
   :
   : Check solrconfig to see that /update/extract is configured as in
 the
   standard
   : Solr example.
   :
   : Does /solr/update/extract work for you using the standard Solr
  example?
   :
   : -- Jack Krupansky
   :
   : -Original Message- From: Nutan
   : Sent: Sunday, September 15, 2013 2:37 AM
   : To: [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4090459i=0
   : Subject: requested url solr/update/extract not available on this
  server
   :
   : I am working on Solr 4.2 on Windows 7. I am trying to index pdf
  files.I
   : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get
  this
   : error:requested url solr/update/extract not available on this
 server
   : When my curl is :
   : curl 
  http://localhost:8080/solr/update/extract?literal.id=1commit=true;
   -F
   : myfile=@cookbook.pdf
   : There is no entry in log files. Please help.
   :
   :
   :
   : --
   : View this message in context:
   :
  
 
 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html
   : Sent from the Solr - User mailing list archive at Nabble.com.
   :
  
   -Hoss
  
  
   --
If you reply to this email, your message will be added to the
  discussion
   below:
  
  
 
 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4090459.html
To unsubscribe from requested url solr/update/extract not available
 on
   this server, click here
 
   .
   NAML
 
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails

Re: requested url solr/update/extract not available on this server

2013-09-22 Thread Nutan
I did define the request handler.


On Sun, Sep 22, 2013 at 12:51 AM, Erick Erickson [via Lucene] 
ml-node+s472066n4091391...@n3.nabble.com wrote:

 bq: And im not using the example config file

 It looks like you have not included the request handler in your
 solrconfig.xml,
 something like (from the stock distro):

   !-- Solr Cell Update Request Handler

http://wiki.apache.org/solr/ExtractingRequestHandler

 --
   requestHandler name=/update/extract
   startup=lazy
   class=solr.extraction.ExtractingRequestHandler 
 lst name=defaults
   str name=lowernamestrue/str
   str name=uprefixignored_/str

   !-- capture link hrefs but ignore div attributes --
   str name=captureAttrtrue/str
   str name=fmap.alinks/str
   str name=fmap.divignored_/str
 /lst
   /requestHandler

 I'd start with the stock config and try removing things one-by-one...

 Best,
 Erick

 On Sat, Sep 21, 2013 at 7:34 AM, Nutan [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4091391i=0
 wrote:

  Yes I do get the solr admin page.And im not using the example config
 file,I
  have create mine own for my project as required.I have also defined
  update/extract in solrconfig.xml.
 
 
  On Tue, Sep 17, 2013 at 4:45 AM, Chris Hostetter-3 [via Lucene] 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4091391i=1
 wrote:
 
 
  : Is /solr/update working?
 
  more importantly: does /solr/ work in your browser and return
 anything
  useful?  (nothing you've told us yet gives us anyway of knowning if
  solr is even up and running)
 
  if 'http://localhost:8080/solr/' shows you the solr admin UI, and you
 are
  using the stock Solr 4.2 example configs, then
  http://localhost:8080/solr/update/extract should not give you a 404
  error.
 
  if however you are using some other configs, it might not work unless
  those configs register a handler with the path /update/extract.
 
  Using the jetty setup provided with 4.2, and the example configs (from
  4.2) I was able to index a sample PDF just fine using your curl
 command...
 
  hossman@frisbee:~/tmp$ curl 
  http://localhost:8983/solr/update/extract?literal.id=1commit=true; -F
  myfile=@stump.winners.san.diego.2013.pdf
  ?xml version=1.0 encoding=UTF-8?
  response
  lst name=responseHeaderint name=status0/intint
  name=QTime1839/int/lst
  /response
 
 
 
 
 
  :
  : Check solrconfig to see that /update/extract is configured as in the
  standard
  : Solr example.
  :
  : Does /solr/update/extract work for you using the standard Solr
 example?
  :
  : -- Jack Krupansky
  :
  : -Original Message- From: Nutan
  : Sent: Sunday, September 15, 2013 2:37 AM
  : To: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4090459i=0
  : Subject: requested url solr/update/extract not available on this
 server
  :
  : I am working on Solr 4.2 on Windows 7. I am trying to index pdf
 files.I
  : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get
 this
  : error:requested url solr/update/extract not available on this server
  : When my curl is :
  : curl 
 http://localhost:8080/solr/update/extract?literal.id=1commit=true;
  -F
  : myfile=@cookbook.pdf
  : There is no entry in log files. Please help.
  :
  :
  :
  : --
  : View this message in context:
  :
 
 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html
  : Sent from the Solr - User mailing list archive at Nabble.com.
  :
 
  -Hoss
 
 
  --
   If you reply to this email, your message will be added to the
 discussion
  below:
 
 
 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4090459.html
   To unsubscribe from requested url solr/update/extract not available on
  this server, click here

  .
  NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

 
 
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091371.html

  Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091391.html
  To unsubscribe from requested url solr/update/extract not available on
 this server, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4090153code=bnV0YW5zaGluZGUxOTkyQGdtYWlsLmNvbXw0MDkwMTUzfC0xMzEzOTU5Mzcx
 .
 NAMLhttp

Re: requested url solr/update/extract not available on this server

2013-09-22 Thread Erick Erickson
Please review:

http://wiki.apache.org/solr/UsingMailingLists

Erick

On Sun, Sep 22, 2013 at 5:52 AM, Nutan nutanshinde1...@gmail.com wrote:
 I did define the request handler.


 On Sun, Sep 22, 2013 at 12:51 AM, Erick Erickson [via Lucene] 
 ml-node+s472066n4091391...@n3.nabble.com wrote:

 bq: And im not using the example config file

 It looks like you have not included the request handler in your
 solrconfig.xml,
 something like (from the stock distro):

   !-- Solr Cell Update Request Handler

http://wiki.apache.org/solr/ExtractingRequestHandler

 --
   requestHandler name=/update/extract
   startup=lazy
   class=solr.extraction.ExtractingRequestHandler 
 lst name=defaults
   str name=lowernamestrue/str
   str name=uprefixignored_/str

   !-- capture link hrefs but ignore div attributes --
   str name=captureAttrtrue/str
   str name=fmap.alinks/str
   str name=fmap.divignored_/str
 /lst
   /requestHandler

 I'd start with the stock config and try removing things one-by-one...

 Best,
 Erick

 On Sat, Sep 21, 2013 at 7:34 AM, Nutan [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4091391i=0
 wrote:

  Yes I do get the solr admin page.And im not using the example config
 file,I
  have create mine own for my project as required.I have also defined
  update/extract in solrconfig.xml.
 
 
  On Tue, Sep 17, 2013 at 4:45 AM, Chris Hostetter-3 [via Lucene] 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4091391i=1
 wrote:
 
 
  : Is /solr/update working?
 
  more importantly: does /solr/ work in your browser and return
 anything
  useful?  (nothing you've told us yet gives us anyway of knowning if
  solr is even up and running)
 
  if 'http://localhost:8080/solr/' shows you the solr admin UI, and you
 are
  using the stock Solr 4.2 example configs, then
  http://localhost:8080/solr/update/extract should not give you a 404
  error.
 
  if however you are using some other configs, it might not work unless
  those configs register a handler with the path /update/extract.
 
  Using the jetty setup provided with 4.2, and the example configs (from
  4.2) I was able to index a sample PDF just fine using your curl
 command...
 
  hossman@frisbee:~/tmp$ curl 
  http://localhost:8983/solr/update/extract?literal.id=1commit=true; -F
  myfile=@stump.winners.san.diego.2013.pdf
  ?xml version=1.0 encoding=UTF-8?
  response
  lst name=responseHeaderint name=status0/intint
  name=QTime1839/int/lst
  /response
 
 
 
 
 
  :
  : Check solrconfig to see that /update/extract is configured as in the
  standard
  : Solr example.
  :
  : Does /solr/update/extract work for you using the standard Solr
 example?
  :
  : -- Jack Krupansky
  :
  : -Original Message- From: Nutan
  : Sent: Sunday, September 15, 2013 2:37 AM
  : To: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4090459i=0
  : Subject: requested url solr/update/extract not available on this
 server
  :
  : I am working on Solr 4.2 on Windows 7. I am trying to index pdf
 files.I
  : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get
 this
  : error:requested url solr/update/extract not available on this server
  : When my curl is :
  : curl 
 http://localhost:8080/solr/update/extract?literal.id=1commit=true;
  -F
  : myfile=@cookbook.pdf
  : There is no entry in log files. Please help.
  :
  :
  :
  : --
  : View this message in context:
  :
 
 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html
  : Sent from the Solr - User mailing list archive at Nabble.com.
  :
 
  -Hoss
 
 
  --
   If you reply to this email, your message will be added to the
 discussion
  below:
 
 
 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4090459.html
   To unsubscribe from requested url solr/update/extract not available on
  this server, click here

  .
  NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

 
 
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091371.html

  Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091391.html
  To unsubscribe from requested url solr/update/extract not available on
 this server, click 
 herehttp://lucene.472066.n3.nabble.com

Re: requested url solr/update/extract not available on this server

2013-09-21 Thread Nutan
Yes I do get the solr admin page.And im not using the example config file,I
have create mine own for my project as required.I have also defined
update/extract in solrconfig.xml.


On Tue, Sep 17, 2013 at 4:45 AM, Chris Hostetter-3 [via Lucene] 
ml-node+s472066n409045...@n3.nabble.com wrote:


 : Is /solr/update working?

 more importantly: does /solr/ work in your browser and return anything
 useful?  (nothing you've told us yet gives us anyway of knowning if
 solr is even up and running)

 if 'http://localhost:8080/solr/' shows you the solr admin UI, and you are
 using the stock Solr 4.2 example configs, then
 http://localhost:8080/solr/update/extract should not give you a 404
 error.

 if however you are using some other configs, it might not work unless
 those configs register a handler with the path /update/extract.

 Using the jetty setup provided with 4.2, and the example configs (from
 4.2) I was able to index a sample PDF just fine using your curl command...

 hossman@frisbee:~/tmp$ curl 
 http://localhost:8983/solr/update/extract?literal.id=1commit=true; -F
 myfile=@stump.winners.san.diego.2013.pdf
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime1839/int/lst
 /response





 :
 : Check solrconfig to see that /update/extract is configured as in the
 standard
 : Solr example.
 :
 : Does /solr/update/extract work for you using the standard Solr example?
 :
 : -- Jack Krupansky
 :
 : -Original Message- From: Nutan
 : Sent: Sunday, September 15, 2013 2:37 AM
 : To: [hidden email]http://user/SendEmail.jtp?type=nodenode=4090459i=0
 : Subject: requested url solr/update/extract not available on this server
 :
 : I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I
 : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this
 : error:requested url solr/update/extract not available on this server
 : When my curl is :
 : curl http://localhost:8080/solr/update/extract?literal.id=1commit=true;
 -F
 : myfile=@cookbook.pdf
 : There is no entry in log files. Please help.
 :
 :
 :
 : --
 : View this message in context:
 :
 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html
 : Sent from the Solr - User mailing list archive at Nabble.com.
 :

 -Hoss


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4090459.html
  To unsubscribe from requested url solr/update/extract not available on
 this server, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4090153code=bnV0YW5zaGluZGUxOTkyQGdtYWlsLmNvbXw0MDkwMTUzfC0xMzEzOTU5Mzcx
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091371.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: requested url solr/update/extract not available on this server

2013-09-21 Thread Erick Erickson
bq: And im not using the example config file

It looks like you have not included the request handler in your solrconfig.xml,
something like (from the stock distro):

  !-- Solr Cell Update Request Handler

   http://wiki.apache.org/solr/ExtractingRequestHandler

--
  requestHandler name=/update/extract
  startup=lazy
  class=solr.extraction.ExtractingRequestHandler 
lst name=defaults
  str name=lowernamestrue/str
  str name=uprefixignored_/str

  !-- capture link hrefs but ignore div attributes --
  str name=captureAttrtrue/str
  str name=fmap.alinks/str
  str name=fmap.divignored_/str
/lst
  /requestHandler

I'd start with the stock config and try removing things one-by-one...

Best,
Erick

On Sat, Sep 21, 2013 at 7:34 AM, Nutan nutanshinde1...@gmail.com wrote:
 Yes I do get the solr admin page.And im not using the example config file,I
 have create mine own for my project as required.I have also defined
 update/extract in solrconfig.xml.


 On Tue, Sep 17, 2013 at 4:45 AM, Chris Hostetter-3 [via Lucene] 
 ml-node+s472066n409045...@n3.nabble.com wrote:


 : Is /solr/update working?

 more importantly: does /solr/ work in your browser and return anything
 useful?  (nothing you've told us yet gives us anyway of knowning if
 solr is even up and running)

 if 'http://localhost:8080/solr/' shows you the solr admin UI, and you are
 using the stock Solr 4.2 example configs, then
 http://localhost:8080/solr/update/extract should not give you a 404
 error.

 if however you are using some other configs, it might not work unless
 those configs register a handler with the path /update/extract.

 Using the jetty setup provided with 4.2, and the example configs (from
 4.2) I was able to index a sample PDF just fine using your curl command...

 hossman@frisbee:~/tmp$ curl 
 http://localhost:8983/solr/update/extract?literal.id=1commit=true; -F
 myfile=@stump.winners.san.diego.2013.pdf
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime1839/int/lst
 /response





 :
 : Check solrconfig to see that /update/extract is configured as in the
 standard
 : Solr example.
 :
 : Does /solr/update/extract work for you using the standard Solr example?
 :
 : -- Jack Krupansky
 :
 : -Original Message- From: Nutan
 : Sent: Sunday, September 15, 2013 2:37 AM
 : To: [hidden email]http://user/SendEmail.jtp?type=nodenode=4090459i=0
 : Subject: requested url solr/update/extract not available on this server
 :
 : I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I
 : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this
 : error:requested url solr/update/extract not available on this server
 : When my curl is :
 : curl http://localhost:8080/solr/update/extract?literal.id=1commit=true;
 -F
 : myfile=@cookbook.pdf
 : There is no entry in log files. Please help.
 :
 :
 :
 : --
 : View this message in context:
 :
 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html
 : Sent from the Solr - User mailing list archive at Nabble.com.
 :

 -Hoss


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4090459.html
  To unsubscribe from requested url solr/update/extract not available on
 this server, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4090153code=bnV0YW5zaGluZGUxOTkyQGdtYWlsLmNvbXw0MDkwMTUzfC0xMzEzOTU5Mzcx
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4091371.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: requested url solr/update/extract not available on this server

2013-09-16 Thread Chris Hostetter

: Is /solr/update working?

more importantly: does /solr/ work in your browser and return anything 
useful?  (nothing you've told us yet gives us anyway of knowning if 
solr is even up and running)

if 'http://localhost:8080/solr/' shows you the solr admin UI, and you are 
using the stock Solr 4.2 example configs, then 
http://localhost:8080/solr/update/extract should not give you a 404 error.

if however you are using some other configs, it might not work unless 
those configs register a handler with the path /update/extract.

Using the jetty setup provided with 4.2, and the example configs (from 
4.2) I was able to index a sample PDF just fine using your curl command...

hossman@frisbee:~/tmp$ curl 
http://localhost:8983/solr/update/extract?literal.id=1commit=true; -F 
myfile=@stump.winners.san.diego.2013.pdf
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 
name=QTime1839/int/lst
/response





: 
: Check solrconfig to see that /update/extract is configured as in the standard
: Solr example.
: 
: Does /solr/update/extract work for you using the standard Solr example?
: 
: -- Jack Krupansky
: 
: -Original Message- From: Nutan
: Sent: Sunday, September 15, 2013 2:37 AM
: To: solr-user@lucene.apache.org
: Subject: requested url solr/update/extract not available on this server
: 
: I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I
: referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this
: error:requested url solr/update/extract not available on this server
: When my curl is :
: curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F
: myfile=@cookbook.pdf
: There is no entry in log files. Please help.
: 
: 
: 
: --
: View this message in context:
: 
http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html
: Sent from the Solr - User mailing list archive at Nabble.com. 
: 

-Hoss


requested url solr/update/extract not available on this server

2013-09-15 Thread Nutan
I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I
referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this
error:requested url solr/update/extract not available on this server
When my curl is :
curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F
myfile=@cookbook.pdf
There is no entry in log files. Please help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: requested url solr/update/extract not available on this server

2013-09-15 Thread Jack Krupansky

Is /solr/update working?

Check solrconfig to see that /update/extract is configured as in the 
standard Solr example.


Does /solr/update/extract work for you using the standard Solr example?

-- Jack Krupansky

-Original Message- 
From: Nutan

Sent: Sunday, September 15, 2013 2:37 AM
To: solr-user@lucene.apache.org
Subject: requested url solr/update/extract not available on this server

I am working on Solr 4.2 on Windows 7. I am trying to index pdf files.I
referred Solr Cookbook 4. Tomcat is using 8080 port number. I get this
error:requested url solr/update/extract not available on this server
When my curl is :
curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F
myfile=@cookbook.pdf
There is no entry in log files. Please help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Solr 4.2 update/extract adding unknown field, can we change field type from string to text

2013-09-03 Thread Jai
hi,

while indexing document with unknown fields, its adding unknown fields in
schema but its always guessing it as string type. is it possible to specify
default field type for unknown fields to some other type, like text so that
it gets tokenized? also can we specify other properties by default like
indexed/stored/multivalued?

PS am using solr4.2.

Thanks alot.
Jai


Re: Solr 4.2 update/extract adding unknown field, can we change field type from string to text

2013-09-03 Thread Shalin Shekhar Mangar
You can use the dynamic fields feature of Solr to map unknown field
names to types.

For example, a dynamic field named as *_s i.e. any field name ending
with _s can be mapped to string and so on. In your cases, if your
field names do not follow a set pattern, then you can even specify a
dynamic field as * and map it to text type.

See https://cwiki.apache.org/confluence/display/solr/Dynamic+Fields

On Tue, Sep 3, 2013 at 12:00 PM, Jai jai4l...@gmail.com wrote:
 hi,

 while indexing document with unknown fields, its adding unknown fields in
 schema but its always guessing it as string type. is it possible to specify
 default field type for unknown fields to some other type, like text so that
 it gets tokenized? also can we specify other properties by default like
 indexed/stored/multivalued?

 PS am using solr4.2.

 Thanks alot.
 Jai



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.2 update/extract adding unknown field, can we change field type from string to text

2013-09-03 Thread Chris Hostetter

Your email is vague in terms of what you are actually *doing* and what 
behavior you are seeing.  

Providing specific details like This is my schema.xml and this is my 
solrconfig.xml; when i POST this file to this URL i get this result and i 
would instead like to get this result is useful for other people to 
provide you with meaningful help...

https://wiki.apache.org/solr/UsingMailingLists

My best guess is that you are refering specifically to the behavior of 
ExtractingRequestHandler and the fields it tries to include in documents 
that are exstracted, and how those fileds are indexed -- in which case you 
can use the uprefix option to add a prefix to the name of all fields 
generated by Tika that aren't already in your schema, and you can then 
define a dynamicField matching hat prefix to ontrol every aspect of the 
resulting fields...

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika#UploadingDatawithSolrCellusingApacheTika-InputParameters


-Hoss


/update/extract error

2013-07-22 Thread franagan
Hi all,

im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper.
All its runing ok, documents are indexing in 2 diferent shards and select
*:* give me all documents.

Now im trying to add/index a new document via solj ussing CloudSolrServer. 

the code:

CloudSolrServer server = new CloudSolrServer(localhost:2181);
server.setDefaultCollection(tika);


ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest(/update/extract);
up.addFile(new File(C:\\sample.pdf), application/octet-stream);
up.setParam(literal.id, 666);   

server.request(up);
server.commit();

when up.setParam(literal.id, 666);, a exception is thown:

*apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR:
[doc=66
6] unknown field 'ignored_dcterms:modified'*
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:402)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:375)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:43
9)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:918)
at java.lang.Thread.run(Thread.java:662)


My schema looks like this:
 fields
field name=id type=integer indexed=true stored=true
required=true/
   field name=title type=string indexed=true stored=true/
   field name=author type=string indexed=true stored=true /
   field name=text type=text_ind indexed=true stored=true /   
   field name=_version_ type=long indexed=true  stored=true/  
 /fields

my solrConfig.xml:

  requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
 lst name=defaults
  str name=fmap.Last-Modifiedlast_modified/str
  str name=uprefixignored_/str
/lst
lst name=date.formats
  str-MM-dd/str
/lst
  /requestHandler

i have already activate /admin/luke check the schema, no dcterms:modified
field in the response only the corrects fields declared in schema.xml

Can someone help me with this issue?

Thanks in advance. 









--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-extract-error-tp4079555.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: /update/extract error

2013-07-22 Thread Jack Krupansky
You need a dynamic field pattern for ignored_* to ignore unmapped 
metadata.


-- Jack Krupansky

-Original Message- 
From: franagan

Sent: Monday, July 22, 2013 5:14 PM
To: solr-user@lucene.apache.org
Subject: /update/extract error

Hi all,

im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper.
All its runing ok, documents are indexing in 2 diferent shards and select
*:* give me all documents.

Now im trying to add/index a new document via solj ussing CloudSolrServer.

the code:

CloudSolrServer server = new CloudSolrServer(localhost:2181);
server.setDefaultCollection(tika);


ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest(/update/extract);
up.addFile(new File(C:\\sample.pdf), application/octet-stream);
up.setParam(literal.id, 666);

server.request(up);
server.commit();

when up.setParam(literal.id, 666);, a exception is thown:

*apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR:
[doc=66
6] unknown field 'ignored_dcterms:modified'*
   at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:402)
   at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ
er.java:180)
   at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:401)
   at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.j
ava:375)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:43
9)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:895)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:918)
   at java.lang.Thread.run(Thread.java:662)


My schema looks like this:
fields
   field name=id type=integer indexed=true stored=true
required=true/
  field name=title type=string indexed=true stored=true/
  field name=author type=string indexed=true stored=true /
  field name=text type=text_ind indexed=true stored=true /
  field name=_version_ type=long indexed=true  stored=true/
/fields

my solrConfig.xml:

 requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
lst name=defaults
 str name=fmap.Last-Modifiedlast_modified/str
 str name=uprefixignored_/str
   /lst
   lst name=date.formats
 str-MM-dd/str
   /lst
 /requestHandler

i have already activate /admin/luke check the schema, no dcterms:modified
field in the response only the corrects fields declared in schema.xml

Can someone help me with this issue?

Thanks in advance.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-extract-error-tp4079555.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: /update/extract error

2013-07-22 Thread franagan
I added dynamicField name=ignored_* type=string indexed=true
stored=true/  to the schema.xml and now its working. 
*
Thank you very much Jack. *





--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-extract-error-in-Solr-4-3-1-tp4079555p4079564.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: /update/extract

2010-08-21 Thread Jayendra Patil
The Extract Request Handler invokes the classes from the extraction package.

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/extraction/src/main/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java

This is package into the apache-solr-cell jar.

Regards,
Jayendra*

*
On Thu, Aug 19, 2010 at 10:04 AM, satya swaroop sswaro...@gmail.com wrote:

 Hi all,
   when we handle extract request handler what class gets invoked.. I
 need to know the navigation of classes when we send any files to solr.
 can anybody tell me the classes or any sources where i can get the answer..
 or can anyone tell me what classes get invoked when we start the
 solr... I be thankful if anybody can help me with regarding this..

 Regards,
 satya



/update/extract

2010-08-19 Thread satya swaroop
Hi all,
   when we handle extract request handler what class gets invoked.. I
need to know the navigation of classes when we send any files to solr.
can anybody tell me the classes or any sources where i can get the answer..
or can anyone tell me what classes get invoked when we start the
solr... I be thankful if anybody can help me with regarding this..

Regards,
satya


Re: Getting update/extract RequestHandler to work under Tomcat

2009-11-03 Thread Grant Ingersoll

Try making it a non-Lazy loaded handler. Does that help?


On Nov 2, 2009, at 4:37 PM, Glock, Thomas wrote:



Hoping someone might help with getting /update/extract  
RequestHandler to

work under Tomcat.

Error 500 happens when trying to access
http://localhost:8080/apache-solr-1.4-dev/update/extract/  (see below)

Note /update/extract DOES work correctly under the Jetty provided
example.

I think I must have a directory path incorrectly specified but not  
sure

where.

No errors in the Catalina log on startup - only this:

Nov 2, 2009 7:10:49 PM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: created /update/extract:
org.apache.solr.handler.extraction.ExtractingRequestHandler

Solrconfig.xml under tomcat is slightly changed from the example with
regards to lib elements:

 lib dir=../contrib/extraction/lib /
 lib dir=../dist/ regex=apache-solr-cell-\d.*\.jar /
 lib dir=../dist/ regex=apache-solr-clustering-\d.*\.jar /:

The \contrib and \dist directories were copied directly below the
webapps\apache-solr-1.4-dev unchanged from the example.

Im the catalina log I see all the Adding specified lib dirs... added
without error:

INFO: Adding specified lib dirs to ClassLoader
Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat 
%206.0/we

bapps/apache-solr-1.4-dev/contrib/extraction/lib/asm-3.1.jar' to
classloader
Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat 
%206.0/we

bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcmail-jdk14-136.jar'
to classloader
Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat 
%206.0/we

bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcprov-jdk14-136.jar'
to classloader

(...many more...)

Solr Home is mapped to:

INFO: SolrDispatchFilter.init()
Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: .\webapps\apache-solr-1.4-dev\solr
Nov 2, 2009 7:10:47 PM
org.apache.solr.core.CoreContainer$Initializer initialize
INFO: looking for solr.xml: C:\Program Files\Apache Software
Foundation\Tomcat 6.0\.\webapps\apache-solr-1.4-dev\solr\solr.xml
Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader
init
INFO: Solr home set to '.\webapps\apache-solr-1.4-dev\solr\'

500 Error:

HTTP Status 500 - lazy loading error
org.apache.solr.common.SolrException: lazy loading error at
org.apache.solr.core.RequestHandlers 
$LazyRequestHandlerWrapper.getWrappe

dHandler(RequestHandlers.java:249) at
org.apache.solr.core.RequestHandlers 
$LazyRequestHandlerWrapper.handleReq

uest(RequestHandlers.java:231) at
org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
org.apache.solr.servlet.SolrDispatchFilter.execute 
(SolrDispatchFilter.ja

va:338) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter 
(SolrDispatchFilter.j

ava:241) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter 
(Applica

tionFilterChain.java:235) at
org.apache.catalina.core.ApplicationFilterChain.doFilter 
(ApplicationFilt

erChain.java:206) at
org.apache.catalina.core.StandardWrapperValve.invoke 
(StandardWrapperValv

e.java:233) at
org.apache.catalina.core.StandardContextValve.invoke 
(StandardContextValv

e.java:191) at
org.apache.catalina.authenticator.AuthenticatorBase.invoke 
(Authenticator

Base.java:433) at
org.apache.catalina.core.StandardHostValve.invoke 
(StandardHostValve.java

:128) at
org.apache.catalina.valves.ErrorReportValve.invoke 
(ErrorReportValve.java

:102) at
org.apache.catalina.core.StandardEngineValve.invoke 
(StandardEngineValve.

java:109) at
org.apache.catalina.connector.CoyoteAdapter.service 
(CoyoteAdapter.java:2

93) at
org.apache.coyote.http11.Http11AprProcessor.process 
(Http11AprProcessor.j

ava:859) at
org.apache.coyote.http11.Http11AprProtocol 
$Http11ConnectionHandler.proce

ss(Http11AprProtocol.java:574) at
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java: 
1527)

at java.lang.Thread.run(Unknown Source) Caused by:
org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.extraction.ExtractingRequestHandler' at
org.apache.solr.core.SolrResourceLoader.findClass 
(SolrResourceLoader.jav

a:373) at
org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java: 
449) at
org.apache.solr.core.RequestHandlers 
$LazyRequestHandlerWrapper.getWrappe

dHandler(RequestHandlers.java:240) ... 17 more Caused by:
java.lang.ClassNotFoundException:
org.apache.solr.handler.extraction.ExtractingRequestHandler at
java.net.URLClassLoader$1.run(Unknown

RE: Getting update/extract RequestHandler to work under Tomcat

2009-11-03 Thread Glock, Thomas

Thanks -

Looked at it last night and I think the problem is that I need to
compile the ExtractingRequestHandler classes/jar.  

I see the source - but no classes or jar that seems to fit the bill.  

I've had problems getting ant to build from the nightly trunk.  I'm of
the opinion I simply need to get the latest source and perform an ant
build.  But this is the first I've worked with ant and so I'm sure I
don't have things set up correctly.

If there is an existing jar of the ExtractingRequestHandler classes that
I might download - please point me to it.

I'll look at this today - thanks again - much appreciated.


-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Tuesday, November 03, 2009 8:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Getting update/extract RequestHandler to work under Tomcat

Try making it a non-Lazy loaded handler. Does that help?


On Nov 2, 2009, at 4:37 PM, Glock, Thomas wrote:


 Hoping someone might help with getting /update/extract RequestHandler 
 to work under Tomcat.

 Error 500 happens when trying to access 
 http://localhost:8080/apache-solr-1.4-dev/update/extract/  (see below)

 Note /update/extract DOES work correctly under the Jetty provided 
 example.

 I think I must have a directory path incorrectly specified but not 
 sure where.

 No errors in the Catalina log on startup - only this:

   Nov 2, 2009 7:10:49 PM org.apache.solr.core.RequestHandlers
 initHandlersFromConfig
   INFO: created /update/extract:
 org.apache.solr.handler.extraction.ExtractingRequestHandler

 Solrconfig.xml under tomcat is slightly changed from the example with 
 regards to lib elements:

  lib dir=../contrib/extraction/lib /  lib dir=../dist/ 
 regex=apache-solr-cell-\d.*\.jar /  lib dir=../dist/ 
 regex=apache-solr-clustering-\d.*\.jar /:

 The \contrib and \dist directories were copied directly below the 
 webapps\apache-solr-1.4-dev unchanged from the example.

 Im the catalina log I see all the Adding specified lib dirs... added

 without error:

   INFO: Adding specified lib dirs to ClassLoader
   Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
 replaceClassLoader
   INFO: Adding
 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat
 %206.0/we
 bapps/apache-solr-1.4-dev/contrib/extraction/lib/asm-3.1.jar' to 
 classloader
   Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
 replaceClassLoader
   INFO: Adding
 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat
 %206.0/we
 bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcmail-jdk14-136.jar'
 to classloader
   Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
 replaceClassLoader
   INFO: Adding
 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat
 %206.0/we
 bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcprov-jdk14-136.jar'
 to classloader

   (...many more...)

 Solr Home is mapped to:

   INFO: SolrDispatchFilter.init()
   Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader
 locateSolrHome
   INFO: Using JNDI solr.home: .\webapps\apache-solr-1.4-dev\solr
   Nov 2, 2009 7:10:47 PM
 org.apache.solr.core.CoreContainer$Initializer initialize
   INFO: looking for solr.xml: C:\Program Files\Apache Software 
 Foundation\Tomcat 6.0\.\webapps\apache-solr-1.4-dev\solr\solr.xml
   Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader
 init
   INFO: Solr home set to '.\webapps\apache-solr-1.4-dev\solr\'

 500 Error:

 HTTP Status 500 - lazy loading error
 org.apache.solr.common.SolrException: lazy loading error at 
 org.apache.solr.core.RequestHandlers
 $LazyRequestHandlerWrapper.getWrappe
 dHandler(RequestHandlers.java:249) at
 org.apache.solr.core.RequestHandlers
 $LazyRequestHandlerWrapper.handleReq
 uest(RequestHandlers.java:231) at
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at 
 org.apache.solr.servlet.SolrDispatchFilter.execute
 (SolrDispatchFilter.ja
 va:338) at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter
 (SolrDispatchFilter.j
 ava:241) at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter
 (Applica
 tionFilterChain.java:235) at
 org.apache.catalina.core.ApplicationFilterChain.doFilter
 (ApplicationFilt
 erChain.java:206) at
 org.apache.catalina.core.StandardWrapperValve.invoke
 (StandardWrapperValv
 e.java:233) at
 org.apache.catalina.core.StandardContextValve.invoke
 (StandardContextValv
 e.java:191) at
 org.apache.catalina.authenticator.AuthenticatorBase.invoke
 (Authenticator
 Base.java:433) at
 org.apache.catalina.core.StandardHostValve.invoke
 (StandardHostValve.java
 :128) at
 org.apache.catalina.valves.ErrorReportValve.invoke
 (ErrorReportValve.java
 :102) at
 org.apache.catalina.core.StandardEngineValve.invoke
 (StandardEngineValve.
 java:109) at
 org.apache.catalina.connector.CoyoteAdapter.service
 (CoyoteAdapter.java:2
 93) at
 org.apache.coyote.http11.Http11AprProcessor.process

RE: Getting update/extract RequestHandler to work under Tomcat

2009-11-03 Thread Chris Hostetter

: I see the source - but no classes or jar that seems to fit the bill.  
: 
: I've had problems getting ant to build from the nightly trunk.  I'm of
...
: If there is an existing jar of the ExtractingRequestHandler classes that
: I might download - please point me to it.

If you are downloading a nightly (or a 1.4 release candidate) there is 
*nothing* you should need to build ... all of the compiled jars (including 
for all of hte contribs) can be found in the ./dist directory.  

(the only jars not included in the releases are the third-party 
clustering libraries not released under ASL compatible licenses, but those 
aren't neeed for extraction)



-Hoss



Re: Getting update/extract RequestHandler to work under Tomcat

2009-11-03 Thread Chris Hostetter

: The \contrib and \dist directories were copied directly below the
: webapps\apache-solr-1.4-dev unchanged from the example.

...that doesn't sound right, they shouldn't be copied into webapps at all.  
can you show a full directory structure...

: Im the catalina log I see all the Adding specified lib dirs... added
: without error:
: 
:   INFO: Adding specified lib dirs to ClassLoader
...
:   (...many more...)

...can you elaborate on many more ... specificly do you ever see it say 
it's loading anything from contrib/extraction or 
apache-solr-cell-1.4.jar ?



-Hoss



RE: Getting update/extract RequestHandler to work under Tomcat

2009-11-03 Thread Glock, Thomas
Follow-up - 

This is now working (sadly I'm not sure exactly why!) but I've
successfully used curl (under windows) and the following examples to
parse content

curl
http://localhost:8080/apache-solr-1.4-dev/update/extract?extractOnly=tru
e --data-binary @curl-config.pdf  -H Content-type:application/pdf
curl
http://localhost:8080/apache-solr-1.4-dev/update/extract?extractOnly=tru
e --data-binary @curl-config.html  -H Content-type:text/html
curl
http://localhost:8080/apache-solr-1.4-dev/update/extract?extractOnly=tru
e --data-binary @c:/EnterpriseSearchSummit.ppt  -H
Content-type:application/vnd.ms-powerpoint 

The solr-cell jar is being loaded as well as other jars from the contrib
and dist directories see list below. 

Regarding files being located in the webapps structure - I did that
because I wanted to try and keep 1.3 running under the same instance of
tomcat as 1.4 and thought there might be difficulties specificing Solr
Home via the tomcat java configuration.  I've since removed the 1.3
instance. 

(I removed the replaceClassLoader lines for readability)

.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/asm-
3.1.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/bcma
il-jdk14-136.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/bcpr
ov-jdk14-136.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm
ons-codec-1.3.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm
ons-compress-1.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm
ons-io-1.4.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm
ons-lang-2.1.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm
ons-logging-1.1.1.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/dom4
j-1.6.1.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/font
box-0.1.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/gero
nimo-stax-api_1.0_spec-1.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/icu4
j-3.8.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/jemp
box-0.2.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/log4
j-1.2.14.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/neko
html-1.9.9.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/ooxm
l-schemas-1.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/pdfb
ox-0.7.3.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/poi-
3.5-beta6.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/poi-
ooxml-3.5-beta6.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/poi-
scratchpad-3.5-beta6.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/tika
-core-0.4.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/tika
-parsers-0.4.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/xerc
esImpl-2.8.1.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/xml-
apis-1.0.b2.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/xmlb
eans-2.3.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/dist/apache-solr-cell-1.4-d
ev.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/dist/apache-solr-clustering
-1.4-dev.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/carr
ot2-mini-3.1.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/comm
ons-lang-2.4.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/ehca
che-1.6.2.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/goog
le-collections-1.0-rc2.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/jack
son-core-asl-0.9.9-6.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/jack
son-mapper-asl-0.9.9-6.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/log4
j-1.2.14.jar' to classloader
Nov 3, 2009 3:05:03 PM org.apache.solr.core.SolrConfig init
INFO: Loaded SolrConfig: solrconfig.xml
Nov 3, 2009 3:05:03 PM org.apache.solr.core.SolrCore init
INFO: Opening new SolrCore at .\webapps\apache-solr-1.4-dev\solr\,
dataDir=.\webapps\apache-solr-1.4-dev\solr\data\
-Original Message-
From: Chris

Getting update/extract RequestHandler to work under Tomcat

2009-11-02 Thread Glock, Thomas

Hoping someone might help with getting /update/extract RequestHandler to
work under Tomcat.

Error 500 happens when trying to access
http://localhost:8080/apache-solr-1.4-dev/update/extract/  (see below)

Note /update/extract DOES work correctly under the Jetty provided
example.

I think I must have a directory path incorrectly specified but not sure
where.

No errors in the Catalina log on startup - only this: 

Nov 2, 2009 7:10:49 PM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: created /update/extract:
org.apache.solr.handler.extraction.ExtractingRequestHandler

Solrconfig.xml under tomcat is slightly changed from the example with
regards to lib elements:

  lib dir=../contrib/extraction/lib /
  lib dir=../dist/ regex=apache-solr-cell-\d.*\.jar /
  lib dir=../dist/ regex=apache-solr-clustering-\d.*\.jar /:

The \contrib and \dist directories were copied directly below the
webapps\apache-solr-1.4-dev unchanged from the example.

Im the catalina log I see all the Adding specified lib dirs... added
without error:

INFO: Adding specified lib dirs to ClassLoader
Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat%206.0/we
bapps/apache-solr-1.4-dev/contrib/extraction/lib/asm-3.1.jar' to
classloader
Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat%206.0/we
bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcmail-jdk14-136.jar'
to classloader
Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat%206.0/we
bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcprov-jdk14-136.jar'
to classloader

(...many more...)

Solr Home is mapped to:

INFO: SolrDispatchFilter.init()
Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: .\webapps\apache-solr-1.4-dev\solr
Nov 2, 2009 7:10:47 PM
org.apache.solr.core.CoreContainer$Initializer initialize
INFO: looking for solr.xml: C:\Program Files\Apache Software
Foundation\Tomcat 6.0\.\webapps\apache-solr-1.4-dev\solr\solr.xml
Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader
init
INFO: Solr home set to '.\webapps\apache-solr-1.4-dev\solr\' 

500 Error:

HTTP Status 500 - lazy loading error
org.apache.solr.common.SolrException: lazy loading error at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappe
dHandler(RequestHandlers.java:249) at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleReq
uest(RequestHandlers.java:231) at
org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:338) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:241) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:235) at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:206) at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:233) at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:191) at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authenticator
Base.java:433) at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:128) at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:102) at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:109) at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2
93) at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.j
ava:859) at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.proce
ss(Http11AprProtocol.java:574) at
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1527)
at java.lang.Thread.run(Unknown Source) Caused by:
org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.extraction.ExtractingRequestHandler' at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.jav
a:373) at
org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449) at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappe
dHandler(RequestHandlers.java:240) ... 17 more Caused by:
java.lang.ClassNotFoundException:
org.apache.solr.handler.extraction.ExtractingRequestHandler at
java.net.URLClassLoader$1.run(Unknown Source) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(Unknown Source) at
java.lang.ClassLoader.loadClass(Unknown Source