A few questions about solr and tika
Hello everyone! Please tell me how and where to set Tika options in Solr? Where is Tica conf? I'm want to know how I can eliminate not required to me response attribute(such as links or images)? Also I am interesting how i can get and index only metadata in several file formats?
Solr errors
Hello everyone! Please tell my wy Solr freezes when I adding this file http://yadi.sk/d/dy-RtcHXB7KZU The response from the server does not come. curl http://localhost:8085/solr/myCollection/update/extract?literal.id=doc1literal.fileName=asuprefix=attr_commit=true; -F myfile=@/media/PENDRIVE/Out/www-http/159/8696_6_5_5535.mp3 Second question: When I adding this file http://yadi.sk/d/OpLW2JTTB7Ms4 Solr returns: wonder@wonder:~$ curl http://localhost:8085/solr/myCollection/update/extract?literal.id=doc1literal.fileName=asuprefix=attr_commit=true; -F myfile=@/media/PENDRIVE/Out/www-http/152/8696_6_5_5528.jpeg ?xml version=1.0 encoding=UTF-8? response lst name=errorstr name=msgjava.lang.NoClassDefFoundError: com/adobe/xmp/XMPException/strstr name=tracejava.lang.RuntimeException: java.lang.NoClassDefFoundError: com/adobe/xmp/XMPException at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:673) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1489) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:517) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:540) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1097) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:446) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1031) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:200) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:317) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:445) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:269) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.NoClassDefFoundError: com/adobe/xmp/XMPException at com.drew.imaging.jpeg.JpegMetadataReader.extractMetadataFromJpegSegmentReader(JpegMetadataReader.java:112) at com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(JpegMetadataReader.java:71) at org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:91) at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) ... 23 more Caused by: java.lang.ClassNotFoundException: com.adobe.xmp.XMPException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) ... 37 more /strint name=code500/int/lst /response
Re: Solr errors
Does anybody know how index files in zip archives?
Re: A few questions about solr and tika
Thanks for answer. If I dont want to store and index any fields i do: field name=links type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=link type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=img type=string indexed=false stored=false multiValued=true/!--удаление лишних TIKA-- field name=iframe type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=area type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=map type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=pragma type=string indexed=false stored=false multiValued=true/!--удаление лишних TIKA-- field name=expires type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=keywords type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- field name=stream_source_info type=string indexed=false stored=false multiValued=true/!--удаление лишних полей TIKA-- Other qestions is still open for me. 17.10.2013 14:26, primoz.sk...@policija.si пишет: Why don't you check these: - Content extraction with Apache Tika ( http://www.youtube.com/watch?v=ifgFjAeTOws) - ExtractingRequestHandler ( http://wiki.apache.org/solr/ExtractingRequestHandler) - Uploading Data with Solr Cell using Apache Tika ( https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika ) Primož From: wonder a-wonde...@rambler.ru To: solr-user@lucene.apache.org Date: 17.10.2013 12:23 Subject:A few questions about solr and tika Hello everyone! Please tell me how and where to set Tika options in Solr? Where is Tica conf? I'm want to know how I can eliminate not required to me response attribute(such as links or images)? Also I am interesting how i can get and index only metadata in several file formats?
Re: Solr errors
Thanks for answer. Yes Tika extract, but not index content. Here is the solr response ... content: [ 9118_xmessengereu_v18ximpda.jar dimonvideo.ru.txt ], ... There are not any of this files in index. Any ideas? 17.10.2013 17:20, Roland Everaert ?: Even if I don't test it myself, you can use Tika, it is able to extract document from zip archives and index them, but of course it depends of the file type in the archive.