Re: A few questions about solr and tika

2013-10-18 Thread primoz . skale
Everythink about Tika extraction is written under those links. Basicaly 
what you need is the following:

1) requestHandler for Tika in solrconfig.xml
2) keep all the fields in schema.xml that are needed for Tika (they are 
marked in example schema.xml) and set those you don't need to 
indexed=false and stored=false
3) if you want to limit the returned fields in query response use query 
parameter 'fl'.

Primoz




From:   wonder a-wonde...@rambler.ru
To: solr-user@lucene.apache.org
Date:   17.10.2013 14:44
Subject:Re: A few questions about solr and tika



Thanks for answer. If I dont want to store and index any fields i do:
field name=links type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=link type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=img type=string indexed=false stored=false 
multiValued=true/!--удаление лишних TIKA--
field name=iframe type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=area type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=map type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=pragma type=string indexed=false stored=false 
multiValued=true/!--удаление лишних TIKA--
field name=expires type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=keywords type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=stream_source_info type=string indexed=false 
stored=false multiValued=true/!--удаление лишних полей TIKA--

Other qestions is still open for me.


17.10.2013 14:26, primoz.sk...@policija.si пишет:
 Why don't you check these:

 - Content extraction with Apache Tika (
 http://www.youtube.com/watch?v=ifgFjAeTOws)
 - ExtractingRequestHandler (
 http://wiki.apache.org/solr/ExtractingRequestHandler)
 - Uploading Data with Solr Cell using Apache Tika (
 
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

 )

 Primož



 From:   wonder a-wonde...@rambler.ru
 To: solr-user@lucene.apache.org
 Date:   17.10.2013 12:23
 Subject:A few questions about solr and tika



 Hello everyone! Please tell me how and where to set Tika options in
 Solr? Where is Tica conf? I'm want to know how I can eliminate not
 required to me response attribute(such as links or images)? Also I am
 interesting how i can get and index only metadata in several file 
formats?







A few questions about solr and tika

2013-10-17 Thread wonder
Hello everyone! Please tell me how and where to set Tika options in 
Solr? Where is Tica conf? I'm want to know how I can eliminate not 
required to me response attribute(such as links or images)? Also I am 
interesting how i can get and index only metadata in several file formats?


Re: A few questions about solr and tika

2013-10-17 Thread wonder

Thanks for answer. If I dont want to store and index any fields i do:
field name=links type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=link type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=img type=string indexed=false stored=false 
multiValued=true/!--удаление лишних TIKA--
field name=iframe type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=area type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=map type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=pragma type=string indexed=false stored=false 
multiValued=true/!--удаление лишних TIKA--
field name=expires type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=keywords type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=stream_source_info type=string indexed=false 
stored=false multiValued=true/!--удаление лишних полей TIKA--


Other qestions is still open for me.


17.10.2013 14:26, primoz.sk...@policija.si пишет:

Why don't you check these:

- Content extraction with Apache Tika (
http://www.youtube.com/watch?v=ifgFjAeTOws)
- ExtractingRequestHandler (
http://wiki.apache.org/solr/ExtractingRequestHandler)
- Uploading Data with Solr Cell using Apache Tika (
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
)

Primož



From:   wonder a-wonde...@rambler.ru
To: solr-user@lucene.apache.org
Date:   17.10.2013 12:23
Subject:A few questions about solr and tika



Hello everyone! Please tell me how and where to set Tika options in
Solr? Where is Tica conf? I'm want to know how I can eliminate not
required to me response attribute(such as links or images)? Also I am
interesting how i can get and index only metadata in several file formats?