Search and index Result
Hi all, i just made a duplication of solrdispatchfilter as solrdispatchfilter1 and solrdispatchfilter2 such that all the /update or /update/extract things are passed through the solrdispatchfilter1 and all search (/select) things are passes through the solrdispatchfilter2. It is because i need to establish a privacy concern for the search result. I need to check whether the required user has access to the particular files or not.. it was success in implementing the privacy of results. one major problem i am getting is after indexing some documents and commiting it, i am not getting the commited data in the search result, i am getting the old data that was before commit... But i get the result only after restarting the server.. can anyone tell me where to modify such that the search will give the results from the recent commit... Thanks and Regards, satya
Re: how to set cookie for url requesting in stream_url
Hi All, I was able to set the cookie value to the Stream_url connection, i was able to pass the cookie value upto contentstreamBase.URLStream class and i added conn.setRequestProperty(Cookie,cookie[0].name=cookie[0].value) in the connection setup.. and it is working fine now... Regards, satya
Fwd: how to set cookie for url requesting in stream_url
HI Markus, I am using solr branch_3x, in tomcat web server Regards, satya
how to set cookie for url requesting in stream_url
Hi All, for indexing the documents in the other server i need to include a cookie value in the url requesting through the stream_url. can anybody tell me how to include the cookie in the url??? have anybody done this type??? or if there are any suggestions please tell me??? ex: http://localhost:8456/solr/update/extract?stream_url=remote_server_urlliteral.id=13748 ; here i need to include a cookie value while requesting for the remote_server_url. Regards, satya
Solr coding
Hi All, As for my project Requirement i need to keep privacy for search of files so that i need to modify the code of solr, for example if there are 5 users and each user indexes some files as user1 - java1, c1,sap1 user2 - java2, c2,sap2 user3 - java3, c3,sap3 user4 - java4, c4,sap4 user5 - java5, c5,sap5 and if a user2 searches for the keyword java then it should be display only the file java2 and not other files so inorder to keep this filtering inside solr itself may i know where to modify the code... i will access a database to check the user indexed files and then filter the result... i didnt have any cores.. i indexed all files in a single index... Regards, satya
Re: Solr coding
Hi Jayendra, I forgot to mention the result also depends on the group of user too It is some wat complex so i didnt tell it.. now i explain the exact way.. user1, group1 - java1, c1,sap1 user2 ,group2- java2, c2,sap2 user3 ,group1,group3- java3, c3,sap3 user4 ,group3- java4, c4,sap4 user5 ,group3- java5, c5,sap5 user1,group1 means user1 belong to group1 Here the filter includes the group too.., if for eg: user1 searches for java then the results should show as java1,java3 since java3 file is acessable to all users who are related to the group1, so i thought of to edit the code... Thanks, satya
Re: Solr coding
Hi Jayendra, the group field can be kept if the no. of groups are small... if a user may belong to 1000 groups in that case it would be difficult to make a query???, if a user changes the groups then we have to reindex the data again... ok i will try ur suggestion, if it can fulfill the needs then task will be very easy... Regards, satya
solr indexing
Hi all, to my keen intrest on solr indexing mechanism i started mining the code of solr indexing (/update/extract), i read the indexing file formats, scoring procedure, i have some queries regarding this.. 1) the scoring is performed on the dynamic and precalculated value(doc boost, field boost, lengthnorm). In calculating the score if suppose a term in the index consits nearly one million docs then is solr calculating the score for each and every doc present for the term and getting the top docs from the index??? or is it undergoing any mechanism such that limiting the calculation of score to only a particular docs??? If anybody know about it or any documentation regarding this please inform me... Regards, satya
is solr dynamic calculation??
Hi All, I have a query whether the solr shows the results of documents by calculating the score on dynamic or is it pre calculating and supplying??.. for example: if a query is made on q=solr in my index... i get a results of 25 documents... what is it calculating?? i am very keen to know its way of calculation of score and ordering of results Regards, satya
Re: is solr dynamic calculation??
Hi Markus, As far i gone through the scoring of solr. The scoring is done during searching on the use of boost values which were given during the indexing. I have a query now if i search for a keyword java then 1)if for a term named java in index contain 50,000 documents then do solr calculate the score value for each and every document and filter them and then sort it and server results??? if it does the dynamic calculation for each and every document then it takes a long time, but how can solr reduced it?? Am i right??? or if any wrong please tell me??? Regards, satya
Re: spell suggest response
Hi Grijesh, Though i use autosuggest i maynot get the exact results, the order is not accurate.. As for example if i type http://localhost:8080/solr/terms/?terms.fl=spellterms.prefix=solrterms.sort=indexterms.lower=solrterms.upper.incl=true i get results as... solr solr.amp solr.datefield solr.p solr.pdf like that.But this may not lead to getting accurate results as we get in spellchecking, i require suggestions for any word irrespective of whether it is correct or not, is there anything to be changed in solr to get suggestions as we get when we type a wrong word in spellchecking... If so please let me know... Regards, satya
Re: spell suggest response
Hi Grijesh, i added both the termscomponent and spellcheck component to the terms requesthandler, when i send a query as http://localhost:8080/solr/terms?terms.fl=textterms.prefix=javarows=7omitHeader=truespellcheck=truespellcheck.q=javaspellcheck.count=20 the result i get is response - lst name=terms - lst name=text int name=java6/int int name=javabas6/int int name=javas6/int int name=javascript6/int int name=javac6/int int name=javax6/int /lst /lst - lst name=spellcheck lst name=suggestions/ /lst /response when i send this http://localhost:8080/solr/terms?terms.fl=textterms.prefix=jawarows=5omitHeader=truespellcheck=truespellcheck.q=jawaspellcheck.count=20 i get the result as response - lst name=terms lst name=text/ /lst - lst name=spellcheck - lst name=suggestions - lst name=jawa int name=numFound20/int int name=startOffset0/int int name=endOffset4/int - arr name=suggestion strjava/str straway/str strjav/str strjar/str strara/str strapa/str strana/str strajax/str Now i need to know how to make ordering of the terms as in the 1st query the result obtained is inorder and i want only javax, javac,javascript but not javas,javabas how can it be done?? Regards, satya
spellchecking even the key is true....
Hi All, can we get the spellchecking results even when the keyword is true. As for spellchecking will give only to the wrong keywords, cant we get similar and near words of the keyword though the spellcheck.q is true.. as an example http://localhost:8080/solr/spellcheck?q=javaspellcheck=truespellcheck.count=5 the result will be 1)- response - lst name=spellcheck lst name=suggestions/ /lst /response can we get the result as 2) response - lst name=spellcheck lst name=suggestions strjavax/str strjavac/str strjavabean/str strjavascript/str /lst /response NOTE:: all the keywords in the 2nd result is are in index... Regards, satya
Re: spell suggest response
Hi Grijesh, As you said you are implementing this type. Can you tell how did you made in brief.. Regards, satya
Re: spell suggest response
Hi stefan, I need the words from the index record itself. If java is given then the relevant or similar or near words in the index should be shown. Even the given keyword is true... can it be possible??? ex:- http://localhost:8080/solr/spellcheckCompRH?q=javarows=0spellcheck=truespellcheck.count=10 In the o/p the suggestions will not be coming as java is a word that spelt correctly... But cant we get near suggestions as javax,javacetc.., ???(the terms in the index) I read about suggester in solr wiki at http://wiki.apache.org/solr/Suggester . But i tried to implement it but got errors as *error loading class org.apache.solr.spelling.suggest.suggester* Regards, satya
Re: spell suggest response
Hi Juan, yeah.. i tried of onlyMorePopular and got some results but are not similar words or near words to the word i have given in the query.. Here i state you the output.. http://localhost:8080/solr/spellcheckCompRH?q=javarows=0spellcheck=truespellcheck.collate=truespellcheck.onlyMorePopular=truespellcheck.count=20 the o/p i get is -arr name=suggestion strdata/str strhave/str strcan/str strany/str strall/str strhas/str streach/str strpart/str strmake/str strthan/str stralso/str /arr but this words are not similar to the given word 'java' the near words would be javac,javax,data,java.io... etc.., the stated words are present in the index.. Regards, satya
spell suggest response
Hi All, can we get just suggestions only without the files response?? Here I state an example when i query http://localhost:8080/solr/spellcheckCompRH?q=java daka usarspellcheck=truespellcheck.count=5spellcheck.collate=true i get some result of java files and then the suggestions for the words daka-data , usar-user. But actually i need only the spell suggestions. But here time is getting consumed for displaying of files and then giving spell suggestions. Cant we post a query to solr where we can get the response as only spell suggestions??? Regards, satya
Re: spell suggest response
Hi Gora, I am using solr for file indexing and searching, But i have a module where i dont need any files result but only the spell suggestions, so i asked is der anyway in solr where i would get the spell suggestion responses only.. I think it is clear for u now.. If not tell me I will try to explain still furthur... Regards, satya
Re: spell suggest response
Hi Stefan, Ya it works :). Thanks... But i have a question... can it be done only getting spell suggestions even if the spelled word is correct... I mean near words to it... ex:- http://localhost:8080/solr/spellcheckCompRH?q=javarows=0spellcheck=truespellcheck.count=10 In the o/p the suggestions will not be coming as java is a word that spelt correctly... But cant we get near suggestions as javax,javacetc.., ??? Regards, satya
error in html???
Hi All, I am able to get the response in the success case in json format by stating wt=json in the query. But as in case if any errors i am geting in html format. 1) Is there any specified reason to get in html format?? 2)cant we get the error result in json format?? Regards, satya
Re: error in html???
Hi Erick, Every result comes in xml format. But when you get any errors like http 500 or http 400 like wise we will get in html format. My query is cant we make that html file into json or vice versa.. Regards, satya
Different Results..
Hi All, i am getting different results when i used with some escape keys.. for example::: 1) when i use this request http://localhost:8080/solr/select?q=erlang!ericson the result obtained is result name=response numFound=1934 start=0 2) when the request is http://localhost:8080/solr/select?q=erlang/ericson the result is result name=response numFound=1 start=0 My query here is, do solr consider both the queries differently and what do it consider for !,/ and all other escape characters. Regards, satya
Re: Google like search
Hi All, Thanks for your suggestions.. I got the result of what i expected.. Cheers, Satya
Testing Solr
Hi All, I built solr successfully and i am thinking to test it with nearly 300 pdf files, 300 docs, 300 excel files,...and so on of each type with 300 files nearly Is there any dummy data available to test for solr,Otherwise i need to download each and every file individually..?? Another question is there any Benchmarks of solr...?? Regards, satya
Google like search
Hi All, Can we get the results like google having some data about the search... I was able to get the data that is the first 300 characters of a file, but it is not helpful for me, can i be get the data that is having the first found key in that file Regards, Satya
Re: Google like search
Hi Tanguy, I am not asking for highlighting.. I think it can be explained with an example.. Here i illustarte it:: when i post the query like dis:: http://localhost:8080/solr/select?q=Javaversion=2.2start=0rows=10indent=on i Would be getting the result as follows:: -response -lst name=responseHeader int name=status0/int int name=QTime1/int /lst -result name=response numFound=1 start=0 -doc str name=filenameJava%20debugging.pdf/str str name=id122/str -arr name=text1 -str Table of Contents If you're viewing this document online, you can click any of the topics below to link directly to that section. 1. Tutorial tips 2 2. Introducing debugging 4 3. Overview of the basics 6 4. Lessons in client-side debugging 11 5. Lessons in server-side debugging 15 6. Multithread debugging 18 7. Jikes overview 20 /str /arr /doc /result /response Here the str field contains the first 300 characters of the file as i kept a field to copy only 300 characters in schema.xml... But i dont want the content like dis.. Is there any way to make an o/p as follows:: str Java is one of the best language,java is easy to learn.../str where this content is at start of the chapter,where the first word of java is occured in the file... Regards, Satya
Re: Google like search
Hi Tanguy, Thanks for ur reply. sorry to ask this type of question. how can we index each chapter of a file as seperate document.As for i know we just give the path of file to solr to index it... Can u provide me any sources for this type... I mean any blogs or wiki's... Regards, satya
Re: RAM increase
Hi All, Thanks for your reply.I have a doubt whether to increase the ram or heap size to java or to tomcat where the solr is running Regards, satya
Re: solr result....
Hi Lance, I actually copied tika exceptions in one html file and indexed it. It is just a content of a file and here i tell u what i mean:: if i post a query like *java* then the result or response from solr should hit only a part of the content like as follows:: http://localhost:8456/solr/select/?q=javaversion=2.2start=10rows=10indent=on -response -lst name=responseHeader int name=status0/int int name=QTime453/int /lst -result name=response numFound=62 start=10 -doc -arr name=content_type strapplication/pdf/str /arr str name=idjavaebuk/str date name=last_modified2001-07-02T11:54:10Z/date -arr name=text -str A Java program with two main methods The following is an example of a java program with two main methods with different signatures. Program 3 public class TwoMains { /** This class has two main methods with * different signatures */ public static void main (String args[]) . /str /arr /doc. /response the doc in the result should not contain the entire content of a file. It should have only a part of the content.The content should be the first hit of the word java in that file... Regards, satya
solr result....
Hi , Can the result of solr show the only a part of the content of a document that got in the result. example if i send a query for to search tika then the result should be as follows::: response -lst name=responseHeader int name=status0/int int name=QTime79/int /lst -result name=response numFound=62 start=0 doc -arr name=content_type strtext/html/str /arr str name=id1html/str -arr name=text -str Apache Tomcat/6.0.26 - Error reportHTTP Status 500 - org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@cc9d70 org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@cc9d70 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:214) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)... /str /arr /doc The result should not show the entire content of a file. It should show up only a part of the content where the query word is present..As like the google result and like search result in the lucidimagionation Regards, satya
RAM increase
Hi all, I increased my RAM size to 8GB and i want 4GB of it to be used for solr itself. can anyone tell me the way to allocate the RAM for the solr. Regards, satya
solr requirements
Hi All, I am planning to have a separate server for solr and regarding hardware requirements i have a doubt about what configuration to be needed. I know it will be hard to tell but i just need a minimum requirement for the particular situation as follows:: 1) There are 1000 regular users using solr and Every day each user indexes 10 files of 1KB each and totally it leads to a size of 10MB for a day and it goes on...??? 2)How much of RAM is used by solr in genral??? Thanks, satya
Re: solr requirements
Hi, here is some more info about it. I use Solr to output only the file names(file id's). Here i enclose the fields in my schema.xml and presently i have only about 40MB of indexed data. field name=id type=string indexed=true stored=true required=true / field name=sku type=textTight indexed=true stored=false omitNorms=true/ field name=name type=textgen indexed=true stored=false/ field name=manu type=textgen indexed=true stored=false omitNorms=true/ field name=cat type=text_ws indexed=true stored=false multiValued=true omitNorms=true / field name=features type=text indexed=true stored=false multiValued=true/ field name=includes type=text indexed=true stored=false termVectors=true termPositions=true termOffsets=true / field name=weight type=float indexed=true stored=false/ field name=price type=float indexed=true stored=false/ field name=popularity type=int indexed=true stored=false / field name=inStock type=boolean indexed=true stored=false / !-- The following store examples are used to demonstrate the various ways one might _CHOOSE_ to implement spatial. It is highly unlikely that you would ever have ALL of these fields defined. -- field name=store type=location indexed=true stored=false/ field name=store_lat_lon type=latLon indexed=true stored=false/ field name=store_hash type=geohash indexed=true stored=false/ !-- Common metadata fields, named specifically to match up with SolrCell metadata when parsing rich documents such as Word, PDF. Some fields are multiValued only because Tika currently may return multiple values for them. -- field name=title type=text indexed=true stored=true multiValued=true/ field name=subject type=text indexed=true stored=false/ field name=description type=text indexed=true stored=false/ field name=comments type=text indexed=true stored=false/ field name=author type=textgen indexed=true stored=false/ field name=keywords type=textgen indexed=true stored=false/ field name=category type=textgen indexed=true stored=false/ field name=content_type type=string indexed=true stored=false multiValued=true/ field name=last_modified type=date indexed=true stored=false/ field name=links type=string indexed=true stored=false multiValued=true/ !-- added here content satya-- field name=content type=spell indexed=true stored=false multiValued=true/ !-- catchall field, containing all other searchable text fields (implemented via copyField further on in this schema -- field name=text type=text indexed=true stored=false multiValued=true termVectors=true/ !-- catchall text field that indexes tokens both normally and in reverse for efficient leading wildcard queries. here satya-- field name=text_rev type=text_rev indexed=true stored=false multiValued=true/ !-- non-tokenized version of manufacturer to make it easier to sort or group results by manufacturer. copied from manu via copyField here satya-- field name=manu_exact type=string indexed=true stored=false/ field name=spell type=spell indexed=true stored=false multiValued=true/ !-- heere changed -- field name=payloads type=payloads indexed=true stored=false/ field name=timestamp type=date indexed=true stored=false default=NOW multiValued=false/ Regards, satya
ant build problem
Hi all, i updated my solr trunk to revision 1004527. when i go for compiling the trunk with ant i get so many warnings, but the build is successful. the warnings are here::: common.compile-core: [mkdir] Created dir: /home/satya/temporary/trunk/lucene/build/classes/java [javac] Compiling 475 source files to /home/satya/temporary/trunk/lucene/build/classes/java [javac] warning: [path] bad path element /usr/share/ant/lib/hamcrest-core.jar: no such file or directory [javac] /home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/queryParser/QueryParserTokenManager.java:455: warning: [cast] redundant cast to int [javac] int hiByte = (int)(curChar 8); [javac] ^ [javac] /home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/queryParser/QueryParserTokenManager.java:705: warning: [cast] redundant cast to int [javac] int hiByte = (int)(curChar 8); [javac] ^ [javac] /home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/queryParser/QueryParserTokenManager.java:812: warning: [cast] redundant cast to int [javac] int hiByte = (int)(curChar 8); [javac] ^ [javac] /home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/queryParser/QueryParserTokenManager.java:983: warning: [cast] redundant cast to int [javac] int hiByte = (int)(curChar 8); [javac] ^ [javac] /home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/search/FieldCacheImpl.java:209: warning: [unchecked] unchecked cast [javac] found : java.lang.Object [javac] required: T [javac] key.creator.validate( (T)value, reader); [javac] ^ [javac] /home/satya/temporary/trunk/lucene/src/java/org/apache/lucene/search/FieldCacheImpl.java:278: warning: [unchecked] unchecked call to Entry(java.lang.String,org.apache.lucene.search.cache.EntryCreatorT) as a member of the raw type org.apache.lucene.search.FieldCacheImpl.Entry [javac] return (ByteValues)caches.get(Byte.TYPE).get(reader, new Entry(field, creator)); ptionList.addAll(exceptions); || [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files additionally use unchecked or unsafe operations. [javac] 100 warnings BUILD SUCCESSFUL Total time: 19 seconds here i placed only the starting stage of warnings. After the compiling i thought to check with the ant test and performed but it is failed.. i didnt find any hamcrest-core.jar in my ant library i use ant 1.7.1 Regards, satya
ant package
Hi all, i want to build the package of my solr and i found it can be done using ant. When i type ant package in solr module i get an error as:::\ sa...@swaroop:~/temporary/trunk/solr$ ant package Buildfile: build.xml maven.ant.tasks-check: BUILD FAILED /home/satya/temporary/trunk/solr/common-build.xml:522: ## Maven ant tasks not found. Please make sure the maven-ant-tasks jar is in ANT_HOME/lib, or made available to Ant using other mechanisms like -lib or CLASSPATH. ## Total time: 0 seconds can anyone tell me the procedure to build it or give any information about it.. Regards, satya
Re: ant package
HI , ya i dont have the jar file in the ant/lib where can i get the jar file or wat is the procedure to make that maven-artifact-ant-2.0.4-dep.jar?? regards, satya
Re: ant package
Hi erick, thanks for reply and i got the jar file downloaded and kept it in ant library now when i make ant package command it getting error in the middle of build in generate-maven-artifacts... and the error is sa...@geodesic-desktop:~/temporary/trunk/solr$ sudo ant package --- --- --- generate-maven-artifacts: [mkdir] Created dir: /home/satya/temporary/trunk/solr/build/maven [mkdir] Created dir: /home/satya/temporary/trunk/solr/dist/maven [copy] Copying 1 file to /home/satya/temporary/trunk/solr/build/maven/src/maven [artifact:install-provider] Installing provider: org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-2 BUILD FAILED /home/satya/temporary/trunk/solr/build.xml:853: The following error occurred while executing this line: /home/satya/temporary/trunk/solr/common-build.xml:373: artifact:deploy doesn't support the uniqueVersion attribute Total time: 1 minute 51 seconds sa...@desktop:~/temporary/trunk/solr$ Regards, satya
SolrCloud new....
Hi all, I am having 4 instances of solr in 4 systems.Each system has a single instance of solr.. I want the result from all these servers. I came to know using of solrcloud. I read about it and worked on the example and it was working as given in wiki. I am using solr 1.4 and apache tomcat. In order to implement cloud in the solr trunk wat procedure should be followed. 1)Should i copy the libraries from cloud to trunk??? 2)should i keep the cloud module in every system??? 3) I am not using any cores in the solr. It is a single solr in every system.can solrcloud support it?? 4) the example is given in jetty.Is it the same way to make it in tomcat??? Regards, satya
cloud or zookeeper
Hi All, What is the difference of using shards,solr cloud and zookeeper.. which is the best way to scale the solr.. I need to reduce the index size in every system and reduce the search time for a query... Regards, satya
Re: stream.url
Hi Hoss, Thanks for reply and it got working The reason was as you said i was not double escaping i used %2520 for whitespace and it is working now Thanks, satya
Re: stream.url
Hi all, I am unable to index the files of remote system that contains escaped characters in their file names i think there is a problem in solr for indexing the files of escaped characters in remote system... Has anybody tried to index the files in remote system that contain the escaped characters But solr is working good for files that has no escaped characters in their name. I sent the request through the curl by encoding the filename in url format but the problem is same... Regards, satya
stream.url
Hi all, I am using stream.url to index the files in the remote system. when i use the url as 1) curl http://localhost:8080/solr/update/extract?stream.url=http://remotehost:port/file_download.yaws?file=yaws_presentation.pdfliteral.id=schb4 it works and i get the response as the file got indexed. but when i use 2) curl http://localhost:8080/solr/update/extract?stream.url=http://remotehost:port/file_download.yaws?file=solr; apache.pdf literal.id=schb5 i get the error in the solr... i replaced the escaped characters with %20 for space and %26 for , but the error is same saying Unexpected end of file from server java.net.SocketException.. when i used without solr as http://remotehost:port/file_download.yaws?file=solr apache.pdf then i get the file downloaded to my system. I here enclose the entire error= HTTP Status 500 - Unexpected end of file from server java.net.SocketException: Unexpected end of file from server at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1368) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1362) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1016) at org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:88) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:169) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:57) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:133) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1355) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:340) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) Caused by: java.net.SocketException: Unexpected end of file from server at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:766) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072) at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2173) at java.net.URLConnection.getContentType(URLConnection.java:485) at org.apache.solr.common.util.ContentStreamBase$URLStream.init(ContentStreamBase.java:81) at org.apache.solr.servlet.SolrRequestParsers.buildRequestFrom(SolrRequestParsers.java:138) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:117) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:226) ... 12 more can anybody provide information regarding this?? Regards, Satya
Re: stream.url
Hi stefan, I used escape charaters and made it... It is not problem for a single file of 'solr apache' but it shows the same problem for the files like Wireless lan.ppt, Tom info.pdf. the curl i sent is:: curl http://localhost:8080/solr/update/extract?stream.url=http://remotehost:port/file_download.yaws%3Ffile=solrhttp://localhost:8080/solr/update/extract?stream.url=http://remotehost:port/file_download.yaws?file=solr %20%26%20apache.pdfliteral.id=schb5 Regards, satya
Re: stream.url
Hi, I made the curl from the shell(command prompt or terminal) with the escaping characters but the error is same when i saw in the remote system the request is not getting there Is there anything to be changed in config file inorder to enable the escaping characters for stream.url Did anybody try indexing files in remote system through stream.url, where the files name contain escape characters like ,space regards, satya
solr working...
Hi all, I am intrested to see the working of solr. 1)Can anyone tell me how to start with to know its working Regards, satya
Re: solr working...
Hi peter, I am already working on solr and it is working good. But i want to understand the code and know where the actual working is going on, and how indexing is done and how the requests are parsed and how it is responding and all others. TO understand the code i asked how to start??? Regards, satya
Re: solr working...
Hi all, Thanks for ur response and information. I used slf4j log and i kept log.info method in every class of solr module to know which classes get invoke on particular requesthandler or on start of solr I was able to keep it only in solr Module but not in lucene module... i get error when i use it in dat module.. can any one tell me other ways like this to track the path solr Regards, satya
reduce the content???
Hi all, i indexed nearly 100 java pdf files which are of large size(min 1MB). The solr is showing the results with the entire content that it indexed which is taking time to show the results.. cant we reduce the content it shows or can i just have the file names and ids instead of the entire content in the results Regards, satya
Re: stream.url problem
Hi all, I got the solution for my problem. I changed my port number and i kept the old one in the stream.url... so problem was that... thanks all Now i got another problem, it is when i send any requests to remote system for the files that have names with escape characters like ,space . For example= TomJerry.pdf i get a problem as Unexpected end of file from server... the request i sent is:: curl http://localhost:8080/solr/update/extract?stream.url=http://remotehost:8011/file_download.yaws?file=Wireless%20Lan.pdfliteral.id=su8 here file_download.yaws is a module that fetches the file and gives to solr. solr is able to index the files that doesnt contain the escape characters in the remote system.. example:: apache.txt, solr_apache.pdf the error i got is::: HTTP Status 500 - Unexpected end of file from server java.net.SocketException: Unexpected end of file from server at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1368) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1362) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1016) at org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:88) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:161) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:57) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:133) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1355) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:340) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) Caused by: java.net.SocketException: Unexpected end of file from server at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:766) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072) at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2173) at java.net.URLConnection.getContentType(URLConnection.java:485) at org.apache.solr.common.util.ContentStreamBase$URLStream.init(ContentStreamBase.java:81) at org.apache.solr.servlet.SolrRequestParsers.buildRequestFrom(SolrRequestParsers.java:138) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:117) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:226) ... Regards, satya
/update/extract
Hi all, when we handle extract request handler what class gets invoked.. I need to know the navigation of classes when we send any files to solr. can anybody tell me the classes or any sources where i can get the answer.. or can anyone tell me what classes get invoked when we start the solr... I be thankful if anybody can help me with regarding this.. Regards, satya
solr working...
hi all, i am very intrested to know the working of solr. can anyone tell me which modules or classes that gets invoked when we start the servlet container like tomcat or when we send any requests to solr like sending pdf files or what files get invoked at the start of solr.?? regards, satya
stream.url problem
hi all, i am indexing the documents to solr that are in my system. now i need to index the files that are in remote system, i enabled the remote streaming to true in solrconfig.xml and when i use the stream.url it shows the error as connection refused and the detail of the error is::: when i sent the request in my browser as:: http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdfliteral.id=schb2 i get the error as HTTP Status 500 - Connection refused java.net.ConnectException: Connection refused at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1368) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1362) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1016) at org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:88) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:161) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:525) at java.net.Socket.connect(Socket.java:475) at sun.net.NetworkClient.doConnect(NetworkClient.java:163) at sun.net.www.http.HttpClient.openServer(HttpClient.java:394) at sun.net.www.http.HttpClient.openServer(HttpClient.java:529) at sun.net.www.http.HttpClient.init(HttpClient.java:233) at sun.net.www.http.HttpClient.New(HttpClient.java:306) at sun.net.www.http.HttpClient.New(HttpClient.java:323) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:860) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:801) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049) at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2173) at java.net.URLConnection.getContentType(URLConnection.java:485) at org.apache.solr.common.util.ContentStreamBase$URLStream.init(ContentStreamBase.java:81) at org.apache.solr.servlet.SolrRequestParsers.buildRequestFrom(SolrRequestParsers.java:136) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:116) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225) ... if any body know please help me with this regards, satya
Re: indexing???
hi, 1) i use tika 0.8... 2)the url is https://issues.apache.org/jira/browse/PDFBOX-709 and the file is samplerequestform.pdf 3)the entire error is::; curl http://localhost:8080/solr/update/extract?stream.file=/home/satya/my_workings/satya_ebooks/8-Linux/samplerequestform.pdfliteral.id=linuxc htmlheadtitleApache Tomcat/6.0.26 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@1d688e2 org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@1d688e2 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:214) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@1d688e2 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:144) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193) ... 18 more Caused by: java.lang.ClassCastException: org.apache.pdfbox.pdmodel.font.PDFontDescriptorAFM cannot be cast to org.apache.pdfbox.pdmodel.font.PDFontDescriptorDictionary at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:167) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.lt;initgt;(PDTrueTypeFont.java:117) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:140) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:76) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:225) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247) at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) ... 21 more /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uorg.apache.tika.exception.TikaException: Unexpected RuntimeException from
Re: indexing???
hi all, the error i got is Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@8210fc when i indexed a file similar to the one in https://issues.apache.org/jira/browse/PDFBOX-709/samplerequestform.pdfcant we index those type files in solr??? regards, satya
indexing???
Hi all, The indexing part of solr is going good,but i got a error on indexing a single pdf file. when i searched for the error in the mailing list i found that the error was due to copyright of that file. can't we index a file which has copy rights or any digital rights??? regards, satya
spell checking problem
hi all, i need some help in spellchecking.i configured my solrconfig and schema by looking the usermailing list and here i give you the configuration i made.. my schema.xml:: fieldType name=spellText class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=spell type=spellText indexed=true stored=true multiValued=true/ copyField source=* dest=spell/ my solrconfig.xml: -- requestHandler name=spellchecker class=solr.SearchHandler startup=lazy lst name=defaults str name=spellcheck.dictionarydefault/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count5/str /lst arr name=last-components strspellcheck/str /arr /requestHandler searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypespellText/str lst name=spellchecker str name=namedefault/str str name=fieldname/str !-- the default field in solrconfig if i change to spell field then the dictionary is not created -- str name=spellcheckIndexDir./spell/str str name=buildOnCommittrue/str /lst !-- a spellchecker that uses a different distance measure-- lst name=spellchecker str name=namejarowinkler/str str name=fieldspell/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellcheckerjaro/str /lst /searchComponent 1)the problem here is for the default dictionary the index is getting created and if i write jawa the suggestions it gives are data,sata.. but the actual sugest is java. I nearly have 20 java docs indexed 2)another problem ::: if i make build to jarowinkler dictionary which is using the spell field is not going to create the dictionary and i only see segments.gen and segments_1 in its directory regards, satya
spell checking....
hi all, i am a new one to solr and able to implement indexing the documents by following the solr wiki. now i am trying to add the spellchecking. i followed the spellcheck component in wiki but not getting the suggested spellings. i first build it by spellcheck.build=true,... here i give u the example::: http://localhost:8080/solr/spell?q=javsspellcheck=truespellcheck.collate=true response - /result lst name=spellcheck lst name=suggestions/ /lst /response here the response should actualy suggest the java but didnt.. can any one guide me about it... i am using solr 1.4, tomcat in ubuntu Regards, swarup
Re: spell checking....
This is in solrconfig.xml::: searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namedefault/str str name=classnamesolr.IndexBasedSpellChecker/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.7/str str name=buildOnCommittrue/str str name=buildOnOptimizetrue/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldlowerfilt/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker/str str name=buildOnCommittrue/str str name=buildOnOptimizetrue/str /lst str name=queryAnalyzerFieldTypetextSpell/str /searchComponent !-- The SpellingQueryConverter to convert raw (CommonParams.Q) queries into tokens. Uses a simple regular expression to strip off field markup, boosts, ranges, etc. but it is not guaranteed to match an exact parse from the query parser. Optional, defaults to solr.SpellingQueryConverter -- queryConverter name=queryConverter class=org.apache.solr.spelling.SpellingQueryConverter/ i added the following in standard request handler:: requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str !-- Optional, must match spell checker's name as defined above, defaults to default -- str name=spellcheck.dictionarydefault/str !-- omp = Only More Popular -- str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler
Re: problem with storing??
hi all, now solr is working good.i am working in ubuntu and i was indexing the documents which dont hav permissions . so the problem was that. i thank all of u for ur reply to my queries. thanking you, satya
Re: no response
hi, i am sorry the mail u sent was in sent mail... I didnt look it I am going to check now.. I will definetely tell u the entire thing regards, satya
Re: problem with storing??
hi, I checked out the admin page and it is indexing for others.In the log files i dont get anything when i send the documents. I checked out the log in catalina(tomcat). I changed the dismax handler from q=*:* to q= . I atleast get the response when i send pdf/html files but dont even get for the doc files regards, swaroop
problem with storing??
Hi all, i am new to solr and i followed d wiki and got everything going right. But when i send any html/txt/pdf documents the response is as follows::: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime576/int/lst /response but when i search in the solr i dont find the result can any one tell me what to be done..?? The curl i used for the above o/p is curl ' http://localhost:8080/solr/update/extract?literal.id=doc1000commit=truefmap.content=text' -F myfi...@java.pdf regards, satya
Re: problem with storing??
hi, i sent the commit after adding the documents. but the problem is same regards, satya
no response
Hi all, i Have a problem with the solr. when i send the documents(.doc) i am not getting the response. example: sa...@geodesic-desktop:~/Desktop$ curl http://localhost:8080/solr/update/extract?stream.file=/home/satya/Desktop/InvestmentDecleration.docstream.contentType=application/msword; literal.id=Invest.doc sa...@geodesic-desktop:~/Desktop$ could any body tell me what to do??
Re: indexing rich documents
ya i checked the extraction request handler but couldnt get the info... i installed tika-0.7 and copied the jar files into the solr home library.. i started sending the pdf/html files then i get a lazy error. i am using tomcat and solr 1.4
Re: indexing with pdf files problem
hi, I installed tika and made its jar files into solr home library and also gave the path to the tika configuration file. But the error is same. the tika config file is as follows::: ?xml version=1.0 encoding=UTF-8? properties mimeTypeRepository resource=/opt/tika-0.7/tika-core/target/classes/org/apache/tika/mime/tika-mimetypes.xml magic=false/ parsers parser name=text-xml class=org.apache.tika.parser.xml.XMLParser namespacehttp://purl.org/dc/elements/1.1//namespace mimeapplication/xml/mime extract content name=title xpathSelect=//dc:title/ content name=subject xpathSelect=//dc:subject/ content name=creator xpathSelect=//dc:creator/ content name=description xpathSelect=//dc:description/ content name=publisher xpathSelect=//dc:publisher/ content name=contributor xpathSelect=//dc:contributor/ content name=type xpathSelect=//dc:type/ content name=format xpathSelect=//dc:format/ content name=identifier xpathSelect=//dc:identifier/ content name=language xpathSelect=//dc:language/ content name=rights xpathSelect=//dc:rights/ content name=outLinks regexSelect ![CDATA[ ([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?) ]] /regexSelect /content /extract /parser parser name=parse-msword class=org.apache.tika.parser.msword.MsWordParser mimeapplication/msword/mime extract content name=fullText textSelect=fullText/ content name=outLinks regexSelect ![CDATA[ ([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?) ]] /regexSelect /content /extract /parser parser name=parse-msexcel class=org.apache.tika.parser.msexcel.MsExcelParser mimeapplication/vnd.ms-excel/mime extract content name=fullText textSelect=fullText/ content name=outLinks regexSelect ![CDATA[ ([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?) ]] /regexSelect /content /extract /parser parser name=parse-mspowerpoint class=org.apache.tika.parser.mspowerpoint.MsPowerPointParser mimeapplication/vnd.ms-powerpoint/mime extract content name=fullText textSelect=fullText/ content name=title textSelect=title/ content name=author textSelect=author/ content name=subject textSelect=subject/ content name=outLinks regexSelect ![CDATA[ ([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?) ]] /regexSelect /content /extract /parser parser name=parse-html class=org.apache.tika.parser.html.HtmlParser mimetext/html/mime mimeapplication/x-asp/mime extract content name=fullText textSelect=fullText/ content name=title textSelect=title/ content name=outLinks regexSelect ![CDATA[ ([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?) ]] /regexSelect /content /extract /parser !-- parser name=parse-html class=org.apache.tika.parser.html.NekoHtmlParser mimetext/html/mime mimeapplication/x-asp/mime . extract content name=fullText xpathSelect=//*/ content name=title xpathSelect=//title/ content name=outLinks regexSelect ![CDATA[ ([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?) ]] /regexSelect /content /extract /parser -- parser mame=parse-rtf class=org.apache.tika.parser.rtf.RTFParser mimeapplication/rtf/mime extract content name=fullText textSelect=fullText/ content name=outLinks regexSelect ![CDATA[ ([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9/](([A-Za-z0-9$_.+!*,;/?:@~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@~=%-]{0,1000}))?) ]] /regexSelect /content /extract /parser parser name=parse-pdf class=org.apache.tika.parser.pdf.PDFParser mimeapplication/pdf/mime extract content name=fullText textSelect=fullText/ content name=title textSelect=title/ content name=author textSelect=author/ content name=creator textSelect=creator/ content name=summary textSelect=summary/ content name=keywords textSelect=keywords/ content name=producer textSelect=producer/ content name=subject textSelect=subject/ content name=trapped textSelect=trapped/ content name=creationDate textSelect=creationDate/
indexing rich documents
Hi all, i am new to solr and followed with the wiki and got the solr admin run sucessfully. It is good going for xml files. But to index the rich documents i am unable to get it. I followed wiki to make the richer documents also, but i didnt get it.The error comes when i send an pdf/html file is a lazy error. can anyone give some detail description about how to make richer documents indexable i use tomcat and working in ubuntu. The home directory for solr is /opt/solr/example and catalina home is /opt/tomcat6. thanks regards, swaroop
indexing rich documents
Hi all, i am new to solr and followed with the wiki and got the solr admin run sucessfully. It is good going for xml files. But to index the rich documents i am unable to get it. I followed wiki to make the richer documents also, but i didnt get it.The error comes when i send an pdf/html file is a lazy error. can anyone give some detail description about how to make richer documents indexable i use tomcat and working in ubuntu. The home directory for solr is /opt/solr/example and catalina home is /opt/tomcat6. thanks regards, swaroop
Re: indexing rich documents
hi, yes i followed the wiki and can now tell me the procedure for it regards, swaroop
indexing with pdf files problem
hi all, i am working with solr on tomcat. the indexing is good for xml files but when i send the docs or html files or pdf's through curl i get the error as lazy error. can u telll me the way. the output is as follows when i send a pdf file i am working in ubuntu. solr home is /opt/example tomcat is /opt/tomcat6 htmlheadtitleApache Tomcat/6.0.26 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.solr.common.SolrException: java.lang.NullPointerException at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:76) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) ... 16 more Caused by: java.lang.NullPointerException at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:73) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:99) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:84) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:61) at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:74) ... 17 more /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b ulazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at