Problem with SynonymFilter and StopFilterFactory
Hi, I have encoutered a problem applying StopFilterFactory and SynonimFilterFactory. The problem is that SynonymFilter removes the gaps that were previously put by the StopFilterFactory. I'm applying filters in query time, because users need to change synonym lists frequently. This is my schema, and an example of the issue: String: documentacion para agentes org.apache.solr.analysis.WhitespaceTokenizerFactory {luceneMatchVersion=LUCENE_35} position1 2 3 term text documentaciónpara agentes startOffset 0 14 19 endOffset 13 18 26 org.apache.solr.analysis.LowerCaseFilterFactory {luceneMatchVersion=LUCENE_35} position1 2 3 term text documentaciónpara agentes startOffset 0 14 19 endOffset 13 18 26 org.apache.solr.analysis.StopFilterFactory {words=stopwords_intranet.txt, ignoreCase=true, enablePositionIncrements=true, luceneMatchVersion=LUCENE_35} position1 3 term text documentación agentes startOffset 0 19 endOffset 13 26 org.apache.solr.analysis.SynonymFilterFactory {synonyms=sinonimos_intranet.txt, expand=true, ignoreCase=true, luceneMatchVersion=LUCENE_35} position1 2 term text documentación agente archivo agentes typeSYNONYM SYNONYM SYNONYM SYNONYM startOffset 0 19 0 19 endOffset 1326 13 26 As you can see, the position should be 1 and 3, but SynonymFilter removes the gap and moves token from position 3 to 2 I've got the same problem with Solr 3.5 y 4.0. I don't know if it's a bug or an error with my configuration. In other schemas that I have worked with, I had always put the SynonymFilter previous to StopFilter, but in this I prefered using this order because of the big number of synonym that the list has (i.e. I don't want to generate a lot of synonyms for a word that I really wanted to remove). Thanks, David Dávila Atienza AEAT - Departamento de Informática Tributaria
Re: how to make sure all the index docs flushed to the index files
Hi Another werid problem. When we setup the autocommit properties, we suppose that the index fille will created every commited.So that the size of the index files will be large enough. We do not want to keep too many small files as [1]. How to control the size of the index files. [1] ...omited 548KBindex/_28w_Lucene41_0.doc 289KBindex/_28w_Lucene41_0.pos 1.1Mindex/_28w_Lucene41_0.tim 24Kindex/_28w_Lucene41_0.tip 2.1Mindex/_28w.fdt 766Bindex/_28w.fdx 5KBindex/_28w.fnm 40Kindex/_28w.nvd 79Kindex/_28w.nvm 364Bindex/_28w.si 518KBindex/_28x_Lucene41_0.doc 290KBindex/_28x_Lucene41_0.pos 1.2Mindex/_28x_Lucene41_0.tim 28Kindex/_28x_Lucene41_0.tip 2.1Mindex/_28x.fdt 843Bindex/_28x.fdx 5KBindex/_28x.fnm 40Kindex/_28x.nvd 79Kindex/_28x.nvm 386Bindex/_28x.si ...omited - 2013/9/17 YouPeng Yang yypvsxf19870...@gmail.com Hi Shawn Thank your very much for your reponse. I lauch the full-import task on the web page of solr/admin . And I do check the commit option. The new docs would be committed after the operation. The commit option is defferent with the autocommit,right? If the import datasets are too large that leads to poor performance or other problems ,such as [1]. The exception that indicate that -Too many open files-,we thought is because of the ulimit. [1] java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149d.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149e.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149f.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149g.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149h.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149i.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149j.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149k.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149l.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149m.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149n.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149o.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149p.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149q.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149r.fdx (Too many open files) java.io.FileNotFoundException: /data/apache-tomcat/webapps/solr/collection1/data/index/_149s.fdx (Too many open files) 2013/9/17 Shawn Heisey s...@elyograg.org On 9/16/2013 8:26 PM, YouPeng Yang wrote: I'm using the DIH to import data from oracle database with Solr4.4 Finally I get 2.7GB index data and 4.1GB tlog data.And the number of docs was 1090. At first, I move the 2.7GB index data to another new Solr Server in tomcat7. After I start the tomcat ,I find the total number of docs was just half of the orginal number. So I thought that maybe the left docs were not commited to index files,and the tlog needed to be replayed . You need to turn on autoCommit in your solrconfig.xml so that there are hard commits happening on a regular basis that flush all indexed data to disk and start new transaction log files. I will give you a link with some information about that below. Sequently , I moved the 2.7GB index data and 4.1GB tlog data to the new Solr Server in tomcat7. After I start the tomcat,an exception comes up as [1]. Then it halts.I can not access the tomcat server URL. I noticed that the CPU utilization was high by using the comand: top -d 1 | grep tomcatPid. I thought solr was replaying the updatelog.And I wait a long time and it still was replaying. As results ,I give up. I don't know what the exception was about, but it is likely that it WAS replaying the log. With 4.1GB
few and huge tlogs
Hi According to http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup。 It explains that the tlog file will swith to a new when hard commit happened. However,my tlog shows different. tlog.003 5.16GB tlog.004 1.56GB tlog.002 610.MB there are only a fewer tlogs which suppose to be ten files, and each one is vary huge.Even there are lots of hard commit happened. So why the number of the tlog files does not increase ? here are settings of the DirectUpdateHandler2: updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dir${solr.ulog.dir:}/str /updateLog autoCommit maxTime120/maxTime maxDocs100/maxDocs openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime60/maxTime maxDocs50/maxDocs /autoSoftCommit /updateHandler
Re: how to make sure all the index docs flushed to the index files
On 9/17/2013 12:32 AM, YouPeng Yang wrote: Hi Another werid problem. When we setup the autocommit properties, we suppose that the index fille will created every commited.So that the size of the index files will be large enough. We do not want to keep too many small files as [1]. How to control the size of the index files. An index segment gets created after every hard commit. In the listing that you sent, all the files starting with _28w are a single segment. All the files starting with _28x are another segment. Solr should be merging the segments when you get enough of them, unless you have incorrectly set up your merge policy. The default number of segments that get merged is ten. When you get ten segments, they will be merged down to one. This repeats until you have ten merged segments. At that point, those ten merged segments will be merged to make an even larger segment. You can bump up the number of open files allowed by your operating system. On Linux, this is controlled by the /etc/security/limits.conf file. Here are some example config lines for that file: elyograghardnofile 6144 elyogragsoftnofile 4096 roothardnofile 6144 rootsoftnofile 4096 Alternatively, you can reduce the required number of files if you turn on the UseCompoundFile setting, which is in the IndexConfig section. This causes Solr to create a single file per index segment instead of several files per segment. The compound file may be slightly less efficient, but the difference is likely to be very small. https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
Problem indexing windows files
Hi, I am trying to index my windows pc files with manifoldcf version 1.3 and solr version 4.4. I create output connection and repository connection and started a new job that scan my E drive. Everything seems like it work ok but after a few minutes solr stop getting new files to index. I am seeing that through tomcat log file. On manifold crawler ui I see that the job is still running but after few minutes I am getting the following error: Error: Repeated service interruptions - failure processing document: Read timed out I am seeing that tomcat process is constantly consume 100% of one cpu (I have two cpu's) even after I get the error message from manifolfcf crawler ui. I check the thread dump in solr admin and saw that the following threads take the most cpu/user time http-8080-3 (32) - java.io.FileInputStream.readBytes(Native Method) - java.io.FileInputStream.read(FileInputStream.java:236) - java.io.BufferedInputStream.fill(BufferedInputStream.java:235) - java.io.BufferedInputStream.read1(BufferedInputStream.java:275) - java.io.BufferedInputStream.read(BufferedInputStream.java:334) - org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) - java.io.FilterInputStream.read(FilterInputStream.java:133) - org.apache.tika.io.TailStream.read(TailStream.java:117) - org.apache.tika.io.TailStream.skip(TailStream.java:140) - org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283) - org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160) - org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193) - org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71) - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) - org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) - org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) - org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) - org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) - org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) - org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) - org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) - org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) - org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) - org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) - org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) - org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) - org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) - org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) - org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) - org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) - org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) - java.lang.Thread.run(Thread.java:679) does anyone know what can I do? how to debug this issue? how can I check which file cause tika to work so hard? I don't see anything in the log files and I am stuck Thanks, Yossi
Scoring by document size
Hi all, I have some doubts about the Solr scoring function. I'm using all default configuration, but I'm facing a wired issue with the retrieved scores. In the schema, I'm going to focus in the only field I'm interested in. Its definition is: *fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=false analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer /fieldType field name=myField type=text indexed=true stored=true required=false /* (omitNorms=false, if not, the document size is not taken into account to the final score) Then, I index some documents, with the following text in the 'myField' field: doc1 = A B C doc2 = A B C D doc3 = A B C D E doc4 = A B C D E F doc5 = A B C D E F G H doc6 = A B C D E F G H I Finally, I perform the query 'myField:(A B C)' in order to recover all the documents, but with different scoring (doc1 is more similar to the query than doc2, which is more similar than doc3, ...). All the documents are retrieved (OK), but the scores are like this: *doc1 = 2,590214 doc2 = 2,590214* doc3 = 2,266437 *doc4 = 1,94266 doc5 = 1,94266* doc6 = 1,618884 So in conclussion, as you can see the score goes down, but not the way I'd like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches 3/3 tokens, and Doc2 matches 3/4 tokens. Is this the normal Solr behaviour? Is there any way to get my expected behaviour? Thanks a lot, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dih delete doc per $deleteDocById
What is your question? On Tue, Sep 17, 2013 at 12:17 AM, andreas owen a.o...@gmx.net wrote: i am using dih and want to delete indexed documents by xml-file with ids. i have seen $deleteDocById used in entity query=... data-config.xml: entity name=rec processor=XPathEntityProcessor url=file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportDelete.xml forEach=/docs/doc dataSource=main field column=$deleteDocById xpath=//id / /entity xml-file: docs doc id2345/id /doc /docs -- Regards, Shalin Shekhar Mangar.
Re: Re-Ranking results based on DocValues with custom function.
Hi! Thanks for the directions! I got it up and running with a custom ValueSourceParser: http://pastebin.com/cz1rJn4A and a custom ValueSource: http://pastebin.com/j8mhA8e0 It basically allows for searching for text (which is associated to an image) in an index and then getting the distance to a sample image (base64 encoded byte[] array) based on one of five different low level content based features stored as DocValues. A sample result is here: http://pastebin.com/V7kL3DJh So there one little tiny question I still have ;) When I'm trying to do a sort I'm getting msg: sort param could not be parsed as a query, and is not a field that exists in the index: lirefunc(cl_hi,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=), for the call http://localhost:9000/solr/lire/select?q=*%3A*sort=lirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)+ascfl=id%2Ctitle%2Clirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)wt=jsonindent=true cheers, Mathias On Tue, Sep 17, 2013 at 1:01 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : dissimilarity functions). What I want to do is to search using common : text search and then (optionally) re-rank using some custom function : like : : http://localhost:8983/solr/select?q=*:*sort=myCustomFunction(var1) asc can you describe what you want your custom function to look like? it may already be possible using the existing functions provided out of hte box - just neeed to combine them to build up the mathc expression... https://wiki.apache.org/solr/FunctionQuery ...if you really want to write your own, just implement ValueSourceParser and register it in solrconfig.xml... https://wiki.apache.org/solr/SolrPlugins#ValueSourceParser : I've seen that there are hooks in solrconfig.xml, but I did not find : an example or some documentation. I'd be most grateful if anyone could : either point me to one or give me a hint for another way to go :) when writing a custom plugin like this, the best thing to do is look at the existing examples of that plugin. almost all of hte built in ValueSourceParsers are really trivial, and can be found in tiny anonymous classes right inside the ValueSourceParser.java... For example, the function ot divide the results of two other fnctions... addParser(div, new ValueSourceParser() { @Override public ValueSource parse(FunctionQParser fp) throws SyntaxError { ValueSource a = fp.parseValueSource(); ValueSource b = fp.parseValueSource(); return new DivFloatFunction(a, b); } }); ..or, if you were trying to bundle that up in your own plugin jar and register it in solrconfig.xml, you might write it something like... public class DivideValueSourceParser extends ValueSourceParser { public DivideValueSourceParser() { } public ValueSource parse(FunctionQParser fp) throws SyntaxError { ValueSource a = fp.parseValueSource(); ValueSource b = fp.parseValueSource(); return new DivFloatFunction(a, b); } } and then register it as... valueSourceParser name=div class=com.you.DivideValueSourceParser / depending on your needs, you may also want to write a custom ValueSource implementation (ie: instead of DivFloatFunction above) in which case, again, the best examples to look at are all of the existing ValueSource functions... https://lucene.apache.org/core/4_4_0/queries/org/apache/lucene/queries/function/ValueSource.html -Hoss -- Dr. Mathias Lux Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-itec
Re: Scoring by document size
Have you used debugQuery=true, or fl=*,[explain], or those various functions? It is possible to ask Solr to tell you how it calculated the score, which will enable you to see what is going on in each case. You can probably work it out for yourself then I suspect. Upayavira On Tue, Sep 17, 2013, at 08:40 AM, blopez wrote: Hi all, I have some doubts about the Solr scoring function. I'm using all default configuration, but I'm facing a wired issue with the retrieved scores. In the schema, I'm going to focus in the only field I'm interested in. Its definition is: *fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=false analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer /fieldType field name=myField type=text indexed=true stored=true required=false /* (omitNorms=false, if not, the document size is not taken into account to the final score) Then, I index some documents, with the following text in the 'myField' field: doc1 = A B C doc2 = A B C D doc3 = A B C D E doc4 = A B C D E F doc5 = A B C D E F G H doc6 = A B C D E F G H I Finally, I perform the query 'myField:(A B C)' in order to recover all the documents, but with different scoring (doc1 is more similar to the query than doc2, which is more similar than doc3, ...). All the documents are retrieved (OK), but the scores are like this: *doc1 = 2,590214 doc2 = 2,590214* doc3 = 2,266437 *doc4 = 1,94266 doc5 = 1,94266* doc6 = 1,618884 So in conclussion, as you can see the score goes down, but not the way I'd like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches 3/3 tokens, and Doc2 matches 3/4 tokens. Is this the normal Solr behaviour? Is there any way to get my expected behaviour? Thanks a lot, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scoring by document size
As the IDF values for A, B and C are minimal (couldn't get any worse than being in any document), the major part of your score comes most likely from the coord(..) part of scoring - which basically computes the overlap of the query and the document. If you want to have a stronger influence you can extend and override the Similarity implementation. You might take a look at http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html cheers, Mathias On Tue, Sep 17, 2013 at 1:59 PM, Upayavira u...@odoko.co.uk wrote: Have you used debugQuery=true, or fl=*,[explain], or those various functions? It is possible to ask Solr to tell you how it calculated the score, which will enable you to see what is going on in each case. You can probably work it out for yourself then I suspect. Upayavira On Tue, Sep 17, 2013, at 08:40 AM, blopez wrote: Hi all, I have some doubts about the Solr scoring function. I'm using all default configuration, but I'm facing a wired issue with the retrieved scores. In the schema, I'm going to focus in the only field I'm interested in. Its definition is: *fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=false analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer /fieldType field name=myField type=text indexed=true stored=true required=false /* (omitNorms=false, if not, the document size is not taken into account to the final score) Then, I index some documents, with the following text in the 'myField' field: doc1 = A B C doc2 = A B C D doc3 = A B C D E doc4 = A B C D E F doc5 = A B C D E F G H doc6 = A B C D E F G H I Finally, I perform the query 'myField:(A B C)' in order to recover all the documents, but with different scoring (doc1 is more similar to the query than doc2, which is more similar than doc3, ...). All the documents are retrieved (OK), but the scores are like this: *doc1 = 2,590214 doc2 = 2,590214* doc3 = 2,266437 *doc4 = 1,94266 doc5 = 1,94266* doc6 = 1,618884 So in conclussion, as you can see the score goes down, but not the way I'd like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches 3/3 tokens, and Doc2 matches 3/4 tokens. Is this the normal Solr behaviour? Is there any way to get my expected behaviour? Thanks a lot, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html Sent from the Solr - User mailing list archive at Nabble.com. -- Dr. Mathias Lux Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-itec
Re: how soft-commit works
Here's a rather long blog post I wrote up that might help: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Mon, Sep 16, 2013 at 1:43 PM, Shawn Heisey s...@elyograg.org wrote: On 9/16/2013 7:01 AM, Matteo Grolla wrote: Can anyone explain me the following things about soft-commit? -For searches o access new documents I think a new searcher is opened after a soft commit. How does the near realtime requirement for soft commit match with the potentially long time taken to warm up caches for the new searcher? -Is it a good idea to set openSearcher=false in auto commit and rely on soft auto commit to see new data in searches? That is a very common way for installs requiring NRT updates to get configured. NRTCachingDirectoryFactory, which is the directory class used in the example since 4.0, is a wrapper around MMapDirectoryFactory, which is the old default in 3.x. For soft commits, the NRT directory keeps small commits in RAM rather than writing it to the disk, which makes the process of opening a new searcher happen a lot faster. http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/store/NRTCachingDirectory.html If your index rate is very fast or you index large amounts of data, the NRT directory doesn't gain you much over MMap, but because we made it the default in the example, it probably doesn't have any performance detriment. Thanks, Shawn
Re: Dynamic row sizing for documents via UpdateCSV
Well, it's reasonably easy if you have empty columns, in the same order, for _all_ of the possible dynamic fields, but I really doubt you are that fortunate... It's especially ugly in that you have the different dynamic fields scattered around. How is the csv file generated? Could you force every row to have _all_ the possible columns in the same order with spaces or something in the columns that are empty? Otherwise I'd think about parsing them externally and using, say, SolrJ to transmit the individual records to Solr. Best, Erick On Mon, Sep 16, 2013 at 2:47 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote: Hello, I am using UpdateCSV to load data in solr. Currently I load this schema with a static set of values: userid,name,age,location john8322,John,32,CA tom22,Tom,30,NY But now I have this usecase where john8322 might have a state specific dynamic field for example: userid,name,age,location, ca_count_i john8322,John,32,CA, 7 And tom22 might have different dynamic fields: userid,name,age,location, ny_count_i,oh_count_i tom22,Tom,30,NY, 981,11 So is it possible to pass different columns sizes for each row, something like this: john8322,John,32,CA,ca_count_i:7 tom22,Tom,30,NY, ny_count_i:981,oh_count_i:11 I understand that the above syntax is not possible, but is there any other way of solving this problem? -- Thanks, -Utkarsh
Re: dih delete doc per $deleteDocById
i would like to know how to get it to work and delete documents per xml and dih. On 17. Sep 2013, at 1:47 PM, Shalin Shekhar Mangar wrote: What is your question? On Tue, Sep 17, 2013 at 12:17 AM, andreas owen a.o...@gmx.net wrote: i am using dih and want to delete indexed documents by xml-file with ids. i have seen $deleteDocById used in entity query=... data-config.xml: entity name=rec processor=XPathEntityProcessor url=file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportDelete.xml forEach=/docs/doc dataSource=main field column=$deleteDocById xpath=//id / /entity xml-file: docs doc id2345/id /doc /docs -- Regards, Shalin Shekhar Mangar.
Re: Atomic commit across shards?
There are two things to think about here. 1 if you're issuing the commit manually (i.e. not relying on the settings in solrconfig.xml) then they are atomic. The call doesn't return until all the active nodes have seen the commit. 2 However, autocommits are usually time based. Since servers start up at different times, if you're relying on the the settings in solrconfig.xml to do the commits then there will be slight offsets since the timers will expire at slightly different times. Best, Erick On Mon, Sep 16, 2013 at 6:44 PM, Damien Dykman damien.dyk...@gmail.comwrote: Is a commit (hard or soft) atomic across shards? In other words, can I guaranty that any given search on a multi-shard collection will hit the same index generation of each shard? Thanks, Damien
Re: few and huge tlogs
Probably because you're indexing a lot of documents very quickly. It's entirely reasonable to have much shorter autoCommit times, all that does is 1 truncate the transaction log 2 close the current segment 3 start a new segment. That should cut down your tlog files drastically. Try setting your autocommit time to, say, 15000 (15 seconds). Long blog here: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Tue, Sep 17, 2013 at 5:16 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote: Hi According to http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup。 It explains that the tlog file will swith to a new when hard commit happened. However,my tlog shows different. tlog.003 5.16GB tlog.004 1.56GB tlog.002 610.MB there are only a fewer tlogs which suppose to be ten files, and each one is vary huge.Even there are lots of hard commit happened. So why the number of the tlog files does not increase ? here are settings of the DirectUpdateHandler2: updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dir${solr.ulog.dir:}/str /updateLog autoCommit maxTime120/maxTime maxDocs100/maxDocs openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime60/maxTime maxDocs50/maxDocs /autoSoftCommit /updateHandler
Re: how to make sure all the index docs flushed to the index files
Here's a blog about tlogs and commits: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ And here's Mike's excellent segment merging blog http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Best, Erick On Tue, Sep 17, 2013 at 6:36 AM, Shawn Heisey s...@elyograg.org wrote: On 9/17/2013 12:32 AM, YouPeng Yang wrote: Hi Another werid problem. When we setup the autocommit properties, we suppose that the index fille will created every commited.So that the size of the index files will be large enough. We do not want to keep too many small files as [1]. How to control the size of the index files. An index segment gets created after every hard commit. In the listing that you sent, all the files starting with _28w are a single segment. All the files starting with _28x are another segment. Solr should be merging the segments when you get enough of them, unless you have incorrectly set up your merge policy. The default number of segments that get merged is ten. When you get ten segments, they will be merged down to one. This repeats until you have ten merged segments. At that point, those ten merged segments will be merged to make an even larger segment. You can bump up the number of open files allowed by your operating system. On Linux, this is controlled by the /etc/security/limits.conf file. Here are some example config lines for that file: elyograghardnofile 6144 elyogragsoftnofile 4096 roothardnofile 6144 rootsoftnofile 4096 Alternatively, you can reduce the required number of files if you turn on the UseCompoundFile setting, which is in the IndexConfig section. This causes Solr to create a single file per index segment instead of several files per segment. The compound file may be slightly less efficient, but the difference is likely to be very small. https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
Re: Scoring by document size
This kind of artificial test is almost always misleading. Some approximations are used, in particular the length of the field is not stored as an exact number, so at various points some fields with slightly different lengths are rounded to the same number, thus the identical scores you're seeing. Unless you have a compelling reason, I wouldn't spend too much time trying to adjust scores in this kind of situation, if your real data exhibits behavior you need to change it's a different story of course. Best, Erick On Tue, Sep 17, 2013 at 3:40 AM, blopez balo...@hotmail.com wrote: Hi all, I have some doubts about the Solr scoring function. I'm using all default configuration, but I'm facing a wired issue with the retrieved scores. In the schema, I'm going to focus in the only field I'm interested in. Its definition is: *fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=false analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer /fieldType field name=myField type=text indexed=true stored=true required=false /* (omitNorms=false, if not, the document size is not taken into account to the final score) Then, I index some documents, with the following text in the 'myField' field: doc1 = A B C doc2 = A B C D doc3 = A B C D E doc4 = A B C D E F doc5 = A B C D E F G H doc6 = A B C D E F G H I Finally, I perform the query 'myField:(A B C)' in order to recover all the documents, but with different scoring (doc1 is more similar to the query than doc2, which is more similar than doc3, ...). All the documents are retrieved (OK), but the scores are like this: *doc1 = 2,590214 doc2 = 2,590214* doc3 = 2,266437 *doc4 = 1,94266 doc5 = 1,94266* doc6 = 1,618884 So in conclussion, as you can see the score goes down, but not the way I'd like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches 3/3 tokens, and Doc2 matches 3/4 tokens. Is this the normal Solr behaviour? Is there any way to get my expected behaviour? Thanks a lot, Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to round solr score ?
Hi , As per this post here http://grokbase.com/t/lucene/solr-user/131jzcg3q2/how-to-round-solr-score. I was able to use my custom fn in sort(defType=funcq=socialDegree(id,1)fl=score,*sort=score%20asc) - works, but can't facet on the same(defType=funcq=socialDegree(id,1)fl=score,*facet=truefacet.field=score) - doesn't work. Exception: org.apache.solr.common.SolrException: undefined field: score at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:965) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:294) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:423) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) Is there any way by which we can achieve this? Thanks, Mamta. This email is intended for the person(s) to whom it is addressed and may contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, distribution, copying, or disclosure by any person other than the addressee(s) is strictly prohibited. If you have received this email in error, please notify the sender immediately by return email and delete the message and any attachments from your system.
Re: spellcheck causing Core Reload to hang
I think they should have it in RC0, because if you search in this forum at lucene, this issue is there since version 4.3 ! Regards, Raheel On Tue, Sep 17, 2013 at 5:58 PM, Erick Erickson erickerick...@gmail.comwrote: H, do we have a JIRA tracking this and does it seem like any fix will get into 4.5? I think 4.5 RC0 will be cut tomorrow (Wednesday) Best, Erick On Tue, Sep 17, 2013 at 3:04 AM, Raheel Hasan raheelhasan@gmail.com wrote: I think there is another solution: Just hide this entry in solrconfig str name=spellcheck.maxCollationTries/str and instead, pass it in the actual query string that calls your requestHandler (like /select/?q=spellcheck.maxCollationTries=3...) On Mon, Sep 16, 2013 at 9:37 PM, Jeroen Steggink jer...@stegg-inc.com wrote: Hi James, I already had the spellcheck.**collateExtendedResults=true Adding spellcheck.**collateMaxCollectDocs=0 did the trick. Thanks so much. Jeroen On 16-9-2013 18:16, Dyer, James wrote: If this started with Solr4.4, I would suspect https://issues.apache.org/* *jira/browse/SOLR-3240 https://issues.apache.org/jira/browse/SOLR-3240 . Rather than removing spellcheck parameters, can you try adding/changing spellcheck.**collateMaxCollectDocs=0 and spellcheck.**collateExtendedResults=true ? These two settings effectively disable the optimization made with SOLR-3240. James Dyer Ingram Content Group (615) 213-4311 -- Regards, Raheel Hasan -- Regards, Raheel Hasan
Re: spellcheck causing Core Reload to hang
Check this thread: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-td3192748i20.htmlhttp://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-td3192748i20.html#a4090320 This issue is there since 2011. On Tue, Sep 17, 2013 at 6:35 PM, Raheel Hasan raheelhasan@gmail.comwrote: I think they should have it in RC0, because if you search in this forum at lucene, this issue is there since version 4.3 ! Regards, Raheel On Tue, Sep 17, 2013 at 5:58 PM, Erick Erickson erickerick...@gmail.comwrote: H, do we have a JIRA tracking this and does it seem like any fix will get into 4.5? I think 4.5 RC0 will be cut tomorrow (Wednesday) Best, Erick On Tue, Sep 17, 2013 at 3:04 AM, Raheel Hasan raheelhasan@gmail.com wrote: I think there is another solution: Just hide this entry in solrconfig str name=spellcheck.maxCollationTries/str and instead, pass it in the actual query string that calls your requestHandler (like /select/?q=spellcheck.maxCollationTries=3...) On Mon, Sep 16, 2013 at 9:37 PM, Jeroen Steggink jer...@stegg-inc.com wrote: Hi James, I already had the spellcheck.**collateExtendedResults=true Adding spellcheck.**collateMaxCollectDocs=0 did the trick. Thanks so much. Jeroen On 16-9-2013 18:16, Dyer, James wrote: If this started with Solr4.4, I would suspect https://issues.apache.org/* *jira/browse/SOLR-3240 https://issues.apache.org/jira/browse/SOLR-3240 . Rather than removing spellcheck parameters, can you try adding/changing spellcheck.**collateMaxCollectDocs=0 and spellcheck.**collateExtendedResults=true ? These two settings effectively disable the optimization made with SOLR-3240. James Dyer Ingram Content Group (615) 213-4311 -- Regards, Raheel Hasan -- Regards, Raheel Hasan -- Regards, Raheel Hasan
check which file/document cause solr to work hard
Hi, I am trying to index my windows pc files with manifoldcf version 1.3 and solr version 4.4. Few minutes after I start the crawler job I see that tomcat process constantly consume 100% of one cpu (I have two cpu's). I check the thread dump in solr admin and saw that the following threads take the most cpu/user time http-8080-3 (32) - java.io.FileInputStream.readBytes(Native Method) - java.io.FileInputStream.read(FileInputStream.java:236) - java.io.BufferedInputStream.fill(BufferedInputStream.java:235) - java.io.BufferedInputStream.read1(BufferedInputStream.java:275) - java.io.BufferedInputStream.read(BufferedInputStream.java:334) - org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) - java.io.FilterInputStream.read(FilterInputStream.java:133) - org.apache.tika.io.TailStream.read(TailStream.java:117) - org.apache.tika.io.TailStream.skip(TailStream.java:140) - org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283) - org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160) - org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193) - org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71) - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) - org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) - org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) - org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) - org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) - org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) - org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) - org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) - org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) - org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) - org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) - org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) - org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) - org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) - org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) - org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) - org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) - org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) - java.lang.Thread.run(Thread.java:679) how can I check which file cause tika to work so hard? I don't see anything in the log files and I am stuck Thanks, Yossi
tlog after commit
Quick question... Should I still see tlog files after a hard commit? I'm trying to test soft commit and hard commits and I was under the impression that tlog would be removed after a hard commit where, in the case of soft commits, I would still see them. Thanks, Al
Atomic updates with solr cloud in solr 4.4
Hi, I am using solr 4.4 in solr cloud configuration. When i try to 'set' a field in a document using the update request handler, I get a 'missing required field' error. However, when I send this query to the specific shard containing the document, the update succeeds. Is this a bug in solr 4.4 or am I doing something wrong I started the shards specifying numShards and have checked that the router used is the compositeId router. Distributed indexing is done based on ids sharing the same domain/prefix, i.e. 'customerB!' form and the documents are distributed in the shards correctly. Querying for documents works as expected and returns all matching documents across shards. Thanks Sesha
Atomic updates with solr cloud in solr 4.4
Hi, I am using solr 4.4 in solr cloud configuration. When i try to 'set' a field in a document using the update request handler, I get a 'missing required field' error. However, when I send this query to the specific shard containing the document, the update succeeds. Is this a bug in solr 4.4 or am I doing something wrong I started the shards specifying numShards and have checked that the router used is the compositeId router. Distributed indexing is done based on ids sharing the same domain/prefix, i.e. 'customerB!' form and the documents are distributed in the shards correctly. Querying for documents works as expected and returns all matching documents across shards. Thanks Sesha
Re: Atomic updates with solr cloud in solr 4.4
On Tue, Sep 17, 2013 at 10:47 AM, Sesha Sendhil Subramanian seshasend...@indix.com wrote: I am using solr 4.4 in solr cloud configuration. When i try to 'set' a field in a document using the update request handler, I get a 'missing required field' error. Can you show the exact error message you get, and the update you are trying to send? -Yonik http://lucidworks.com
Re: How to round solr score ?
: 'score' is a pseudo-field, i.e., it does not actually exist in : the index, which is probably why it cannot be faceted on. : Faceting on a rounded score seems like an unusual use : case. What requirement are you trying to address? agreed, more details would be helpful. FWIW: the only way available to facet on functions is to use facet.query along with the {!frange} paser to create facet constraints based on ranges of function values that you specify. there is no othe way i can think of to facet over function values -- there is an open issue where people were discussing it, but i don't think there wa ever a functional patch... https://issues.apache.org/jira/browse/SOLR-1581 -Hoss
Re: Re-Ranking results based on DocValues with custom function.
: It basically allows for searching for text (which is associated to an : image) in an index and then getting the distance to a sample image : (base64 encoded byte[] array) based on one of five different low level : content based features stored as DocValues. very cool. : So there one little tiny question I still have ;) When I'm trying to : do a sort I'm getting : : msg: sort param could not be parsed as a query, and is not a field : that exists in the index: : lirefunc(cl_hi,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=), : : for the call http://localhost:9000/solr/lire/select?q=*%3A*sort=lirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)+ascfl=id%2Ctitle%2Clirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)wt=jsonindent=true Hmmm... i think the crux of the issue is your string literal. function parsing tries to make live easy for you by not requiring string literals to be quoted unless they conflict with other function names or field names etc on top of that the sort parsing code is kind of hueristic based (because it has to account for both functions or field names or wildcards, followed by other sort clauses, etc...) so in that context the special characters like '=' in your base64 string literal might be confusing hte hueristics. can you try to quote the string literal it and see if that works? For example, when i try using strdist with your base64 string in a sort param using the example configs i get the same error... http://localhost:8983/solr/select?q=*:*sort=strdist%28name,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=,jw%29+asc but if i quote the string literal it works fine... http://localhost:8983/solr/select?q=*:*sort=strdist%28name,%27FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=%27,jw%29+asc -Hoss
Re: Solr node goes down while trying to index records
Do you get that error only when indexing? 2013/9/17 neoman harira...@gmail.com Hello everyone, one or more of the nodes in the solrcloud go down randomly when we try to index data using solrj APIs. The nodes do recover. but when we try to index back, they go down again Our configuration: 3 shards Solr 4.4. I see the following exceptions in the log file. 09/17/13 15:33:32:976|localhost-startStop-1-SendThread(10.68.129.119:9080 )|INFO|org.apache.zookeeper.ClientCnxn|Socket connection established to 10.68.129.119/10.68.129.119:9080, initiating session| 09/17/13 15:33:32:978|localhost-startStop-1-SendThread(10.68.129.119:9080 )|INFO|org.apache.zookeeper.ClientCnxn|Unable to reconnect to ZooKeeper service, session 0x34109f9474b0029 has expired, closing socket connection| 09/17/13 15:34:36:080|localhost-startStop-1-EventThread|ERROR|apache.solr.cloud.ZkController|There was a problem making a request to the leader:org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr02-prod.phneaz:8080/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:431) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1421) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:306) at org.apache.solr.cloud.ZkController.access$100(ZkController.java:86) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:196) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:117) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) We are also getting IOExcpetion in the client side. Adding chunk 122 Total Count 12422 org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr-prod.com:8443/solr/aq-collection at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:409) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) at com.billmelater.fraudworkstation.data.DataProvider.flushBatch(DataProvider.java:48) at com.billmelater.fraudworkstation.data.AQDBDataProvider.execute(AQDBDataProvider.java:114) at com.billmelater.fraudworkstation.data.AQDBDataProvider.main(AQDBDataProvider.java:244) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) Your help is appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud liveness problems
Hello there, we have following setup: SolrCloud 4.4.0 (3 nodes, physical machines) Zookeeper 3.4.5 (3 nodes, physical machines) We have a number of rather small collections (~10K or ~100K of documents), that we would like to load to all Solr instances (numShards=1, replication_factor=3), and access them through local network interface, as the load balancing is done in layers above. We can live (and we actually do it in the test phase) with updating the entire collections whenever we need it, switching collection aliases and removing the old collections. We stumbled across following problem: as soon as all three Solr nodes become a leader to at least one collection, restarting any node makes it completely unresponsive (timeout), both though admin interface and for replication. If we restart all solr nodes the cluster end up in some kind of deadlock and only remedy we found is Solr clean installation, removing ZooKeeper data and re-posting collections. Apparently, leader is waiting for replicas to come up and they try to synchronize but timeout on http requests, so everything ends up in some kind of dead lock, maybe related to: https://issues.apache.org/jira/browse/SOLR-5240 Eventually (after few minutes), leader takes over, mark collections active but remains blocked on http interface, so other nodes can not synchronize. In further tests, we loaded 4 collections with numShards=1 and replication_factor=2. By chance, one node become the leader for all 4 collections. Restarting the node which was not the leader is done without the problem, but when we restarted the leader it happened that: - leader shut down, other nodes became leaders of 2 collections each - leader starts up, 3 collections on it become active, one collection remains ”down” and node becomes unresponsive and timeouts on http requests. As this behavior is completely unexpected for one cluster solution, I wonder if somebody else experienced same problems or we are doing something entirely wrong. Best regards -- Vladimir Veljkovic Senior Java Entwickler Boxalino AG vladimir.veljko...@boxalino.com www.boxalino.com Tuning Kit for your Online Shop Product Search - Recommendations - Landing Pages - Data intelligence - Mobile Commerce
Re: Atomic updates with solr cloud in solr 4.4
curl http://localhost:8983/solr/search/update -H 'Content-type:application/json' -d ' [ { id: c8cce27c1d8129d733a3df3de68dd675!c8cce27c1d8129d733a3df3de68dd675, link_id_45454 : {set:abcdegff} } ]' I have two collections search and meta. I want to do an update in the search collection. If i pick a document in same shard : localhost:8983, the update succeeds 15350327 [qtp386373885-19] INFO org.apache.solr.update.processor.LogUpdateProcessor ? [search] webapp=/solr path=/update params={} {add=[6cfcb56ca52b56ccb1377a7f0842e74d!6cfcb56ca52b56ccb1377a7f0842e74d (1446444025873694720)]} 0 5 If i pick a document on a different shard : localhost:7574, the update fails 15438547 [qtp386373885-75] INFO org.apache.solr.update.processor.LogUpdateProcessor ? [search] webapp=/solr path=/update params={} {} 0 1 15438548 [qtp386373885-75] ERROR org.apache.solr.core.SolrCore ? org.apache.solr.common.SolrException: [doc=c8cce27c1d8129d733a3df3de68dd675!c8cce27c1d8129d733a3df3de68dd675] missing required field: variant_count at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:189) at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:556) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:692) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:392) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:117) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:101) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:65) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Sesha
Solr node goes down while trying to index records
Hello everyone, one or more of the nodes in the solrcloud go down randomly when we try to index data using solrj APIs. The nodes do recover. but when we try to index back, they go down again Our configuration: 3 shards Solr 4.4. I see the following exceptions in the log file. 09/17/13 15:33:32:976|localhost-startStop-1-SendThread(10.68.129.119:9080)|INFO|org.apache.zookeeper.ClientCnxn|Socket connection established to 10.68.129.119/10.68.129.119:9080, initiating session| 09/17/13 15:33:32:978|localhost-startStop-1-SendThread(10.68.129.119:9080)|INFO|org.apache.zookeeper.ClientCnxn|Unable to reconnect to ZooKeeper service, session 0x34109f9474b0029 has expired, closing socket connection| 09/17/13 15:34:36:080|localhost-startStop-1-EventThread|ERROR|apache.solr.cloud.ZkController|There was a problem making a request to the leader:org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr02-prod.phneaz:8080/solr at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:431) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1421) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:306) at org.apache.solr.cloud.ZkController.access$100(ZkController.java:86) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:196) at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:117) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) We are also getting IOExcpetion in the client side. Adding chunk 122 Total Count 12422 org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr-prod.com:8443/solr/aq-collection at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:409) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) at com.billmelater.fraudworkstation.data.DataProvider.flushBatch(DataProvider.java:48) at com.billmelater.fraudworkstation.data.AQDBDataProvider.execute(AQDBDataProvider.java:114) at com.billmelater.fraudworkstation.data.AQDBDataProvider.main(AQDBDataProvider.java:244) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) Your help is appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr node goes down while trying to index records
yes. the nodes go down while indexing. if we stop indexing, it does not go down. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610p4090644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dynamic row sizing for documents via UpdateCSV
Yeah I think the only way to go about it is via SolrJ. The csv file is generated by a pig job which computes the data to be loaded in solr. I think this is what I will endup doing: Load all the possible columns in the csv with a value of 0 if the value doesn't exist for a specific record. I was just trying to avoid it and find an optimal solution with UpdateCSV. Thanks, -Utkarsh On Tue, Sep 17, 2013 at 5:43 AM, Erick Erickson erickerick...@gmail.comwrote: Well, it's reasonably easy if you have empty columns, in the same order, for _all_ of the possible dynamic fields, but I really doubt you are that fortunate... It's especially ugly in that you have the different dynamic fields scattered around. How is the csv file generated? Could you force every row to have _all_ the possible columns in the same order with spaces or something in the columns that are empty? Otherwise I'd think about parsing them externally and using, say, SolrJ to transmit the individual records to Solr. Best, Erick On Mon, Sep 16, 2013 at 2:47 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Hello, I am using UpdateCSV to load data in solr. Currently I load this schema with a static set of values: userid,name,age,location john8322,John,32,CA tom22,Tom,30,NY But now I have this usecase where john8322 might have a state specific dynamic field for example: userid,name,age,location, ca_count_i john8322,John,32,CA, 7 And tom22 might have different dynamic fields: userid,name,age,location, ny_count_i,oh_count_i tom22,Tom,30,NY, 981,11 So is it possible to pass different columns sizes for each row, something like this: john8322,John,32,CA,ca_count_i:7 tom22,Tom,30,NY, ny_count_i:981,oh_count_i:11 I understand that the above syntax is not possible, but is there any other way of solving this problem? -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Re: SolrCloud liveness problems
On Sep 17, 2013, at 12:00 PM, Vladimir Veljkovic vladimir.veljko...@boxalino.com wrote: Hello there, we have following setup: SolrCloud 4.4.0 (3 nodes, physical machines) Zookeeper 3.4.5 (3 nodes, physical machines) We have a number of rather small collections (~10K or ~100K of documents), that we would like to load to all Solr instances (numShards=1, replication_factor=3), and access them through local network interface, as the load balancing is done in layers above. We can live (and we actually do it in the test phase) with updating the entire collections whenever we need it, switching collection aliases and removing the old collections. We stumbled across following problem: as soon as all three Solr nodes become a leader to at least one collection, restarting any node makes it completely unresponsive (timeout), both though admin interface and for replication. If we restart all solr nodes the cluster end up in some kind of deadlock and only remedy we found is Solr clean installation, removing ZooKeeper data and re-posting collections. Apparently, leader is waiting for replicas to come up and they try to synchronize but timeout on http requests, so everything ends up in some kind of dead lock, maybe related to: https://issues.apache.org/jira/browse/SOLR-5240 Yup, that sounds exactly what you would expect with SOLR-5240. A fix for that is coming in 4.5, which is a probably a week or so away. Eventually (after few minutes), leader takes over, mark collections active but remains blocked on http interface, so other nodes can not synchronize. In further tests, we loaded 4 collections with numShards=1 and replication_factor=2. By chance, one node become the leader for all 4 collections. Restarting the node which was not the leader is done without the problem, but when we restarted the leader it happened that: - leader shut down, other nodes became leaders of 2 collections each - leader starts up, 3 collections on it become active, one collection remains ”down” and node becomes unresponsive and timeouts on http requests. Hard to say - I'll experiment with 4.5 and see if I can duplicate this. - Mark As this behavior is completely unexpected for one cluster solution, I wonder if somebody else experienced same problems or we are doing something entirely wrong. Best regards -- Vladimir Veljkovic Senior Java Entwickler Boxalino AG vladimir.veljko...@boxalino.com www.boxalino.com Tuning Kit for your Online Shop Product Search - Recommendations - Landing Pages - Data intelligence - Mobile Commerce
Getting a query parameter in a TokenFilter
Hi everyone, We developed a TokenFilter. It should act differently, depends on a parameter supplied in the query (for query chain only, not the index one, of course). We found no way to pass that parameter into the TokenFilter flow. I guess that the root cause is because TokenFilter is a pure lucene object. As a last resort, we tried to pass the parameter as the first term in the query text (q=...), and save it as a member of the TokenFilter instance. Although it is ugly, it might work fine. But, the problem is that it is not guaranteed that all the terms of a particular query will be analyzed by the same instance of a TokenFilter. In this case, some terms will be analyzed without the required information of that parameter. We can produce such a race very easily. How should I overcome this issue? Do anyone have a better resolution?
Re: Stop zookeeper from batch
Are you looking for that: https://issues.apache.org/jira/browse/ZOOKEEPER-1122 16 Eylül 2013 Pazartesi tarihinde Prasi S prasi1...@gmail.com adlı kullanıcı şöyle yazdı: Hi, We have setup solrcloud with zookeeper and 2 tomcats . we are using a batch file to start the zookeeper, uplink config files and start tomcats. Now, i need to stop zookeeper from the batch file. How is this possible. Im using Windows server. Zookeeper 3.4.5 version. Pls help. Thanks, Prasi
Some text not indexed in solr4.4
I have a copyField called allText with type text_general: https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68 I have ~100 documents which have the text: dyson and dc44 or dc41 etc. For example: title: Dyson DC44 Animal Digital Slim Cordless Vacuum description: The DC44 Animal is the new Dyson Digital Slim vacuum cleaner the cordless machine that doesn’t lose suction. It has been engineered for floor to ceiling cleaning. DC44 Animal has a detachable long-reach wand which is balanced for floor to ceiling cleaning. The motorized floor tool has twice the power of the DC35 floor tool to drive the bristles deeper into the carpet pile with more force. It attaches to the wand or directly to the machine for cleaning awkward spaces. The brush bar has carbon fiber filaments for removing fine dust from hard floors. DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode. Powered by the Dyson digital motor DC44 Animal has a fade-free nickel manganese cobalt battery and Root Cyclone technology for constant powerful suction., UPC: 0879957006362 The documents are indexed. Analysis says its indexeD: http://i.imgur.com/O52ino1.png But when I search for allText:dyson dc44 I get no results, response: http://pastie.org/8334220 Any suggestions about the problem? I am out of ideas about how to debug this. -- Thanks, -Utkarsh
Re: How to round solr score ?
On 17 September 2013 18:31, Mamta Thakur mtha...@care.com wrote: Hi , As per this post here http://grokbase.com/t/lucene/solr-user/131jzcg3q2/how-to-round-solr-score. I was able to use my custom fn in sort(defType=funcq=socialDegree(id,1)fl=score,*sort=score%20asc) - works, but can't facet on the same(defType=funcq=socialDegree(id,1)fl=score,*facet=truefacet.field=score) - doesn't work. 'score' is a pseudo-field, i.e., it does not actually exist in the index, which is probably why it cannot be faceted on. Faceting on a rounded score seems like an unusual use case. What requirement are you trying to address? Regards, Gora
Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents
Currently I hafer over 50+ millions documents at my index and as I mentiod before at another question I have some problems while indexing (jetty EOF exception) I know that problem may not be about index size but just I want to learn that is there any limit for document size at Solr that if I exceed it I can have some problems? I am not talking about the theoretical limit. What are the maximim index size for folks and what they to handle heavy index rate when having millions of documents. What tuning strategies they do? PS: I have 18 machines, 9 shards, each machine has 48 GB RAM and I use Solr 4.2.1 for my SolrCloud.
Re: Solr node goes down while trying to index records
Could you give some information about your jetty.xml and give more info about your index rate and RAM usage of your machines? 17 Eylül 2013 Salı tarihinde neoman harira...@gmail.com adlı kullanıcı şöyle yazdı: yes. the nodes go down while indexing. if we stop indexing, it does not go down. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610p4090644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: tlog after commit
Did you check here: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ 17 Eylül 2013 Salı tarihinde Alejandro Calbazana acalbaz...@gmail.com adlı kullanıcı şöyle yazdı: Quick question... Should I still see tlog files after a hard commit? I'm trying to test soft commit and hard commits and I was under the impression that tlog would be removed after a hard commit where, in the case of soft commits, I would still see them. Thanks, Al
Re: Problem indexing windows files
Firstly; This may not be a Solr related problem. Did you check the log file of Solr? Tika mayhave some circumstances at some kind of situations. For example when parsing HTML that has a base64 encoded image it may have some problems. If you find the correct logs you can detect it. On the other take care of Manifold, there may be some problem too. 17 Eylül 2013 Salı tarihinde Yossi Nachum nachum...@gmail.com adlı kullanıcı şöyle yazdı: Hi, I am trying to index my windows pc files with manifoldcf version 1.3 and solr version 4.4. I create output connection and repository connection and started a new job that scan my E drive. Everything seems like it work ok but after a few minutes solr stop getting new files to index. I am seeing that through tomcat log file. On manifold crawler ui I see that the job is still running but after few minutes I am getting the following error: Error: Repeated service interruptions - failure processing document: Read timed out I am seeing that tomcat process is constantly consume 100% of one cpu (I have two cpu's) even after I get the error message from manifolfcf crawler ui. I check the thread dump in solr admin and saw that the following threads take the most cpu/user time http-8080-3 (32) - java.io.FileInputStream.readBytes(Native Method) - java.io.FileInputStream.read(FileInputStream.java:236) - java.io.BufferedInputStream.fill(BufferedInputStream.java:235) - java.io.BufferedInputStream.read1(BufferedInputStream.java:275) - java.io.BufferedInputStream.read(BufferedInputStream.java:334) - org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) - java.io.FilterInputStream.read(FilterInputStream.java:133) - org.apache.tika.io.TailStream.read(TailStream.java:117) - org.apache.tika.io.TailStream.skip(TailStream.java:140) - org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283) - org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160) - org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193) - org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71) - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) - org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) - org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) - org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) - org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) - org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) - org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) - org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) - org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) - org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) - org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) - org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) - org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) - org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) - org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) - org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) - org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) - org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) - java.lang.Thread.run(Thread.java:679) does anyone know what can I do? how to debug this issue? how can I check which file cause tika to work so hard? I don't see anything in the log files and I am stuck Thanks, Yossi
Re: Some text not indexed in solr4.4
To add to it, I see the exact problem with the queries: nikon d7100, nikon d5100, samsung ps-we450 etc. Thanks, -Utkarsh On Tue, Sep 17, 2013 at 2:20 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote: I have a copyField called allText with type text_general: https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68 I have ~100 documents which have the text: dyson and dc44 or dc41 etc. For example: title: Dyson DC44 Animal Digital Slim Cordless Vacuum description: The DC44 Animal is the new Dyson Digital Slim vacuum cleaner the cordless machine that doesn’t lose suction. It has been engineered for floor to ceiling cleaning. DC44 Animal has a detachable long-reach wand which is balanced for floor to ceiling cleaning. The motorized floor tool has twice the power of the DC35 floor tool to drive the bristles deeper into the carpet pile with more force. It attaches to the wand or directly to the machine for cleaning awkward spaces. The brush bar has carbon fiber filaments for removing fine dust from hard floors. DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode. Powered by the Dyson digital motor DC44 Animal has a fade-free nickel manganese cobalt battery and Root Cyclone technology for constant powerful suction., UPC: 0879957006362 The documents are indexed. Analysis says its indexeD: http://i.imgur.com/O52ino1.png But when I search for allText:dyson dc44 I get no results, response: http://pastie.org/8334220 Any suggestions about the problem? I am out of ideas about how to debug this. -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Re: Some text not indexed in solr4.4
On the other hand did you check here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters what it says about MultiPhraseQuery? 18 Eylül 2013 Çarşamba tarihinde Furkan KAMACI furkankam...@gmail.com adlı kullanıcı şöyle yazdı: Hi; Did you run commit command? 18 Eylül 2013 Çarşamba tarihinde Utkarsh Sengar utkarsh2...@gmail.com adlı kullanıcı şöyle yazdı: To add to it, I see the exact problem with the queries: nikon d7100, nikon d5100, samsung ps-we450 etc. Thanks, -Utkarsh On Tue, Sep 17, 2013 at 2:20 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: I have a copyField called allText with type text_general: https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68 I have ~100 documents which have the text: dyson and dc44 or dc41 etc. For example: title: Dyson DC44 Animal Digital Slim Cordless Vacuum description: The DC44 Animal is the new Dyson Digital Slim vacuum cleaner the cordless machine that doesn't lose suction. It has been engineered for floor to ceiling cleaning. DC44 Animal has a detachable long-reach wand which is balanced for floor to ceiling cleaning. The motorized floor tool has twice the power of the DC35 floor tool to drive the bristles deeper into the carpet pile with more force. It attaches to the wand or directly to the machine for cleaning awkward spaces. The brush bar has carbon fiber filaments for removing fine dust from hard floors. DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode. Powered by the Dyson digital motor DC44 Animal has a fade-free nickel manganese cobalt battery and Root Cyclone technology for constant powerful suction., UPC: 0879957006362 The documents are indexed. Analysis says its indexeD: http://i.imgur.com/O52ino1.png But when I search for allText:dyson dc44 I get no results, response: http://pastie.org/8334220 Any suggestions about the problem? I am out of ideas about how to debug this. -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Re: SPLITSHARD failure right before publishing the new sub-shards
Never mind. I figured it out. It was due to a NPE on the missing updateLog in solrconfig.xml. My solrconfig.xml is from an older Solr release, which doesn't have certain required sections, etc. After adding them to solrconfig.xml per this official doc, everything started to work. It'd be great if null checks were there to produce informative error on SolrCore.java, so as to make it easier to find the root cause. http://wiki.apache.org/solr/SolrCloud#Required_Config Regards, HaiXin On 09/16/2013 06:44 PM, HaiXin Tie wrote: Hi Solr experts, I am using Solr 4.4 with ZK 3.4.5, trying to split shard1 of a collection named body. There is only one core on one machine for this collection. When I call SPLITSHARD to split this collection, Solr is able to create two sub-shards, but failed with a NPE in SolrCore.java while publishing the new shards. It seems that either the updateHandler or its updateLog is null, though they work fine in the original shard: SolrCore.java if (cc != null cc.isZooKeeperAware() Slice.CONSTRUCTION.equals(cd.getCloudDescriptor().getShardState())) { // set update log to buffer before publishing the core 862: getUpdateHandler().getUpdateLog().bufferUpdates(); cd.getCloudDescriptor().setShardState(null); cd.getCloudDescriptor().setShardRange(null); } Here are the details. Any pointers to aid debugging this issue is greatly appreciated! # curl request/response to split the shard: curl -s http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=bodyshard=shard1; ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status500/intint name=QTime2688/int/lstlst name=failurestrorg.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'body_shard1_0_replica1': Unable to create core: body_shard1_0_replica1 Caused by: null/str/lststr name=Operation splitshard caused exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: SPLTSHARD failed to create subshard leaders/strlst name=exceptionstr name=msgSPLTSHARD failed to create subshard leaders/strint name=rspCode500/int/lstlst name=errorstr name=msgSPLTSHARD failed to create subshard leaders/strstr name=traceorg.apache.solr.common.SolrException: SPLTSHARD failed to create subshard leaders at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:171) at org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:322) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at
Re: Some text not indexed in solr4.4
Hi; Did you run commit command? 18 Eylül 2013 Çarşamba tarihinde Utkarsh Sengar utkarsh2...@gmail.com adlı kullanıcı şöyle yazdı: To add to it, I see the exact problem with the queries: nikon d7100, nikon d5100, samsung ps-we450 etc. Thanks, -Utkarsh On Tue, Sep 17, 2013 at 2:20 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: I have a copyField called allText with type text_general: https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68 I have ~100 documents which have the text: dyson and dc44 or dc41 etc. For example: title: Dyson DC44 Animal Digital Slim Cordless Vacuum description: The DC44 Animal is the new Dyson Digital Slim vacuum cleaner the cordless machine that doesn't lose suction. It has been engineered for floor to ceiling cleaning. DC44 Animal has a detachable long-reach wand which is balanced for floor to ceiling cleaning. The motorized floor tool has twice the power of the DC35 floor tool to drive the bristles deeper into the carpet pile with more force. It attaches to the wand or directly to the machine for cleaning awkward spaces. The brush bar has carbon fiber filaments for removing fine dust from hard floors. DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode. Powered by the Dyson digital motor DC44 Animal has a fade-free nickel manganese cobalt battery and Root Cyclone technology for constant powerful suction., UPC: 0879957006362 The documents are indexed. Analysis says its indexeD: http://i.imgur.com/O52ino1.png But when I search for allText:dyson dc44 I get no results, response: http://pastie.org/8334220 Any suggestions about the problem? I am out of ideas about how to debug this. -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Re: Updated: CREATEALIAS does not work with more than one collection (Error 503: no servers hosting shard)
Never mind. I figured it out. It was due to a NPE on the missing updateLog in solrconfig.xml. My solrconfig.xml is from an older Solr release, which doesn't have certain required sections, etc. After adding them to solrconfig.xml per this official doc, everything started to work. http://wiki.apache.org/solr/SolrCloud#Required_Config Regards, HaiXin On 09/16/2013 04:55 PM, HaiXin Tie wrote: Sorry but I've fixed some typos, updated text: Hello Solr experts, For some strange reason, collection alias does not work in my Solr instance when more than one collection is used. I would appreciate your help. # Here is my setup, which is quite simple: Zookeeper: 3.4.5 (used to upconfig/linkconfig collections and configs for c1 and c2) Solr: version 4.4.0, with two collections c1 and c2 (solr.xml included) created using remote core API calls # Symptoms: 1. Solr queries to each individual collection works fine: http://localhost:8983/solr/c1/select?q=*:* http://localhost:8983/solr/c2/select?q=*:* 2. CREATEALIAS name=cx for c1 or c2 alone (e.g. 1-1 mapping) works fine: http://localhost:8983/solr/cx/select?q=*:* 3. CREATEALIAS name=cx for c1 and c2 does not work: # Solr request/response to the collection alias (success): curl -s http://localhost:8983/solr/admin/collections?action=CREATEALIASname=cxcollections=c1,c2;http://localhost:8983/solr/admin/collections?action=CREATEALIASname=cxcollections=c1,c2 ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime134/int/lst /response # Solr query using the alias fails with Error 503: no servers hosting shard http://localhost:8983/solr/cx/select?q=*:* responselst name=responseHeaderint name=status503/intint name=QTime2/intlst name=paramsstr name=q*:*/str/lst/lstlst name=errorstr name=msgno servers hosting shard: /strint name=code503/int/lst/response # Solr logs: 3503223 [qtp724646150-11] ERROR org.apache.solr.core.SolrCore ? org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 3503224 [qtp724646150-11] INFO org.apache.solr.core.SolrCore ? [c1] webapp=/solr path=/select params={q=*:*} status=503 QTime=2 # solr.xml ?xml version=1.0 encoding=UTF-8 ? solr persistent=true sharedLib=lib cores host=${host:} adminPath=/admin/cores hostPort=${jetty.port:} hostContext=${hostContext:solr} core shard=shard1 instanceDir=c1/ name=c1 collection=c1/ core shard=shard1 instanceDir=c2/ name=c2 collection=c2/ /cores /solr # zookeeper alias (same from solr/cloud UI): [zk: localhost:2181(CONNECTED) 10] get /myroot/aliases.json {collection:{ cx:c1,c2}} cZxid = 0x110d ctime = Fri Sep 13 17:25:18 PDT 2013 mZxid = 0x18d1 mtime = Mon Sep 16 16:31:21 PDT 2013 pZxid = 0x110d cversion = 0 dataVersion = 19 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 119 numChildren = 0 BTW, I've spent a lot of time figuring out how to make zookeeper and solr work together. The commands are not complex, but making them work sometimes requires a lot of digging online, to figure out missing jars for zkCli.sh, etc. I know a lot of things are changing since Solr 4.0, but I really hope the Solr documentation can be better maintained, so that people won't have to spend tons of hours figuring out simple steps (albeit complex under the hood) like this. Thanks! This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
Re: Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents
Hi 50m docs across 18 servers 48gb RAM ain't much. I doubt you are hitting any limits in lucene or solr. How heavy is your index rate? Otis Solr ElasticSearch Support http://sematext.com/ On Sep 17, 2013 5:25 PM, Furkan KAMACI furkankam...@gmail.com wrote: Currently I hafer over 50+ millions documents at my index and as I mentiod before at another question I have some problems while indexing (jetty EOF exception) I know that problem may not be about index size but just I want to learn that is there any limit for document size at Solr that if I exceed it I can have some problems? I am not talking about the theoretical limit. What are the maximim index size for folks and what they to handle heavy index rate when having millions of documents. What tuning strategies they do? PS: I have 18 machines, 9 shards, each machine has 48 GB RAM and I use Solr 4.2.1 for my SolrCloud.
Re: Some text not indexed in solr4.4
Utkarsh, Check to see if the value is actually indexed into the field by using the Terms request handler: http://localhost:8983/solr/terms?terms.fl=textterms.prefix=d (adjust the prefix to whatever you're looking for) This should get you going in the right direction. Jason On Sep 17, 2013, at 2:20 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: I have a copyField called allText with type text_general: https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68 I have ~100 documents which have the text: dyson and dc44 or dc41 etc. For example: title: Dyson DC44 Animal Digital Slim Cordless Vacuum description: The DC44 Animal is the new Dyson Digital Slim vacuum cleaner the cordless machine that doesn’t lose suction. It has been engineered for floor to ceiling cleaning. DC44 Animal has a detachable long-reach wand which is balanced for floor to ceiling cleaning. The motorized floor tool has twice the power of the DC35 floor tool to drive the bristles deeper into the carpet pile with more force. It attaches to the wand or directly to the machine for cleaning awkward spaces. The brush bar has carbon fiber filaments for removing fine dust from hard floors. DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode. Powered by the Dyson digital motor DC44 Animal has a fade-free nickel manganese cobalt battery and Root Cyclone technology for constant powerful suction., UPC: 0879957006362 The documents are indexed. Analysis says its indexeD: http://i.imgur.com/O52ino1.png But when I search for allText:dyson dc44 I get no results, response: http://pastie.org/8334220 Any suggestions about the problem? I am out of ideas about how to debug this. -- Thanks, -Utkarsh
Querying a non-indexed field?
Hello, Is it possible to restrict query results using a non-indexed, stored field? e.g. I might index fewer fields to reduce the index size. I query on a few indexed fields, getting a small # of results. I want to restrict this further based on values from non-indexed, stored fields. I can obviously do this myself, but it would be nice if Solr could do this for me. Thanks, Scott
Re: Querying a non-indexed field?
No. --wunder On Sep 17, 2013, at 5:16 PM, Scott Schneider wrote: Hello, Is it possible to restrict query results using a non-indexed, stored field? e.g. I might index fewer fields to reduce the index size. I query on a few indexed fields, getting a small # of results. I want to restrict this further based on values from non-indexed, stored fields. I can obviously do this myself, but it would be nice if Solr could do this for me. Thanks, Scott
Re: how to make sure all the index docs flushed to the index files
Hi Erick and Shawn Thanks a lot 2013/9/17 Erick Erickson erickerick...@gmail.com Here's a blog about tlogs and commits: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ And here's Mike's excellent segment merging blog http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Best, Erick On Tue, Sep 17, 2013 at 6:36 AM, Shawn Heisey s...@elyograg.org wrote: On 9/17/2013 12:32 AM, YouPeng Yang wrote: Hi Another werid problem. When we setup the autocommit properties, we suppose that the index fille will created every commited.So that the size of the index files will be large enough. We do not want to keep too many small files as [1]. How to control the size of the index files. An index segment gets created after every hard commit. In the listing that you sent, all the files starting with _28w are a single segment. All the files starting with _28x are another segment. Solr should be merging the segments when you get enough of them, unless you have incorrectly set up your merge policy. The default number of segments that get merged is ten. When you get ten segments, they will be merged down to one. This repeats until you have ten merged segments. At that point, those ten merged segments will be merged to make an even larger segment. You can bump up the number of open files allowed by your operating system. On Linux, this is controlled by the /etc/security/limits.conf file. Here are some example config lines for that file: elyograghardnofile 6144 elyogragsoftnofile 4096 roothardnofile 6144 rootsoftnofile 4096 Alternatively, you can reduce the required number of files if you turn on the UseCompoundFile setting, which is in the IndexConfig section. This causes Solr to create a single file per index segment instead of several files per segment. The compound file may be slightly less efficient, but the difference is likely to be very small. https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
Solr SpellCheckComponent only shows results with certain fields
I'm trying to get the Solr SpellCheckComponent working but am running into some issues. When I run .../solr/collection1/select?q=%3Awt=jsonindent=true These results are returned { responseHeader: { status: 0, QTime: 1, params: { indent: true, q: *:*, _: 1379457032534, wt: json } }, response: { numFound: 2, start: 0, docs: [ { enterprise_name: because, name: doc1, enterprise_id: 100, _version_: 1446463888248799200 }, { enterprise_name: what, name: RZTEST, enterprise_id: 102, _version_: 1446464432735518700 } ] } } Those are the values that I have indexed. Now when I want to query for spelling I get some weird results. When I run .../solr/collection1/select?q=name%3Arxtestwt=jsonindent=truespellcheck=true The results are accurate and I get { responseHeader:{ status:0, QTime:4, params:{ spellcheck:true, indent:true, q:name:rxtest, wt:json}}, response:{numFound:0,start:0,docs:[] }, spellcheck:{ suggestions:[ rxtest,{ numFound:1, startOffset:5, endOffset:11, suggestion:[rztest]}]}} Anytime I run a query without the name values I get 0 results back. /solr/collection1/select?q=enterprise_name%3Abecauswt=jsonindent=truespellcheck=true { responseHeader:{ status:0, QTime:5, params:{ spellcheck:true, indent:true, q:enterprise_name:becaus, wt:json}}, response:{numFound:0,start:0,docs:[] }, spellcheck:{ suggestions:[]}} My guess is that there is something wrong in my scheme but everything looks fine. Schema.xml field name=name type=text_general indexed=true stored=true/ field name=enterprise_id type=string indexed=true stored=true required=true / field name=enterprise_name type=text_general indexed=true stored=true/ field name=text type=text_general indexed=true stored=false multiValued=true / dynamicField name=*_t type=text_generalindexed=true stored=true/ dynamicField name=*_txt type=text_general indexed=true stored=true multiValued=true/ dynamicField name=attr_* type=text_general indexed=true stored=true multiValued=true/ copyField source=name dest=text/ copyField source=enterprise_name dest=text/ fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType solrconfig.xml requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count5/str /lst arr name=last-components strspellcheck/str /arr requestHandler searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namedefault/str str name=classnamesolr.IndexBasedSpellChecker/str str name=fieldname/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.5/str float name=thresholdTokenFrequency.0001/float str name=buildOnCommittrue/str /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldname/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges3/int str name=buildOnCommittrue/str /lst str name=queryAnalyzerFieldTypetext_general/str /searchComponent Any help would be appreciated. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-SpellCheckComponent-only-shows-results-with-certain-fields-tp4090727.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud liveness problems
SOLR-5243 and SOLR-5240 will likely improve the situation. Both fixes are in 4.5 - the first RC for 4.5 will likely come tomorrow. Thanks to yonik for sussing these out. - Mark On Sep 17, 2013, at 2:43 PM, Mark Miller markrmil...@gmail.com wrote: On Sep 17, 2013, at 12:00 PM, Vladimir Veljkovic vladimir.veljko...@boxalino.com wrote: Hello there, we have following setup: SolrCloud 4.4.0 (3 nodes, physical machines) Zookeeper 3.4.5 (3 nodes, physical machines) We have a number of rather small collections (~10K or ~100K of documents), that we would like to load to all Solr instances (numShards=1, replication_factor=3), and access them through local network interface, as the load balancing is done in layers above. We can live (and we actually do it in the test phase) with updating the entire collections whenever we need it, switching collection aliases and removing the old collections. We stumbled across following problem: as soon as all three Solr nodes become a leader to at least one collection, restarting any node makes it completely unresponsive (timeout), both though admin interface and for replication. If we restart all solr nodes the cluster end up in some kind of deadlock and only remedy we found is Solr clean installation, removing ZooKeeper data and re-posting collections. Apparently, leader is waiting for replicas to come up and they try to synchronize but timeout on http requests, so everything ends up in some kind of dead lock, maybe related to: https://issues.apache.org/jira/browse/SOLR-5240 Yup, that sounds exactly what you would expect with SOLR-5240. A fix for that is coming in 4.5, which is a probably a week or so away. Eventually (after few minutes), leader takes over, mark collections active but remains blocked on http interface, so other nodes can not synchronize. In further tests, we loaded 4 collections with numShards=1 and replication_factor=2. By chance, one node become the leader for all 4 collections. Restarting the node which was not the leader is done without the problem, but when we restarted the leader it happened that: - leader shut down, other nodes became leaders of 2 collections each - leader starts up, 3 collections on it become active, one collection remains ”down” and node becomes unresponsive and timeouts on http requests. Hard to say - I'll experiment with 4.5 and see if I can duplicate this. - Mark As this behavior is completely unexpected for one cluster solution, I wonder if somebody else experienced same problems or we are doing something entirely wrong. Best regards -- Vladimir Veljkovic Senior Java Entwickler Boxalino AG vladimir.veljko...@boxalino.com www.boxalino.com Tuning Kit for your Online Shop Product Search - Recommendations - Landing Pages - Data intelligence - Mobile Commerce
FAcet with values are displayes in output
Hi , Im using solr 4.4 for our search. When i query for a keyword, it returns empty valued facets in the response lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=Country *int name=1/int* int name=USA1/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst I have also tried using facet.missing parameter., but no change. How can we handle this. Thanks, Prasi
how can I use DataImportHandler on multiple MySQL databases with the same schema?
Hi all Our system has distributed MySQL databases, we create a database for every customer signed up and distributed it to one of our MySQL hosts. We currently use lucene core to perform search on these databases, and we write java code to loop through these databases and convert the data to lucene index. Right now we are planning to move to Solr for distribution, and I am doing investigation on it. I tried to use DataImportHandlerhttp://wiki.apache.org/solr/DataImportHandler in the wiki page, but I can't figured out a way to use multiple datasoures with the same schema. The other question is, we have the database connection data in one table, can I create datasource connections info from it, and loop through the databases using DataImporter? If DataImporter isn't working, is there a way to feed data to solr using customized SolrRequestHandler without using SolrJ? If neither of these two ways is working, I think I am going to reuse the DAO of the old project and feed the data to solr using SolrJ, probably using embedded Solr server. Your help will be much of my appreciation. http://wiki.apache.org/solr/DataImportHandlerFaq-- All the best Liu Bo
Re: how can I use DataImportHandler on multiple MySQL databases with the same schema?
You can create multiple entities in DIH definition and they will all run. Means duplicating the mapping definition apart from dataSource name, but is doable. Alternatively, the configuration file is read on every call to DIH. You can edit file between different invocations or autogenerate different files from common template and pass the name as parameter. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Sep 18, 2013 at 10:39 AM, Liu Bo diabl...@gmail.com wrote: Hi all Our system has distributed MySQL databases, we create a database for every customer signed up and distributed it to one of our MySQL hosts. We currently use lucene core to perform search on these databases, and we write java code to loop through these databases and convert the data to lucene index. Right now we are planning to move to Solr for distribution, and I am doing investigation on it. I tried to use DataImportHandler http://wiki.apache.org/solr/DataImportHandler in the wiki page, but I can't figured out a way to use multiple datasoures with the same schema. The other question is, we have the database connection data in one table, can I create datasource connections info from it, and loop through the databases using DataImporter? If DataImporter isn't working, is there a way to feed data to solr using customized SolrRequestHandler without using SolrJ? If neither of these two ways is working, I think I am going to reuse the DAO of the old project and feed the data to solr using SolrJ, probably using embedded Solr server. Your help will be much of my appreciation. http://wiki.apache.org/solr/DataImportHandlerFaq-- All the best Liu Bo