Re: what happends with slave during repliacation?
Hi Amanda, we don't use solr cloud jet, just 3 dedicated server. When it comes to distribution the choice will be either solr cloud or elastic search. But currently we use unix shell scripts with ssh for switching. Easy, simple, stable :-) Regards, Bernd Am 21.09.2012 16:03, schrieb yangqian_nj: Hi Bernd, You mentioned: Only one slave is online the other is for backup. The backup gets replicated first. After that the servers will be switched and the online becomes backup. Do you please let us know how to do you do the Switch? We use SWAP to switch in solr cloud. After SWAP, when we query, from the tomcat log, we could see the query actually go to both cores for some reason. Thanks, Amanda -- View this message in context: http://lucene.472066.n3.nabble.com/what-happends-with-slave-during-repliacation-tp4009100p4009417.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Return only matched multiValued field
Hi It seems like highlighting feature. 24.09.2012 0:51 пользователь Dotan Cohen dotanco...@gmail.com написал: Assuming a multivalued, stored and indexed field with name comment. When performing a search, I would like to return only the values of comment which contain the match. For example: When searching for gold instead of getting this result: doc arr name=comment strTheres a lady whos sure/str strall that glitters is gold/str strand shes buying a stairway to heaven/str /arr /doc I would prefer to get this result: doc arr name=comment strall that glitters is gold/str /arr /doc (psuedo-XML from memory, may not be accurate but illustrates the point) Is there any way to do this with a Solr 4 index? The client accessing Solr is on a dial-up connection (no provision for DSL or other high speed internet) so I'd like to move as little data over the wire as possible. In reality, the array will have tens of fields so returning only the relevant fields may reduce the data transferred by an order of magnitude. Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Solrcloud not reachable and after restart just a no servers hosting shard
Hi, I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow, so that it wasn't reachable. CPU load was 100%. After a restart i couldn't access the data it just telled me: no servers hosting shard Is there a way to get the data back? Thanks regards Daniel
Re: Solrcloud not reachable and after restart just a no servers hosting shard
hi, Can you share a little bit more about your configuration: how many shards, # of replicas, how does your clusterstate.json look like, anything suspicious in the logs? -- Sami Siren On Mon, Sep 24, 2012 at 11:13 AM, Daniel Brügge daniel.brue...@gmail.com wrote: Hi, I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow, so that it wasn't reachable. CPU load was 100%. After a restart i couldn't access the data it just telled me: no servers hosting shard Is there a way to get the data back? Thanks regards Daniel
Solr - Remove specific punctuation marks
Hi; I am working with apache-solr-3.6.0 on windows machine. I would like to remove all punctuation marks before indexing except the colon and the full-stop. I tried: fieldType name=text_ar class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=[\p{Punct}[^\.^\:]] replacement= replace=all/ /analyzer /fieldType But it didn't work. Any Ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795.html Sent from the Solr - User mailing list archive at Nabble.com.
solrcloud without realtime
its possible to use solrcloud but without real-time features? In my application I do not need realtime features and old style processing should be more efficient.
RE: Nodes cannot recover and become unavailable
It seems my clusterstate.json is still old. Is there a method to recreate is without taking all nodes down at the same time? -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Thu 20-Sep-2012 10:14 To: solr-user@lucene.apache.org Subject: RE: Nodes cannot recover and become unavailable Hi - at first i didn't recreate the Zookeeper data but i got it to work. I'll check the removal of the LOG line. thanks -Original message- From:Sami Siren ssi...@gmail.com Sent: Wed 19-Sep-2012 17:45 To: solr-user@lucene.apache.org Subject: Re: Nodes cannot recover and become unavailable also, did you re create the cluster after upgrading to a newer version? I believe there were some changes made to the clusterstate.json recently that are not backwards compatible. -- Sami Siren On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren ssi...@gmail.com wrote: Hi, I am having troubles understanding the reason for that NPE. First you could try removing the line #102 in HttpClientUtility so that logging does not prevent creation of the http client in SyncStrategy. -- Sami Siren On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Since the 2012-09-17 11:10:41 build shards start to have trouble coming back online. When i restart one node the slices on the other nodes are throwing exceptions and cannot be queried. I'm not sure how to remedy the problem but stopping a node or restarting it a few times seems to help it. The problem is when i restart a node, and it happens, i must not restart another node because that may trigger other slices becoming unavailable. Here are some parts of the log: 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - [main-EventThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Error while trying to recover. core=oi_i:org.apache.solr.common.SolrException: We are not the leader at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again... core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - I give up. core=oi_i 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error: java.lang.NullPointerException ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : http://nl10.host:8080/solr/oi_i/: Could not tell a replica to recover:java.lang.NullPointerException at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:155) at org.apache.solr.client.solrj.impl.HttpSolrServer.init(HttpSolrServer.java:128) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262) at org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272) at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203) at org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125) at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87) at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275) at
Re: Return only matched multiValued field
field name=doctest type=textmulti stored=true indexed=true multiValued=true / /fields defaultSearchFielddoctest/defaultSearchField Note that in anonymizing the information, I introduced a typo. The above doctest should be doctext. In any case, the field names in the production application and in production schema do in fact match! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Items disappearing from Solr index
Hi, I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer to index and delete items from solr. I basically index items from the db into solr every night. Existing items can be marked for deletion in the db and a delete request sent to solr to delete such items. My process runs as follows every night: 1. Check if items have been marked for deletion and delete from solr. I commit and optimize after the entire solr deletion runs. 2. Index any new items to solr. I commit and optimize after all the new items have been added. Recently i started noticing that huge chunks of items that have not been marked for deletion are disappearing from the index. I checked the solr logs and the logs indicate that it is deleting exactly the number of items requested but still a lot of other items disappear from the index from time to time. Any ideas what might be causing this or what i am doing wrong. Thanks.
Understanding autoSoftCommit
Hi On my windows workstation I have tried to index a document into a SolrCloud instance with the following special configuration: autoCommit maxTime120/maxTime /autoCommit autoSoftCommit maxTime60/maxTime /autoSoftCommit ... updateLog str name=dir${solr.data.dir:}/str /updateLog That is commit every 20 minutes and soft commit every 10 minutes. Right after indexing I can find the document using /get (and not using /search) and after 10 minutes I can find it as well using /search. If I stop Solr using Ctrl+C or kill -9 (from my cygwin console) before the 10 minutes have passed and starts Solr again then I can find the document using both /get and /search. Are there any scenarios where I will loose an indexed document before either commit or soft commit is trigged? And does the transaction log have anything to do with this... Thanks in advance. Best regards Trym
Range operator problems in Chef ( automating framework)
Hi Everyone! We doing some nice stuff with Chef (http://wiki.opscode.com/display/chef/Home). It uses solr for search but range queries don't work as expected. Maybe chef, solr just buggy or I am doing it wrong ;-) In chef I have bunch of nodes witch timestamp attribute. Now want search nodes with have timestamp not older than on hour: search(:node, role:JSlave AND ohai_time:[NOW-1HOUR TO *]) Is this string in the call a solr compliant range expression at all? Unluckily, I have no toys at hand to verify this myself at the moment... but I work on this. Thanks for reading! ^^ Kind Regards, Christian Bordis
Performance Degradation on Migrating from 1.3 to solr 3.6.1
Hi, On migrating from 1.3 to 3.6.1 , I see the query performance degrading by nearly 2 times for all types of query. Indexing performance slight degradation over 1.3 For Indexing we use our custom scripts that post xml over HTTP. Any thing that I might have missed . I am thinking that this might be due to new Tiered MP over LogByteSize creating more segment files and hence more the Query latency .We are using Compound Files in 1.3 and I have set this to true even in 3.6.1 ,but results in more segement files On optimizing the query response time improved beyond 1.3 .So could it be the MP or am i missing something here . Do let me know Please find attached the solrconfig.xml Regards Sujatha ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- config abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError luceneMatchVersionLUCENE_36/luceneMatchVersion !-- The DirectoryFactory to use for indexes. solr.StandardDirectoryFactory, the default, is filesystem based. solr.RAMDirectoryFactory is memory based, not persistent, and doesn't work with replication. -- directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ indexConfig !-- Values here affect all index writers and act as a default unless overridden. -- useCompoundFiletrue/useCompoundFile mergeFactor4/mergeFactor maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout !-- mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy / -- !-- mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce4/int int name=segmentsPerTier4/int /mergePolicy -- lockTypesingle/lockType /indexConfig jmx / updateHandler class=solr.DirectUpdateHandler2 / query maxBooleanClauses10/maxBooleanClauses filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=1024 / queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=1024/ documentCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoading useFilterForSortedQuerytrue/useFilterForSortedQuery queryResultWindowSize50/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached useColdSearcherfalse/useColdSearcher maxWarmingSearchers2/maxWarmingSearchers /query requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=2048 / httpCaching never304=true / !-- httpCaching lastModifiedFrom=openTime etagSeed=Solr /httpCaching -- /requestDispatcher requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str -- /lst /requestHandler requestHandler name=/update class=solr.XmlUpdateRequestHandler / !-- requestHandler name=/analysis class=solr.AnalysisRequestHandler / -- requestHandler name=/admin/ class=solr.admin.AdminHandlers / requestHandler name=/analysis/field startup=lazy class=solr.FieldAnalysisRequestHandler / requestHandler name=/analysis/document class=solr.DocumentAnalysisRequestHandler startup=lazy / requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qsolrpingquery/str /lst lst name=defaults str name=echoParamsall/str /lst /requestHandler !-- Echo the request contents back to the client -- requestHandler name=/debug/dump class=solr.DumpRequestHandler lst name=defaults str name=echoParamsexplicit/str !-- for all params (including the default etc) use: 'all' -- str
Re: Help with new Join Functionallity in Solr 4.0
NP, good luck! On Sun, Sep 23, 2012 at 3:41 PM, milen.ti...@materna.de wrote: Hello Erick, Thanks a lot for your reply! Your suggestion is actually exactly the alternative solution we are thinking about and with your clarification on Solr's performance we are going to go for it! Many thanks again! Milen Von: Erick Erickson [erickerick...@gmail.com] Gesendet: Sonntag, 23. September 2012 17:50 An: solr-user@lucene.apache.org Betreff: Re: Help with new Join Functionallity in Solr 4.0 The very first thing to try is flatten your data so you don't have to use joins. I know that goes against your database instincts, but Solr easily handles millions and millions of documents. So if the cross-product of docs and modules isn't prohibitive, that's what I'd do first. Then it's just a matter of forming a search without joins Joins run into performance issues when the join field has many unique values, unfortunately the field people often want to join on is something like a uniqueKey (or PK in RDBMS terms), so be aware of that. Best Erick On Fri, Sep 21, 2012 at 5:46 AM, milen.ti...@materna.de wrote: Dear Solr community, I am rather new to Solr, however I already find it kind of attractive. We are developing a research application, which contains a Solr index with three different kinds of documents, here the basic idea: - A document of type doc consisting of fields id, docid, doctitle and some other metadata - A document of type module consisting of fields id, modid and text - A document of type docmodule consisting of fields id, docrefid, modrefid and some metadata about the relation between a document and a module; filed docrefid refers to the id of a doc document, while field modrefid contains the id of a module document In other words, in our model there are documents (type doc) consisting of several modules and there is some characterization of each link between a document and a module. Almost all fields of a doc document are searchable, as well as the text of a module and the metadata of the docmodule entries. We are looking for a fast way to retrieve all modules containing a certain text and associated with a given document, preferably with a single query. This means we want to query the text from a module document while we set a restriction on the docrefid from a docmodule or the id from a doc document. Is this possible by means of the new pseudo joins? Any ideas are highly appreciated! Thanks in advance! Milen Tilev Master of Science Softwareentwickler Business Unit Information MATERNA GmbH Information Communications Voßkuhle 37 44141 Dortmund Deutschland Telefon: +49 231 5599-8257 Fax: +49 231 5599-98257 E-Mail: milen.ti...@materna.demailto:milen.ti...@materna.de www.materna.dehttp://www.materna.de/ | Newsletterhttp://www.materna.de/newsletter | Twitterhttp://twitter.com/MATERNA_GmbH | XINGhttp://www.xing.com/companies/MATERNAGMBH | Facebookhttp://www.facebook.com/maternagmbh Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig Amtsgericht Dortmund HRB 5839
Re: solrcloud without realtime
I'm pretty sure all you need to do is disable autoSoftCommit. Or rather don't un-comment it from solrconfig.xml Best Erick On Mon, Sep 24, 2012 at 5:44 AM, Radim Kolar h...@filez.com wrote: its possible to use solrcloud but without real-time features? In my application I do not need realtime features and old style processing should be more efficient.
Re: Return only matched multiValued field
Hmmm, works for me. What is your entire response packet? And you've covered the bases with indexed and stored so this seems like it _should_ work. Best Erick On Mon, Sep 24, 2012 at 6:12 AM, Dotan Cohen dotanco...@gmail.com wrote: field name=doctest type=textmulti stored=true indexed=true multiValued=true / /fields defaultSearchFielddoctest/defaultSearchField Note that in anonymizing the information, I introduced a typo. The above doctest should be doctext. In any case, the field names in the production application and in production schema do in fact match! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Items disappearing from Solr index
How do you delete items? By ID or by query? My guess is that one of two things is happening: 1 your delete process is deleting too much data. 2 your index process isn't indexing what you think. I'd add some logging to the SolrJ program to see what it thinks is has deleted or added to the index and go from there. Best Erick On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer to index and delete items from solr. I basically index items from the db into solr every night. Existing items can be marked for deletion in the db and a delete request sent to solr to delete such items. My process runs as follows every night: 1. Check if items have been marked for deletion and delete from solr. I commit and optimize after the entire solr deletion runs. 2. Index any new items to solr. I commit and optimize after all the new items have been added. Recently i started noticing that huge chunks of items that have not been marked for deletion are disappearing from the index. I checked the solr logs and the logs indicate that it is deleting exactly the number of items requested but still a lot of other items disappear from the index from time to time. Any ideas what might be causing this or what i am doing wrong. Thanks.
Solr is not Indexing after Mysql Upgradation
Indexing is not happening after 'x' documents. I am using Bitnami and had upgraded Mysql server from Mysql 5.1.* to Mysql 5.5.* version. After up gradation when I ran indexing on solr, it not get indexed. I am using a procedure in which i am finding the parent of a child and inserting it in a table which uses MyISAM as memory. Individually when i ran the procedure in Mysql 5.1.* and Mysql 5.5.* it works fine both the cases. I am calling a procedure in solr after that I am executing some sql statements from the table I have created in the above procedure. When I ran both procedure in query together the data is not getting indexed but if I ran procedure separately in solr with out executing the queries it works fine, and if i comment the procedure and ran the queries it also works fine but when I ran both together, the data didn't get index. Can anyone recommend some solution to this? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-Indexing-after-Mysql-Upgradation-tp4009790.html Sent from the Solr - User mailing list archive at Nabble.com.
Splitting up a location to make it searchable
I am using Google for location input. *It often splits out something like this:* Shorewood, Seattle, Wa *Since I am using this index analyzer:* filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=30 / It means that if I search for Sho or Shorew I get the result I want. However, if I search for “Sea” or “Seatt” I get no results. I guess I need to break the location down into “Shorewood” “Seattle” “Wa” instead of “Shorewood, Seattle, Wa” Can this be done easily and efficiently within Solr, perhaps as an index analyzer? -- View this message in context: http://lucene.472066.n3.nabble.com/Splitting-up-a-location-to-make-it-searchable-tp4009825.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr - Remove specific punctuation marks
Hi Daisy, I can't see anything wrong with the regex or the XML syntax. One possibility: if it's Arabic you're matching against, you may want to add ARABIC FULL STOP U+06D4 to the set you subtract from \p{Punct}. If you give an example of your input and your expected output, I might be able to help more. Steve -Original Message- From: Daisy [mailto:omnia.za...@gmail.com] Sent: Monday, September 24, 2012 5:08 AM To: solr-user@lucene.apache.org Subject: Solr - Remove specific punctuation marks Hi; I am working with apache-solr-3.6.0 on windows machine. I would like to remove all punctuation marks before indexing except the colon and the full-stop. I tried: fieldType name=text_ar class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=[\p{Punct}[^\.^\:]] replacement= replace=all/ /analyzer /fieldType But it didn't work. Any Ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr - Remove specific punctuation marks
Yes I am trying to index Arabic document. There is a problem that the regex couldn't be understood in the solr schema and it gives 500 - code error. Here is an example: input: هذا مثال: للتوضيح (مثال علي علامات الترقيم) انتهي. I tried also the regex: pattern=([\(\)\}\{\,[^.:\s+\S+]]) but I failed to remove the bracutes from the text above, when i searched for a bracket I found result. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009830.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr - Remove specific punctuation marks
-Original message- From:Daisy omnia.za...@gmail.com Sent: Mon 24-Sep-2012 15:09 To: solr-user@lucene.apache.org Subject: RE: Solr - Remove specific punctuation marks Yes I am trying to index Arabic document. There is a problem that the regex couldn't be understood in the solr schema and it gives 500 - code error. The config is XML. Try encoding the ampersand as amp; Here is an example: input: هذا مثال: للتوضيح (مثال علي علامات الترقيم) انتهي. I tried also the regex: pattern=([\(\)\}\{\,[^.:\s+\S+]]) but I failed to remove the bracutes from the text above, when i searched for a bracket I found result. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Return only matched multiValued field
On Mon, Sep 24, 2012 at 2:16 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, works for me. What is your entire response packet? And you've covered the bases with indexed and stored so this seems like it _should_ work. I'm sorry, reducing the output to rows=1 helped me notice that the highlighted sections come after the main results. The highlighting feature works as expected. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: solrcloud without realtime
Dne 24.9.2012 14:05, Erick Erickson napsal(a): I'm pretty sure all you need to do is disable autoSoftCommit. Or rather don't un-comment it from solrconfig.xml and what about solr.NRTCachingDirectoryFactory? Is solr.MMapDirectoryFactory faster if there is no NRT search requirements?
RE: Solr - Remove specific punctuation marks
I tried amp; and it solved the 500 error code. But still it could find punctuation marks. Although the parsed query didnt contain the punctuation mark, str name=rawquerystring{/str str name=querystring{/str str name=parsedquerytext:/str str name=parsedquery_toStringtext:/str but still the numfound gives 1 result name=response numFound=1 start=0 and the highlight shows the result of punctuation mark em{/em The steps I did: 1- editing the schema 2- restart the server 3-delete the file 4-index the file -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Return only matched multiValued field
On Mon, Sep 24, 2012 at 9:47 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hi It seems like highlighting feature. Thank you Mikhail. I actually do need the entire matched single entry, not a snippet of it. Looking at the example in the OP, with highlighting on gold I would get emglitters is gold/em Whereas I need: strall that glitters is gold/str Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Range operator problems in Chef ( automating framework)
That looks like a valid Solr date math expression, but you need to make sure that the field type is actually a Solr DateField as opposed to simply an integer Unix time value. -- Jack Krupansky -Original Message- From: Christian Bordis Sent: Monday, September 24, 2012 7:16 AM To: solr-user@lucene.apache.org Subject: Range operator problems in Chef ( automating framework) Hi Everyone! We doing some nice stuff with Chef (http://wiki.opscode.com/display/chef/Home). It uses solr for search but range queries don't work as expected. Maybe chef, solr just buggy or I am doing it wrong ;-) In chef I have bunch of nodes witch timestamp attribute. Now want search nodes with have timestamp not older than on hour: search(:node, role:JSlave AND ohai_time:[NOW-1HOUR TO *]) Is this string in the call a solr compliant range expression at all? Unluckily, I have no toys at hand to verify this myself at the moment... but I work on this. Thanks for reading! ^^ Kind Regards, Christian Bordis
Re: solrcloud without realtime
On Mon, Sep 24, 2012 at 9:21 AM, Radim Kolar h...@filez.com wrote: and what about solr.NRTCachingDirectoryFactory? Is solr.MMapDirectoryFactory faster if there is no NRT search requirements? NRTCachingDirectoryFactory is a wrapping directory - it's generally going to use solr.MMapDirectoryFactory as it's delegate anyhow. It should not hurt performance if you are not using NRT though. -- - Mark
Re: Qtime Vs DebugComponent Timing
And QTime doesn't include the time spent in the container (e.g., Tomcat or Jetty) or network latency. Usually a query benchmark would be from the time the client sent the query request until the time the client received the query results. The debug timing will help you understand which Solr components are consuming the time. For example, the highlighter, but that is still part of overal query processing time. -- Jack Krupansky -Original Message- From: Sujatha Arun Sent: Monday, September 24, 2012 8:58 AM To: solr-user@lucene.apache.org Subject: Qtime Vs DebugComponent Timing Whats the difference between the QTime that gets returned in the xml results vs the debugComponet Qparser timing break up and which one should be considered for benchmarking performance of solr? I understand that Qtime is total time take by solr to execute a search in ms . But Qparser breakup in the debugComponent does not exactly reflect this ...So which one should be used for benchmark purpose? Regards
Re: Understanding autoSoftCommit
autoCommit (hard commit) is basically just to reduce how much RAM is needed for the transaction log. You should generally use it with openSearcher=false and don't need to use it for visibility. It's also not required for durability due to the transaction log. Soft commit should be used for visibility. It's also got nothing to do with durability. For durability, the idea is that if Solr accepts your update, it's in. And yes, the transaction log is part of that. - Mark On Mon, Sep 24, 2012 at 7:12 AM, Trym R. Møller t...@sigmat.dk wrote: Hi On my windows workstation I have tried to index a document into a SolrCloud instance with the following special configuration: autoCommit maxTime120/maxTime /autoCommit autoSoftCommit maxTime60/maxTime /autoSoftCommit ... updateLog str name=dir${solr.data.dir:}/str /updateLog That is commit every 20 minutes and soft commit every 10 minutes. Right after indexing I can find the document using /get (and not using /search) and after 10 minutes I can find it as well using /search. If I stop Solr using Ctrl+C or kill -9 (from my cygwin console) before the 10 minutes have passed and starts Solr again then I can find the document using both /get and /search. Are there any scenarios where I will loose an indexed document before either commit or soft commit is trigged? And does the transaction log have anything to do with this... Thanks in advance. Best regards Trym -- - Mark
Solr Cell Questions
Hi, Im currently experimenting with Solr Cell to index files to Solr. During this some questions came up. 1. Is it possible (and wise) to connect to Solr Cell with multiple Threads at the same time to index several documents at the same time? This question came up because my prrogramm takes about 6hours to index round 35000 docs. (no production environment, only example solr and a little desktop machine but I think its very slow, and I know solr isn't the bottleneck (yet)) 2. If 1 is possible, how many Threads should do this and how many memory Solr needs? I've tried it but i run into an out of memory exception. Thanks in advantage Best Regards Johannes
Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1
Run a query on both old and new with debugQuery=true on your query request and look at the component timings for possible insight. -- Jack Krupansky From: Sujatha Arun Sent: Monday, September 24, 2012 7:26 AM To: solr-user@lucene.apache.org Subject: Performance Degradation on Migrating from 1.3 to solr 3.6.1 Hi, On migrating from 1.3 to 3.6.1 , I see the query performance degrading by nearly 2 times for all types of query. Indexing performance slight degradation over 1.3 For Indexing we use our custom scripts that post xml over HTTP. Any thing that I might have missed . I am thinking that this might be due to new Tiered MP over LogByteSize creating more segment files and hence more the Query latency .We are using Compound Files in 1.3 and I have set this to true even in 3.6.1 ,but results in more segement files On optimizing the query response time improved beyond 1.3 .So could it be the MP or am i missing something here . Do let me know Please find attached the solrconfig.xml Regards Sujatha
Re: Solrcloud not reachable and after restart just a no servers hosting shard
Right - we need logs, admin-cloud dump to clipboard info, anything else to go on. On Mon, Sep 24, 2012 at 4:36 AM, Sami Siren ssi...@gmail.com wrote: hi, Can you share a little bit more about your configuration: how many shards, # of replicas, how does your clusterstate.json look like, anything suspicious in the logs? -- Sami Siren On Mon, Sep 24, 2012 at 11:13 AM, Daniel Brügge daniel.brue...@gmail.com wrote: Hi, I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow, so that it wasn't reachable. CPU load was 100%. After a restart i couldn't access the data it just telled me: no servers hosting shard Is there a way to get the data back? Thanks regards Daniel -- - Mark
Re: Items disappearing from Solr index
Hi Erick, Thanks for your reply. Yes i am using delete by query. I am currently logging the number of items to be deleted before handing off to solr. And from solr logs i can it deleted exactly that number. I will verify further. Thanks. On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson erickerick...@gmail.comwrote: How do you delete items? By ID or by query? My guess is that one of two things is happening: 1 your delete process is deleting too much data. 2 your index process isn't indexing what you think. I'd add some logging to the SolrJ program to see what it thinks is has deleted or added to the index and go from there. Best Erick On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer to index and delete items from solr. I basically index items from the db into solr every night. Existing items can be marked for deletion in the db and a delete request sent to solr to delete such items. My process runs as follows every night: 1. Check if items have been marked for deletion and delete from solr. I commit and optimize after the entire solr deletion runs. 2. Index any new items to solr. I commit and optimize after all the new items have been added. Recently i started noticing that huge chunks of items that have not been marked for deletion are disappearing from the index. I checked the solr logs and the logs indicate that it is deleting exactly the number of items requested but still a lot of other items disappear from the index from time to time. Any ideas what might be causing this or what i am doing wrong. Thanks.
solrcloud and csv import hangs
Hi, This appears to happen in trunk too. It appears that the add command request parameters get sent to the nodes. If I comment these out like so for add and commit: core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java - params = new ModifiableSolrParams(req.getParams()); + //params = new ModifiableSolrParams(req.getParams()); + params = new ModifiableSolrParams(); This things work as expected. Otherwise params like stream.url gets sent to the replicant nodes which causes failure if the file is missing, or worse repeatedly importing the same file if exists on a replicant. This might not be the right thing to do? ... what should be sent here for a streaming CSV import? Dan On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote: Hi, I'm using Solr 4.0-BETA and trying to import a CSV file as follows: curl http://localhost:8080/solr/core/update -d overwrite=false -d commit=true -d stream.contentType='text/csv;charset=utf-8' -d stream.url=file:///dir/file.csv I have 2 tomcat servers running on different machines and a separate zookeeper quorum (3 zoo servers, 2 on same machine). This is a 1 shard core, replicated to the other machine. It seems that for a 255K line file I have 170 docs on the server that issued the command, but on the other, the index seems to grow unbounded? Has anyone been seen this, or been successful in using the CSV import with solrcloud? Cheers, Dan
Re: need best solution for indexing and searching multiple, related database tables
I'm not sure if this will be relevant for you, but this is roughly what I do. Apologies if it's too basic. I have a complex view that normalizes all the data that I need to be together -- from over a dozen different tables. For one to many and many to many relationships, I have sql turn the data into a comma delimited string which the data import handler and the RegexTransformer will split into a multi-valued field. So, you might have a schema like this: id123/id name_sJohn Smith/name_s attr_products strpython/str strjava/str strjavascript/str /attr_products Often I've found that I don't really need to the data together into one solr core and it works better to just create a separate core just for that schema. -- View this message in context: http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4009879.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrcloud and csv import hangs
On Mon, Sep 24, 2012 at 11:03 AM, dan sutton danbsut...@gmail.com wrote: Hi, This appears to happen in trunk too. It appears that the add command request parameters get sent to the nodes. If I comment these out like so for add and commit: core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java - params = new ModifiableSolrParams(req.getParams()); + //params = new ModifiableSolrParams(req.getParams()); + params = new ModifiableSolrParams(); This things work as expected. Otherwise params like stream.url gets sent to the replicant nodes which causes failure if the file is missing, or worse repeatedly importing the same file if exists on a replicant. This might not be the right thing to do? ... what should be sent here for a streaming CSV import? Dan On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote: Hi, I'm using Solr 4.0-BETA and trying to import a CSV file as follows: curl http://localhost:8080/solr/core/update -d overwrite=false -d commit=true -d stream.contentType='text/csv;charset=utf-8' -d stream.url=file:///dir/file.csv I have 2 tomcat servers running on different machines and a separate zookeeper quorum (3 zoo servers, 2 on same machine). This is a 1 shard core, replicated to the other machine. It seems that for a 255K line file I have 170 docs on the server that issued the command, but on the other, the index seems to grow unbounded? Has anyone been seen this, or been successful in using the CSV import with solrcloud? Yikes! Thanks for investigating, this looks pretty serious. Could you open a JIRA issue for this bug? -Yonik http://lucidworks.com
At a high level how does faceting in SolrCloud work?
I'd like to wrap my head around how faceting in SolrCloud works, does Solr ask each shard for their maximum value and then use that to determine what else should be asked for from other shards, or does it ask for all values and do the aggregation on the requesting server?
Re: need best solution for indexing and searching multiple, related database tables
Could supply some sample user queries and some sample data the queries should match? In other words, how do your users expect to view the data? If you are simply trying to replicate full SQL queries in Solr, you're probably going to be disappointed, but if you look at what queries your users are likely to want to enter, maybe it won't be so bad. And maybe Solr's limited join capabilities might be sufficient to bridge the gap between a single flat schema and many relational tables. http://wiki.apache.org/solr/Join Join support is there, but don't leap before you think carefully. -- Jack Krupansky -Original Message- From: Thomas J. Brennan Sent: Monday, September 24, 2012 10:23 AM To: solr-user@lucene.apache.org Subject: need best solution for indexing and searching multiple, related database tables I have a requirement to search multiple, related database tables. Sometimes I need to join two tables, sometimes three or four and possibly more. The tables will generally store structured data related to individuals or organizations. This will include things like company, contact and address tables and may include other child tables like products, assets, etc. It is something of a moving target. Record counts are commonly in the tens of millions but can be upwards of a few hundred million or even much more. My understanding is that denormalization is most commonly the preferred solution. For two tables that is pretty straightforward. For three or four or more tables, or many to many relationships, and depending on the record counts, this can generate a lot of redundant data, indexing time, etc. Any information on the best way to design a single approach to this problem or any options I might employ like faceted search, NoSQL (based on my limited research I am guessing this is not a solution), etc. would be greatly appreciated. Answers that are terribly obvious, even to a newb, are a tiny bit annoying. Things like you should test several scenarios or there is no one good solution. I really do appreciate any suggestions that would help me solve this problem. Biff P.S. - I did search the existing posts and found some related topics but nothing as specific as I was looking for.
Re: Solr - Remove specific punctuation marks
1. Which query parser are you using? 2. I see the following comment in the Java 6 doc for regex \p{Punct}: POSIX character classes (US-ASCII only), so if any of the punctuation is some higher Unicode character code, it won't be matched/removed. 3. It seems very odd that the parsed query has empty terms - normally the query parsers will ignore terms that analyze to zero tokens. Maybe your { is not an ASCII left brace code and is (apparently) unprintable in the parsed query. Or, maybe there is some encoding problem in the analyzer. -- Jack Krupansky -Original Message- From: Daisy Sent: Monday, September 24, 2012 9:26 AM To: solr-user@lucene.apache.org Subject: RE: Solr - Remove specific punctuation marks I tried amp; and it solved the 500 error code. But still it could find punctuation marks. Although the parsed query didnt contain the punctuation mark, str name=rawquerystring{/str str name=querystring{/str str name=parsedquerytext:/str str name=parsedquery_toStringtext:/str but still the numfound gives 1 result name=response numFound=1 start=0 and the highlight shows the result of punctuation mark em{/em The steps I did: 1- editing the schema 2- restart the server 3-delete the file 4-index the file -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem indexing CSV files using post.jar with multivalue fields
Never fails. Take the time to post this message, only to discover the answer on my own a few minutes later. The solution is to surround the -Durl value in double quotes. For example: java -Durl=http://localhost:8983/solr/contacts/update/csv?f.address.split=truef.address.separator=%7C; -Dtype=text/csv -jar post.jar contact_test.csv works perfectly. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-indexing-CSV-files-using-post-jar-with-multivalue-fields-tp4009905p4009907.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1
Thanks Jack. so Qtime = Sum of all prepare components + sum of all process components - Debug comp process/prepare time In 3.6.1 the process part of Query component for the following query seems to take 8 times more time? anything missing? For most queries the process part of the Querycomponent seem to take more time in 3.6.1 This is *3.6.1 * response lst name=responseHeaderint name=status0/int *int name=QTime33/int* lst name=params str name=debugQueryon/str str name=indenton/strstr name=start0/str *str name=qdifferential AND equations AND has AND one AND solution/str * str name=rows10/strstr name=version2.2/str/lst/lst *Debug Output* * * str name=QParserLuceneQParser/str lst name=timing double name=time33.0/double lst name=prepare double name=time3.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time3.0/double/lst*lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time0.0/double /lst/lstlst name=processdouble name=time30.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time26.0/double/lst* lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time4.0/double/lst *Same query in solr 1.3* * * lst name=responseHeader int name=status0/int *int name=QTime6/int* lst name=params str name=debugQueryon/strstr name=indenton/str str name=start0/strstr name=qdifferential AND equations AND has AND one AND solution/str str name=rows10/strstr name=version2.2/str Debug Info lst name=timing double name=time6.0/double lst name=prepare double name=time1.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time1.0/double/lst* lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time0.0/double/lst/lst lst name=process double name=time5.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time3.0/double/lst* lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time2.0/double *On Mon, Sep 24, 2012 at 7:35 PM, Jack Krupansky j...@basetechnology.comwrote: * Run a query on both old and new with debugQuery=true on your query request and look at the component timings for possible insight. -- Jack Krupansky From: Sujatha Arun Sent: Monday, September 24, 2012 7:26 AM To: solr-user@lucene.apache.org Subject: Performance Degradation on Migrating from 1.3 to solr 3.6.1 Hi, On migrating from 1.3 to 3.6.1 , I see the query performance degrading by nearly 2 times for all types of query. Indexing performance slight degradation over 1.3 For Indexing we use our custom scripts that post xml over HTTP. Any thing that I might have missed . I am thinking that this might be due to new Tiered MP over LogByteSize creating more segment files and hence more the Query latency .We are using Compound Files in 1.3 and I have set this to true even in 3.6.1 ,but results in more segement files On optimizing the query response time improved beyond 1.3 .So could it be the MP or am i missing something here . Do let me know Please find attached the solrconfig.xml Regards Sujatha
Re: Solr - Remove specific punctuation marks
I tried it and PRFF is indeed generating an empty token. I don't know how Lucene will index or query an empty term. I mean, what it should do. In any case, it is best to avoid them. You should be using a charFilter to simply filter raw characters before tokenizing. So, try: charFilter class=solr.PatternReplaceCharFilterFactory/ It has the same pattern and replacement attributes. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, September 24, 2012 12:43 PM To: solr-user@lucene.apache.org Subject: Re: Solr - Remove specific punctuation marks 1. Which query parser are you using? 2. I see the following comment in the Java 6 doc for regex \p{Punct}: POSIX character classes (US-ASCII only), so if any of the punctuation is some higher Unicode character code, it won't be matched/removed. 3. It seems very odd that the parsed query has empty terms - normally the query parsers will ignore terms that analyze to zero tokens. Maybe your { is not an ASCII left brace code and is (apparently) unprintable in the parsed query. Or, maybe there is some encoding problem in the analyzer. -- Jack Krupansky -Original Message- From: Daisy Sent: Monday, September 24, 2012 9:26 AM To: solr-user@lucene.apache.org Subject: RE: Solr - Remove specific punctuation marks I tried amp; and it solved the 500 error code. But still it could find punctuation marks. Although the parsed query didnt contain the punctuation mark, str name=rawquerystring{/str str name=querystring{/str str name=parsedquerytext:/str str name=parsedquery_toStringtext:/str but still the numfound gives 1 result name=response numFound=1 start=0 and the highlight shows the result of punctuation mark em{/em The steps I did: 1- editing the schema 2- restart the server 3-delete the file 4-index the file -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr - Remove specific punctuation marks
How could I know which query parser I am using? Here is the part of my schema that I am using fieldType name=text_ar class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=(\() replacement= replace=all/ /analyzer /fieldType field name=text type=text_ar indexed=true stored=true termVectors=true multiValued=true/ As shown even if I tried to remove ( the same happened for parsed query and for numFound. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009915.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr - Remove specific punctuation marks
Thanks. Finally it works using charFilter class=solr.PatternReplaceCharFilterFactory pattern=(\() replacement= replace=all/ I wonder what is the reason for that, and what is the difference between the filter and the charFilter? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009918.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr - Remove specific punctuation marks
When I do things like this and want to avoid empty tokens even though previous analysis might result in some--I just throw one of these at the end of my analysis chain: !-- get rid of empty string tokens. max is required, although we don't really care. -- filter class=solr.LengthFilterFactory min=1 max=/ A charfilter to filter raw characters can certainly still result in an empty token, if an initial token was composed solely of chars you wanted to filter out! In which case you probably want the token to be deleted entirely, not still there as an empty token. The above length filter is one way to do that, although unfortunately requires specifying a 'max' even though I didn't actually want to filter out on the high end, oh well. On 9/24/2012 1:07 PM, Jack Krupansky wrote: I tried it and PRFF is indeed generating an empty token. I don't know how Lucene will index or query an empty term. I mean, what it should do. In any case, it is best to avoid them. You should be using a charFilter to simply filter raw characters before tokenizing. So, try: charFilter class=solr.PatternReplaceCharFilterFactory/ It has the same pattern and replacement attributes. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, September 24, 2012 12:43 PM To: solr-user@lucene.apache.org Subject: Re: Solr - Remove specific punctuation marks 1. Which query parser are you using? 2. I see the following comment in the Java 6 doc for regex \p{Punct}: POSIX character classes (US-ASCII only), so if any of the punctuation is some higher Unicode character code, it won't be matched/removed. 3. It seems very odd that the parsed query has empty terms - normally the query parsers will ignore terms that analyze to zero tokens. Maybe your { is not an ASCII left brace code and is (apparently) unprintable in the parsed query. Or, maybe there is some encoding problem in the analyzer. -- Jack Krupansky -Original Message- From: Daisy Sent: Monday, September 24, 2012 9:26 AM To: solr-user@lucene.apache.org Subject: RE: Solr - Remove specific punctuation marks I tried amp; and it solved the 500 error code. But still it could find punctuation marks. Although the parsed query didnt contain the punctuation mark, str name=rawquerystring{/str str name=querystring{/str str name=parsedquerytext:/str str name=parsedquery_toStringtext:/str but still the numfound gives 1 result name=response numFound=1 start=0 and the highlight shows the result of punctuation mark em{/em The steps I did: 1- editing the schema 2- restart the server 3-delete the file 4-index the file -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr - Remove specific punctuation marks
I've had problems with empty tokens. You can remove those with this as a step in the analyzer chain. filter class=solr.LengthFilterFactory min=1 max=1024/ wunder On Sep 24, 2012, at 10:07 AM, Jack Krupansky wrote: I tried it and PRFF is indeed generating an empty token. I don't know how Lucene will index or query an empty term. I mean, what it should do. In any case, it is best to avoid them. You should be using a charFilter to simply filter raw characters before tokenizing. So, try: charFilter class=solr.PatternReplaceCharFilterFactory/ It has the same pattern and replacement attributes. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Monday, September 24, 2012 12:43 PM To: solr-user@lucene.apache.org Subject: Re: Solr - Remove specific punctuation marks 1. Which query parser are you using? 2. I see the following comment in the Java 6 doc for regex \p{Punct}: POSIX character classes (US-ASCII only), so if any of the punctuation is some higher Unicode character code, it won't be matched/removed. 3. It seems very odd that the parsed query has empty terms - normally the query parsers will ignore terms that analyze to zero tokens. Maybe your { is not an ASCII left brace code and is (apparently) unprintable in the parsed query. Or, maybe there is some encoding problem in the analyzer. -- Jack Krupansky -Original Message- From: Daisy Sent: Monday, September 24, 2012 9:26 AM To: solr-user@lucene.apache.org Subject: RE: Solr - Remove specific punctuation marks I tried amp; and it solved the 500 error code. But still it could find punctuation marks. Although the parsed query didnt contain the punctuation mark, str name=rawquerystring{/str str name=querystring{/str str name=parsedquerytext:/str str name=parsedquery_toStringtext:/str but still the numfound gives 1 result name=response numFound=1 start=0 and the highlight shows the result of punctuation mark em{/em The steps I did: 1- editing the schema 2- restart the server 3-delete the file 4-index the file -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009835.html Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org
Re: Solr - Remove specific punctuation marks
Using solr.LengthFilterFactory was great and also solve the problem of using PatternReplaceFilter. So now I have two solutions. Thanks all for helping me. One thing I would like to know what is the diffrence between PatternReplaceFilter and PatternReplaceCharFilter? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Remove-specific-punctuation-marks-tp4009795p4009925.html Sent from the Solr - User mailing list archive at Nabble.com.
omit tf using per-field CustomSimilarity?
I'm trying to configure per-field similarity to disregard term frequency (omitTf) in a 'title' field. I'm trying to follow the example docs without success: my custom similarity doesn't seem to have any effect on 'tf'. Is the NoTfSimilarity function below written correctly? Any advice is much appreciated. my schema.xml: field name=title type=text_custom_sim indexed=true stored=true omitNorms=true termVectors=true / similarity class=solr.SchemaSimilarityFactory/ fieldType name=text_custom_sim class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ similarity class=com.ssww.NoTfSimilarityFactory / . /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ similarity class=com.ssww.NoTfSimilarityFactory / . /analyzer NoTfSimilarityFactory.java: package com.ssww; import org.apache.lucene.search.similarities.Similarity; import org.apache.solr.schema.SimilarityFactory; public class NoTfSimilarityFactory extends SimilarityFactory { @Override public Similarity getSimilarity() { return new NoTfSimilarity(); } } NoTfSimilarity.java: package com.ssww; import org.apache.lucene.search.similarities.DefaultSimilarity; public final class NoTfSimilarity extends DefaultSimilarity { public float tf(int i) { return 1; } } These two files are in a jar in the lib directory of this core. Here's the results of a search for paint with custom and default similarity: Indexed with per-field NoTfSimilarity: 284.5441 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product of: 280.5598 = (MATCH) sum of: 280.5598 = (MATCH) max of: 280.5598 = (MATCH) weight(title:paint^8.0 in 48) [], result of: 280.5598 = score(doc=48,freq=2.0 = termFreq=2.0 ), product of: 39.83825 = queryWeight, product of: 8.0 = boost 4.979781 = idf(docFreq=187, maxDocs=10059) 1.0 = queryNorm 7.042474 = fieldWeight in 48, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 4.979781 = idf(docFreq=187, maxDocs=10059) 1.0 = fieldNorm(doc=48) 18.217428 = (MATCH) weight(search_keywords:paint in 48) [], result of: 18.217428 = score(doc=48,freq=1.0 = termFreq=1.0 ), product of: 4.268188 = queryWeight, product of: 4.268188 = idf(docFreq=382, maxDocs=10059) 1.0 = queryNorm 4.268188 = fieldWeight in 48, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.268188 = idf(docFreq=382, maxDocs=10059) 1.0 = fieldNorm(doc=48) 7.725952 = (MATCH) weight(description:paint^0.5 in 48) [], result of: 7.725952 = score(doc=48,freq=2.0 = termFreq=2.0 ), product of: 1.6527361 = queryWeight, product of: 0.5 = boost 3.3054721 = idf(docFreq=1002, maxDocs=10059) 1.0 = queryNorm 4.6746435 = fieldWeight in 48, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 3.3054721 = idf(docFreq=1002, maxDocs=10059) 1.0 = fieldNorm(doc=48) 106.50396 = (MATCH) weight(nosyn:paint^5.0 in 48) [], result of: 106.50396 = score(doc=48,freq=4.0 = termFreq=4.0 ), product of: 16.317472 = queryWeight, product of: 5.0 = boost 3.2634945 = idf(docFreq=1045, maxDocs=10059) 1.0 = queryNorm 6.526989 = fieldWeight in 48, product of: 2.0 = tf(freq=4.0), with freq of: 4.0 = termFreq=4.0 3.2634945 = idf(docFreq=1045, maxDocs=10059) 1.0 = fieldNorm(doc=48) 1.0142012 = scale(int(page_views)=18,toMin=1.0,toMax=3.0,fromMin=0.0,fromMax=2535.0) Indexed with DefaultSimilarity: 7.630908 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product of: 7.524058 = (MATCH) sum of: 7.524058 = (MATCH) max of: 7.524058 = (MATCH) weight(title:paint^8.0 in 3504) [DefaultSimilarity], result of: 7.524058 = fieldWeight in 3504, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 5.3203125 = idf(docFreq=197, maxDocs=14892) 1.0 = fieldNorm(doc=3504) 0.5091842 = (MATCH) weight(search_keywords:paint in 3504) [DefaultSimilarity], result of: 0.5091842 = score(doc=3504,freq=1.0 = termFreq=1.0 ), product of:
memory leak in pdfbox--SolrCel needs to call COSName.clearResources?
We've been struggling with solr hangs in the solr process that indexes incoming PDF documents. TLDR; summary is that I'm thinking that PDFBox needs to have COSName.clearResources() called on it if the solr indexer expects to be able to keep running indefinitely. Is that likely? Is there anybody on this list who is doing PDF extraction in a long-running process and having it work? The thread dump of a hung process often shows lots of threads hanging on this: java.lang.Thread.State: BLOCKED (on object monitor) at java.util.Collections$SynchronizedMap.get(Collections.java:1975) - waiting to lock 0x00072551f908 (a java.util.Collections$SynchronizedMap) at org.apache.pdfbox.util.PDFOperator.getOperator(PDFOperator.java:68) at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:441) And the heap is almost full: Heap PSYoungGen total 796416K, used 386330K eden space 398208K, 97% used from space 398208K, 0% used to space 398208K, 0% used PSOldGen object space 2389376K, 99% used PSPermGen object space 53824K, 99% used Using eclipse's mat to look at the heap dump of a hung process shows one of the chief memory leak suspects is PDFBox's COSName class The class org.apache.pdfbox.cos.COSName, loaded by java.net.FactoryURLClassLoader @ 0x725a1a230, occupies 151,183,360 (16.64%) bytes. The memory is accumulated in one instance of java.util.concurrent.ConcurrentHashMap$Segment[] loaded by system class loader. and the Shortest Paths To the Accumulation Point graph for that looks like this: Class Name Shallow HeapRetained Heap java.util.concurrent.ConcurrentHashMap$Segment[16] 80151,160,680 segments java.util.concurrent.ConcurrentHashMap 48 151,160,728 nameMap class org.apache.pdfbox.cos.COSName 1,184 151,183,360 [123] java.lang.Object[2560] 10,256 26,004,368 elementData java.util.Vector 32 26,004,400 classes java.net.FactoryURLClassLoader 72 26,228,440 classloader class org.apache.pdfbox.cos.COSDocument 8 8 class org.apache.pdfbox.cos.COSDocument 641,703,704 referent java.lang.ref.Finalizer 40 1,703,744 And the Dominator Tree chart looks like this: 26.69% org.apache.solr.core.SolrCore 16.64% class org.apache.pdfbox.cos.COSName 2.89% java.net.Factory.URLClassLoader Now the implementation of COSName says this: /** * Not usually needed except if resources need to be reclaimed in a long * running process. * Patch provided by fles...@gmail.com * incorporated 5/23/08, danielwil...@users.sourceforge.net */ public static synchronized void clearResources() { // Clear them all nameMap.clear(); } I *don't* see a call to clearResources anywhere in solr or tika, and I think that's the problem. The implementation puts all the COSNames in a class-level static HashMap, which never gets emptied, and apparently keeps growing forever. I suspect the fact that the URLClassLoader is involved in that graph to the COSNames class is what's filling up the PermGen space in the heap. Does that sound likely? Possible? Can anyone speak to that? Anyone have suggested next steps for us, besides restarting our solr indexer process every couple of hours?
Persisting dataimport.properties in ZooKeeper directory
Hi, We are working on a DIH for our project and we are persisting the last_modified_date in the ZooKeeper directory. Our understanding is that the properties are uploaded to ZooKeeper when the first SOLR node comes up. When the SOLR nodes are restarted whatever is persisted in the properties is lost. Is there another way of maintaining state? Please let us know. Thanks, Balaji -- View this message in context: http://lucene.472066.n3.nabble.com/Persisting-dataimport-properties-in-ZooKeeper-directory-tp4009965.html Sent from the Solr - User mailing list archive at Nabble.com.
Solved: Re: omit tf using per-field CustomSimilarity?
My problem was that I specified the per-field similarity class INSIDE the analyzer instead of outside it. fieldType analyzer similarity /fieldType On 09/24/2012 02:56 PM, Carrie Coy wrote: I'm trying to configure per-field similarity to disregard term frequency (omitTf) in a 'title' field. I'm trying to follow the example docs without success: my custom similarity doesn't seem to have any effect on 'tf'. Is the NoTfSimilarity function below written correctly? Any advice is much appreciated. my schema.xml: field name=title type=text_custom_sim indexed=true stored=true omitNorms=true termVectors=true / similarity class=solr.SchemaSimilarityFactory/ fieldType name=text_custom_sim class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ similarity class=com.ssww.NoTfSimilarityFactory / . /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ similarity class=com.ssww.NoTfSimilarityFactory / . /analyzer NoTfSimilarityFactory.java: package com.ssww; import org.apache.lucene.search.similarities.Similarity; import org.apache.solr.schema.SimilarityFactory; public class NoTfSimilarityFactory extends SimilarityFactory { @Override public Similarity getSimilarity() { return new NoTfSimilarity(); } } NoTfSimilarity.java: package com.ssww; import org.apache.lucene.search.similarities.DefaultSimilarity; public final class NoTfSimilarity extends DefaultSimilarity { public float tf(int i) { return 1; } } These two files are in a jar in the lib directory of this core. Here's the results of a search for paint with custom and default similarity: Indexed with per-field NoTfSimilarity: 284.5441 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product of: 280.5598 = (MATCH) sum of: 280.5598 = (MATCH) max of: 280.5598 = (MATCH) weight(title:paint^8.0 in 48) [], result of: 280.5598 = score(doc=48,freq=2.0 = termFreq=2.0 ), product of: 39.83825 = queryWeight, product of: 8.0 = boost 4.979781 = idf(docFreq=187, maxDocs=10059) 1.0 = queryNorm 7.042474 = fieldWeight in 48, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 4.979781 = idf(docFreq=187, maxDocs=10059) 1.0 = fieldNorm(doc=48) 18.217428 = (MATCH) weight(search_keywords:paint in 48) [], result of: 18.217428 = score(doc=48,freq=1.0 = termFreq=1.0 ), product of: 4.268188 = queryWeight, product of: 4.268188 = idf(docFreq=382, maxDocs=10059) 1.0 = queryNorm 4.268188 = fieldWeight in 48, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.268188 = idf(docFreq=382, maxDocs=10059) 1.0 = fieldNorm(doc=48) 7.725952 = (MATCH) weight(description:paint^0.5 in 48) [], result of: 7.725952 = score(doc=48,freq=2.0 = termFreq=2.0 ), product of: 1.6527361 = queryWeight, product of: 0.5 = boost 3.3054721 = idf(docFreq=1002, maxDocs=10059) 1.0 = queryNorm 4.6746435 = fieldWeight in 48, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 3.3054721 = idf(docFreq=1002, maxDocs=10059) 1.0 = fieldNorm(doc=48) 106.50396 = (MATCH) weight(nosyn:paint^5.0 in 48) [], result of: 106.50396 = score(doc=48,freq=4.0 = termFreq=4.0 ), product of: 16.317472 = queryWeight, product of: 5.0 = boost 3.2634945 = idf(docFreq=1045, maxDocs=10059) 1.0 = queryNorm 6.526989 = fieldWeight in 48, product of: 2.0 = tf(freq=4.0), with freq of: 4.0 = termFreq=4.0 3.2634945 = idf(docFreq=1045, maxDocs=10059) 1.0 = fieldNorm(doc=48) 1.0142012 = scale(int(page_views)=18,toMin=1.0,toMax=3.0,fromMin=0.0,fromMax=2535.0) Indexed with DefaultSimilarity: 7.630908 = (MATCH) boost(+(title:paint^8.0 | search_keywords:paint | shingle_text:paint^2.0 | description:paint^0.5 | nosyn:paint^5.0 | bullets:paint^0.5) () () () () () (),scale(int(page_views),1.0,3.0)), product of: 7.524058 = (MATCH) sum of: 7.524058 = (MATCH) max of: 7.524058 = (MATCH) weight(title:paint^8.0 in 3504) [DefaultSimilarity], result of: 7.524058 = fieldWeight in 3504, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 5.3203125 = idf(docFreq=197, maxDocs=14892) 1.0 =
Re: solrcloud and csv import hangs
https://issues.apache.org/jira/browse/SOLR-3883 -Yonik http://lucidworks.com On Mon, Sep 24, 2012 at 11:42 AM, Yonik Seeley yo...@lucidworks.com wrote: On Mon, Sep 24, 2012 at 11:03 AM, dan sutton danbsut...@gmail.com wrote: Hi, This appears to happen in trunk too. It appears that the add command request parameters get sent to the nodes. If I comment these out like so for add and commit: core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java - params = new ModifiableSolrParams(req.getParams()); + //params = new ModifiableSolrParams(req.getParams()); + params = new ModifiableSolrParams(); This things work as expected. Otherwise params like stream.url gets sent to the replicant nodes which causes failure if the file is missing, or worse repeatedly importing the same file if exists on a replicant. This might not be the right thing to do? ... what should be sent here for a streaming CSV import? Dan On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote: Hi, I'm using Solr 4.0-BETA and trying to import a CSV file as follows: curl http://localhost:8080/solr/core/update -d overwrite=false -d commit=true -d stream.contentType='text/csv;charset=utf-8' -d stream.url=file:///dir/file.csv I have 2 tomcat servers running on different machines and a separate zookeeper quorum (3 zoo servers, 2 on same machine). This is a 1 shard core, replicated to the other machine. It seems that for a 255K line file I have 170 docs on the server that issued the command, but on the other, the index seems to grow unbounded? Has anyone been seen this, or been successful in using the CSV import with solrcloud? Yikes! Thanks for investigating, this looks pretty serious. Could you open a JIRA issue for this bug? -Yonik http://lucidworks.com
CQL instead of SQL in Solr data-config
Please see this post here: http://stackoverflow.com/questions/12324837/apache-cassandra-integration-with-apache-solr/12326329#comment16936430_12326329 Does anyone have experience with or know if it's possible with the Solr data-config combined with Cassandra JDBC drivers (http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/) to add CQL to data-config instead of SQL and query Cassandra instead of a RDBMS? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/CQL-instead-of-SQL-in-Solr-data-config-tp4009984.html Sent from the Solr - User mailing list archive at Nabble.com.
/solr/dataimport not found
I've been trying to set up Solr with Tomcat, in order to connect to a MySQL database. I've got the admin page up, but I can't get localhpst:8080/solr/dataimport/ to work. It returns a 404 errror. Been googleing high and low, without finding the answer. I've put this in my solrconfig.xml requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config.xml/str /lst /requestHandler Created a data-config.xml in the same directory as the file above. This should just connect to DB for now. And copied the JDBC-MYSQL connecter into the /solr/lib directory. Any suggestions would be much appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-dataimport-not-found-tp4009975.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: /solr/dataimport not found
Hello, John, Assuming this is a single core instance of Solr, does /solr/admin/dataimport.jsp work? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor | New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Sep 24, 2012 at 5:11 PM, johnohod john-o...@tyde.no wrote: I've been trying to set up Solr with Tomcat, in order to connect to a MySQL database. I've got the admin page up, but I can't get localhpst:8080/solr/dataimport/ to work. It returns a 404 errror. Been googleing high and low, without finding the answer. I've put this in my solrconfig.xml requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config.xml/str /lst /requestHandler Created a data-config.xml in the same directory as the file above. This should just connect to DB for now. And copied the JDBC-MYSQL connecter into the /solr/lib directory. Any suggestions would be much appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-dataimport-not-found-tp4009975.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: /solr/dataimport not found
: database. I've got the admin page up, but I can't get : localhpst:8080/solr/dataimport/ to work. It returns a 404 errror. 1) which version of solr are you using? 2) did you try localhost:8080/solr/dataimport (no trailing slash) ? 3) does anything in the admin UI work? -Hoss
Re: Range operator problems in Chef ( automating framework)
Be a little careful, spaces here can mess you up. Particularly around the hyphen in -1hour. I.e. NOW -1HOUR is invalid but NOW-1HOUR is ok (note the space between W and -). There aren't any in your example, but just to be sure One other note: you may get better performance out of making this a filter query fq, so it can be re-used, but you'll want to do some date rounding, see: http://searchhub.org/dev/2012/02/23/date-math-now-and-filter-queries/ But one easy place to look at what eventually gets to Solr is the Solr logs, the queries are all put in that file as they come in (along with a lot of other stuff), so you have a chance to see whether what you're doing in Chef is getting to Solr as you wish... Best Erick On Mon, Sep 24, 2012 at 9:42 AM, Jack Krupansky j...@basetechnology.com wrote: That looks like a valid Solr date math expression, but you need to make sure that the field type is actually a Solr DateField as opposed to simply an integer Unix time value. -- Jack Krupansky -Original Message- From: Christian Bordis Sent: Monday, September 24, 2012 7:16 AM To: solr-user@lucene.apache.org Subject: Range operator problems in Chef ( automating framework) Hi Everyone! We doing some nice stuff with Chef (http://wiki.opscode.com/display/chef/Home). It uses solr for search but range queries don't work as expected. Maybe chef, solr just buggy or I am doing it wrong ;-) In chef I have bunch of nodes witch timestamp attribute. Now want search nodes with have timestamp not older than on hour: search(:node, role:JSlave AND ohai_time:[NOW-1HOUR TO *]) Is this string in the call a solr compliant range expression at all? Unluckily, I have no toys at hand to verify this myself at the moment... but I work on this. Thanks for reading! ^^ Kind Regards, Christian Bordis
Re: Solr Cell Questions
If you're concerned about throughput, consider moving all the SolrCell (Tika) processing off the server. SolrCell is way cool for showing what can be done, but its downside is you're moving all the processing of the structured documents to the same machine doing the indexing. Pretty soon, especially with significant size files, you're spending all your CPU cycles parsing the files... Happens there's a blog about this: http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ By moving the indexing to N clients, you can increase throughput until you make Solr work hard to do the indexing Best Erick On Mon, Sep 24, 2012 at 10:04 AM, johannes.schwendin...@blum.com wrote: Hi, Im currently experimenting with Solr Cell to index files to Solr. During this some questions came up. 1. Is it possible (and wise) to connect to Solr Cell with multiple Threads at the same time to index several documents at the same time? This question came up because my prrogramm takes about 6hours to index round 35000 docs. (no production environment, only example solr and a little desktop machine but I think its very slow, and I know solr isn't the bottleneck (yet)) 2. If 1 is possible, how many Threads should do this and how many memory Solr needs? I've tried it but i run into an out of memory exception. Thanks in advantage Best Regards Johannes
Re: Solr - Remove specific punctuation marks
On 9/24/2012 11:37 AM, Daisy wrote: One thing I would like to know what is the diffrence between PatternReplaceFilter and PatternReplaceCharFilter? The CharFilter version gets applied before anything else, including the Tokenizer. The Filter version gets applied in the order specified in the schema file. I would imagine that if you are allowed to specify multiple CharFilter entries (which I have never tested), they would be applied in the order they occur, all of them before the Tokenizer. Thanks, Shawn
Admin-UI: multiple facet
Hello, Is there a way to provide multiple facet field names in the Admin UI? I have tried spaces, comas and simi-colons for no effect. Would have been nice to be able to push the UI just a tiny bit further before switching to the URL query string directly. Or is single facet field a limitation of - otherwise excellent - new UI? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: memory leak in pdfbox--SolrCel needs to call COSName.clearResources?
: We've been struggling with solr hangs in the solr process that indexes : incoming PDF documents. TLDR; summary is that I'm thinking that : PDFBox needs to have COSName.clearResources() called on it if the solr : indexer expects to be able to keep running indefinitely. Is that I don't know much about tika/pdfbox, but based on the details in your email i think you assessment is correct. Solr (and SolrCell) doen't directly know about PDFBox at all -- that's all handled under the covers by Tika. So i supsect you'd need to file a Jira with the Tika project to request that Tika somewhere/somehow call this COSName.clearResources() method when using PDFBox -- athough based on your description, i'm not sure when/where this owuld make sense. Two workarrounds i can imagine: 1) if you do a SolrCore RELOAD all of the plugin classes will be reloaded in new ClassLoader (assuming you haven't embedded them directly in the solr.war, or asked your servlet container to load them for you) ... this would be marginally better then doing a full server restart. 2) if you are comfortable with java code, you could write a small RequestHandler that did nothing more then call COSName.clearResources() on each request -- you could then ping it on a regular basis, or register it as part of a newSearcher QuerySendEventListern to ensure it got called automaticly (or impelment SolrEventListenr directly and you could trigger it on ever commit). 3) heck: with the new ScriptUpdateProcessor in Solr 4.0, you could write some javascript in your solrconfig.xml that would call this method as part of the chains processCommit() method. -Hoss
Re: SolrJ - IOException
I have seen this happening We retry and that works. Is your solr server stalled? On Mon, Sep 24, 2012 at 4:50 PM, balaji.gandhi balaji.gan...@apollogrp.eduwrote: Hi, I am encountering this error randomly (under load) when posting to Solr using SolrJ. Has anyone encountered a similar error? org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8080/solr/profile at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:122) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:107) at Thanks, Balaji -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026.html Sent from the Solr - User mailing list archive at Nabble.com.
How to more gracefully handle field format exceptions?
Greetings, Is there a way to configure more graceful handling of field formatting exceptions when indexing documents? Currently, there is a field being generated in some documents that I am indexing that is supposed to be a float but some times slips through as an empty string. (I know, fix the docs, but sometimes bad values slip through, and it would be nice to handle them in a more forgiving manner). Here's an example of the exception - when this happens, the entire doc is thrown out due to the one malformed field: ---snip--- ERROR org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: ERROR: [doc=docidstr] Error adding field 'f_floatfield'='' ... Caused by: java.lang.NumberFormatException: empty String 00:56:46,288 [SI] WARN com.company.IndexerThread - BAD DOC: a82a2f6a6a42ad3c98a05ddb3f2c382c 01:02:12,713 [SI] ERROR org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: ERROR: [doc=6ff90020f9ec0f6dd623e9879c3e024d] Error adding field 'f_afloatfield'='' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:142) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at com.company.IndexerThread.run(IndexerThread.java:55) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.NumberFormatException: empty String at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1011) at java.lang.Float.parseFloat(Float.java:452) at org.apache.solr.schema.TrieField.createField(TrieField.java:410) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:103) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286) ... 12 more 01:02:12,713 [SI] WARN com.company.IndexerThread - BAD DOC: 6ff90020f9ec0f6dd623e9879c3e024d ---snip--- In my thinking (and for this situation), it would be much better to just ignore the malformed field and keep the doc - is there any way to configure this or enable this behavior instead? Thanks, Aaron
Re: How to more gracefully handle field format exceptions?
Hi Aaron, You could catch the error on the client, fix/clean/remove, and retry, no? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Mon, Sep 24, 2012 at 9:21 PM, Aaron Daubman daub...@gmail.com wrote: Greetings, Is there a way to configure more graceful handling of field formatting exceptions when indexing documents? Currently, there is a field being generated in some documents that I am indexing that is supposed to be a float but some times slips through as an empty string. (I know, fix the docs, but sometimes bad values slip through, and it would be nice to handle them in a more forgiving manner). Here's an example of the exception - when this happens, the entire doc is thrown out due to the one malformed field: ---snip--- ERROR org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: ERROR: [doc=docidstr] Error adding field 'f_floatfield'='' ... Caused by: java.lang.NumberFormatException: empty String 00:56:46,288 [SI] WARN com.company.IndexerThread - BAD DOC: a82a2f6a6a42ad3c98a05ddb3f2c382c 01:02:12,713 [SI] ERROR org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: ERROR: [doc=6ff90020f9ec0f6dd623e9879c3e024d] Error adding field 'f_afloatfield'='' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:142) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at com.company.IndexerThread.run(IndexerThread.java:55) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.NumberFormatException: empty String at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1011) at java.lang.Float.parseFloat(Float.java:452) at org.apache.solr.schema.TrieField.createField(TrieField.java:410) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:103) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286) ... 12 more 01:02:12,713 [SI] WARN com.company.IndexerThread - BAD DOC: 6ff90020f9ec0f6dd623e9879c3e024d ---snip--- In my thinking (and for this situation), it would be much better to just ignore the malformed field and keep the doc - is there any way to configure this or enable this behavior instead? Thanks, Aaron
Re: How to more gracefully handle field format exceptions?
Hi Otis, I was just looking at how to implement that, but was hoping for a cleaner method - it seems like I will have to actually parse the error as text to find the field that caused it, then remove/mangle that field and attempt re-adding the document - which seems less than ideal. I would think there would be a flag or an easy way to override the add method that would just drop (or set to default value) any field that didn't meet expectations. Thanks for the suggestion, Aaron On Mon, Sep 24, 2012 at 9:24 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Aaron, You could catch the error on the client, fix/clean/remove, and retry, no? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Mon, Sep 24, 2012 at 9:21 PM, Aaron Daubman daub...@gmail.com wrote: Greetings, Is there a way to configure more graceful handling of field formatting exceptions when indexing documents? Currently, there is a field being generated in some documents that I am indexing that is supposed to be a float but some times slips through as an empty string. (I know, fix the docs, but sometimes bad values slip through, and it would be nice to handle them in a more forgiving manner). Here's an example of the exception - when this happens, the entire doc is thrown out due to the one malformed field: ---snip--- ERROR org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: ERROR: [doc=docidstr] Error adding field 'f_floatfield'='' ... Caused by: java.lang.NumberFormatException: empty String 00:56:46,288 [SI] WARN com.company.IndexerThread - BAD DOC: a82a2f6a6a42ad3c98a05ddb3f2c382c 01:02:12,713 [SI] ERROR org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: ERROR: [doc=6ff90020f9ec0f6dd623e9879c3e024d] Error adding field 'f_afloatfield'='' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:142) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at com.company.IndexerThread.run(IndexerThread.java:55) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.NumberFormatException: empty String at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1011) at java.lang.Float.parseFloat(Float.java:452) at org.apache.solr.schema.TrieField.createField(TrieField.java:410) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:103) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286) ... 12 more 01:02:12,713 [SI] WARN com.company.IndexerThread - BAD DOC: 6ff90020f9ec0f6dd623e9879c3e024d ---snip--- In my thinking (and for this situation), it would be much better to just ignore the malformed field and keep the doc - is there any way to configure this or enable this behavior instead? Thanks, Aaron
How can I create about 100000 independent indexes in Solr?
Dear all, The company I'm working in have a website to server more than 10 customers, and every customer should have it's own search cataegory. So I should create independent index for every customer. The site http://wiki.apache.org/solr/MultipleIndexes give some solution to create multiple indexes. I want to use multicore solution. But i'm afraid that Solr can't support so many indexes in this solution. The other solution Flattening data into a single index is a choice, but i think it's best to keep all indexes indepent. Could you tell me how to create about 10 independent indexes in Solr? Thank you all for reply!
Re: Solr Swap Function doesn't work when using Solr Cloud Beta
Hi Mark, If can support in future, I think it's great. It's a really useful feature. For example, user can use to refresh with totally new core. User can build index on one core. After build done, can swap old core and new core. Then get totally new core for search. Also can used in the backup. If one crashed, can easily swap with backup core and quickly serve the search request. Best Regards, Sam On Sun, Sep 23, 2012 at 2:51 PM, Mark Miller markrmil...@gmail.com wrote: FYI swap is def not supported in SolrCloud right now - even though it may work, it's not been thought about and there are no tests. If you would like to see support, I'd add a JIRA issue along with any pertinent info from this thread about what the behavior needs to be changed to. - Mark On Sep 21, 2012, at 6:49 PM, sam fang sam.f...@gmail.com wrote: Hi Chris, Thanks for your help. Today I tried again and try to figure out the reason. 1. set up an external zookeeper server. 2. change /opt/solr/apache-solr-4.0.0-BETA/example/solr/solr.xml persistent to true. and run below command to upload config to zk. (renamed multicore to solr, and need to put zkcli.sh related jar package.) /opt/solr/apache-solr-4.0.0-BETA/example/cloud-scripts/zkcli.sh -cmd upconfig -confdir /opt/solr/apache-solr-4.0.0-BETA/example/solr/core0/conf/ -confname core0 -z localhost:2181 /opt/solr/apache-solr-4.0.0-BETA/example/cloud-scripts/zkcli.sh -cmd upconfig -confdir /opt/solr/apache-solr-4.0.0-BETA/example/solr/core1/conf/ -confname core1 -z localhost:2181 3. Start jetty server cd /opt/solr/apache-solr-4.0.0-BETA/example java -DzkHost=localhost:2181 -jar start.jar 4. publish message to core0 /opt/solr/apache-solr-4.0.0-BETA/example/solr/exampledocs cp ../../exampledocs/post.jar ./ java -Durl=http://localhost:8983/solr/core0/update -jar post.jar ipod_video.xml 5. query to core0 and core1 is ok. 6. Click swap in the admin page, the query to core0 and core1 is changing. Previous I saw sometimes returns 0 result. sometimes return 1 result. Today seems core0 still return 1 result, core1 return 0 result. 7. Then click reload in the admin page, the query to core0 and core1. Sometimes return 1 result, and sometimes return nothing. Also can see the zk configuration also changed. 8. Restart jetty server. If do the query, it's same as what I saw in step 7. 9. Stop jetty server, then log into zkCli.sh, then run command set /clusterstate.json {}. then start jetty again. everything back to normal, that is what previous swap did in solr 3.6 or solr 4.0 w/o cloud. From my observation, after swap, seems it put shard information into actualShards, when user request to search, it will use all shard information to do the search. But user can't see zk update until click reload button in admin page. When restart web server, this shard information eventually went to zk, and the search go to all shards. I found there is a option distrib, and used url like http://host1:18000/solr/core0/select?distrib=falseq=*%3A*wt=xml;, then only get the data on the core0. Digged in the code (handleRequestBody method in SearchHandler class, seems it make sense) I tried to stop tomcat server, then use command set /clusterstate.json {} to clean all cluster state, then use command cloud-scripts/zkcli.sh -cmd upconfig to upload config to zk server, and start tomcat server. It rebuild the right shard information in zk. then search function back to normal like what we saw in 3.6 or 4.0 w/o cloud. Seems solr always add shard information into zk. I tested cloud swap on single machine, if each core have one shard in the zk, after swap, eventually zk has 2 slices(shards) for that core because now only do the add. so the search will go to both 2 shards. and tested cloud swap with 2 machine which each core have 1 shard and 2 slices. Below the configuration in the zk. After swap, eventually zk has 4 for that core. and search will mess up. core0:{shard1:{ host1:18000_solr_core0:{ shard:shard1, roles:null, leader:true, state:active, core:core0, collection:core0, node_name:host1:18000_solr, base_url:http://host1:18000/solr}, host2:18000_solr_core0:{ shard:shard1, roles:null, state:active, core:core0, collection:core0, node_name:host2:18000_solr, base_url:http://host2:18000/solr}}}, For previous 2 cases, if I stoped tomcat/jetty server, then manullay upload configuration to zk, then start tomcat server, zk and search become normal. On Fri, Sep 21, 2012 at 3:34 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Below is my solr.xml configuration, and already set persistent to true. ... : Then publish 1 record to test1, and query. it's ok now. Ok, first off --
UIMA for lemmatization
hi I m new to UIMA. Solr doea not have lemmatization component, i was thinking of using UIMA for this. Is this a correct choice and if so how i would go about it any idea? I see couple of links for solr uima integration but dont know how that can be used for lemmatization Any thoughts? -- View this message in context: http://lucene.472066.n3.nabble.com/UIMA-for-lemmatization-tp4010056.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1
Any comments on this? On Mon, Sep 24, 2012 at 10:28 PM, Sujatha Arun suja.a...@gmail.com wrote: Thanks Jack. so Qtime = Sum of all prepare components + sum of all process components - Debug comp process/prepare time In 3.6.1 the process part of Query component for the following query seems to take 8 times more time? anything missing? For most queries the process part of the Querycomponent seem to take more time in 3.6.1 This is *3.6.1 * response lst name=responseHeaderint name=status0/int *int name=QTime33/int* lst name=params str name=debugQueryon/str str name=indenton/strstr name=start0/str *str name=qdifferential AND equations AND has AND one AND solution/str* str name=rows10/strstr name=version2.2/str/lst/lst *Debug Output* * * str name=QParserLuceneQParser/str lst name=timing double name=time33.0/double lst name=prepare double name=time3.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time3.0/double/lst*lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time0.0/double /lst/lstlst name=processdouble name=time30.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time26.0/double/lst* lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time4.0/double/lst *Same query in solr 1.3* * * lst name=responseHeader int name=status0/int *int name=QTime6/int* lst name=params str name=debugQueryon/strstr name=indenton/str str name=start0/strstr name=qdifferential AND equations AND has AND one AND solution/str str name=rows10/strstr name=version2.2/str Debug Info lst name=timing double name=time6.0/double lst name=prepare double name=time1.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time1.0/double/lst* lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time0.0/double/lst/lst lst name=process double name=time5.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time3.0/double/lst* lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time2.0/double *On Mon, Sep 24, 2012 at 7:35 PM, Jack Krupansky j...@basetechnology.comwrote: * Run a query on both old and new with debugQuery=true on your query request and look at the component timings for possible insight. -- Jack Krupansky From: Sujatha Arun Sent: Monday, September 24, 2012 7:26 AM To: solr-user@lucene.apache.org Subject: Performance Degradation on Migrating from 1.3 to solr 3.6.1 Hi, On migrating from 1.3 to 3.6.1 , I see the query performance degrading by nearly 2 times for all types of query. Indexing performance slight degradation over 1.3 For Indexing we use our custom scripts that post xml over HTTP. Any thing that I might have missed . I am thinking that this might be due to new Tiered MP over LogByteSize creating more segment files and hence more the Query latency .We are using Compound Files in 1.3 and I have set this to true even in 3.6.1 ,but results in more segement files On optimizing the query response time improved beyond 1.3 .So could it be the MP or am i missing something here . Do let me know Please find attached the solrconfig.xml Regards Sujatha
Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1
Hi , Please comment on whether I should consider to move to the old Logbytesize MP on moving to 3.6.1 from 1.3 ,as I see improvements in query performance on optimization. Just to mention we have a lot of indexes in multi cores as well as multiple webapps and that's the reason we went for CFS in 1.3 to avoid the too many open file issue which we had encountered. Regards Sujatha On Tue, Sep 25, 2012 at 9:55 AM, Sujatha Arun suja.a...@gmail.com wrote: Any comments on this? On Mon, Sep 24, 2012 at 10:28 PM, Sujatha Arun suja.a...@gmail.comwrote: Thanks Jack. so Qtime = Sum of all prepare components + sum of all process components - Debug comp process/prepare time In 3.6.1 the process part of Query component for the following query seems to take 8 times more time? anything missing? For most queries the process part of the Querycomponent seem to take more time in 3.6.1 This is *3.6.1 * response lst name=responseHeaderint name=status0/int *int name=QTime33/int* lst name=params str name=debugQueryon/str str name=indenton/strstr name=start0/str *str name=qdifferential AND equations AND has AND one AND solution/str* str name=rows10/strstr name=version2.2/str/lst/lst *Debug Output* * * str name=QParserLuceneQParser/str lst name=timing double name=time33.0/double lst name=prepare double name=time3.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time3.0/double/lst*lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time0.0/double /lst/lstlst name=processdouble name=time30.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time26.0/double/lst* lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time4.0/double/lst *Same query in solr 1.3* * * lst name=responseHeader int name=status0/int *int name=QTime6/int* lst name=params str name=debugQueryon/strstr name=indenton/str str name=start0/strstr name=qdifferential AND equations AND has AND one AND solution/str str name=rows10/strstr name=version2.2/str Debug Info lst name=timing double name=time6.0/double lst name=prepare double name=time1.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time1.0/double/lst* lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time0.0/double/lst/lst lst name=process double name=time5.0/double *lst name=org.apache.solr.handler.component.QueryComponentdouble name=time3.0/double/lst* lst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time2.0/double *On Mon, Sep 24, 2012 at 7:35 PM, Jack Krupansky j...@basetechnology.com wrote:* Run a query on both old and new with debugQuery=true on your query request and look at the component timings for possible insight. -- Jack Krupansky From: Sujatha Arun Sent: Monday, September 24, 2012 7:26 AM To: solr-user@lucene.apache.org Subject: Performance Degradation on Migrating from 1.3 to solr 3.6.1 Hi, On migrating from 1.3 to 3.6.1 , I see the query performance degrading by nearly 2 times for all types of query. Indexing performance slight degradation over 1.3 For Indexing we use our custom scripts that post xml over HTTP. Any thing that I might have missed . I am thinking that this might be due to new Tiered MP over LogByteSize creating more segment files and hence more the Query latency .We are using Compound Files in 1.3 and I have set this to true even in 3.6.1 ,but results in more segement files On optimizing the query response time improved beyond 1.3 .So could it be the MP or am i missing something here . Do let me know Please find attached the solrconfig.xml