Re: Solr Update URI is not found
On 28 Oct 2013, at 01:19 , Bayu Widyasanyata bwidyasany...@gmail.com wrote: request: http://localhost:8080/solr/update?wt=javabinversion=2 I think this url is incorrect: there should be a core name between solr and update.
Re: Solr Update URI is not found
On Mon, Oct 28, 2013 at 1:26 PM, Raymond Wiker rwi...@gmail.com wrote: request: http://localhost:8080/solr/update?wt=javabinversion=2 I think this url is incorrect: there should be a core name between solr and update. I changed th SolrURL on crawl script's option to: ./bin/crawl urls/seed.txt TestCrawl http://localhost:8080/solr/mycollection/2 And the result now is Bad Request. I will look for another misconfiguration things... = org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8080/solr/mycollection/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155) at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) 2013-10-28 13:30:02,804 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195) -- wassalam, [bayu]
One of all shard stopping, all shards stop
Hi. I hava 3 shard solr cloud version 4.4.0 not replication. http://lucene.472066.n3.nabble.com/file/n4098015/ex1.png for example, if one shard(leader) died for OOM, all shard is stop. is it just the way that it is? I want to find a option this problem. I want to change if 1 shard died, remain shards request work nomally thanks you. -- View this message in context: http://lucene.472066.n3.nabble.com/One-of-all-shard-stopping-all-shards-stop-tp4098015.html Sent from the Solr - User mailing list archive at Nabble.com.
Optimal interval for soft commit
Hello, We have solr index with about 1m docs. Every day we add 5,000 to 8,000 docs. We have defined 15 sec interval for soft commit. But for the impatient user 15 secs looks like eternity. The wiki http://wiki.apache.org/solr/NearRealtimeSearch advises on 1s soft commit interval but warns Be sure to pay special attention to cache and autowarm settings as they can have a significant impact on NRT performance I was looking at CommitWithin (http://wiki.apache.org/solr/CommitWithin, http://stackoverflow.com/questions/17475456/solr-issues-with-soft-auto-commit-near-real-time) as an alternative but have no idea how this works and the implications What would be best settings to achieve NRT search? Thanks. Mugoma.
Compound words
Hi, I'm an infant in Solr/Lucene family, just a couple of months old. We are trying to find a way to combine words into a single compound word at index and query time. E.g. if the document has sea bird in it, it should be indexed as seabird and any query having sea bird in it should also look for seabird not only in qf but also in pf, pf2, pf3 fields. Well, we are using edismax query parser. Our problem is not at index time, we have achieved it by writing our own token filter, but at query time. Our token filter takes a dictionary in the form of prefix,suffix in the file and keeps emitting regular and compound tokens as it encounters them. We configured our own filter at query time but figured that at query time individual clauses like field:sea , field:bird etc are created first and then sent to the analyzer. First of all, can someone please confirm if this part of my understanding is correct? So, we are forced to emit sea and bird as individual tokens because we are not getting them in sequence at all. Is it possible to achieve this by other means than pre-processing query before sending it to solr? Can a CharFilter be used instead, are they applied before creating query clauses? I can keep providing more details as necessary. This mail has already crossed TL;DR limits for many :) Parvesh Garg http://www.zettata.com +91 963 222 5540
Re: Compound words
One more thing, Is there a way to remove my accidentally sent phone number in the signature from the previous mail? aarrrggghhh
Re: SolrCloud: optimizing a core triggers optimizations of all cores in that collection?
Thanks @Mark @Erick Should I create a JIRA issue for this ? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-tp4097499p4098020.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr search in case the first keyword are not index
I have solve it. Thanks. - Phat T. Dong -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-search-in-case-the-first-keyword-are-not-index-tp4097699p4098021.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Optimal interval for soft commit
How do you add the documents to the index - one by one, batches of n ? When do you do your commits ? Because 8k docs per day is not a lot. Depending on the above, commiting with softCommit=true might also be a solution. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Optimal-interval-for-soft-commit-tp4098016p4098022.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: One of all shard stopping, all shards stop
When one of your shards dies, your index becomes incomplete. By default the querying is distributed (on all shards - distrib=true) and if one of them (shard X) is down, then you get an error stating that there are no servers hosting shard X. If the other shards are still up you can query them directly using distrib=false but in the resultset you will only have documents from that shard. So you would have to query every active shard individually and then merge the results yourself. If I'm wrong please correct me. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/One-of-all-shard-stopping-all-shards-stop-tp4098015p4098024.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr For
You're describing two different entities: Job and Employee. Since they are clearly different in any way you will need two different cores with two different schemas. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-For-tp4097928p4098025.html Sent from the Solr - User mailing list archive at Nabble.com.
Data import handler with multi tables
Hi, I wanna to import many tables from MySQL. Assume that, I have two tables: *** Tables 1: tbl_tableA(id, nameA) with data (1, A1), (2, A2), (3, A3). *** Tables 2: tbl_tableB(id, nameB) with data (1, B1), (2, B2), (3, B3), (4, B4), (5, B5). I configure: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://xx user=xxx password=xxx batchSize=1 / document name = atexpats6 entity name=tableA query=select * from tbl_tableA field name=id column=id/ field name=nameA column=nameA / /entity entity name=tableB query=select * from tbl_tableB field name=id column=id/ field name=nameA column=nameA / /entity /document /dataConfig I define nameA, nameB in schema.xml and id is configured by uniqueKeyid/uniqueKey When I import data by http://localhost:8983/solr/dataimport?command=full-import It's successfull. But only data of tbl_tableB had indexed. I think because id is unique. When importing tbl_tableA import first, tbl_tableB import after. tbl_tableB has id which the same id in tableA, so only data of tableB had indexed with unique id. Anyone can help me to configure data import handler that can index all data of two (more) tables which have the same id in each table. Thanks. - Phat T. Dong -- View this message in context: http://lucene.472066.n3.nabble.com/Data-import-handler-with-multi-tables-tp4098026.html Sent from the Solr - User mailing list archive at Nabble.com.
error in suggester component in solr
I am working with solr auto complete functionality,I am using solr 4.50 to build my application, and I am following this link as a reference. http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-td3998559i20.html My suggest component is something like this searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=storeDirsuggest/str str name=fieldautocomplete_text/str bool name=exactMatchFirsttrue/bool float name=threshold0.005/float str name=buildOnCommittrue/str str name=buildOnOptimizetrue/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldlowerfilt/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDirspellchecker/str /lst str name=queryAnalyzerFieldTypeedgytext/str /searchComponent but, I am getting the following error *org.apache.solr.spelling.suggest.Suggester – Loading stored lookup data failed java.io.FileNotFoundException: /home/anurag/Downloads/solr-4.4.0/example/solr/collection1/data/suggest/tst.dat (No such file or directory)* It says that some file are missing but the solr wiki suggester component says it supports these lookupImpls -- *str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str * Dont know what I am doing wrong. Any help will be deeply appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/error-in-suggester-component-in-solr-tp4098028.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie to Solr
Hi Alex, I have been able to run a few simple queries with my own schema.xml and data file. My concern now is that i'm able to run queries like http://localhost:8983/solr/select/?q=*:* http://localhost:8983/solr/select/?q=*:*facet=truefacet.field=Name from the url However, when I try to run them like this *:*facet=truefacet.field=Name from the query string text box it gives me error like undefined field *. Any idea what is going wrong? TIA On Sun, Oct 27, 2013 at 1:28 PM, Mamta Alshi mamta.al...@gmail.com wrote: Hi Alex, That is what I am suspecting too. Trying to remove the other files from the exampledocs directory is not helping. After removing all files except the details.xml also the results still show me data from the other files but not my file. I am making changes to the same path which is displayed in Web Admin's dashboard. My last option will be to delete solr ,install it again and try. Thanks for your prompt response. On Sun, Oct 27, 2013 at 1:04 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Maybe your Solr instance is somehow using a different collection directory? In Web Admin's dashboard section, it shows the path to where it thinks the instance is. Does it match to what you expected? If it does, try deleting the core directory, restarting Solr and doing indexing again. Maybe you have some old stuff there accidentally. Regards, Alex Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Oct 27, 2013 at 3:45 PM, Mamta Alshi mamta.al...@gmail.com wrote: Hi, On trying to create a new schema.xml it shows the schema from the solr console. I have created a new data file called details.xml and placed it in the folder exampledocs. I have indexed just this one file from the command prompt. However,on my solr console in my query string when I query *:* it does not show me the contents from details.xml. It shows me contents of some other data file. Am I missing out on something? TIA . On Tue, Oct 1, 2013 at 3:16 PM, Kishan Parmar kishan@gmail.com wrote: yes you have to create your own schema but in schema file you have to add your xml files field name in it like wise you can add your field name in it or you can add your filed in the default schema file whiithout schema you can not add your xml file to solr my schema is like this -- ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.5 fields field name=No type=string indexed=true stored=true required=true multiValued=false / field name=Name type=string indexed=true stored=true required=true multiValued=false / field name=Address type=string indexed=true stored=true required=true multiValued=false / field name=Mobile type=string indexed=true stored=true required=true multiValued=false / /fields uniqueKeyNo/uniqueKey types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0 / /types /schema - and my file is like this ,,.,.,.,. - add doc field name=No100120107088/field field name=Namekishan/field field name=Addressghatlodia/field field name=Mobile9510077394/field /doc /add Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote: Hi, I want to know that if i have to fire some query through the Solr admin, do i need to create a new schema.xml? Where do i place it incase iahve to create a new one. Incase i can edit the original schema.xml can there be two fields named id in my schema.xml? I desperately need help in running queries on the Solr admin which is configured on a Tomcat server. What all preparation will i need to do? Schema.xml any docs? Any help will be highly appreciated. Thanks, Mamta -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html Sent from the Solr - User mailing list archive at
Re: Newbie to Solr
Put *:* in the q field Then check the facet check box (look lower close to the Execute button) and in the facet.field insert Name. This should do the trick. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876p4098031.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AW: AW: auto completion search with solr using NGrams in SOLR
Hi ... I am trying to build autocomplete functionality using your post. But I am getting the following error *2577 [coreLoadExecutor-3-thread-1] WARN org.apache.solr.spelling.suggest.Suggester – Loading stored lookup data failed java.io.FileNotFoundException: /home/anurag/Downloads/solr-4.4.0/example/solr/collection1/data/suggest/tst.dat (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:137) at org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:116) at org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:623) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:601) at org.apache.solr.core.SolrCore.init(SolrCore.java:830) at org.apache.solr.core.SolrCore.init(SolrCore.java:629) * I am using solr 4.4. Is the suggester component still works in this version -- View this message in context: http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4098032.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Background merge errors with Solr 4.4.0 on Optimize call
For Tomcat, the Solr is often put into catalina.out as a default, so the output might be there. You can configure Solr to send the logs most anywhere you please, but without some specific setup on your part the log output just goes to the default for the servlet. I took a quick glance at the code but since the merges are happening in the background, there's not much context for where that error is thrown. How much memory is there for the JVM? I'm grasping at straws a bit... Erick On Sun, Oct 27, 2013 at 9:54 PM, Matthew Shapiro m...@mshapiro.net wrote: I am working at implementing solr to work as the search backend for our web system. So far things have been going well, but today I made some schema changes and now things have broken. I updated the schema.xml file and reloaded the core (via the admin interface). No errors were reported in the logs. I then pushed 100 records to be indexed. A call to Commit afterwards seemed fine, however my next call for Optimize caused the following errors: java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 [maxNumSegments=1] null:java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 [maxNumSegments=1] Unfortunately, googling for background merge hit exception came up with 2 thing: a corrupt index or not enough free space. The host machine that's hosting solr has 227 out of 229GB free (according to df -h), so that's not it. I then ran CheckIndex on the index, and got the following results: http://apaste.info/gmGU As someone who is new to solr and lucene, as far as I can tell this means my index is fine. So I am coming up at a loss. I'm somewhat sure that I could probably delete my data directory and rebuild it but I am more interested in finding out why is it having issues, what is the best way to fix it, and what is the best way to prevent it from happening when this goes into production. Does anyone have any advice that may help? As an aside, i do not have a stacktrace for you because the solr admin page isn't giving me one. I tried looking in my logs file in my solr directory, but it does not contain any logs. I opened up my ~/tomcat/lib/log4j.properties file and saw http://apaste.info/0rTL, which didnt really help me find log files. Doing a 'find . | grep solr.log' didn't really help either. Any help for finding log files (which may help find the actual cause of this) would also be appreciated.
Re: Newbie to Solr
Hi Michael, Thanks for the prompt response. Have a look at my attached admin user interfaces. I do not quite see the options you mention. On Mon, Oct 28, 2013 at 2:18 PM, michael.boom my_sky...@yahoo.com wrote: Put *:* in the q field Then check the facet check box (look lower close to the Execute button) and in the facet.field insert Name. This should do the trick. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876p4098031.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie to Solr
how do I get the solr admin web user interface? On Mon, Oct 28, 2013 at 2:32 PM, Mamta Alshi mamta.al...@gmail.com wrote: Hi Michael, Thanks for the prompt response. Have a look at my attached admin user interfaces. I do not quite see the options you mention. On Mon, Oct 28, 2013 at 2:18 PM, michael.boom my_sky...@yahoo.com wrote: Put *:* in the q field Then check the facet check box (look lower close to the Execute button) and in the facet.field insert Name. This should do the trick. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876p4098031.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When is/should qf different from pf?
The facetious answer is when phrases aren't important in the fields. If you're doing a simple boolean match, adding phrase fields will add expense, to no good purpose etc. Phrases on numeric fields seems wrong. FWIW, Erick On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian anith...@gmail.com wrote: Hi all, I have been using Solr for years but never really stopped to wonder: When using the dismax/edismax handler, when do you have the qf different from the pf? I have always set them to be the same (maybe different weights) but I was wondering if there is a situation where you would have a field in the qf not in the pf or vice versa. My understanding from the docs is that qf is a term-wise hard filter while pf is a phrase-wise boost of documents who made it past the qf filter. Thanks! Amit
Re: Solr Update URI is not found
This seems like a better question for the Nutch list. I see hadoop in there, so unless you've specifically configured solr to use the HDFS directory writer factory, this has to be coming from someplace else. And there are map/reduce tasks in here. BTW, it would be more helpful if you posted the URL that you successfully queried Solr with... What is the /2 on the end for? Do you use that when you query? Best, Erick On Mon, Oct 28, 2013 at 2:37 AM, Bayu Widyasanyata bwidyasany...@gmail.comwrote: On Mon, Oct 28, 2013 at 1:26 PM, Raymond Wiker rwi...@gmail.com wrote: request: http://localhost:8080/solr/update?wt=javabinversion=2 I think this url is incorrect: there should be a core name between solr and update. I changed th SolrURL on crawl script's option to: ./bin/crawl urls/seed.txt TestCrawl http://localhost:8080/solr/mycollection/2 And the result now is Bad Request. I will look for another misconfiguration things... = org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8080/solr/mycollection/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155) at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) 2013-10-28 13:30:02,804 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195) -- wassalam, [bayu]
Re: Newbie to Solr
I don't see the mentioned attachement. Try using http://snag.gy/ to provide it. As for where do you find it, the default is http://localhost:8983/solr/collection1/query - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876p4098041.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Optimal interval for soft commit
Hello, How do you add the documents to the index - one by one, batches of n ? Documents are added one by one using solrj When do you do your commits ? We have the following settings in solrconfig.xml: autoCommit maxTime180/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime15000/maxTime /autoSoftCommit Thanks. Mugoma. On Mon, October 28, 2013 12:22 pm, michael.boom wrote: How do you add the documents to the index - one by one, batches of n ? When do you do your commits ? Because 8k docs per day is not a lot. Depending on the above, commiting with softCommit=true might also be a solution. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Optimal-interval-for-soft-commit-tp4098016p4098022.html Sent from the Solr - User mailing list archive at Nabble.com.
Apache-Solr with Tomcat: displaying the format of search result
Hi All, Recently I have integrated Apache solr with Tomcat server.everything is working fine. I am displaying the search result using velocity template. But Here is my problem. search results are displaying the correct format as input data format. For Example: input data (whole data contains in single field):: *issue*: description about issue. *Solution*: Solution given user goes here. but after index the data , the data displaying in the below format in the search result :: *issue*: description about issue.*Solution*: Solution given user goes here. But this is not I want.. I want to display data as same as input format. can anyone please help on this Thanks in Advance ... -- View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-with-Tomcat-displaying-the-format-of-search-result-tp4098040.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Compound words
Why did you reject using synonyms? You can have multi-word synonyms just fine at index time, and at query time, since the multiple words are already substituted in the index you don't need to do the same substitution, just query the raw strings. I freely acknowledge you may have very good reasons for doing this yourself, I'm just making sure you know what's already there. See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory Look particularly at the explanations for sea biscuit in that section. Best, Erick On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg parv...@zettata.com wrote: One more thing, Is there a way to remove my accidentally sent phone number in the signature from the previous mail? aarrrggghhh
Re: Optimal interval for soft commit
To reply to your original question, when you soft commit the top-level caches are thrown away. I.e. the filterCache, documentResultCache, all the ones in solrconfig.xml. And if you have a high autowarm count on them, you wind up doing a lot of work for no gain. Say your soft commit interval is 1 second. Only queries that come in during that one second even _potentially_ use the caches. Here's a long blog with lots of background: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Try this: 1 set your soft commit interval to 1 2 set your cache sizes in solrconfig to 5 3 set your autowarm counts in 2 to 0. try it. If you see unacceptable degradation in query performance, then this is too aggressive and you need some caching. If not, don't bother caching. As always, it's a tradeoff between how fast docs are searchable and how much you can improve things with caching. Best, Erick On Mon, Oct 28, 2013 at 6:42 AM, Mugoma Joseph O. mug...@yengas.com wrote: Hello, How do you add the documents to the index - one by one, batches of n ? Documents are added one by one using solrj When do you do your commits ? We have the following settings in solrconfig.xml: autoCommit maxTime180/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime15000/maxTime /autoSoftCommit Thanks. Mugoma. On Mon, October 28, 2013 12:22 pm, michael.boom wrote: How do you add the documents to the index - one by one, batches of n ? When do you do your commits ? Because 8k docs per day is not a lot. Depending on the above, commiting with softCommit=true might also be a solution. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Optimal-interval-for-soft-commit-tp4098016p4098022.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Update URI is not found
Hi Erick and All, The problem is solved by copying schema-solr4.xml into my collection's Solr conf (renamed to schema.xml). I didn't use hadoop there, and apologize if it's better to post on this Solr list since the problem appeared first on Solr Indexer step. Regarding /2 option it's e-mail body evolution I thought :) On my first posting, that was a crawl script syntax, as on my case: # ./bin/crawl urls/seed.txt TestCrawl http://localhost:8080/solr/ 2 2 = the number of rounds. See here: http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script Again, thanks everyone! On Mon, Oct 28, 2013 at 5:39 PM, Erick Erickson erickerick...@gmail.comwrote: This seems like a better question for the Nutch list. I see hadoop in there, so unless you've specifically configured solr to use the HDFS directory writer factory, this has to be coming from someplace else. And there are map/reduce tasks in here. BTW, it would be more helpful if you posted the URL that you successfully queried Solr with... What is the /2 on the end for? Do you use that when you query? Best, Erick On Mon, Oct 28, 2013 at 2:37 AM, Bayu Widyasanyata bwidyasany...@gmail.comwrote: On Mon, Oct 28, 2013 at 1:26 PM, Raymond Wiker rwi...@gmail.com wrote: request: http://localhost:8080/solr/update?wt=javabinversion=2 I think this url is incorrect: there should be a core name between solr and update. I changed th SolrURL on crawl script's option to: ./bin/crawl urls/seed.txt TestCrawl http://localhost:8080/solr/mycollection/2 And the result now is Bad Request. I will look for another misconfiguration things... = org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8080/solr/mycollection/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155) at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) 2013-10-28 13:30:02,804 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195) -- wassalam, [bayu] -- wassalam, [bayu]
Solr 4.5.1 replication Bug? Illegal to have multiple roots (start tag in epilog?).
we have a similar error as this thread. http://www.mail-archive.com/solr-user@lucene.apache.org/msg90748.html Tried tomcat setting from this post. We used exact setting sepecified here. we merge 500 documents at a time. I am creating a new thread because Michael is using Jetty where as we use Tomcat. formdataUploadLimitInKB and multipartUploadLimitInKB limits are set to very high value 2GB. As suggested in the following thread. https://issues.apache.org/jira/browse/SOLR-5331 We use out of the box Solr 4.5.1 no customization done. If we merge documents via SolrJ to a single server it is perfectly working fine. But as soon as we add another node to the cloud we are getting following while merging documents. This is the error we are getting on the server (10.10.10.116 - IP is irrelavent just for clarity)where merging is happening. 10.10.10.119 is the new node here. This server gets RemoteSolrException shard update error StdNode: http://10.10.10.119:8980/solr/mycore/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,12468] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:425) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) On the other server 10.10.10.119 we get following error org.apache.solr.common.SolrException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,12468] at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,12369] at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461) at com.ctc.wstx.sr.BasicStreamReader.handleExtraRoot(BasicStreamReader.java:2155) at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2070) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2647)
Field Value depending on another field value
Hello, I'm pretty new to Solr, and I have a question about best practice. I want to handle a Solr collection with products that are available in different shops. For several reasons, the price of a product may be the same or vary, depending the shop's location. What I don't know how to handle correctly is the ability to have a price that is a multivalued notion, which value depends on another field. Imagine the following product into the collection : { id: 123456, name: The Wonderful product, SellableInShop: [1, 3], Price: 0, PriceInShop1: 34.99, PriceInShop2: 0, PriceInShop3: 38.99 } Behaviour I want when the user searchs for wonderful after selecting the shop #3 /query?q=wonderful AND SellableInShops:3 { id: 123456, name: The Wonderful product, SellableInShop: [1, 3], Price: 38.99 } My question is : how to fill, at query-time, the content of a field Price, depending on 2 other fields : SellableInShop and PriceInShop3 (PriceInShop2 if SellableInShop == 2, PriceInShop1 if SellableInShop == 1, etc) ? Thanks a lot, Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Field-Value-depending-on-another-field-value-tp4098047.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Data import handler with multi tables
I think because id is unique. When importing tbl_tableA import first, tbl_tableB import after. tbl_tableB has id which the same id in tableA, so only data of tableB had indexed with unique id. That's exactly what happens here :) If the second table would have fewer records than the first one, you'd still see records from that table. Anyone can help me to configure data import handler that can index all data of two (more) tables which have the same id in each table. that requires the use of a key which is known as compound key (http://en.wikipedia.org/wiki/Compound_key), f.e. if data comes from Table A .. make it A1 instead of (only) 1, A2, B1, B2 .. and so on. you can still index the raw id's in another field .. but for the unique key .. you need something like that, to get it working. HTH Stefan On Monday, October 28, 2013 at 10:45 AM, dtphat wrote: Hi, I wanna to import many tables from MySQL. Assume that, I have two tables: *** Tables 1: tbl_tableA(id, nameA) with data (1, A1), (2, A2), (3, A3). *** Tables 2: tbl_tableB(id, nameB) with data (1, B1), (2, B2), (3, B3), (4, B4), (5, B5). I configure: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://xx user=xxx password=xxx batchSize=1 / document name = atexpats6 entity name=tableA query=select * from tbl_tableA field name=id column=id/ field name=nameA column=nameA / /entity entity name=tableB query=select * from tbl_tableB field name=id column=id/ field name=nameA column=nameA / /entity /document /dataConfig I define nameA, nameB in schema.xml and id is configured by uniqueKeyid/uniqueKey When I import data by http://localhost:8983/solr/dataimport?command=full-import It's successfull. But only data of tbl_tableB had indexed. I think because id is unique. When importing tbl_tableA import first, tbl_tableB import after. tbl_tableB has id which the same id in tableA, so only data of tableB had indexed with unique id. Anyone can help me to configure data import handler that can index all data of two (more) tables which have the same id in each table. Thanks. - Phat T. Dong -- View this message in context: http://lucene.472066.n3.nabble.com/Data-import-handler-with-multi-tables-tp4098026.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
Re: Compound words
Hi Erick, Thanks for the suggestion. Like I said, I'm an infant. We tried synonyms both ways. sea biscuit = seabiscuit and seabiscuit = sea biscuit and didn't understand exactly how it worked. But I just checked the analysis tool, and it seems to work perfectly fine at index time. Now, I can happily discard my own filter and 4 days of work. I'm happy I got to know a few ways on how/when not to write a solr filter :) I tried the string sea biscuit sea bird with expand=false and the tokens i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But at query time, when I enter the same term sea biscuit sea bird, using edismax and qf, pf2, and pf3, the parsedQuery looks like this: +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\biscuit sea\) (text:\sea bird\)) ((text:\seabiscuit sea\) (text:\biscuit sea bird\)) What I wanted instead was this +((text:seabiscuit) (text:sea) (text:bird)) ((text:\seabiscuit sea\) (text:\sea bird\)) (text:\seabiscuit sea bird\) Looks like there isn't any other way than to pre-process query myself and create the compound word. What do you mean by just query the raw string? Am I still missing something? Parvesh Garg http://www.zettata.com (This time I did remove my phone number :) ) On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.comwrote: Why did you reject using synonyms? You can have multi-word synonyms just fine at index time, and at query time, since the multiple words are already substituted in the index you don't need to do the same substitution, just query the raw strings. I freely acknowledge you may have very good reasons for doing this yourself, I'm just making sure you know what's already there. See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory Look particularly at the explanations for sea biscuit in that section. Best, Erick On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg parv...@zettata.com wrote: One more thing, Is there a way to remove my accidentally sent phone number in the signature from the previous mail? aarrrggghhh
Re: One of all shard stopping, all shards stop
Thanks for your reply. If one of server have stop and error, this option(distrib=false) is good work. Similarly option is shards.tolerant=true. but I don't want to using this option. because the died server isn't show error message. only return not nothing data. I want to show error message at died server, the other way normal server work normally. -- View this message in context: http://lucene.472066.n3.nabble.com/One-of-all-shard-stopping-all-shards-stop-tp4098015p4098053.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Data import handler with multi tables
Hi, is there no another way to import all data for this case instead Only the way using compound key? Thanks. - Phat T. Dong -- View this message in context: http://lucene.472066.n3.nabble.com/Re-Data-import-handler-with-multi-tables-tp4098048p4098056.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Compound words
Consider setting expand=true at index time. That puts all the tokens in your index, and then you may not need to have any synonym processing at query time since all the variants will already be in the index. As it is, you've replaced the words in the original with synonyms, essentially collapsed them down to a single word and then you have to do something at query time to get matches. If all the variants are in the index, you shouldn't have to. That's what I meant by raw. Best, Erick On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg parv...@zettata.com wrote: Hi Erick, Thanks for the suggestion. Like I said, I'm an infant. We tried synonyms both ways. sea biscuit = seabiscuit and seabiscuit = sea biscuit and didn't understand exactly how it worked. But I just checked the analysis tool, and it seems to work perfectly fine at index time. Now, I can happily discard my own filter and 4 days of work. I'm happy I got to know a few ways on how/when not to write a solr filter :) I tried the string sea biscuit sea bird with expand=false and the tokens i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But at query time, when I enter the same term sea biscuit sea bird, using edismax and qf, pf2, and pf3, the parsedQuery looks like this: +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\biscuit sea\) (text:\sea bird\)) ((text:\seabiscuit sea\) (text:\biscuit sea bird\)) What I wanted instead was this +((text:seabiscuit) (text:sea) (text:bird)) ((text:\seabiscuit sea\) (text:\sea bird\)) (text:\seabiscuit sea bird\) Looks like there isn't any other way than to pre-process query myself and create the compound word. What do you mean by just query the raw string? Am I still missing something? Parvesh Garg http://www.zettata.com (This time I did remove my phone number :) ) On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.com wrote: Why did you reject using synonyms? You can have multi-word synonyms just fine at index time, and at query time, since the multiple words are already substituted in the index you don't need to do the same substitution, just query the raw strings. I freely acknowledge you may have very good reasons for doing this yourself, I'm just making sure you know what's already there. See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory Look particularly at the explanations for sea biscuit in that section. Best, Erick On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg parv...@zettata.com wrote: One more thing, Is there a way to remove my accidentally sent phone number in the signature from the previous mail? aarrrggghhh
Re: One of all shard stopping, all shards stop
I think if you set shards.tolerant=true you get information in the return packet if a shard is completely down. The other thing you can do is query the ZooKeeper cluster state directly. But I have to ask why you're not using a replica or two per shard. That should provide automatic fail-over etc and make the necessity of dealing with this case _much_ less frequent. Personally I'd put more effort into making an always-up cluster than dealing with when a single node goes down. FWIW, Erick On Mon, Oct 28, 2013 at 8:10 AM, hongkeun.yoo hunter...@naver.com wrote: Thanks for your reply. If one of server have stop and error, this option(distrib=false) is good work. Similarly option is shards.tolerant=true. but I don't want to using this option. because the died server isn't show error message. only return not nothing data. I want to show error message at died server, the other way normal server work normally. -- View this message in context: http://lucene.472066.n3.nabble.com/One-of-all-shard-stopping-all-shards-stop-tp4098015p4098053.html Sent from the Solr - User mailing list archive at Nabble.com.
return value from SolrJ client to php
Hello All, I have a requirement where I have to conect to Solr using SolrJ client and documents return by solr to SolrJ client have to returned to PHP. I know its simple to get document from Solr to SolrJ But how do I return documents from SolrJ to PHP ? Thanks Amit Aggarwal
Re: Field Value depending on another field value
Hi Ben, You can actually look at indexing single valued documents i.e. a different one for every store and then group by on the product id. Have a look at this presentation by Adrian Trenaman at the Lucene Revolution earlier this year: Presentation: http://www.slideshare.net/trenaman/personalized-search-on-the-largest-flash-sale-site-in-america Video: http://www.youtube.com/watch?v=kJa-3PEc90g Hope that helps you. On Mon, Oct 28, 2013 at 5:06 PM, bengates benga...@aliceadsl.fr wrote: Hello, I'm pretty new to Solr, and I have a question about best practice. I want to handle a Solr collection with products that are available in different shops. For several reasons, the price of a product may be the same or vary, depending the shop's location. What I don't know how to handle correctly is the ability to have a price that is a multivalued notion, which value depends on another field. Imagine the following product into the collection : { id: 123456, name: The Wonderful product, SellableInShop: [1, 3], Price: 0, PriceInShop1: 34.99, PriceInShop2: 0, PriceInShop3: 38.99 } Behaviour I want when the user searchs for wonderful after selecting the shop #3 /query?q=wonderful AND SellableInShops:3 { id: 123456, name: The Wonderful product, SellableInShop: [1, 3], Price: 38.99 } My question is : how to fill, at query-time, the content of a field Price, depending on 2 other fields : SellableInShop and PriceInShop3 (PriceInShop2 if SellableInShop == 2, PriceInShop1 if SellableInShop == 1, etc) ? Thanks a lot, Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Field-Value-depending-on-another-field-value-tp4098047.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta http://www.anshumgupta.net
Re: return value from SolrJ client to php
Hi Amit, I haven't personally tried it, but have a look at the options listed here: http://wiki.apache.org/solr/IntegratingSolr Also, just check if the library you try is known to work with the version of Solr you'd want to use. Otherwise, how about just using a serialization library for apps in the 2 languages to talk to each other? On Mon, Oct 28, 2013 at 7:03 PM, Amit Aggarwal amit.aggarwa...@gmail.comwrote: Hello All, I have a requirement where I have to conect to Solr using SolrJ client and documents return by solr to SolrJ client have to returned to PHP. I know its simple to get document from Solr to SolrJ But how do I return documents from SolrJ to PHP ? Thanks Amit Aggarwal -- Anshum Gupta http://www.anshumgupta.net
Re: Proposal for new feature, cold replicas, brainstorming
On Sat, 2013-10-26 at 02:14 +0200, Chris Hostetter wrote: I suspect that the most straight forward way to achieve what you are folks seem to be describing would be to add a hook into the request distribution processing so that you could have a custom plugin used when solr does Replica r = pickReplica(shardName) and your implimentation of pickReplica() would look something like (all psuedo code)... ListReplica allInShard = clusterState.getAllLiveReplicas(shardName) ListReplica candidates = new List(); for (Replica r : allInShard) { if (! r.hasRole(shardIsLastResort) ) { candaites.add(r); } return candaidates.isEmpty() ? allInShard : candidates; I am not vary familiar with the distribution code in Solr. I located CloudSolrServer.request(SolrRequest request) which seems to be the place you are talking about? It extracts replica URLs and generates a LBHttpSolrServer.Req with that list, which it immediately used with the LBHttpSolrServer. As I understand it, feeding LBHttpSolrServer.Req with only shards that are primary, would mean an exception if those shards does not answer. In order to handle the first search against a failed primary shard gracefully, wouldn't we need to extend the LBHttpSolrServer.Req to have two lists, primary and lastResort, instead of one? This would also require a rewrite of the try-retry logic in LBHttpSolrServer. ...if i remember correctly, there is already a hook (or there is an issue about adding a hook) to let you do plugin logic like this -- [...] I did not see one in the code and could not locate a JIRA issue. Not that it means that it isn't there. Thank you for your time, Toke Eskildsen
Re: Compound words
Hi Parvesh, I think you should check the following jira https://issues.apache.org/jira/browse/SOLR-5379. You will find there links to other possible solutions/problems:-) Roman On 28 Oct 2013 09:06, Erick Erickson erickerick...@gmail.com wrote: Consider setting expand=true at index time. That puts all the tokens in your index, and then you may not need to have any synonym processing at query time since all the variants will already be in the index. As it is, you've replaced the words in the original with synonyms, essentially collapsed them down to a single word and then you have to do something at query time to get matches. If all the variants are in the index, you shouldn't have to. That's what I meant by raw. Best, Erick On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg parv...@zettata.com wrote: Hi Erick, Thanks for the suggestion. Like I said, I'm an infant. We tried synonyms both ways. sea biscuit = seabiscuit and seabiscuit = sea biscuit and didn't understand exactly how it worked. But I just checked the analysis tool, and it seems to work perfectly fine at index time. Now, I can happily discard my own filter and 4 days of work. I'm happy I got to know a few ways on how/when not to write a solr filter :) I tried the string sea biscuit sea bird with expand=false and the tokens i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But at query time, when I enter the same term sea biscuit sea bird, using edismax and qf, pf2, and pf3, the parsedQuery looks like this: +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\biscuit sea\) (text:\sea bird\)) ((text:\seabiscuit sea\) (text:\biscuit sea bird\)) What I wanted instead was this +((text:seabiscuit) (text:sea) (text:bird)) ((text:\seabiscuit sea\) (text:\sea bird\)) (text:\seabiscuit sea bird\) Looks like there isn't any other way than to pre-process query myself and create the compound word. What do you mean by just query the raw string? Am I still missing something? Parvesh Garg http://www.zettata.com (This time I did remove my phone number :) ) On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.com wrote: Why did you reject using synonyms? You can have multi-word synonyms just fine at index time, and at query time, since the multiple words are already substituted in the index you don't need to do the same substitution, just query the raw strings. I freely acknowledge you may have very good reasons for doing this yourself, I'm just making sure you know what's already there. See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory Look particularly at the explanations for sea biscuit in that section. Best, Erick On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg parv...@zettata.com wrote: One more thing, Is there a way to remove my accidentally sent phone number in the signature from the previous mail? aarrrggghhh
Re: Solr - what's the next big thing?
Hi, On Sun, Oct 27, 2013 at 2:57 PM, Saar Carmi saarca...@gmail.com wrote: If I get it right, Solr can store its data files on HDFS but it will not Correct. And can be used to build indices in parallel, using MapReduce, from data living on HDFS. use map reduce to process the data (e.g. evaluating queries). Right. MapReduce jobs are typically not a sub-second process, while search queries typically need to be very quick. That said, one could run a query and then apply MapReduce-based processing on the search results. There is no support for that in Solr today. I was wondering whether Solr could utilize the Hadoop job distribution mechanism to utlize resources better. On the otherhand, maybe this is not needed with the availability of Solr Cloud. Maybe you are thinking Solr on YARN? Mark Miller can probably say a word or two or three on this topic. Bill Bell, could you elaborate about complex object indexing? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Sat, Oct 26, 2013 at 10:04 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, On Sat, Oct 26, 2013 at 5:58 AM, Saar Carmi saarca...@gmail.com wrote: LOL, Jack. I can imagine Otis saying that. Funny indeed, but not really. Otis, with these marriage, are we going to see map reduce based queries? Can you please describe what you mean by that? Maybe with an example. Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Oct 25, 2013 10:03 PM, Jack Krupansky j...@basetechnology.com wrote: But a lot of that big yellow elephant stuff is in 4.x anyway. (Otis: I was afraid that you were going to say that the next big thing in Solr is... Elasticsearch!) -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Friday, October 25, 2013 2:43 PM To: solr-user@lucene.apache.org Subject: Re: Solr - what's the next big thing? Saar, The marriage with the big yellow elephant is a big deal. It changes the scale. Otis Solr ElasticSearch Support http://sematext.com/ On Oct 25, 2013 5:32 AM, Saar Carmi saarca...@gmail.com wrote: If I am not mistaken the most impressive improvement of Solr 4.0 compared to previous versions was the Solr Cloud architecture. What would be the next big thing in Solr 5.0 ? Saar -- Saar Carmi Mobile: 054-7782417 Email: saarca...@gmail.com
Re: Need idea to standardize keywords - ring tone vs ringtone
Thanks for your response Eric. Sorry for the confusion. I currently display both 'ring tone' as well as 'ringtone' when the user types in 'r' but I am trying to figure out a way to display just 'ringtone' hence I added 'ring tone' to stopwords list so that it doesn't get indexed. I have the list of know keywords (more like synonyms) which I am trying to map against the user entered keywords. ring tone, ringer tine = ringtone -- View this message in context: http://lucene.472066.n3.nabble.com/Need-idea-to-standardize-keywords-ring-tone-vs-ringtone-tp4097794p4098103.html Sent from the Solr - User mailing list archive at Nabble.com.
Replace document title with filename if it's empty
Hi, I just found that some of PDFs files crawled has no (empty) 'title' metadata. How to define or fetch the filename, and use it (filename) replacing empty 'title' field? I didn't found filename field on schema.xml, and don't know how to make conditional for above conditions (if title is empty then ). Thanks in advance. -- wassalam, [bayu]
Re: Apache-Solr with Tomcat: displaying the format of search result
On 10/28/2013 4:40 AM, pyramesh wrote: But this is not I want.. I want to display data as same as input format. can anyone please help on this What Solr outputs in its fields for search results is identical to what it receives when data is indexed, unless you have update processors configured that change the data. The analysis chain that you define in schema.xml is *NOT* applied to stored data, only indexed data. If the search results are not coming out in the format that you want, it is either arriving at Solr incorrectly, or you have one or more update processors that are changing it. Thanks, Shawn
Re: Need idea to standardize keywords - ring tone vs ringtone
Do you know about the Solr synonym feature? That seems more applicable to what you're describing then stopwords. I'd stay away from stopwords entirely here, and try to do what you want with synonyms. Multi-word synonyms can be tricky, I'm not entirely sure the right way to do it for this use case. But I think the synonym feature is what you want. Not the stopwords feature. On 10/28/13 12:24 PM, Developer wrote: Thanks for your response Eric. Sorry for the confusion. I currently display both 'ring tone' as well as 'ringtone' when the user types in 'r' but I am trying to figure out a way to display just 'ringtone' hence I added 'ring tone' to stopwords list so that it doesn't get indexed. I have the list of know keywords (more like synonyms) which I am trying to map against the user entered keywords. ring tone, ringer tine = ringtone -- View this message in context: http://lucene.472066.n3.nabble.com/Need-idea-to-standardize-keywords-ring-tone-vs-ringtone-tp4097794p4098103.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When is/should qf different from pf?
Thanks Erick. Numeric fields make sense as I guess would strictly string fields too since its one term? In the normal text searching case though does it make sense to have qf and pf differ? Thanks Amit On Oct 28, 2013 3:36 AM, Erick Erickson erickerick...@gmail.com wrote: The facetious answer is when phrases aren't important in the fields. If you're doing a simple boolean match, adding phrase fields will add expense, to no good purpose etc. Phrases on numeric fields seems wrong. FWIW, Erick On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian anith...@gmail.com wrote: Hi all, I have been using Solr for years but never really stopped to wonder: When using the dismax/edismax handler, when do you have the qf different from the pf? I have always set them to be the same (maybe different weights) but I was wondering if there is a situation where you would have a field in the qf not in the pf or vice versa. My understanding from the docs is that qf is a term-wise hard filter while pf is a phrase-wise boost of documents who made it past the qf filter. Thanks! Amit
Re: When is/should qf different from pf?
There'd be no point having them the same. You're likely to include boosts in your pf, so that docs that match the phrase query as well as the term query score higher than those that just match the term query. Such as: qf=text descriptionpf=text^2 description^4 Upayavira On Mon, Oct 28, 2013, at 05:44 PM, Amit Nithian wrote: Thanks Erick. Numeric fields make sense as I guess would strictly string fields too since its one term? In the normal text searching case though does it make sense to have qf and pf differ? Thanks Amit On Oct 28, 2013 3:36 AM, Erick Erickson erickerick...@gmail.com wrote: The facetious answer is when phrases aren't important in the fields. If you're doing a simple boolean match, adding phrase fields will add expense, to no good purpose etc. Phrases on numeric fields seems wrong. FWIW, Erick On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian anith...@gmail.com wrote: Hi all, I have been using Solr for years but never really stopped to wonder: When using the dismax/edismax handler, when do you have the qf different from the pf? I have always set them to be the same (maybe different weights) but I was wondering if there is a situation where you would have a field in the qf not in the pf or vice versa. My understanding from the docs is that qf is a term-wise hard filter while pf is a phrase-wise boost of documents who made it past the qf filter. Thanks! Amit
Solr block join
Hi, The block join feature introduced in Solr 4.5 is really helpful in solving some of the issues in my project. I am able to get it working in simple cases. However, I couldn't figure out how to use it in some more complex cases and I could find very little reference about it. 1) how to return both parent documents fields and child document fields in same result (in Solrj )? 2) how to apply 'OR' to multiple child documents types (searching for documents that meet conditions of either child document type 1 or child document type2)? 3) if result/sort/facet fields coming from child documents, how to define them in schema? What I can think about is to create a copyField for each them in parent documents. Is there any better way? 4) is block join working for multiple child level such child, grandchild documents etc? Does anyone have had similar issues and would like to share your solutions? Thanks, Simon -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-block-join-tp4098128.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.5.1 replication Bug? Illegal to have multiple roots (start tag in epilog?).
Hey, this is Michael, who was having the exact error on the Jetty side with an update. I've upgraded jetty from the 4.5.1 embedded version (in the example directory) to version 9.0.6, which means I had to upgrade my OpenJDK from 1.6 to 1.7.0_45. Also, I added the suggested (very large) settings to my solrconfig.xml: requestParsers enableRemoteStreaming=true formdataUploadLimitInKB=2048000 multipartUploadLimitInKB=2048000 / but I am still getting the errors when I put a second server in the cloud. Single servers (external zookeeper, but no cloud partner) works just fine. I suppose my next step is to try Tomcat, but according to your post, it will not help! Any help is appreciated, M. - Original Message - From: Sai Gadde gadde@gmail.com To: solr-user@lucene.apache.org Sent: Monday, October 28, 2013 7:10:41 AM Subject: Solr 4.5.1 replication Bug? Illegal to have multiple roots (start tag in epilog?). we have a similar error as this thread. http://www.mail-archive.com/solr-user@lucene.apache.org/msg90748.html Tried tomcat setting from this post. We used exact setting sepecified here. we merge 500 documents at a time. I am creating a new thread because Michael is using Jetty where as we use Tomcat. formdataUploadLimitInKB and multipartUploadLimitInKB limits are set to very high value 2GB. As suggested in the following thread. https://issues.apache.org/jira/browse/SOLR-5331 We use out of the box Solr 4.5.1 no customization done. If we merge documents via SolrJ to a single server it is perfectly working fine. But as soon as we add another node to the cloud we are getting following while merging documents. This is the error we are getting on the server (10.10.10.116 - IP is irrelavent just for clarity)where merging is happening. 10.10.10.119 is the new node here. This server gets RemoteSolrException shard update error StdNode: http://10.10.10.119:8980/solr/mycore/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,12468] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:425) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) On the other server 10.10.10.119 we get following error org.apache.solr.common.SolrException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,12468] at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) at
Re: Compound words
Hi Roman, thanks for the link, will go through it. Erick, will try with expand=true once and check out the results. Will update this thread with the findings. I remember we rejected expand=true because of some weird spaghetti problem. Will check it out again. Thanks, Parvesh Garg http://www.zettata.com On Mon, Oct 28, 2013 at 9:01 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Parvesh, I think you should check the following jira https://issues.apache.org/jira/browse/SOLR-5379. You will find there links to other possible solutions/problems:-) Roman On 28 Oct 2013 09:06, Erick Erickson erickerick...@gmail.com wrote: Consider setting expand=true at index time. That puts all the tokens in your index, and then you may not need to have any synonym processing at query time since all the variants will already be in the index. As it is, you've replaced the words in the original with synonyms, essentially collapsed them down to a single word and then you have to do something at query time to get matches. If all the variants are in the index, you shouldn't have to. That's what I meant by raw. Best, Erick On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg parv...@zettata.com wrote: Hi Erick, Thanks for the suggestion. Like I said, I'm an infant. We tried synonyms both ways. sea biscuit = seabiscuit and seabiscuit = sea biscuit and didn't understand exactly how it worked. But I just checked the analysis tool, and it seems to work perfectly fine at index time. Now, I can happily discard my own filter and 4 days of work. I'm happy I got to know a few ways on how/when not to write a solr filter :) I tried the string sea biscuit sea bird with expand=false and the tokens i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But at query time, when I enter the same term sea biscuit sea bird, using edismax and qf, pf2, and pf3, the parsedQuery looks like this: +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\biscuit sea\) (text:\sea bird\)) ((text:\seabiscuit sea\) (text:\biscuit sea bird\)) What I wanted instead was this +((text:seabiscuit) (text:sea) (text:bird)) ((text:\seabiscuit sea\) (text:\sea bird\)) (text:\seabiscuit sea bird\) Looks like there isn't any other way than to pre-process query myself and create the compound word. What do you mean by just query the raw string? Am I still missing something? Parvesh Garg http://www.zettata.com (This time I did remove my phone number :) ) On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.com wrote: Why did you reject using synonyms? You can have multi-word synonyms just fine at index time, and at query time, since the multiple words are already substituted in the index you don't need to do the same substitution, just query the raw strings. I freely acknowledge you may have very good reasons for doing this yourself, I'm just making sure you know what's already there. See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory Look particularly at the explanations for sea biscuit in that section. Best, Erick On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg parv...@zettata.com wrote: One more thing, Is there a way to remove my accidentally sent phone number in the signature from the previous mail? aarrrggghhh
Single multilingual field analyzed based on other field values
Hello, First some background... I am indexing a multilingual document set where documents themselves can contain multiple languages. The language(s) within my documents are known ahead of time. I have tried separate fields per language, and due to the poor query performance I'm seeing with that approach (many languages / fields), I'm trying to create a single multilingual field. One approach to this problem is given in Section 14.6.4https://docs.google.com/a/basistech.com/file/d/0B3NlE_uL0pqwR0hGV0M1QXBmZm8/editof the new Solr In Action book. The approach is to take the document content field and prepend it with the list contained languages followed by a special delimiter. A new field type is defined that maps languages to sub field types, and the new type's tokenizer then runs all of the sub field type analyzers over the field and merges results, adjusts offsets for the prepended data, etc. Due to the tokenizer complexity incurred, I'd like to pursue a more flexible approach, which is to run the various language-specific analyzers not based on prepended codes, but instead based on other field values (i.e., a language field). I don't see a straightforward way to do this, mostly because a field analyzer doesn't have access to the rest of the document. On the flip side, an UpdateRequestProcessor would have access to the document but doesn't really give a path to wind up where I want to be (single field with different analyzers run dynamically). Finally, my question: is it possible to thread cache document language(s) during UpdateRequestProcessor execution (where we have access to the full document), so that the analyzer can then read from the cache to determine which analyzer(s) to run? More specifically, if a document is run through it's URP chain on thread T, will its analyzer(s) also run on thread T and will no other documents be run through the URP on that thread in the interim? Thanks, Dave
Re: Single multilingual field analyzed based on other field values
Consider an update processor - it can operate on any field and has access to all fields. You could have one update processor to combine all the fields to process, into a temporary, dummy field. Then run a language detection update processor on the combined field. Then process the results and place in the desired field. And finally remove any temporary fields. -- Jack Krupansky -Original Message- From: David Anthony Troiano Sent: Monday, October 28, 2013 4:47 PM To: solr-user@lucene.apache.org Subject: Single multilingual field analyzed based on other field values Hello, First some background... I am indexing a multilingual document set where documents themselves can contain multiple languages. The language(s) within my documents are known ahead of time. I have tried separate fields per language, and due to the poor query performance I'm seeing with that approach (many languages / fields), I'm trying to create a single multilingual field. One approach to this problem is given in Section 14.6.4https://docs.google.com/a/basistech.com/file/d/0B3NlE_uL0pqwR0hGV0M1QXBmZm8/editof the new Solr In Action book. The approach is to take the document content field and prepend it with the list contained languages followed by a special delimiter. A new field type is defined that maps languages to sub field types, and the new type's tokenizer then runs all of the sub field type analyzers over the field and merges results, adjusts offsets for the prepended data, etc. Due to the tokenizer complexity incurred, I'd like to pursue a more flexible approach, which is to run the various language-specific analyzers not based on prepended codes, but instead based on other field values (i.e., a language field). I don't see a straightforward way to do this, mostly because a field analyzer doesn't have access to the rest of the document. On the flip side, an UpdateRequestProcessor would have access to the document but doesn't really give a path to wind up where I want to be (single field with different analyzers run dynamically). Finally, my question: is it possible to thread cache document language(s) during UpdateRequestProcessor execution (where we have access to the full document), so that the analyzer can then read from the cache to determine which analyzer(s) to run? More specifically, if a document is run through it's URP chain on thread T, will its analyzer(s) also run on thread T and will no other documents be run through the URP on that thread in the interim? Thanks, Dave
Re: Index JTS Point in Solr/Lucene index
Just follow-ing up with this thread after a round of emails between Shahbaz and I… David Smiley wrote Ooooh, I see your confusion. You looked at code in an UpdateRequestProcessor and expected it to work on the client in SolrJ. It won't work for the reason that the code in the URP is creating a non-string object (a Shape subclass) whereas SolrJ expects Strings or numbers. You need to use Shape formatted strings. If you have a generic Shape and want to serialize it to a String without special casing Point, etc., then you can use SpatialContext.toString(shape). Shahbaz lodhi wrote Hi, *Story:* I am trying to index *JTS point* in following format; not successful though: Pt(x=55.76056,y=24.19167) It is the format that i get by ctx.readShape( shapeString ). I don't get any error at reading shape or adding shape to solrInputDocument but prompts *error reading WKT* on adding document to solr (i.e. solrServer.add(solrInputDocument)). * * *Question:* Is it a legal way to index: solrInputDocument.addField(myGeoField, JtsSpatialContext.GEO.readShape(shapeString)); solr.add(solrInputDocument); or I'll have to stick to the WKT format. Any help will be highly appreciated. Thanks, Shahbaz - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Index-JTS-Point-in-Solr-Lucene-index-tp4095395p4098139.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Global User defined properties - solr.xml from Solr 4.4 to Solr 4.5
Done https://issues.apache.org/jira/browse/SOLR-5398 -- View this message in context: http://lucene.472066.n3.nabble.com/Global-User-defined-properties-solr-xml-from-Solr-4-4-to-Solr-4-5-tp4097740p4098143.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Background merge errors with Solr 4.4.0 on Optimize call
Thanks for your response. You were right, solr is logging to the catalina.out file for tomcat. When I click the optimize button in solr's admin interface the following logs are written: http://apaste.info/laup About JVM memory, solr's admin interface is listing JVM memory at 3.1% (221.7MB is dark grey, 512.56MB light grey and 6.99GB total). On Mon, Oct 28, 2013 at 6:29 AM, Erick Erickson erickerick...@gmail.comwrote: For Tomcat, the Solr is often put into catalina.out as a default, so the output might be there. You can configure Solr to send the logs most anywhere you please, but without some specific setup on your part the log output just goes to the default for the servlet. I took a quick glance at the code but since the merges are happening in the background, there's not much context for where that error is thrown. How much memory is there for the JVM? I'm grasping at straws a bit... Erick On Sun, Oct 27, 2013 at 9:54 PM, Matthew Shapiro m...@mshapiro.net wrote: I am working at implementing solr to work as the search backend for our web system. So far things have been going well, but today I made some schema changes and now things have broken. I updated the schema.xml file and reloaded the core (via the admin interface). No errors were reported in the logs. I then pushed 100 records to be indexed. A call to Commit afterwards seemed fine, however my next call for Optimize caused the following errors: java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 [maxNumSegments=1] null:java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 [maxNumSegments=1] Unfortunately, googling for background merge hit exception came up with 2 thing: a corrupt index or not enough free space. The host machine that's hosting solr has 227 out of 229GB free (according to df -h), so that's not it. I then ran CheckIndex on the index, and got the following results: http://apaste.info/gmGU As someone who is new to solr and lucene, as far as I can tell this means my index is fine. So I am coming up at a loss. I'm somewhat sure that I could probably delete my data directory and rebuild it but I am more interested in finding out why is it having issues, what is the best way to fix it, and what is the best way to prevent it from happening when this goes into production. Does anyone have any advice that may help? As an aside, i do not have a stacktrace for you because the solr admin page isn't giving me one. I tried looking in my logs file in my solr directory, but it does not contain any logs. I opened up my ~/tomcat/lib/log4j.properties file and saw http://apaste.info/0rTL, which didnt really help me find log files. Doing a 'find . | grep solr.log' didn't really help either. Any help for finding log files (which may help find the actual cause of this) would also be appreciated.
Re: Background merge errors with Solr 4.4.0 on Optimize call
Sorry for reposting after I just sent in a reply, but I just looked at the error trace closer and noticed 1. Caused by: java.lang.IllegalArgumentException: no such field what The 'what' field was removed by request of the customer as they wanted the logic behind what gets queried in the what field to be code side instead of solr side (for easier changing without having to re-index everything. I didn't feel strongly either way and since they are paying me, I took it out). This makes me wonder if its crashing while merging because a field that used to be there is now gone. However, this seems odd to me as Solr doesn't even let me delete the old data and instead its leaving my collection in an extremely bad state, with the only remedy I can think of is to nuke the index at the filesystem level. If this is indeed the cause of the crash, is the only way to delete a field to first completely empty your index first? On Mon, Oct 28, 2013 at 6:34 PM, Matthew Shapiro m...@mshapiro.net wrote: Thanks for your response. You were right, solr is logging to the catalina.out file for tomcat. When I click the optimize button in solr's admin interface the following logs are written: http://apaste.info/laup About JVM memory, solr's admin interface is listing JVM memory at 3.1% (221.7MB is dark grey, 512.56MB light grey and 6.99GB total). On Mon, Oct 28, 2013 at 6:29 AM, Erick Erickson erickerick...@gmail.comwrote: For Tomcat, the Solr is often put into catalina.out as a default, so the output might be there. You can configure Solr to send the logs most anywhere you please, but without some specific setup on your part the log output just goes to the default for the servlet. I took a quick glance at the code but since the merges are happening in the background, there's not much context for where that error is thrown. How much memory is there for the JVM? I'm grasping at straws a bit... Erick On Sun, Oct 27, 2013 at 9:54 PM, Matthew Shapiro m...@mshapiro.net wrote: I am working at implementing solr to work as the search backend for our web system. So far things have been going well, but today I made some schema changes and now things have broken. I updated the schema.xml file and reloaded the core (via the admin interface). No errors were reported in the logs. I then pushed 100 records to be indexed. A call to Commit afterwards seemed fine, however my next call for Optimize caused the following errors: java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 [maxNumSegments=1] null:java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37 [maxNumSegments=1] Unfortunately, googling for background merge hit exception came up with 2 thing: a corrupt index or not enough free space. The host machine that's hosting solr has 227 out of 229GB free (according to df -h), so that's not it. I then ran CheckIndex on the index, and got the following results: http://apaste.info/gmGU As someone who is new to solr and lucene, as far as I can tell this means my index is fine. So I am coming up at a loss. I'm somewhat sure that I could probably delete my data directory and rebuild it but I am more interested in finding out why is it having issues, what is the best way to fix it, and what is the best way to prevent it from happening when this goes into production. Does anyone have any advice that may help? As an aside, i do not have a stacktrace for you because the solr admin page isn't giving me one. I tried looking in my logs file in my solr directory, but it does not contain any logs. I opened up my ~/tomcat/lib/log4j.properties file and saw http://apaste.info/0rTL, which didnt really help me find log files. Doing a 'find . | grep solr.log' didn't really help either. Any help for finding log files (which may help find the actual cause of this) would also be appreciated.
Solr 4.5.1 Overseer error
I am upgrading from 4.4 to 4.5.1 I used to just upload my configurations to zookeeper and then install solr with no default core Solr would give me an error that no cores were created when I tried to access until I ran the collections API create command to make a collection however now when I try to install solr with no default core I get a generic error about path cannot end with / and I can't create the cores using the collections api when I manually copy the files over and create the core through the interface it all works as expected any help would be appreciated Here is the error i'm seeing http://pastebin.com/cEfpSEqe here's my solr.xml http://pastebin.com/kBLv9Vvt and here are my startup arguments http://pastebin.com/7tCrSpX9 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-5-1-Overseer-error-tp4098160.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Single multilingual field analyzed based on other field values
Hi David, What version of the Solr in Action MEAP are you looking at (current version is 12, and version 13 is coming out later this week, and prior versions had significant bugs in the code you are referencing)? I added an update processor in the most recent version that can do language identification and prepend the language codes for you (even removing them from the stored version of the field and only including them on the indexed version for text analysis). You could easily modify this update processor to read the value from the language field and use it as the basis of the pre-pended languages. Otherwise, if you want to do language detection instead of passing in the language manually, MultiTextField in chapter 14 of Solr in Action and the corresponding MultiTextFieldLanguageIdentifierUpdateProcessor should handle all of the language detection and pre-pending automatically for you (and also append the identified language to a separate field). If it were easy/possible to have access to the rest of the fields in the document from within a field's Analyzer then I would have certainly opted for that approach instead of the whole pre-pending languages to content option. If it is too cumbersome, you could probably rewrite the MultiTextField to pull the language from the field name instead of the content (i.e. field name=myField|en,frblah, blah/field instead of field name=myFielden,fr|blah, blah/field as currently designed). This would make specifying the language much easier (especially at query time since you only have to specify the languages once instead of on each term), and you could have Solr still search the same underlying field for all languages. Same general idea, though. In terms of your ThreadLocal cache idea... that sounds really scary to me. The Analyzers' TokenStreamComponents are cached in a ThreadLocal context depending upon to the internal ReusePolicy, and I'm skeptical that you'll be able to pull this off cleanly. It would really be hacking around the Lucene API's even if you were able to pull it off. -Trey On Mon, Oct 28, 2013 at 5:15 PM, Jack Krupansky j...@basetechnology.comwrote: Consider an update processor - it can operate on any field and has access to all fields. You could have one update processor to combine all the fields to process, into a temporary, dummy field. Then run a language detection update processor on the combined field. Then process the results and place in the desired field. And finally remove any temporary fields. -- Jack Krupansky -Original Message- From: David Anthony Troiano Sent: Monday, October 28, 2013 4:47 PM To: solr-user@lucene.apache.org Subject: Single multilingual field analyzed based on other field values Hello, First some background... I am indexing a multilingual document set where documents themselves can contain multiple languages. The language(s) within my documents are known ahead of time. I have tried separate fields per language, and due to the poor query performance I'm seeing with that approach (many languages / fields), I'm trying to create a single multilingual field. One approach to this problem is given in Section 14.6.4https://docs.google.**com/a/basistech.com/file/d/** 0B3NlE_uL0pqwR0hGV0M1QXBmZm8/**edithttps://docs.google.com/a/basistech.com/file/d/0B3NlE_uL0pqwR0hGV0M1QXBmZm8/edit of the new Solr In Action book. The approach is to take the document content field and prepend it with the list contained languages followed by a special delimiter. A new field type is defined that maps languages to sub field types, and the new type's tokenizer then runs all of the sub field type analyzers over the field and merges results, adjusts offsets for the prepended data, etc. Due to the tokenizer complexity incurred, I'd like to pursue a more flexible approach, which is to run the various language-specific analyzers not based on prepended codes, but instead based on other field values (i.e., a language field). I don't see a straightforward way to do this, mostly because a field analyzer doesn't have access to the rest of the document. On the flip side, an UpdateRequestProcessor would have access to the document but doesn't really give a path to wind up where I want to be (single field with different analyzers run dynamically). Finally, my question: is it possible to thread cache document language(s) during UpdateRequestProcessor execution (where we have access to the full document), so that the analyzer can then read from the cache to determine which analyzer(s) to run? More specifically, if a document is run through it's URP chain on thread T, will its analyzer(s) also run on thread T and will no other documents be run through the URP on that thread in the interim? Thanks, Dave
Re: Solr 4.5.1 Overseer error
On 10/28/2013 5:50 PM, dboychuck wrote: I am upgrading from 4.4 to 4.5.1 I used to just upload my configurations to zookeeper and then install solr with no default core Solr would give me an error that no cores were created when I tried to access until I ran the collections API create command to make a collection however now when I try to install solr with no default core I get a generic error about path cannot end with / and I can't create the cores using the collections api when I manually copy the files over and create the core through the interface it all works as expected any help would be appreciated Working on IRC, we were able to track this down to a work item in the overseer queue in zookeeper. It had a deletecore operation in the queue with the collection parameter set to an empty string. { operation:deletecore, core_node_name:solr-shard-1.REDACTED.com:__collection1, core:collection1, collection:, node_name:solr-shard-1.REDACTED.com:_} Basically, the previous version left behind some bad data in zookeeper. When dboychuck wiped out all the zookeeper data and started over, it all worked. If you are seeing Path must not end with / character error when starting Solr, you may have some bad data in the overseer queue, which is located in zookeeper. Would it be worthwhile to file a bug so Solr can deal with these problems automatically and log what it's doing, or at the very least output a better error message? Thanks, Shawn
Re: how to avoid recover? how to ensure a recover success?
I have had a similar problem before but the patch which was included with the version 4.1 fixed that... I couldnt reproduce the problem with the patch... anyone is able to reproduce this exception? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-avoid-recover-how-to-ensure-a-recover-success-tp4096777p4098166.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.5.1 replication Bug? Illegal to have multiple roots (start tag in epilog?).
Hi Michael, I downgraded to Solr 4.4.0 and this issue is gone. No additional settings or tweaks are done. This is not a fix or solution I guess but, in our case we wanted something working and we were running out of time. I will watch this thread if there are any suggestions but, possibly we will stay with 4.4.0 for sometime. Regards Sai On Tue, Oct 29, 2013 at 4:36 AM, Michael Tracey mtra...@biblio.com wrote: Hey, this is Michael, who was having the exact error on the Jetty side with an update. I've upgraded jetty from the 4.5.1 embedded version (in the example directory) to version 9.0.6, which means I had to upgrade my OpenJDK from 1.6 to 1.7.0_45. Also, I added the suggested (very large) settings to my solrconfig.xml: requestParsers enableRemoteStreaming=true formdataUploadLimitInKB=2048000 multipartUploadLimitInKB=2048000 / but I am still getting the errors when I put a second server in the cloud. Single servers (external zookeeper, but no cloud partner) works just fine. I suppose my next step is to try Tomcat, but according to your post, it will not help! Any help is appreciated, M. - Original Message - From: Sai Gadde gadde@gmail.com To: solr-user@lucene.apache.org Sent: Monday, October 28, 2013 7:10:41 AM Subject: Solr 4.5.1 replication Bug? Illegal to have multiple roots (start tag in epilog?). we have a similar error as this thread. http://www.mail-archive.com/solr-user@lucene.apache.org/msg90748.html Tried tomcat setting from this post. We used exact setting sepecified here. we merge 500 documents at a time. I am creating a new thread because Michael is using Jetty where as we use Tomcat. formdataUploadLimitInKB and multipartUploadLimitInKB limits are set to very high value 2GB. As suggested in the following thread. https://issues.apache.org/jira/browse/SOLR-5331 We use out of the box Solr 4.5.1 no customization done. If we merge documents via SolrJ to a single server it is perfectly working fine. But as soon as we add another node to the cloud we are getting following while merging documents. This is the error we are getting on the server (10.10.10.116 - IP is irrelavent just for clarity)where merging is happening. 10.10.10.119 is the new node here. This server gets RemoteSolrException shard update error StdNode: http://10.10.10.119:8980/solr/mycore/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException : Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,12468] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:425) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) On the other server 10.10.10.119 we get following error org.apache.solr.common.SolrException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,12468] at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at
Re: Apache-Solr with Tomcat: displaying the format of search result
Thanks Shawn for quick response... As suggested, I verified my configuration to check whether the update processors configured or not and found no processors configured. I am just wonder how the format getting changed. Let explain my problem in details I am indexing the .xml file to solr. and below is field configuration. * schema.xml* (giving for a field) = *1. Filed::* field name=Resolution type=text_general indexed=true multiValued=true stored=true/ *2. Field Type tokenizers* fieldType name=text_general class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false / filter class=solr.PositionFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory catenateAll=1/ /analyzer /fieldType 3. *before doing the index, data is in below format* field name=Resolution*Issue:* ID country user X; unable xxx wsdfsdfs sdsdfs *Impact / Suspected Impact*: asa asdasdaav asdffcasdfassd *Rootcause:* asdfas asdfasdwersdvsdv sdfsdfcss (1). test 12, (2).tesst 123/field 4. *After index the data* the data is displaying in the below format *Issue:* ID country user X; unable xxx wsdfsdfs sdsdfs*Impact / Suspected Impact*: asa asdasdaav asdffcasdfassd *Rootcause:* asdfas asdfasdwersdvsdv sdfsdfcss (1). test 12, (2).tesst 123 Could you please guide me to display the search result as same as input format even after index.. if any update processors need to add what processors have to add.. Thanks in Advance -- View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-with-Tomcat-displaying-the-input-format-in-the-search-results-tp4098040p4098183.html Sent from the Solr - User mailing list archive at Nabble.com.