Re: Solr 3.3 Sorting is not working for long fields
Field type is long and not multi valued. Using solr 3.3 war file , Tried on solr 1.4.1 index and solr 3.3 index , both cases its not working. query : http://localhost:8091/Group/select?/indent=onq=studyid:120sort=studyidasc,groupid asc,subjectid ascstart=0rows=10 all the ID fields are long Thanks Regards Rajani On Sun, Nov 13, 2011 at 7:58 AM, Erick Erickson erickerick...@gmail.comwrote: Well, 3.3 has been around for quite a while, I'd suspect that something this fundamental would have been found... Is your field multi-valued? And what kind of field is studyid? You really have to provide more details, input, output, etc to get reasonable help. It might help to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Nov 11, 2011 at 5:52 AM, rajini maski rajinima...@gmail.com wrote: Hi, I have upgraded my Solr from 1.4.1 to 3.3.Now I tried to sort on a long field and documents are not getting sorted based on that. Sort is working when we do sorting on facet ex:facet=on facet.sort=studyid But when do simple sort on documents , sort=studyid, sort doesn't happen. Is there any bug ? Regards, Rajani
Dismax, pf and qf
Hi all, In my dismax request handler I'm usually using both qf and pf parameters in order to do phrse and query search with different boosting. Now there are some scenario when I want just the pf active (without qf). Othen then surrounding my query with double quotes, is there another way to do that? I mean, i would like to do the following _query:{!dismax pf=author^100}vincent kwner And that would fire a phrase search, not also vincent OR knwer By completelty ignoring the qf settings. I saw that if i omit the qf parameter SOLR uses the default field and subsequently it returns no result, even if the pf query is matching a record. Regards, Andrea
Re: Solr 3.3 Sorting is not working for long fields
Am 14.11.2011 09:33, schrieb rajini maski: query : http://localhost:8091/Group/select?/indent=onq=studyid:120sort=studyidasc,groupid asc,subjectid ascstart=0rows=10 Is it a copy-and-paste error, or did you realls sort on studyidasc? I don't think you have a field studyidasc, and Solr should've given an exception that either asc or desc is missing. -Kuli
Re: getting solr to expand Acronym
thanks for the replies... the problem with Synonyms is that they would need to be tracked... there could be new words entered that will need to be added to the list on a regular basis... @Otis: As for the option of a custom TokenFilter, how would that work? i have not coded anything into Solr or any custom TokenFilters my self... I am sure theres documentation on this, but how would you think this should work? Thanks. --Tiernan On Fri, Nov 11, 2011 at 9:01 PM, Brandon Ramirez brandon_rami...@elementk.com wrote: Could this be simulated through synonyms? Could you define CD as a synonym of Compact Disc or vice versa? I'm not sure if that would work, just brainstorming here... Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 Software Engineer II | Element K | www.elementk.com -Original Message- From: Tiernan OToole [mailto:lsmart...@gmail.com] Sent: Friday, November 11, 2011 5:10 AM To: solr-user@lucene.apache.org Subject: getting solr to expand Acronym Dont know if this is posible, but i need to ask anyway... Say we have a list of Acronyms in a database (CD, DVD, CPU) and also a list of their not so short names (Compact Disk, Digital Versitile Disk, Central Processing Unit) but they are not linked in any particular way (lost of items, some with full names, some using anronyms), is it posible for Solr to figure out CD is an Acronym of Compact Disk? I know CD could also mean Central Data, or anything that beings with C and D, but is there a way to tell solr to look for items that not only match CD, but have words next to each other that begin with C and D... Another example i can think of is IBM: It could be International Business Machines, or Irish Business Machines, or Irish Banking Machines... So, would that be posible? -- Tiernan O'Toole blog.lotas-smartman.net www.geekphotographer.com www.tiernanotoole.ie -- Tiernan O'Toole blog.lotas-smartman.net www.geekphotographer.com www.tiernanotoole.ie
Re: delta-import of rich documents like word and pdf files!
Thanks for your reply Mr. Erick All I want to do is that I have indexed some of my pdf files and doc files. Now, any changes I make to them, I want a delta-import(incremental) so that I do not have to re index whole document by full import . Only changes made to these documents should get updated. I am using dataimporthandler. I have seen in forums but all of them have queried for delta import related to databases. I am just indexing some of my doc and pdf files for now. What should I do in order to achieve that? Can you provide your data-config.xml?
Re: Delete by Query with limited number of rows
Hi Erick, hi Yury, thanks to your input I found a perfect solution for my case. Even though this is not a solr-only solution, I will just briefly describe how it works since it might be of interest to others: I have put up a mysql database holding two tables. The first only has a primarykey with auto-increment and nothing else. The second has a primarykey but without auto-increment and also fields for the content I store in solr. Now, before I add something to the solr core, I add an entry to the first mysql database. After the insertion, I get the primarykey for the action. I check, whether it is above my limit of documents. If so, I empty the first mysql table and reset the auto-increment to zero. I than insert a mysql entry to the second table using the primarykey taken from the first table (if the primarykey exists, I do not add an entry but update the existing one). And finally I have a solr core which holds my searchable data and has a uniquekey field. Into this core I add a new document by using the primarykey from the first mysql table for the uniquekey field. The solution has two main benefits for me: - I can precisely control the number of documents in my solr core. - I do now also have a backup of my data in mysql Thank you very much for your help! -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-by-Query-with-limited-number-of-rows-tp3503094p3506380.html Sent from the Solr - User mailing list archive at Nabble.com.
Counting in facet results
Hi, By counting in facet results I mean resolve the problem: I have 7 documents: A1 B1 C1 A2 B1 C1 A3 B2 C1 A4 B2 C2 A5 B3 C2 A6 B3 C2 A7 B3 C2 If I make the facet query by field B, get the result: B1=2, B2=2, B3=3. A1 B1 C1 A2 B1 C1 2 - facing by B --=== A3 B2 C1 A4 B2 C2 2 - facing by B --=== A5 B3 C2 A6 B3 C2 A7 B3 C2 3 - facing by B I wont to get additional information, something like count in results, by field C. So, how can I query to get a result similar to the following: A1 B1 C1 A2 B1 C1 2, 1 - facing by B, count C in facet results --= A3 B2 C1 A4 B2 C2 2, 2 - facing by B, count C in facet results --= A5 B3 C2 A6 B3 C2 A7 B3 C2 2, 1 - facing by B, count C in facet results Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Counting-in-facet-results-tp3506382p3506382.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: delta-import of rich documents like word and pdf files!
Thanks for your reply...my data-config.xml is dataConfig dataSource type=BinFileDataSource name=bin/ document entity name=f pk=id processor=FileListEntityProcessor recursive=true rootEntity=false dataSource=null baseDir=/var/data/solr fileName=.*\.(DOC)|(PDF)|(XML)|(xml)|(JPEG)|(jpg)|(ZIP)|(zip)|(pdf)|(doc) onError=skip entity name=tika-test processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=text dataSource=bin onError=skip field column=Author name=author meta=true/ field column=title name=title meta=true/ field column=text name=text/ field column=id name=id/ /entity field column=file name=fileName/ field column=fileAbsolutePath name=links/ /entity /document /dataConfig -- View this message in context: http://lucene.472066.n3.nabble.com/delta-import-of-rich-documents-like-word-and-pdf-files-tp3502039p3506404.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Counting in facet results
Hi, i think what you are looking for is *nested facets* or * HierarchicalFaceting http://wiki.apache.org/solr/HierarchicalFaceting* * * Category A - Subcategory A1 Category A - Subcategory A1 Category B - Subcategory A1 Category B - Subcategory B2 Category A - Subcategory A2 Faceting by Category: A: 3 B: 2 In addition, pivoting this query: Cat: A=3 SubCat: A1=2 and A2=1 Cat: B=2 SubCat: A1=1 and B2=1 This makes sense? On Mon, Nov 14, 2011 at 11:02 AM, LT.thomas t.latu...@itspree.pl wrote: Hi, By counting in facet results I mean resolve the problem: I have 7 documents: A1 B1 C1 A2 B1 C1 A3 B2 C1 A4 B2 C2 A5 B3 C2 A6 B3 C2 A7 B3 C2 If I make the facet query by field B, get the result: B1=2, B2=2, B3=3. A1 B1 C1 A2 B1 C1 2 - facing by B --=== A3 B2 C1 A4 B2 C2 2 - facing by B --=== A5 B3 C2 A6 B3 C2 A7 B3 C2 3 - facing by B I wont to get additional information, something like count in results, by field C. So, how can I query to get a result similar to the following: A1 B1 C1 A2 B1 C1 2, 1 - facing by B, count C in facet results --= A3 B2 C1 A4 B2 C2 2, 2 - facing by B, count C in facet results --= A5 B3 C2 A6 B3 C2 A7 B3 C2 2, 1 - facing by B, count C in facet results Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Counting-in-facet-results-tp3506382p3506382.html Sent from the Solr - User mailing list archive at Nabble.com. -- Un saludo, Samuel García.
Re: delta-import of rich documents like word and pdf files!
Thanks for your reply...my data-config.xml is dataConfig dataSource type=BinFileDataSource name=bin/ document entity name=f pk=id processor=FileListEntityProcessor recursive=true rootEntity=false dataSource=null baseDir=/var/data/solr fileName=.*\.(DOC)|(PDF)|(XML)|(xml)|(JPEG)|(jpg)|(ZIP)|(zip)|(pdf)|(doc) onError=skip entity name=tika-test processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=text dataSource=bin onError=skip field column=Author name=author meta=true/ field column=title name=title meta=true/ field column=text name=text/ field column=id name=id/ /entity field column=file name=fileName/ field column=fileAbsolutePath name=links/ /entity /document /dataConfig According to wiki : the only EntityProcessor which supports delta is SqlEntityProcessor. May be you can use newerThan parameter of FileListEntityProcessor. Issuing a full-import with clean=false may mimic delta import. You can pass value of this newerThan parameter in your request. command=full-importclean=falsemyLastModifiedParam=NOW-3DAYS http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
Re: TikaEntityProcessor not working?
Earlier issue has been resolved but stuck up on something else. Can you tell me which poi jar version would work with tika.0.6. Currently I have poi-3.7.jar. Error which i am getting is this SEVERE: Exception while processing: js_logins document : SolrInputDocument[{id=id(1.0)={100984}, complete_mobile_number=complete_mobile_number(1.0)={+91 9600067575}, emailid=emailid(1.0)={vkry...@gmail.com}, full_name=full_name(1.0)={Venkat Ryali}}]:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodError: org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) Caused by: java.lang.NoSuchMethodError: org.apache.poi.xwpf.usermodel.XWPFParagraph.init(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:163) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.init(XWPFWordExtractorDecorator.java:161) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTableContent(XWPFWordExtractorDecorator.java:140) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:91) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:69) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:51) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596) ... 7 more -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3506596.html Sent from the Solr - User mailing list archive at Nabble.com.
TREC-style IR experiments
Hi, I'm planning to do some information retrieval experiments with Solr. I'd like to compare different IR methods. I have a test collection with topics and judgements available. I'm considering using Solr (and not Lemur/Indri etc.) for the tests, because Solr supports several nice methods out-of-the-box, e.g. n-grams. Finally, I plan to evaluate the different methods and their results with trec_eval or similar program. What I need is a program, which puts Solr results in a suitable format for trec_eval. I think I can get the Solr search results in that format quite easily by using the solr-php-client library. Have any of you run TREC-style IR experiments with Solr and what are your experiences with that? Do you have any suggestion for that kind of tests with Solr? Kind regards, Ismo
Re: Using solr during optimization
Hi Mark, In the above case , what if the index is optimized partly ie. by specifying the max no of segments we want. It has been observed that after optimizing(even partly optimization), the indexing as well as searching had been faster than in case of an unoptimized one. Decreasing the merge factor will affect the performance as it will increase the indexing time due to the frequent merges. So is it good that we optimize partly(let say once in a month), rather than decreasing the merge factor and affect the indexing speed.Also since we will be sharding, that 100 GB index will be divided in different shards. Thanks, Isan Fulia. On 14 November 2011 11:28, Kalika Mishra kalika.mis...@germinait.comwrote: Hi Mark, Thanks for your reply. What you saying is interesting; so are you suggesting that optimizations should be done usually when there not many updates. Also can you please point out further under what conditions optimizations might be beneficial. Thanks. On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote: I would not optimize - it's very expensive. With 11,000 updates a day, I think it makes sense to completely avoid optimizing. That should be your default move in any case. If you notice performance suffers more than is acceptable (good chance you won't), then I'd use a lower merge factor. It defaults to 10 - lower numbers will lower the number of segments in your index, and essentially amortize the cost of an optimize. Optimize is generally only useful when you will have a mostly static index. - Mark Miller lucidimagination.com On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote: Hi Mark, We are performing almost 11,000 updates a day, we have around 50 million docs in the index (i understand we will need to shard) the core seg will get fragmented over a period of time. We will need to do optimize every few days or once in a month; do you have any reason not to optimize the core. Please let me know. Thanks. On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote: Do a you have something forcing you to optimize, or are you just doing it for the heck of it? On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote: Hi, I would like to optimize solr core which is in Reader Writer mode. Since the Solr cores are huge in size (above 100 GB) the optimization takes hours to complete. When the optimization is going on say. on the Writer core, the application wants to continue using the indexes for both query and write purposes. What is the best approach to do this. I was thinking of using a temporary index (empty core) to write the documents and use the same Reader to read the documents. (Please note that temp index and the Reader cannot be made Reader Writer as Reader is already setup for the Writer on which optimization is taking place) But there could be some updates to the temp index which I would like to get reflected in the Reader. Whats the best setup to support this. Thanks, Kalika - Mark Miller lucidimagination.com -- Thanks Regards, Kalika -- Thanks Regards, Kalika -- Thanks Regards, Isan Fulia.
Re: Solr 3.3 Sorting is not working for long fields
There is no error as such. When I do a basic sort on *long *field. the sort doesn't happen. Query is : -http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469# lst name=*responseHeader* int name=*status*0/int int name=*QTime*3/int -http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469# lst name=*params* str name=*fl*studyid/str str name=*sort*studyid asc/str str name=*indent*on/str str name=*start*0/str str name=*q**:*/str str name=*rows*100/str str name=*version*2.2/str /lst /lst response - result name=response numFound=216 start=0 - doc long name=studyid53/long /doc - doc long name=studyid18/int /doc - doc long name=studyid14/long /doc - doc int name=studyid11/long /doc - doc long name=studyid7/long /doc - doc int name=studyid63/int /doc - doc int name=studyid35/long /doc - doc int name=studyid70/long /doc - doc long name=studyid91/long /doc - doc int name=studyid97/int /doc /result /response The same case works with Solr1.4.1 but it is not working solr 3.3 Regards, Rajani On Mon, Nov 14, 2011 at 2:23 PM, Michael Kuhlmann k...@solarier.de wrote: Am 14.11.2011 09:33, schrieb rajini maski: query : http://localhost:8091/Group/**select?/indent=onq=studyid:** 120sort=studyidasc,groupidhttp://localhost:8091/Group/select?/indent=onq=studyid:120sort=studyidasc,groupid asc,subjectid ascstart=0rows=10 Is it a copy-and-paste error, or did you realls sort on studyidasc? I don't think you have a field studyidasc, and Solr should've given an exception that either asc or desc is missing. -Kuli
Re: Counting in facet results
I use Solandra that integrates Solr 3.4 with Cassandra. So, is there any way to solve this problem with Solr 3.4 (without pivots)? Your results are: Cat: A=3 SubCat: A1=2 and A2=1 Cat: B=2 SubCat: A1=1 and B2=1 but I would like to have: Cat: A=3 SubCat: 2 (losing information about the numbers within A1 and A2, only distinct count of subcategories) Cat: B=2 SubCat: 2 (losing information about the numbers within A1 and B2, only distinct count of subcategories) -- View this message in context: http://lucene.472066.n3.nabble.com/Counting-in-facet-results-tp3506382p3506848.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TREC-style IR experiments
I'm planning to do some information retrieval experiments with Solr. I'd like to compare different IR methods. I have a test collection with topics and judgements available. I'm considering using Solr (and not Lemur/Indri etc.) for the tests, because Solr supports several nice methods out-of-the-box, e.g. n-grams. Finally, I plan to evaluate the different methods and their results with trec_eval or similar program. What I need is a program, which puts Solr results in a suitable format for trec_eval. I think I can get the Solr search results in that format quite easily by using the solr-php-client library. Have any of you run TREC-style IR experiments with Solr and what are your experiences with that? Do you have any suggestion for that kind of tests with Solr? There some existing implementations in Lucene http://lucene.apache.org/java/3_0_2/api/contrib-benchmark/org/apache/lucene/benchmark/quality/trec/package-summary.html
Casesensitive search problem
Hi, Whenever I am searching with the words OfficeJet or officejet or Officejet or oFiiIcejET. I am getting the different results for each search respectively. I am not able to understand why this is happening? I want to solve this problem such a way that search will become case insensitive and I will get same result for any combination of capital and small letters. -- Jayanta Sahoo
Re: Solr 3.3 Sorting is not working for long fields
When I do a basic sort on *long *field. the sort doesn't happen. Query is : -http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469# lst name=*responseHeader* int name=*status*0/int int name=*QTime*3/int -http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469# lst name=*params* str name=*fl*studyid/str str name=*sort*studyid asc/str str name=*indent*on/str str name=*start*0/str str name=*q**:*/str str name=*rows*100/str str name=*version*2.2/str /lst /lst response - result name=response numFound=216 start=0 - doc long name=studyid53/long /doc - doc long name=studyid18/int /doc - doc long name=studyid14/long /doc - doc int name=studyid11/long /doc - doc long name=studyid7/long /doc - doc int name=studyid63/int /doc - doc int name=studyid35/long /doc - doc int name=studyid70/long /doc - doc long name=studyid91/long /doc - doc int name=studyid97/int /doc /result /response The same case works with Solr1.4.1 but it is not working solr 3.3 Can you try with the following type? fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ And studyid must be marked as indexed=true.
Re: Casesensitive search problem
Check this : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseFilterFactory On Mon, Nov 14, 2011 at 3:24 PM, jayanta sahoo jsahoo1...@gmail.com wrote: Hi, Whenever I am searching with the words OfficeJet or officejet or Officejet or oFiiIcejET. I am getting the different results for each search respectively. I am not able to understand why this is happening? I want to solve this problem such a way that search will become case insensitive and I will get same result for any combination of capital and small letters. -- Jayanta Sahoo
Re: Using solr during optimization
On Nov 14, 2011, at 8:27 AM, Isan Fulia wrote: Hi Mark, In the above case , what if the index is optimized partly ie. by specifying the max no of segments we want. It has been observed that after optimizing(even partly optimization), the indexing as well as searching had been faster than in case of an unoptimized one. Yes, this remains true - searching against fewer segments is faster than searching against many segments. Unless you have a really high merge factor, this is just generally not a big deal IMO. It tends to be something like, a given query is say 10-30% slower. If you have good performance though, this should often be something like a 50ms query goes to 80 or 90ms. You really have to decide/test if there is a practical difference to your users. You should also pay attention to how long that perf improvement lasts while you are continuously adding more documents. Is it a super high cost for a short perf boost? Decreasing the merge factor will affect the performance as it will increase the indexing time due to the frequent merges. True - it will essentially amortize the cost of reducing segments. Have you tested lower merge factors though? Does it really slow down indexing to the point where you find it unacceptable? I've been surprised in the past. Usually you can find a pretty nice balance. So is it good that we optimize partly(let say once in a month), rather than decreasing the merge factor and affect the indexing speed.Also since we will be sharding, that 100 GB index will be divided in different shards. Partial optimize is a good option, and optimize is an option. They both exist for a reason ;) Many people pay the price because they assume they have to though, when they really have no practical need. Generally, the best way to manage the number of segments in your index is through the merge policy IMO - not necessarily optimize calls. I'm pretty sure optimize also blocks adds in previous version of Solr as well - it grabs the commit lock. It won't do that in Solr 4, but that is another reason I wouldn't recommend it under normal circumstances. I look at optimize as a last option, or when creating a static index personally. Thanks, Isan Fulia. On 14 November 2011 11:28, Kalika Mishra kalika.mis...@germinait.comwrote: Hi Mark, Thanks for your reply. What you saying is interesting; so are you suggesting that optimizations should be done usually when there not many updates. Also can you please point out further under what conditions optimizations might be beneficial. Thanks. On 11 November 2011 20:30, Mark Miller markrmil...@gmail.com wrote: I would not optimize - it's very expensive. With 11,000 updates a day, I think it makes sense to completely avoid optimizing. That should be your default move in any case. If you notice performance suffers more than is acceptable (good chance you won't), then I'd use a lower merge factor. It defaults to 10 - lower numbers will lower the number of segments in your index, and essentially amortize the cost of an optimize. Optimize is generally only useful when you will have a mostly static index. - Mark Miller lucidimagination.com On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote: Hi Mark, We are performing almost 11,000 updates a day, we have around 50 million docs in the index (i understand we will need to shard) the core seg will get fragmented over a period of time. We will need to do optimize every few days or once in a month; do you have any reason not to optimize the core. Please let me know. Thanks. On 11 November 2011 18:51, Mark Miller markrmil...@gmail.com wrote: Do a you have something forcing you to optimize, or are you just doing it for the heck of it? On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote: Hi, I would like to optimize solr core which is in Reader Writer mode. Since the Solr cores are huge in size (above 100 GB) the optimization takes hours to complete. When the optimization is going on say. on the Writer core, the application wants to continue using the indexes for both query and write purposes. What is the best approach to do this. I was thinking of using a temporary index (empty core) to write the documents and use the same Reader to read the documents. (Please note that temp index and the Reader cannot be made Reader Writer as Reader is already setup for the Writer on which optimization is taking place) But there could be some updates to the temp index which I would like to get reflected in the Reader. Whats the best setup to support this. Thanks, Kalika - Mark Miller lucidimagination.com -- Thanks Regards, Kalika -- Thanks Regards, Kalika -- Thanks Regards, Isan Fulia. - Mark Miller lucidimagination.com
Re: Solr 3.3 Sorting is not working for long fields
I On Mon, Nov 14, 2011 at 7:23 PM, Ahmet Arslan iori...@yahoo.com wrote: When I do a basic sort on *long *field. the sort doesn't happen. Query is : - http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469# lst name=*responseHeader* int name=*status*0/int int name=*QTime*3/int - http://blr-ws-195:8091/Solr3.3/select/?q=2%3A104+AND+526%3A27747version=2.2start=0rows=10indent=onsort=469%20ascfl=469# lst name=*params* str name=*fl*studyid/str str name=*sort*studyid asc/str str name=*indent*on/str str name=*start*0/str str name=*q**:*/str str name=*rows*100/str str name=*version*2.2/str /lst /lst response - result name=response numFound=216 start=0 - doc long name=studyid53/long /doc - doc long name=studyid18/int /doc - doc long name=studyid14/long /doc - doc int name=studyid11/long /doc - doc long name=studyid7/long /doc - doc int name=studyid63/int /doc - doc int name=studyid35/long /doc - doc int name=studyid70/long /doc - doc long name=studyid91/long /doc - doc int name=studyid97/int /doc /result /response The same case works with Solr1.4.1 but it is not working solr 3.3 Can you try with the following type? fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ And studyid must be marked as indexed=true. I tried this one. fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ It didn't work :( Sort didn't happen
Re: Solr 3.3 Sorting is not working for long fields
I tried this one. fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ It didn't work :( Sort didn't happen Did you restart tomcat and perform re-index?
XSLT caching mechanism
Hello All, i am using xslt to transform solr xml response, when made search;getting below warning WARNING [org.apache.solr.util.xslt.TransformerProvider] The TransformerProvider's simplistic XSLT caching mechanism is not appropriate for high load scenarios, unless a single XSLT transform is used and xsltCacheLifetimeSeconds is set to a sufficiently high value. how can i apply effective xslt caching for solr ? Thanks, Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/XSLT-caching-mechanism-tp3506979p3506979.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: XSLT caching mechanism
Set the cache lifetime high, like it says. Questions - why use the XSLT response writer? What are you transforming the response into and digesting it with? Erik On Nov 14, 2011, at 09:31 , vrpar...@gmail.com wrote: Hello All, i am using xslt to transform solr xml response, when made search;getting below warning WARNING [org.apache.solr.util.xslt.TransformerProvider] The TransformerProvider's simplistic XSLT caching mechanism is not appropriate for high load scenarios, unless a single XSLT transform is used and xsltCacheLifetimeSeconds is set to a sufficiently high value. how can i apply effective xslt caching for solr ? Thanks, Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/XSLT-caching-mechanism-tp3506979p3506979.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: delta-import of rich documents like word and pdf files!
And you cannot update-in-place. That is, you can't update just selected fields in a document, you have to re-index the whole document. Best Erick On Mon, Nov 14, 2011 at 6:11 AM, Ahmet Arslan iori...@yahoo.com wrote: Thanks for your reply...my data-config.xml is dataConfig dataSource type=BinFileDataSource name=bin/ document entity name=f pk=id processor=FileListEntityProcessor recursive=true rootEntity=false dataSource=null baseDir=/var/data/solr fileName=.*\.(DOC)|(PDF)|(XML)|(xml)|(JPEG)|(jpg)|(ZIP)|(zip)|(pdf)|(doc) onError=skip entity name=tika-test processor=TikaEntityProcessor url=${f.fileAbsolutePath} format=text dataSource=bin onError=skip field column=Author name=author meta=true/ field column=title name=title meta=true/ field column=text name=text/ field column=id name=id/ /entity field column=file name=fileName/ field column=fileAbsolutePath name=links/ /entity /document /dataConfig According to wiki : the only EntityProcessor which supports delta is SqlEntityProcessor. May be you can use newerThan parameter of FileListEntityProcessor. Issuing a full-import with clean=false may mimic delta import. You can pass value of this newerThan parameter in your request. command=full-importclean=falsemyLastModifiedParam=NOW-3DAYS http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
Re: Solr 3.3 Sorting is not working for long fields
Yes . On 11/14/11, Ahmet Arslan iori...@yahoo.com wrote: I tried this one. fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ It didn't work :( Sort didn't happen Did you restart tomcat and perform re-index?
Re: Solr 3.3 Sorting is not working for long fields
Yes . Did you restart tomcat and perform re-index? Okey, one thing left. Http caching may cause stale response. Delete your browsers cache if you are using a browser to query solr.
Re: XSLT caching mechanism
In solrconfig.xml, change the xsltCacheLifetimeSeconds property of the XSLTResponseWriter to the desired value (this example 6000secs): queryResponseWriter name=xslt class=solr.XSLTResponseWriter int name=xsltCacheLifetimeSeconds6000/int /queryResponseWriter On Mon, 2011-11-14 at 15:31 +0100, vrpar...@gmail.com wrote: Hello All, i am using xslt to transform solr xml response, when made search;getting below warning WARNING [org.apache.solr.util.xslt.TransformerProvider] The TransformerProvider's simplistic XSLT caching mechanism is not appropriate for high load scenarios, unless a single XSLT transform is used and xsltCacheLifetimeSeconds is set to a sufficiently high value. how can i apply effective xslt caching for solr ? Thanks, Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/XSLT-caching-mechanism-tp3506979p3506979.html Sent from the Solr - User mailing list archive at Nabble.com.
Easy way to tell if there are pending documents
Hi Solr, Does anyone know of an easy way to tell if there are pending documents waiting for commit? Our application performs operations that are never safe to perform while commits are pending. We make this work by making sure that all indexing operations end in a commit, and stop the unsafe operations from running while a commit is running. This works great most of the time, except when we have enough disk space to add documents to the pending area, but not enough disk space to do a commit - then the indexing operations only error out after they've done all of their adds. It would be nice if the unsafe operation could somehow detect that there are pending documents and abort. In the interim I'll have the unsafe operation perform a commit when it starts, but I've been weeding out useless commits from my app recently and I don't like them creeping back in. Thanks, Antoine
get a total count
Hello everyone, A newbie question: how do I find out how documents have been indexed across all shards? Thanks much!
memory usage keep increase
Hi all, I saw one issue is ram usage keep increase when we run query. After look in the code, looks like Lucene use MMapDirectory to map index file to ram. According to http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/store/MMapDirectory.html comments, it will use lot of memory. NOTE: memory mapping uses up a portion of the virtual memory address space in your process equal to the size of the file being mapped. Before using this class, be sure your have plenty of virtual address space, e.g. by using a 64 bit JRE, or a 32 bit JRE with indexes that are guaranteed to fit within the address space. So, my understanding is solr request physical RAM = index file size, is it right? Yongtao **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. *
Re: TREC-style IR experiments
I'm planning to do some information retrieval experiments with Solr. There some existing implementations in Lucene http://lucene.apache.org/java/3_0_2/api/contrib-benchmark/org/apache/lucene/benchmark/quality/trec/package-summary.html Have you used that with Solr? How? //Ismo
Help! - ContentStreamUpdateRequest
Could someone take a look at this page: http://wiki.apache.org/solr/ContentStreamUpdateRequestExample ... and tell me what code changes I would need to make to be able to stream a LOT of files at once rather than just one? It has to be something simple like a collection of some sort but I just can't get it figured out. Maybe I'm using the wrong class altogether? TIA
Re: Question about solr caches and warming
: Although I don't have statistics to back my claim, I suspect that the really : nasty filters don't have as high a hitcount as the ones that are more simple. : Typically the really nasty filters are used when an employee logs into the : site. Employees have access to a lot more than customers do, but the search : still needs to be filtered to be appropriate for whatever search options are : active. A low impact change to consider would be to leverage the cache=false local param feature that was added in Solr 3.4... https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters ...you could add this localparam anytime you know the query is coming from an employee -- or anytime you know the filter query is esoteric A higher impact change would be to create a dedicated query slave machine (or just an alternate core name that polls the same master) that is *only* used by employees and has much lower sizes on the caches -- this is the approach i have advocated and seen work very well since the pre-apache days of Solr: dedicated instances for each major user base with key settings (ie: replication frequencies, cache sizes, cache warming, static warming of sorts, etc...) tuned for that user base. -Hoss
Getting 411 Length required when adding docs
Hello All, i am this strange issue of http 411 Length required error. My Solr is hosted on third party hosting company and it was working fine all these while. i really don't understand why this happened. Attached is the stack trace any help will be appreciated org.apache.solr.common.SolrException: Length Required Length Required request: http://www.listing-social.com/solr/update?wt=javabinversion=1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64) at org.apache.solr.client.solrj.SolrServer.addBean(SolrServer.java:68) at com.listings.solr.service.impl.BulkIndexingServiceImpl.startBulkIndexing(BulkIndexingServiceImpl.java:55) at com.listings.action.BulkIndexingAction.execute(BulkIndexingAction.java:42) at org.apache.struts.chain.commands.servlet.ExecuteAction.execute(ExecuteAction.java:53) at org.apache.struts.chain.commands.AbstractExecuteAction.execute(AbstractExecuteAction.java:64) at org.apache.struts.chain.commands.ActionCommandBase.execute(ActionCommandBase.java:48) at org.apache.commons.chain.impl.ChainBase.execute(ChainBase.java:190) at org.apache.commons.chain.generic.LookupCommand.execute(LookupCommand.java:304) at org.apache.commons.chain.impl.ChainBase.execute(ChainBase.java:190) at org.apache.struts.chain.ComposableRequestProcessor.process(ComposableRequestProcessor.java:280) at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1858) at org.apache.struts.action.ActionServlet.doGet(ActionServlet.java:446) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-411-Length-required-when-adding-docs-tp3508372p3508372.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Keyword counts
: Thanks for the reply. There are many keyword terms (1000?) and not sure if : Solr would choke on a query string that long. Perhaps solr is not built to Did you try it? 1000 facet.query params is not a strain for Solr -- but you may find problems with your servlet container if you try specifying them all in a GET request. if this list isn't going to change very often it sounds like a perfect use case for specifying as appends request params on the request handler declaration in your solrconfig.xml see the comments in solrconfig.xml for examples. -Hoss
Index format difference between 4.0 and 3.4
Hi All, We are using Solr 1.4.1 in production and are considering an upgrade to newer version. It seems that Solr 3.x requires a complete rebuild of index as the format seems to have changed. Is Solr 4.0 index file format compatible with Solr 3.x format? Please advise. Thanks Saroj
File based wordlists for spellchecker
Hi, I have a very large index, and I'm trying to add a spell checker for it. I don't want to copy all text in index to extra spell field, since that would be prohibitively big, and index is already close to how big it can reasonably be, so I just want to extract word frequencies as I index for offline processing. After some filtering I get something like this (word, frequency): a 122958495 aa 834203 aaa 175206 22389 aaab1522 aaai1050 aaas6384 aab 8109 aabb1906 aac 35100 aacc1692 aachen 11723 I wanted to use FileBasedSpellChecker, but it doesn't support frequencies, so its recommendations are consistently horrible. Increasing frequency cutoff won't really help that much - it will still suggest less frequent words over equally similar more frequent words. What's the easiest way to get this working? Presumably I'd need to create a separate index with just these words. How do I get frequencies there, without actually creating 11723 records with aachen in them etc.? I can do some small Java coding if need be. I'm already using 3.x branch (mostly for edismax, plus some unrelated minor patches). Thanks, Tomasz
Re: Casesensitive search problem
HI, Even if i have used all the posibility way like filter class=solr.LowerCaseFilterFactory/ still i am getting same problrm.If anyone faced before same problem please let me know how you have solved. -- View this message in context: http://lucene.472066.n3.nabble.com/Casesensitive-search-problem-tp3506883p3508765.html Sent from the Solr - User mailing list archive at Nabble.com.