RE: Starts with Query
If you are not searching for the specific digit and want to match all documents that start with any digit, you could as part of the indexing process, have another field say startsWithDigit and set it to true if it the title begins with a digit. All you need to do at query time then is query for startsWithDigit =true. Thanks Afroz From: nutchsolruser Sent: 6/14/2012 11:03 PM To: solr-user@lucene.apache.org Subject: Re: Starts with Query Thanks Jack for valuable response,Actually i am trying to match *any* numeric pattern at the start of each document. I dont know documents in index i just want documents title starting with any digit. -- View this message in context: http://lucene.472066.n3.nabble.com/Starts-with-Query-tp3989627p3989761.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regarding number of documents
Could it be that you are getting records that are not unique. If so then SOLR would just overwrite the non unique documents. Thanks Afroz On Wed, Jun 13, 2012 at 4:50 PM, Swetha Shenoy sshe...@gmail.com wrote: Note: I don't see any errors in the logs when I run the index. On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy sshe...@gmail.com wrote: Hi, I have a data config file that contains the data import query. If I just run the import query against MySQL, I get a certain number of results. I assume that if I run the full-import, I should get the same number of documents added to the index, but I see that it's not the case and the number of documents added to the index are less than what I see from the MySQL query result. Can any one tell me if my assumption is correct and why the number of documents would be off? Thanks, Swetha
Re: edismax and untokenized field
In the example above your schema is applying the tokenizers and filter only during index time. For your query terms to also pass through the same pipeline you need to modify the field type and add a analyzer type=query section. I believe this should fix your problem. Thanks Afroz : fieldType name=text_full_match class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=names-synonyms.txt ignoreCase=true expand=true/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=names-synonyms.txt ignoreCase=true expand=true/ /analyzer /fieldType On Mon, Jun 11, 2012 at 10:25 AM, Vijay Ramachandran vijay...@gmail.comwrote: Thank you for your reply. Sending this as a phrase query does change the results as expected. On Mon, Jun 11, 2012 at 4:39 PM, Tanguy Moal tanguy.m...@gmail.com wrote: I think you have to issue a phrase query in such a case because otherwise each token is searched independently in the merchant field : the query parser splits the query on spaces! So parsing of query is dependent in part on the query handling itself, independent of the field definition? Check the difference between debug outputs when you search for Jones New York, you'd get what you expected. Yes, that gives the expected result. So, I should make a separate query to the merchant field as a phrase? thanks! Vijay
Re: How to do custom sorting in Solr?
You may want to look at http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html. While it is not the same requirement, this should give you an idea of how to do custom sorting. Thanks Afroz On Sun, Jun 10, 2012 at 4:43 PM, roz dev rozde...@gmail.com wrote: Yes, these documents have lots of unique values as the same product could be assigned to lots of other categories and that too, in a different sort order. We did some evaluation of heap usage and found that with kind of queries we generate, heap usage was going up to 24-26 GB. I could trace it to the fact that fieldCache is creating an array of 2M size for each of the sort fields. Since same products are mapped to multiple categories, we incur significant memory overhead. Therefore, any solve where memory consumption can be reduced is a good one for me. In fact, we have situations where same product is mapped to more than 1 sub-category in the same category like Books -- Programming - Java in a nutshell -- Sale (40% off) - Java in a nutshell So,another thought in my mind is to somehow use second pass collector to group books appropriately in Programming and Sale categories, with right sort order. But, i have no clue about that piece :( -Saroj On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson erickerick...@gmail.com wrote: 2M docs is actually pretty small. Sorting is sensitive to the number of _unique_ values in the sort fields, not necessarily the number of documents. And sorting only works on fields with a single value (i.e. it can't have more than one token after analysis). So for each field you're only talking 2M values at the vary maximum, assuming that the field in question has a unique value per document, which I doubt very much given your problem description. So with a corpus that size, I'd just try it'. Best Erick On Sun, Jun 10, 2012 at 7:12 PM, roz dev rozde...@gmail.com wrote: Thanks Erik for your quick feedback When Products are assigned to a category or Sub-Category then they can be in any order and price type can be regular or markdown. So, reg and markdown products are intermingled as per their assignment but I want to sort them in such a way that we ensure that all the products which are on markdown are at the bottom of the list. I can use these multiple sorts but I realize that they are costly in terms of heap used, as they are using FieldCache. I have an index with 2M docs and docs are pretty big. So, I don't want to use them unless there is no other option. I am wondering if I can define a custom function query which can be like this: - check if product is on the markdown - if yes then change its sort order field to be the max value in the given sub-category, say 99 - else, use the sort order of the product in the sub-category I have been looking at existing function queries but do not have a good handle on how to make one of my own. - Another option could be use a custom sort comparator but I am not sure about the way it works Any thoughts? -Saroj On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.com wrote: Skimming this, I two options come to mind: 1 Simply apply primary, secondary, etc sorts. Something like sort=subcategory asc,markdown_or_regular desc,sort_order asc 2 You could also use grouping to arrange things in groups and sort within those groups. This has the advantage of returning some members of each of the top N groups in the result set, which makes it easier to get some of each group rather than having to analyze the whole list But your example is somewhat contradictory. You say products which are on markdown, are at the bottom of the documents list But in your examples, products on markdown are intermingled Best Erick On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: Hi All I have an index which contains a Catalog of Products and Categories, with Solr 4.0 from trunk Data is organized like this: Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:1 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:4 . ... Product # 70 Price: Regular, Sort Order:70 I want to query Solr and sort these products within each
Re: Problem with field collapsing of patched Solr 1.4
Have you enabled the collapse component in solconfig.xml? searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent / Thanks afroz On Fri, Mar 18, 2011 at 8:14 PM, Kai Schlamp-2 kai.schl...@googlemail.comwrote: Unfortunately I have to use Solr 1.4.x or 3.x as one of the interfaces to access Solr uses Sunspot (a Ruby Solr library), which doesn't seem to be compatible with 4.x. Kai Otis Gospodnetic-2 wrote: Kai, try SOLR-1086 with Solr trunk instead if trunk is OK for you. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Kai Schlamp lt;kai.schl...@googlemail.comgt; To: solr-user@lucene.apache.org Sent: Sun, March 13, 2011 11:58:56 PM Subject: Problem with field collapsing of patched Solr 1.4 Hello. I just tried to patch Solr 1.4 with the field collapsing patch of https://issues.apache.org/jira/browse/SOLR-236. The patching and build process seemed to be ok (below are the steps I did), but the field collapsing feature doesn't seem to work. When I go to `http://localhost:8982/solr/select/?q=*:*` I correctly get 10 documents as result. When going to ` http://localhost:8982/solr/select/?q=*:*collapse=truecollapse.field=tag_name_sscollapse.max=1` (tag_name_ss is surely a field with content) I get the same 10 docs as result back. No further information regarding the field collapsing. What do I miss? Do I have to activate it somehow? * Downloaded [Solr]( http://apache.lauf-forum.at//lucene/solr/1.4.1/apache-solr-1.4.1.tgz) * Downloaded [SOLR-236-1_4_1-paging-totals-working.patch]( https://issues.apache.org/jira/secure/attachment/12459716/SOLR-236-1_4_1-paging-totals-working.patch ) * Changed line 2837 of that patch to `@@ -0,0 +1,511 @@` (regarding this [comment]( https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12932905page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12932905 )) * Downloaded [SOLR-236-1_4_1-NPEfix.patch]( https://issues.apache.org/jira/secure/attachment/12470202/SOLR-236-1_4_1-NPEfix.patch ) * Extracted the Solr archive * Applied both patches: ** `cd apache-solr-1.4.1` ** `patch -p0 ../SOLR-236-1_4_1-paging-totals-working.patch` ** `patch -p0 ../SOLR-236-1_4_1-NPEfix.patch` * Build Solr ** `ant clean` ** `ant example` ... tells me BUILD SUCCESSFUL * Reindexed everything (using Sunspot Solr) * Solr info tells me correctly Solr Specification Version: 1.4.1.2011.03.14.04.29.20 Kai -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-field-collapsing-of-patched-Solr-1-4-tp2678850p2701061.html Sent from the Solr - User mailing list archive at Nabble.com.