RE: Starts with Query

2012-06-15 Thread Afroz Ahmad
If you are not searching for the specific digit and want to match all
documents that start with any digit, you could as part of the indexing
process, have another field say startsWithDigit and set it to true if
it the title begins with a digit. All you need to do at query time then
is query for startsWithDigit =true.
Thanks
Afroz


From: nutchsolruser
Sent: 6/14/2012 11:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Starts with Query
Thanks Jack for valuable response,Actually i am trying to match *any* numeric
pattern at the start of each document.  I dont know documents in index i
just want documents title starting with any digit.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Starts-with-Query-tp3989627p3989761.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regarding number of documents

2012-06-13 Thread Afroz Ahmad
Could it be that you are getting records that are not unique. If so then
SOLR would just overwrite the non unique documents.

Thanks
Afroz

On Wed, Jun 13, 2012 at 4:50 PM, Swetha Shenoy sshe...@gmail.com wrote:

 Note: I don't see any errors in the logs when I run the index.

 On Wed, Jun 13, 2012 at 5:48 PM, Swetha Shenoy sshe...@gmail.com wrote:

  Hi,
 
  I have a data config file that contains the data import query. If I just
  run the import query against MySQL, I get a certain number of results. I
  assume that if I run the full-import, I should get the same number of
  documents added to the index, but I see that it's not the case and the
  number of documents added to the index are less than what I see from the
  MySQL query result. Can any one tell me if my assumption is correct and
 why
  the number of documents would be off?
 
  Thanks,
  Swetha
 



Re: edismax and untokenized field

2012-06-12 Thread Afroz Ahmad
In the example above your schema is applying the tokenizers and filter only
during index time. For your query terms to also pass through the same
pipeline you need to modify the field type and add a  analyzer
type=query section. I believe this should fix your problem.
Thanks
Afroz
:
fieldType name=text_full_match class=solr.TextField
positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
filter class=solr.SynonymFilterFactory
synonyms=names-synonyms.txt ignoreCase=true expand=true/
  /analyzer
   analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
filter class=solr.SynonymFilterFactory
synonyms=names-synonyms.txt ignoreCase=true expand=true/
  /analyzer
/fieldType

On Mon, Jun 11, 2012 at 10:25 AM, Vijay Ramachandran vijay...@gmail.comwrote:

 Thank you for your reply. Sending this as a phrase query does change the
 results as expected.

 On Mon, Jun 11, 2012 at 4:39 PM, Tanguy Moal tanguy.m...@gmail.com
 wrote:

  I think you have to issue a phrase query in such a case because otherwise
  each token is searched independently in the merchant field : the query
  parser splits the query on spaces!
 
 
 So parsing of query is dependent in part on the query handling itself,
 independent of the field definition?


  Check the difference between debug outputs when you search for Jones New
  York, you'd get what you expected.
 

 Yes, that gives the expected result. So, I should make a separate query to
 the merchant field as a phrase?

 thanks!
 Vijay



Re: How to do custom sorting in Solr?

2012-06-11 Thread Afroz Ahmad
You may want to look at
http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html.
While it is not the same requirement, this should give you an idea of how
to do custom sorting.

Thanks
Afroz

On Sun, Jun 10, 2012 at 4:43 PM, roz dev rozde...@gmail.com wrote:

 Yes, these documents have lots of unique values as the same product could
 be assigned to lots of other categories and that too, in a different sort
 order.

 We did some evaluation of heap usage and found that with kind of queries we
 generate, heap usage was going up to 24-26 GB. I could trace it to the fact
 that
 fieldCache is creating an array of 2M size for each of the sort fields.

 Since same products are mapped to multiple categories, we incur significant
 memory overhead. Therefore, any solve where memory consumption can be
 reduced is a good one for me.

 In fact, we have situations where same product is mapped to more than 1
 sub-category in the same category like


 Books
  -- Programming
  - Java in a nutshell
  -- Sale (40% off)
  - Java in a nutshell


 So,another thought in my mind is to somehow use second pass collector to
 group books appropriately in Programming and Sale categories, with right
 sort order.

 But, i have no clue about that piece :(

 -Saroj


 On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  2M docs is actually pretty small. Sorting is sensitive to the number
  of _unique_ values in the sort fields, not necessarily the number of
  documents.
 
  And sorting only works on fields with a single value (i.e. it can't have
  more than one token after analysis). So for each field you're only
 talking
  2M values at the vary maximum, assuming that the field in question has
  a unique value per document, which I doubt very much given your
  problem description.
 
  So with a corpus that size, I'd just try it'.
 
  Best
  Erick
 
  On Sun, Jun 10, 2012 at 7:12 PM, roz dev rozde...@gmail.com wrote:
   Thanks Erik for your quick feedback
  
   When Products are assigned to a category or Sub-Category then they can
 be
   in any order and price type can be regular or markdown.
   So, reg and markdown products are intermingled  as per their assignment
  but
   I want to sort them in such a way that we
   ensure that all the products which are on markdown are at the bottom of
  the
   list.
  
   I can use these multiple sorts but I realize that they are costly in
  terms
   of heap used, as they are using FieldCache.
  
   I have an index with 2M docs and docs are pretty big. So, I don't want
 to
   use them unless there is no other option.
  
   I am wondering if I can define a custom function query which can be
 like
   this:
  
  
 - check if product is on the markdown
 - if yes then change its sort order field to be the max value in the
 given sub-category, say 99
 - else, use the sort order of the product in the sub-category
  
   I have been looking at existing function queries but do not have a good
   handle on how to make one of my own.
  
   - Another option could be use a custom sort comparator but I am not
 sure
   about the way it works
  
   Any thoughts?
  
  
   -Saroj
  
  
  
  
   On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
  
   Skimming this, I two options come to mind:
  
   1 Simply apply primary, secondary, etc sorts. Something like
 sort=subcategory asc,markdown_or_regular desc,sort_order asc
  
   2 You could also use grouping to arrange things in groups and sort
  within
those groups. This has the advantage of returning some members
of each of the top N groups in the result set, which makes it
  easier
   to
get some of each group rather than having to analyze the whole
   list
  
   But your example is somewhat contradictory. You say
   products which are on markdown, are at
   the bottom of the documents list
  
   But in your examples, products on markdown are intermingled
  
   Best
   Erick
  
   On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote:
Hi All
   
   
I have an index which contains a Catalog of Products and
 Categories,
   with
Solr 4.0 from trunk
   
Data is organized like this:
   
Category: Books
   
Sub Category: Programming
   
Products:
   
Product # 1,  Price: Regular Sort Order:1
Product # 2,  Price: Markdown, Sort Order:2
Product # 3   Price: Regular, Sort Order:3
Product # 4   Price: Regular, Sort Order:4

.
...
Product # 100   Price: Regular, Sort Order:100
   
Sub Category: Fiction
   
Products:
   
Product # 1,  Price: Markdown, Sort Order:1
Product # 2,  Price: Regular, Sort Order:2
Product # 3   Price: Regular, Sort Order:3
Product # 4   Price: Markdown, Sort Order:4

.
...
Product # 70   Price: Regular, Sort Order:70
   
   
I want to query Solr and sort these products within each 

Re: Problem with field collapsing of patched Solr 1.4

2011-03-23 Thread Afroz Ahmad
Have you enabled the collapse component in solconfig.xml?

searchComponent name=query
class=org.apache.solr.handler.component.CollapseComponent /

Thanks
afroz


On Fri, Mar 18, 2011 at 8:14 PM, Kai Schlamp-2
kai.schl...@googlemail.comwrote:

 Unfortunately I have to use Solr 1.4.x or 3.x as one of the interfaces to
 access Solr uses Sunspot (a Ruby Solr library), which doesn't seem to be
 compatible with 4.x.

 Kai


 Otis Gospodnetic-2 wrote:
 
  Kai, try SOLR-1086 with Solr trunk instead if trunk is OK for you.
 
  Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
 
  - Original Message 
  From: Kai Schlamp lt;kai.schl...@googlemail.comgt;
  To: solr-user@lucene.apache.org
  Sent: Sun, March 13, 2011 11:58:56 PM
  Subject: Problem with field collapsing of patched Solr 1.4
 
  Hello.
 
  I just tried to patch Solr 1.4 with the field collapsing patch  of
  https://issues.apache.org/jira/browse/SOLR-236. The patching and  build
  process seemed to be ok (below are the steps I did), but the  field
  collapsing feature doesn't seem to work.
  When I go to `http://localhost:8982/solr/select/?q=*:*` I correctly
  get 10 documents  as result.
  When going to
 `
 http://localhost:8982/solr/select/?q=*:*collapse=truecollapse.field=tag_name_sscollapse.max=1`
 
  (tag_name_ss  is surely a field with content) I get the same 10 docs as
  result back. No  further information regarding the field collapsing.
  What do I miss? Do I have  to activate it somehow?
 
  * Downloaded
 [Solr](
 http://apache.lauf-forum.at//lucene/solr/1.4.1/apache-solr-1.4.1.tgz)
  *  Downloaded
 [SOLR-236-1_4_1-paging-totals-working.patch](
 https://issues.apache.org/jira/secure/attachment/12459716/SOLR-236-1_4_1-paging-totals-working.patch
 )
 
  *  Changed line 2837 of that patch to `@@ -0,0 +1,511 @@` (regarding
  this
 [comment](
 https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12932905page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12932905
 ))
 
  *  Downloaded
 [SOLR-236-1_4_1-NPEfix.patch](
 https://issues.apache.org/jira/secure/attachment/12470202/SOLR-236-1_4_1-NPEfix.patch
 )
 
  *  Extracted the Solr archive
  * Applied both patches:
  ** `cd  apache-solr-1.4.1`
  ** `patch -p0   ../SOLR-236-1_4_1-paging-totals-working.patch`
  ** `patch -p0   ../SOLR-236-1_4_1-NPEfix.patch`
  * Build Solr
  ** `ant clean`
  ** `ant  example` ... tells me BUILD SUCCESSFUL
  * Reindexed everything (using  Sunspot Solr)
  * Solr info tells me correctly Solr Specification  Version:
  1.4.1.2011.03.14.04.29.20
 
  Kai
 
 


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Problem-with-field-collapsing-of-patched-Solr-1-4-tp2678850p2701061.html
 Sent from the Solr - User mailing list archive at Nabble.com.