PDF indexing
Good day, I'm checking if Solr would work for indexing PDFs. My requirements are: 1) I must know which page has what contents. 2) Left to right search support. Such as Hebrew. This has been the most trickiest to achieve. I also prefer to know the position of the searched contents on the page but could live without. Any info or ideas would be greatly appreciated. Thank you, Jon
How do I format this query with 2 search terms?
I'm using index time boosting and need to specify every field I want to search (not use copy fields) or else the boosting wont work. This query with 1 saerchterm works fine, boosts look good: http://localhost:8983/solr/select/? q=companyName:foo +descriptionTxt:verslun fl=*%20scorerows=10start=0 However if I have 2 words in the query and do it like this boosting seems not to be working http://localhost:8983/solr/select/? q=companyName:foo+bar +descriptionTxt:foo+bar fl=*%20scorerows=10start=0 Its probably using the default search field for the second word which has no boosting configured. How do I go about this? Thanks, Jon
Re: How do I format this query with 2 search terms?
Thanks a lot for that! I wanted to use dismax but hit a wall because I require trailing wildcards in some instances. Methods 1 and 3 do not work in my case. However upon further thinking I realized in the cases I required wildcard I'm only searching one field. So I'll just turn dismax on and off as required. Thanks again :) On Wed, Nov 17, 2010 at 12:40 PM, Ken Stanley doh...@gmail.com wrote: 2010/11/17 Jón Helgi Jónsson jonjons...@gmail.com: I'm using index time boosting and need to specify every field I want to search (not use copy fields) or else the boosting wont work. This query with 1 saerchterm works fine, boosts look good: http://localhost:8983/solr/select/? q=companyName:foo +descriptionTxt:verslun fl=*%20scorerows=10start=0 However if I have 2 words in the query and do it like this boosting seems not to be working http://localhost:8983/solr/select/? q=companyName:foo+bar +descriptionTxt:foo+bar fl=*%20scorerows=10start=0 Its probably using the default search field for the second word which has no boosting configured. How do I go about this? Thanks, Jon Jon, You have a few options here, depending on what you want to achieve with your query: 1. If you're trying to do a phrase query, you simply need to ensure that your phrases are quoted. The default behavior in SOLR is to split the phrase into multiple chunks. If a word is not preceded with a field definition, then SOLR will automatically apply the word(s) as if you had specified the default field. So for your example, SOLR would parse your query into companyName:foo defaultField:bar descriptionTxt:foo defaultField:bar. 2. You can use the dismax query plugin instead of the standard query plugin. You simply configure the dismax section of your solrconfig.xml to your liking - you define which fields to search, apply any special boosts for your needs, etc (http://wiki.apache.org/solr/DisMaxQParserPlugin) - and then you simply feed the query terms without naming your fields (i.e., q=foo+bar), along with telling SOLR to use dismax (i.e., qt=whatever_you_named_your_dismax_handler). 3. If phrase queries are not important to you, you can manually prefix each term in your query with the field you wish to search; for example, you would do companyName:foo companyName:bar descriptionTxt:foo descriptionTxt:bar. Whichever way you decide to go, the best thing that you can do to understand SOLR and how it's working in your environment is to append debugQuery=on to the end of your URL; this tells SOLR to output information about how it parsed your query, how long each component took to run, and some other useful debugging information. It's very useful, and has come in handy several times here where I'm at when I wanted to know why SOLR returned the results (or didn't return) that I expected. I hope this helps. - Ken
Index time boosting troubles
Hi, I had working index time boosting on documents like so: doc boost=10.0 Everything was great until I made some changes that I thought where no related to the doc boost but after that my doc boosting appears to be missing. I'm having a tough time debugging this and didn't have the sense to version control this so I would have something to revert to (lesson learned). In schema.xml I have fieldType name=float class=solr.FloatField omitNorms=false/ Is there something else I should be watching out for? Some query parameter perhaps? Or something else? I think wildcards in query affect it but I don't have any, some setting in solrconfig.xml or cheme.xml? Thanks! Jon
Re: How to use key with facet.prefix?
Thanks for that. So perhaps use copyfield in schema and make a subcat field identical to my category would be the best solution? On Sat, Aug 8, 2009 at 10:17 AM, Koji Sekiguchik...@r.email.ne.jp wrote: Jón Helgi Jónsson wrote: I'm trying to facet multiple times on same field using key. This works fine except when I use prefixes for these facets. What I got so far (and not functional): .. facet=true facet.field=categoryf.category.facet.prefix=01 facet.field={!key=subcat}categoryf.subcat.facet.prefix=00 This will give me 2 facets in results, one named 'category' and another 'subcat' like expected. But prefix for key 'subcat' is ignored and the other prefix is used for both facets. How do I use key with prefixes or am I barking up the wrong tree here? Thanks! I think '!key' can be used for just a label when displaying the facet result. As it doesn't change its field name, the parameter f.subcat.facet.prefix=00 is ignored. Koji
How to use key with facet.prefix?
I'm trying to facet multiple times on same field using key. This works fine except when I use prefixes for these facets. What I got so far (and not functional): .. facet=true facet.field=categoryf.category.facet.prefix=01 facet.field={!key=subcat}categoryf.subcat.facet.prefix=00 This will give me 2 facets in results, one named 'category' and another 'subcat' like expected. But prefix for key 'subcat' is ignored and the other prefix is used for both facets. How do I use key with prefixes or am I barking up the wrong tree here? Thanks!
Summing sub categories in faceting
Hi, would really appreciate some help on this. I'm doing a category browser for companies. Kind of like a yellow pages. For each company I store each category the company is in like this: Example for Boeing would be 03.03.02 which is an fictional id for 'Jets' The beginning point I display all companies My query: ?q=*:*facet=truefacet.field=categoryIDfacet.mincount=1 Desired facet result: Shops and services (4313) ID = 01 Home and interiour (2932) ID = 02 Transportation (1144) ID = 03 I click Transportation, ID = 03 My query: ?q=*:*'fq=categoryID:03*facet=truefacet.field=categoryIDfacet.mincount=1 Desired facet result: Land vehicles (708) ID = 03.01 Boats (391) ID = 03.02 Planes (342)ID = 03.03 Under these categories are even more subcategories and so forth. Using facet queries like above would give me count for every single sub category which will be in the hundreds when I only really want the sum of where I am in the hierarchical category tree at that. Does this make sense? My solution is to store multiple ID's for each company. Example for Boeing would be to have a categoryFacet field and store 03 and 03.03 and 03.03.02, and skip the wildcard in the facet.field. Seems kind of bloated, are there better solutions? Thanks a bunch!
Re: Summing sub categories in faceting
Did a bit more creative searching for a solution and came up with this: http://www.mail-archive.com/solr-user@lucene.apache.org/msg15027.html I'm using couple of days old nightly build, so unless there is something new I should know about I'm going with that method :) 2009/8/6 Jón Helgi Jónsson jonjons...@gmail.com: Hi, would really appreciate some help on this. I'm doing a category browser for companies. Kind of like a yellow pages. For each company I store each category the company is in like this: Example for Boeing would be 03.03.02 which is an fictional id for 'Jets' The beginning point I display all companies My query: ?q=*:*facet=truefacet.field=categoryIDfacet.mincount=1 Desired facet result: Shops and services (4313) ID = 01 Home and interiour (2932) ID = 02 Transportation (1144) ID = 03 I click Transportation, ID = 03 My query: ?q=*:*'fq=categoryID:03*facet=truefacet.field=categoryIDfacet.mincount=1 Desired facet result: Land vehicles (708) ID = 03.01 Boats (391) ID = 03.02 Planes (342) ID = 03.03 Under these categories are even more subcategories and so forth. Using facet queries like above would give me count for every single sub category which will be in the hundreds when I only really want the sum of where I am in the hierarchical category tree at that. Does this make sense? My solution is to store multiple ID's for each company. Example for Boeing would be to have a categoryFacet field and store 03 and 03.03 and 03.03.02, and skip the wildcard in the facet.field. Seems kind of bloated, are there better solutions? Thanks a bunch!
Wildcard and boosting
Hey now! I do index time boosting for my fields and just discovered that when searching with a trailing wild card the boosting is ignored. Will my boosting work with a wild card if I do it at query time? And if so is there a lot of performance difference? Some other method I can use to preserve my boosting? I do not need hightlighting. Thanks, Jon Helgi
Re: Wildcard and boosting
I just updated to nightly build (I was using 1.2) and this does not seem to be an issue anymore. 2009/7/29 Jón Helgi Jónsson jonjons...@gmail.com: Hey now! I do index time boosting for my fields and just discovered that when searching with a trailing wild card the boosting is ignored. Will my boosting work with a wild card if I do it at query time? And if so is there a lot of performance difference? Some other method I can use to preserve my boosting? I do not need hightlighting. Thanks, Jon Helgi
Re: How to install a patch?
Thanks for that. The patch in question is this one: http://issues.apache.org/jira/browse/SOLR-469 I found this patching utility for Windows, going to give it a go: http://gnuwin32.sourceforge.net/packages/patch.htm On Tue, Jun 10, 2008 at 12:11 PM, Jacob Singh [EMAIL PROTECTED] wrote: Hi Rusli, Is there a URL you'd like to reference for where you got the patch? That would probably help. For windows I suppose you'll have to google around to find a version of patch which runs there. Beyond Compare is a windows app which has patching capabilities. patch is a program for *nix machines where in the user supplies a patch file and it patches an existing file. a patch file is in a certain format where it explains the differences between the original and a modified copy. So you already have the file locally, and by applying the patch file to it, it will make the changes needed to make your copy like the one the author of the patch has. The source of the file you are looking for is probably in the handlers directory of the solr source. Hope that helps, Jacob Rusli Ruslakall wrote: This is a terribly simple question I bet. I'm running Solr on Windows and would like to use the Data Import RequestHandler patch. I have been trying to figure out how to install this patch but been unsuccessful so far. How would I go about doing this? Thanks, Jon
Re: Want to drill down facet search result
Thanks for that, I looked into fq and it will definatly help when I drill into zip codes. However I'm still having some issues, facet.prefix only got me so far because sometimes the facet is the second word in the field. Also I have another question with this example: doc field name=nameCompany A/field field name=category_id1/field field name=category_nameCar/field field name=category_aliasautomobile, vehicle/field field name=category_id2/field field name=category_nameAnimals/field field name=category_aliascat, dog, rat/field /doc Is there any way I can group category information together? So that I know the category_id for the specific category_name? For example, I want to facet search for 'vehicle' and want to count how many companies are in the mother category 1 and the name of the category = Car. I can put everything in one line and break apart with php after the fact but wondering if there is a better way. On Thu, May 29, 2008 at 5:32 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Thu, May 29, 2008 at 12:22 PM, Rusli Ruslakall [EMAIL PROTECTED] wrote: searched forever before posting and of course I found it shortly after :) Can use facet.prefix, beautiful! You can also constrain both results and facets to any arbitrary query via fq=myquery -Yonik On Thu, May 29, 2008 at 3:43 PM, Rusli Ruslakall [EMAIL PROTECTED] wrote: Hi, I index something like this: doc field name=nameCompany A/field field name=cat123/field field name=cat456/field field name=cat789/field /doc doc field name=nameCompany B/field field name=cat129/field field name=cat123/field field name=cat987/field /doc So I ONLY want to display all category names starting with '12' and how many companies are in each one. In this example it should output: name count 123 (2) 129 (1) What I have now is: http://localhost:8983/solr/select/?q=cat:12facet=truefacet.limit=-1facet.field=catfacet.mincount=1 But with this I get all the categories which I would rather not prefer: name count 123 (2) 456 (1) -- Rather not get this information 789 (1) -- Rather not get this information 129 (1) 987 (1) -- Rather not get this information Is there some way of achieving this in Solr? Thanks alot! Jon