Re: 2 solr dataImport requests on a single core at the same time
Hi Tq very much its solved my problem , having multiple Request Handlers will not degrade the performance ... unless we are sending parallel requests? am i right ? Thansk, Prasad -- View this message in context: http://lucene.472066.n3.nabble.com/2-solr-dataImport-requests-on-a-single-core-at-the-same-time-tp978649p989132.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting FileNotFoundException with repl command=backup?
Thanks for the info Peter, i think i ran into the same isssue some time ago and could not find out why the backup stopped and also got deleted by solr. I decided to stop current running updates to solr while backup is running and wrote an own backuphandler that simply just copies the index-files to some location and rotates older unneeded backups. I thought about a cleaner solution where the backuphandler should create a LOCK to the index which would prevent incomming updates to write into the index. (the same is happening when index-optimizing is running). Then when the LOCK is set, a backup could run without any problems and removes the LOCK when done then. I was not able to create a working LOCK that prevents incomming updates to be applied, never found out... -- Alexander Rothenberg Fotofinder GmbH USt-IdNr. DE812854514 Software EntwicklungWeb: http://www.fotofinder.net/ Potsdamer Str. 96 Tel: +49 30 25792890 10785 BerlinFax: +49 30 257928999 Geschäftsführer:Ali Paczensky Amtsgericht:Berlin Charlottenburg (HRB 73099) Sitz: Berlin
Re: Tree Faceting in Solr 1.4
Thanks I saw the article, As far as I can tell the trunk archives only go back to the middle of March and the 2 patches are from the beginning of the year. Thus: *These approaches can be tried out easily using a single set of sample data and the Solr example application (assumes current trunk codebase and latest patches posted to the respective issues). ** **Is a bit of an over-statement!** * Regards Eric* * On Fri, Jul 23, 2010 at 6:22 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Solr does not, yet, at least not simply, as far as I know, but there are ideas and some JIRA's with maybe some patches: http://wiki.apache.org/solr/HierarchicalFaceting From: rajini maski [rajinima...@gmail.com] Sent: Friday, July 23, 2010 12:34 AM To: solr-user@lucene.apache.org Subject: Re: Tree Faceting in Solr 1.4 I am also looking out for same feature in Solr and very keen to know whether it supports this feature of tree faceting... Or we are forced to index in tree faceting formatlike 1/2/3/4 1/2/3 1/2 1 In-case of multilevel faceting it will give only 2 level tree facet is what i found.. If i give query as : country India and state Karnataka and city bangalore...All what i want is a facet count 1) for condition above. 2) The number of states in that Country 3) the number of cities in that state ... Like = Country: India ,State:Karnataka , City: Bangalore 1 State:Karnataka Kerla Tamilnadu Andra Pradesh...and so on City: Mysore Hubli Mangalore Coorg and so on... If I am doing facet=on facet.field={!ex=State}State fq={!tag=State}State:Karnataka All it gives me is Facets on state excluding only that filter query.. But i was not able to do same on third level ..Like facet.field= Give me the counts of cities also in state Karantaka.. Let me know solution for this... Regards, Rajani Maski On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler impalah...@googlemail.com wrote: Thank you for the link. I was not aware of the multifaceting syntax - this will enable me to run 1 less query on the main page! However this is not a tree faceting feature. Thanks Eric On Thu, Jul 22, 2010 at 4:51 PM, SR r.steve@gmail.com wrote: Perhaps the following article can help: http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html -S On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote: Hi Solr Community If I have: COUNTRY CITY Germany Berlin Germany Hamburg Spain Madrid Can I do faceting like: Germany Berlin Hamburg Spain Madrid I tried to apply SOLR-792 to the current trunk but it does not seem to be compatible. Maybe there is a similar feature existing in the latest builds? Thanks Regards Eric
Re: Duplicates
Another possibility could be the well known 'field collapse' ;-) http://wiki.apache.org/solr/FieldCollapsing Regards, Peter. Thanks. If I set uniqueKey on the field, then I can save duplicates? I need to remove duplicates only from search results. The ability to save duplicates are should be. 2010/7/23 Erick Erickson erickerick...@gmail.com If the field is a single token, just define the uniqueKey on it in your schema. Otherwise, this may be of interest: http://wiki.apache.org/solr/Deduplication Haven't used it myself though... best Erick On Thu, Jul 22, 2010 at 6:14 PM, Pavel Minchenkov char...@gmail.com wrote: Hi, Is it possible to remove duplicates in search results by a given field? Thanks. -- Pavel Minchenkov
Re: Duplicates
Thanks. Does it work with Solr 1.4 (Solr 4.0 mentioned in article)? What about performance? I need only to delete duplicates (I don't need cout of duplicates or select certain duplicate). 2010/7/23 Peter Karich peat...@yahoo.de Another possibility could be the well known 'field collapse' ;-) http://wiki.apache.org/solr/FieldCollapsing Regards, Peter. Thanks. If I set uniqueKey on the field, then I can save duplicates? I need to remove duplicates only from search results. The ability to save duplicates are should be. 2010/7/23 Erick Erickson erickerick...@gmail.com If the field is a single token, just define the uniqueKey on it in your schema. Otherwise, this may be of interest: http://wiki.apache.org/solr/Deduplication Haven't used it myself though... best Erick On Thu, Jul 22, 2010 at 6:14 PM, Pavel Minchenkov char...@gmail.com wrote: Hi, Is it possible to remove duplicates in search results by a given field? Thanks. -- Pavel Minchenkov -- Pavel Minchenkov
Re: Solr on iPad?
Hi Stephan, On the iPad, as with the iPhone, I'm afraid you're stuck with using SQLite if you want any form of database in your app. I suppose if you wanted to get really ambitious and had a lot of time on your hands you could use Xcode to try and compile one of the open- source C-based DBs/Indexers, but as with most things in OS X and iOS development, if you're bending over yourself trying to implement something, you're probably doing it wrongly! Also, I wouldn't put it past the AppStore guardians to reject your app purely on the basis of having used something other than SQLite! Apple's cocoa-dev mailing list is very active if you have problems, but do your homework before asking questions or you'll get short shrift. http://lists.apple.com/cocoa-dev Mark On 22 Jul 2010, at 6:12 pm, Stephan Schwab wrote: Dear Solr community, does anyone know whether it may be possible or has already been done to bring Solr to the Apple iPad so that applications may use a local search engine? Greetings, Stephan -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: Duplicates
Hi Pavel! The patch can be applied to 1.4. The performance is ok, but for some situations it could be worse than without the patch. For us it works good, but others reported some exceptions (see the patch site: https://issues.apache.org/jira/browse/SOLR-236) I need only to delete duplicates Could you give us an example what you exactly need? (Maybe you could index each master document of the 'unique' documents with an extra field and query for that field?) Regards, Peter. Thanks. Does it work with Solr 1.4 (Solr 4.0 mentioned in article)? What about performance? I need only to delete duplicates (I don't need cout of duplicates or select certain duplicate). 2010/7/23 Peter Karich peat...@yahoo.de Another possibility could be the well known 'field collapse' ;-) http://wiki.apache.org/solr/FieldCollapsing Regards, Peter. Thanks. If I set uniqueKey on the field, then I can save duplicates? I need to remove duplicates only from search results. The ability to save duplicates are should be. 2010/7/23 Erick Erickson erickerick...@gmail.com If the field is a single token, just define the uniqueKey on it in your schema. Otherwise, this may be of interest: http://wiki.apache.org/solr/Deduplication Haven't used it myself though... best Erick On Thu, Jul 22, 2010 at 6:14 PM, Pavel Minchenkov char...@gmail.com wrote: Hi, Is it possible to remove duplicates in search results by a given field? Thanks. -- Pavel Minchenkov -- http://karussell.wordpress.com/
Replacing text fields with numeric fields for speed
Hi, One of the things that we were thinking of doing in order to speed up results from Solr search is to convert fixed-text fields (such as values from a drop-down) into numeric fields. The thinking behind this was that searching through numeric values would be faster than searching through text. However, I now feel that we were barking up the wrong tree, as Lucene is probably not doing a text search per se. From some experiments, I see only a small difference between a text search on a field, and a numeric search on the corresponding numeric field. This difference can probably be attributed to the additional processing on the text field. Could someone clarify on whether one can expect a difference in speed between searching through a fixed-text field, and its numeric equivalent? I am aware of the benefit of numeric fields for range queries. Regards, Gora
Problem with Pdf, Sol 1.4.1 Cell
Hi all, as I saw in this discussion [1] there were many issues with PDF indexing in Solr 1.4 due to TIka library (0.4 Version). In Solr 1.4.1 the tika library is the same so I guess the issues are the same. Could anyone, who contributed to the previous thread, help me in resolving these issues? I need a simple tutorial that could help me to upgrade Solr Cell! Something like this: 1) download tika core from trunk 2)create jar with maven dependecies 3)unjar Sol 1.4.1 and change tika library 4)jar the patched Solr 1.4.1 and enjoy! [1] http://markmail.org/message/zbkplnzqho7mxwy3#query:+page:1+mid:gamcxdx34ayt6ccg+state:results Best regards -- -- Benedetti Alessandro Personal Page: http://tigerbolt.altervista.org Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Solr on iPad?
Hi, unfortunately for iPad developers, it seems that it is not possible to use the Spotlight engine through the SDK: http://stackoverflow.com/questions/3133678/spotlight-search-in-the-application Chantal On Fri, 2010-07-23 at 10:16 +0200, Mark Allan wrote: Hi Stephan, On the iPad, as with the iPhone, I'm afraid you're stuck with using SQLite if you want any form of database in your app. I suppose if you wanted to get really ambitious and had a lot of time on your hands you could use Xcode to try and compile one of the open- source C-based DBs/Indexers, but as with most things in OS X and iOS development, if you're bending over yourself trying to implement something, you're probably doing it wrongly! Also, I wouldn't put it past the AppStore guardians to reject your app purely on the basis of having used something other than SQLite! Apple's cocoa-dev mailing list is very active if you have problems, but do your homework before asking questions or you'll get short shrift. http://lists.apple.com/cocoa-dev Mark On 22 Jul 2010, at 6:12 pm, Stephan Schwab wrote: Dear Solr community, does anyone know whether it may be possible or has already been done to bring Solr to the Apple iPad so that applications may use a local search engine? Greetings, Stephan
Re: Replacing text fields with numeric fields for speed
On Fri, 23 Jul 2010 14:44:32 +0530 Gora Mohanty g...@srijan.in wrote: [...] From some experiments, I see only a small difference between a text search on a field, and a numeric search on the corresponding numeric field. [...] Well, I take that back. Running more rigorous tests with Apache Bench shows a difference of slightly over a factor of 2 between the median search time on the numeric field, and on the text field. The search on the numeric field is, of course, faster. That much of a difference puzzles me. Would someone knowledgeable about Lucene indexes care to comment? Regards, Gora
Re: filter query on timestamp slowing query???
I don't specify any sort order, and i do request for the score, so it is ordered based on that. My schema consists of these fields: field name=id type=string indexed=true stored=true required=true / field name=timestamp type=pdate indexed=true stored=true default=NOW multiValued=false/ (changing now to tdate) field name=type type=string indexed=true stored=true required=true / field name=contents type=text indexed=true stored=false termVectors=true / and a typical query would be: fl=id,type,timestamp,scorestart=0q=Coca+Cola+pepsi+-dr+pepperfq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y)rows=2000 thanks again for you time -- View this message in context: http://lucene.472066.n3.nabble.com/filter-query-on-timestamp-slowing-query-tp977280p989536.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Duplicates
Pavel, hopefully I understand now your usecase :-) but one question: I need to select always *one* file per folder or select *only* folders than contains matched files (without files). What do you mean here with 'or'? Do you have 2 usecases or would one of them be sufficient? Because the second usecase could be solved without the patch: you could index folders only, then all prop_N will be multivalued field. and you don't have the problem of duplicate folders. (If you don't mind uglyness both usecases could even handled: After you got the folders grabbing the files which matched could be done in postprocessing) But I fear the cleanest solution is to use the patch. Hopefully it can be applied without hassles against 1.4 or the trunk. If not, please ask on the patch-site for assistance. Regards, Peter. Thanks, Peter! I'll try collapsing today. Example (sorry if table unformated): id | type | prop_1 | | prop_N | folderId 0 | folder | | | | 1 | file | val1 | | valN1 | 0 2 | file | val3 | | valN2 | 0 3 | file | val1 | | valN3 | 0 4 | folder | | | | 5 | folder | | | | 6 | file | val3 | | valN7 | 6 7 | file | val4 | | valN8 | 6 8 | folder | | | | 9 | file | val2 | | valN3 | 8 10| file | val1 | | valN2 | 8 11| file | val2 | | valN5 | 8 12| folder | | | | I need to select always *one* file per folder or select *only* folders than contains matched files (without files). Query: prop_1:val1 OR prop_2:val2 I need results (document ids): 1, 9 or 0, 8 2010/7/23 Peter Karich peat...@yahoo.de Hi Pavel! The patch can be applied to 1.4. The performance is ok, but for some situations it could be worse than without the patch. For us it works good, but others reported some exceptions (see the patch site: https://issues.apache.org/jira/browse/SOLR-236) I need only to delete duplicates Could you give us an example what you exactly need? (Maybe you could index each master document of the 'unique' documents with an extra field and query for that field?) Regards, Peter. -- Pavel Minchenkov -- http://karussell.wordpress.com/
Re: Replacing text fields with numeric fields for speed
Gora, just for my interests: does apache bench sends different queries, or from the logs, or always the same query? If it would be always the same query the cache of solr will come and make the response time super small. I would like to find a tool or script where I can send my logfile to solr and measure some things ... because at the moment we are using fastbench and I would like to replace it ;-) Regards, Peter. On Fri, 23 Jul 2010 14:44:32 +0530 Gora Mohanty g...@srijan.in wrote: [...] From some experiments, I see only a small difference between a text search on a field, and a numeric search on the corresponding numeric field. [...] Well, I take that back. Running more rigorous tests with Apache Bench shows a difference of slightly over a factor of 2 between the median search time on the numeric field, and on the text field. The search on the numeric field is, of course, faster. That much of a difference puzzles me. Would someone knowledgeable about Lucene indexes care to comment? Regards, Gora
Re: Solr 3.1 dev
On Fri, Jul 23, 2010 at 6:09 AM, Eric Grobler impalah...@googlemail.com wrote: I have a few questions :-) a) Will the next release of solr be 3.0 (instead of 1.5)? The next release will be 3.1 (matching the next lucene version off of the 3x branch). Trunk is 4.0-dev b) How stable/mature is the current 3x version? For features that are not new, it should be very stable. c) Is LocalSolr implemented? where can I find a list of new features? Solr spatial is partly implemented... currently in trunk. http://wiki.apache.org/solr/SpatialSearch d) Is this the correct method to download the lasted stable version? svn co https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x The last official Solr release was 1.4.1 Nightly builds aren't official apache releases... but plenty of people do use them in production environments (after appropriate testing of course). -Yonik http://www.lucidimagination.com
Re: Solr 3.1 dev
Hi, is there any wiki/url of the proposed changes or new features that we should expect with this new release? On Fri, Jul 23, 2010 at 9:20 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Fri, Jul 23, 2010 at 6:09 AM, Eric Grobler impalah...@googlemail.com wrote: I have a few questions :-) a) Will the next release of solr be 3.0 (instead of 1.5)? The next release will be 3.1 (matching the next lucene version off of the 3x branch). Trunk is 4.0-dev b) How stable/mature is the current 3x version? For features that are not new, it should be very stable. c) Is LocalSolr implemented? where can I find a list of new features? Solr spatial is partly implemented... currently in trunk. http://wiki.apache.org/solr/SpatialSearch d) Is this the correct method to download the lasted stable version? svn co https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x The last official Solr release was 1.4.1 Nightly builds aren't official apache releases... but plenty of people do use them in production environments (after appropriate testing of course). -Yonik http://www.lucidimagination.com
Re: Tree Faceting in Solr 1.4
Hi Erik, Thanks for the fast update :-) I will try it soon. Regards Eric On Fri, Jul 23, 2010 at 2:37 PM, Erik Hatcher erik.hatc...@gmail.comwrote: I've update the SOLR-792 patch to apply to trunk (using the solr/ directory as the root still, not the higher-level trunk/). This one I think is an important one that I'd love to see eventually part of Solr built-in, but the TODO's in TreeFacetComponent ought to be taken care of first, to generalize this to N fields levels and maybe some other must/nice-to-haves. Erik On Jul 23, 2010, at 3:45 AM, Eric Grobler wrote: Thanks I saw the article, As far as I can tell the trunk archives only go back to the middle of March and the 2 patches are from the beginning of the year. Thus: *These approaches can be tried out easily using a single set of sample data and the Solr example application (assumes current trunk codebase and latest patches posted to the respective issues). ** **Is a bit of an over-statement!** * Regards Eric* * On Fri, Jul 23, 2010 at 6:22 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Solr does not, yet, at least not simply, as far as I know, but there are ideas and some JIRA's with maybe some patches: http://wiki.apache.org/solr/HierarchicalFaceting From: rajini maski [rajinima...@gmail.com] Sent: Friday, July 23, 2010 12:34 AM To: solr-user@lucene.apache.org Subject: Re: Tree Faceting in Solr 1.4 I am also looking out for same feature in Solr and very keen to know whether it supports this feature of tree faceting... Or we are forced to index in tree faceting formatlike 1/2/3/4 1/2/3 1/2 1 In-case of multilevel faceting it will give only 2 level tree facet is what i found.. If i give query as : country India and state Karnataka and city bangalore...All what i want is a facet count 1) for condition above. 2) The number of states in that Country 3) the number of cities in that state ... Like = Country: India ,State:Karnataka , City: Bangalore 1 State:Karnataka Kerla Tamilnadu Andra Pradesh...and so on City: Mysore Hubli Mangalore Coorg and so on... If I am doing facet=on facet.field={!ex=State}State fq={!tag=State}State:Karnataka All it gives me is Facets on state excluding only that filter query.. But i was not able to do same on third level ..Like facet.field= Give me the counts of cities also in state Karantaka.. Let me know solution for this... Regards, Rajani Maski On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler impalah...@googlemail.com wrote: Thank you for the link. I was not aware of the multifaceting syntax - this will enable me to run 1 less query on the main page! However this is not a tree faceting feature. Thanks Eric On Thu, Jul 22, 2010 at 4:51 PM, SR r.steve@gmail.com wrote: Perhaps the following article can help: http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html -S On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote: Hi Solr Community If I have: COUNTRY CITY Germany Berlin Germany Hamburg Spain Madrid Can I do faceting like: Germany Berlin Hamburg Spain Madrid I tried to apply SOLR-792 to the current trunk but it does not seem to be compatible. Maybe there is a similar feature existing in the latest builds? Thanks Regards Eric
solrj occasional timeout on commit
Hey, I recently moved a solr app from a testing environment into a production environment, and I'm seeing a brand new error which never occurred during testing. I'm seeing this in the solrJ-based app logs: org.apache.solr.common.SolrException: com.caucho.vfs.SocketTimeoutException: client timeout com.caucho.vfs.SocketTimeoutException: client timeout request: http://somehost:8080/solr/live/update?wt=javabinversion=1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) This occurs in a service that periodically adds new documents to solr. There are 4 boxes that could be doing updates in parallel. In testing there were 2. We're running on a new Resin 4 based install on production, whereas we were using resin 3 in testing. Does anyone have any ideas. Help would be greatly appreciated! Thanks, -Kallin Nagelberg
Re: Tree Faceting in Solr 1.4
If I am doing facet=on facet.field={!ex=State}State fq={!tag=State}State:Karnataka All it gives me is Facets on state excluding only that filter query.. But i was not able to do same on third level ..Like facet.field= Give me the counts of cities also in state Karantaka.. Let me know solution for this... This looks like regular faceting to me. 1. Showing citycounts given state facet=onfq=State:Karnatakafacet.field=city 2. showing statecounts given country (similar to 1) facet=onfq=Country:Indiafacet.field=state 3. showing city and state counts given country: facet=onfq=Country:Indiafacet.field=statefacet.field=city 4. showing city counts given state + all other states not filtered by current state ( http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters ) facet=onfq={!tag=State}state:Karnatakafacet.field={!ex=State}statefacet.field=city 5. showing state + city counts given country + all other countries not filtered by current country (shttp://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filtersimilar to 4) facet=onfq={!tag=country}country:Indiafacet.field={!ex=country}countryfacet.field=cityfacet.field=state etc. This has nothing to do with Hierarchical faceting as described in SOLR-792 btw, although I understand the possible confusion as County state city can obvisouly be seen as some sort of hierarchy. The first part of your question seemed to be more about Hierarchial faceting as per SOLR-792, but I couldn't quite distill a question from that part. Also, just a suggestion, consider using id's instead of names for filtering; you will get burned sooner or later otherwise. HTH, Geert-Jan 2010/7/23 rajini maski rajinima...@gmail.com I am also looking out for same feature in Solr and very keen to know whether it supports this feature of tree faceting... Or we are forced to index in tree faceting formatlike 1/2/3/4 1/2/3 1/2 1 In-case of multilevel faceting it will give only 2 level tree facet is what i found.. If i give query as : country India and state Karnataka and city bangalore...All what i want is a facet count 1) for condition above. 2) The number of states in that Country 3) the number of cities in that state ... Like = Country: India ,State:Karnataka , City: Bangalore 1 State:Karnataka Kerla Tamilnadu Andra Pradesh...and so on City: Mysore Hubli Mangalore Coorg and so on... If I am doing facet=on facet.field={!ex=State}State fq={!tag=State}State:Karnataka All it gives me is Facets on state excluding only that filter query.. But i was not able to do same on third level ..Like facet.field= Give me the counts of cities also in state Karantaka.. Let me know solution for this... Regards, Rajani Maski On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler impalah...@googlemail.com wrote: Thank you for the link. I was not aware of the multifaceting syntax - this will enable me to run 1 less query on the main page! However this is not a tree faceting feature. Thanks Eric On Thu, Jul 22, 2010 at 4:51 PM, SR r.steve@gmail.com wrote: Perhaps the following article can help: http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html -S On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote: Hi Solr Community If I have: COUNTRY CITY Germany Berlin Germany Hamburg Spain Madrid Can I do faceting like: Germany Berlin Hamburg Spain Madrid I tried to apply SOLR-792 to the current trunk but it does not seem to be compatible. Maybe there is a similar feature existing in the latest builds? Thanks Regards Eric
Re: Tree Faceting in Solr 1.4
Hi Erik, I must be doing something wrong :-( I took: svn co https://svn.apache.org/repos/asf/lucene/dev/trunk mytest then i copied SOLR-792.path to folder /mytest/solr then i ran: patch -p1 SOLR-792.patch but I get can't find file to patch at input line 5 Is this the correct trunk and patch command? However if I just manually - copy TreeFacetComponent.java to folder solr/src/java/org/apache/solr/handler/component - add SimpleOrderedMapSimpleOrderedMap _treeFacets; to ResponseBuilder.java - and make the changes to solrconfig.xml I am able to compile and run your test :-) Regards Eric On Fri, Jul 23, 2010 at 2:37 PM, Erik Hatcher erik.hatc...@gmail.comwrote: I've update the SOLR-792 patch to apply to trunk (using the solr/ directory as the root still, not the higher-level trunk/). This one I think is an important one that I'd love to see eventually part of Solr built-in, but the TODO's in TreeFacetComponent ought to be taken care of first, to generalize this to N fields levels and maybe some other must/nice-to-haves. Erik On Jul 23, 2010, at 3:45 AM, Eric Grobler wrote: Thanks I saw the article, As far as I can tell the trunk archives only go back to the middle of March and the 2 patches are from the beginning of the year. Thus: *These approaches can be tried out easily using a single set of sample data and the Solr example application (assumes current trunk codebase and latest patches posted to the respective issues). ** **Is a bit of an over-statement!** * Regards Eric* * On Fri, Jul 23, 2010 at 6:22 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Solr does not, yet, at least not simply, as far as I know, but there are ideas and some JIRA's with maybe some patches: http://wiki.apache.org/solr/HierarchicalFaceting From: rajini maski [rajinima...@gmail.com] Sent: Friday, July 23, 2010 12:34 AM To: solr-user@lucene.apache.org Subject: Re: Tree Faceting in Solr 1.4 I am also looking out for same feature in Solr and very keen to know whether it supports this feature of tree faceting... Or we are forced to index in tree faceting formatlike 1/2/3/4 1/2/3 1/2 1 In-case of multilevel faceting it will give only 2 level tree facet is what i found.. If i give query as : country India and state Karnataka and city bangalore...All what i want is a facet count 1) for condition above. 2) The number of states in that Country 3) the number of cities in that state ... Like = Country: India ,State:Karnataka , City: Bangalore 1 State:Karnataka Kerla Tamilnadu Andra Pradesh...and so on City: Mysore Hubli Mangalore Coorg and so on... If I am doing facet=on facet.field={!ex=State}State fq={!tag=State}State:Karnataka All it gives me is Facets on state excluding only that filter query.. But i was not able to do same on third level ..Like facet.field= Give me the counts of cities also in state Karantaka.. Let me know solution for this... Regards, Rajani Maski On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler impalah...@googlemail.com wrote: Thank you for the link. I was not aware of the multifaceting syntax - this will enable me to run 1 less query on the main page! However this is not a tree faceting feature. Thanks Eric On Thu, Jul 22, 2010 at 4:51 PM, SR r.steve@gmail.com wrote: Perhaps the following article can help: http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html -S On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote: Hi Solr Community If I have: COUNTRY CITY Germany Berlin Germany Hamburg Spain Madrid Can I do faceting like: Germany Berlin Hamburg Spain Madrid I tried to apply SOLR-792 to the current trunk but it does not seem to be compatible. Maybe there is a similar feature existing in the latest builds? Thanks Regards Eric
Re: Duplicates
I mean two usecases. I can't index folders only because I have another queries on files. Or I have to do another index that contains only folders, but then I have to take care of synchronizing folders in two indexes. Does range, spatial, etc quiries are supported on multivalued fields? 2010/7/23 Peter Karich peat...@yahoo.de Pavel, hopefully I understand now your usecase :-) but one question: I need to select always *one* file per folder or select *only* folders than contains matched files (without files). What do you mean here with 'or'? Do you have 2 usecases or would one of them be sufficient? Because the second usecase could be solved without the patch: you could index folders only, then all prop_N will be multivalued field. and you don't have the problem of duplicate folders. (If you don't mind uglyness both usecases could even handled: After you got the folders grabbing the files which matched could be done in postprocessing) But I fear the cleanest solution is to use the patch. Hopefully it can be applied without hassles against 1.4 or the trunk. If not, please ask on the patch-site for assistance. Regards, Peter. Thanks, Peter! I'll try collapsing today. Example (sorry if table unformated): id | type | prop_1 | | prop_N | folderId 0 | folder | | | | 1 | file | val1 | | valN1 | 0 2 | file | val3 | | valN2 | 0 3 | file | val1 | | valN3 | 0 4 | folder | | | | 5 | folder | | | | 6 | file | val3 | | valN7 | 6 7 | file | val4 | | valN8 | 6 8 | folder | | | | 9 | file | val2 | | valN3 | 8 10| file | val1 | | valN2 | 8 11| file | val2 | | valN5 | 8 12| folder | | | | I need to select always *one* file per folder or select *only* folders than contains matched files (without files). Query: prop_1:val1 OR prop_2:val2 I need results (document ids): 1, 9 or 0, 8 2010/7/23 Peter Karich peat...@yahoo.de Hi Pavel! The patch can be applied to 1.4. The performance is ok, but for some situations it could be worse than without the patch. For us it works good, but others reported some exceptions (see the patch site: https://issues.apache.org/jira/browse/SOLR-236) I need only to delete duplicates Could you give us an example what you exactly need? (Maybe you could index each master document of the 'unique' documents with an extra field and query for that field?) Regards, Peter. -- Pavel Minchenkov -- http://karussell.wordpress.com/ -- Pavel Minchenkov
Re: Solr on iPad?
Thanks Mark! I'm subscribing to the cocoa-dev list. On Jul 23, 2010, at 10:17 AM, Mark Allan [via Lucene] wrote: Hi Stephan, On the iPad, as with the iPhone, I'm afraid you're stuck with using SQLite if you want any form of database in your app. I suppose if you wanted to get really ambitious and had a lot of time on your hands you could use Xcode to try and compile one of the open- source C-based DBs/Indexers, but as with most things in OS X and iOS development, if you're bending over yourself trying to implement something, you're probably doing it wrongly! Also, I wouldn't put it past the AppStore guardians to reject your app purely on the basis of having used something other than SQLite! Apple's cocoa-dev mailing list is very active if you have problems, but do your homework before asking questions or you'll get short shrift. http://lists.apple.com/cocoa-dev Mark On 22 Jul 2010, at 6:12 pm, Stephan Schwab wrote: Dear Solr community, does anyone know whether it may be possible or has already been done to bring Solr to the Apple iPad so that applications may use a local search engine? Greetings, Stephan -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. View message @ http://lucene.472066.n3.nabble.com/Solr-on-iPad-tp987655p989269.html To unsubscribe from Solr on iPad?, click here. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-on-iPad-tp987655p990034.html Sent from the Solr - User mailing list archive at Nabble.com.
Autocommit not happening
Hi! I'm a Solr newbie, and I don't understand why autocommits aren't happening in my Solr installation. My one server running Solr: - Ubuntu 10.04 (Lucid Lynx), with all the latest updates. - Solr 1.4.0 running on Tomcat6 - Installation was done via apt-get install solr-common solr-tomcat tomcat6-admin My solrconfig.xml has: autoCommit maxDocs1/maxDocs maxTime1/maxTime /autoCommit My code can add documents just fine. But after 12 hours, autocommit has never happened! Here's what I see on my Solr Admin pages: CORE: name: core class: version:1.0 description:SolrCore stats: coreName : startTime : Thu Jul 22 21:38:30 UTC 2010 refCount : 2 aliases : [] name: searcher class: org.apache.solr.search.SolrIndexSearcher version:1.0 description:index searcher stats: searcherName : searc...@10ed7f5c main caching : true numDocs : 0 maxDoc : 0 reader : SolrIndexReader{this=509f662e,r=readonlydirectoryrea...@509f662e,refCnt=1,segments=0} readerDir : org.apache.lucene.store.NIOFSDirectory@/var/lib/solr/data/index indexVersion : 1279834591965 openedAt : Thu Jul 22 23:58:28 UTC 2010 registeredAt : Thu Jul 22 23:58:28 UTC 2010 warmupTime : 3 name: searc...@10ed7f5c main class: org.apache.solr.search.SolrIndexSearcher version:1.0 description:index searcher stats: searcherName : searc...@10ed7f5c main caching : true numDocs : 0 maxDoc : 0 reader : SolrIndexReader{this=509f662e,r=readonlydirectoryrea...@509f662e,refCnt=1,segments=0} readerDir : org.apache.lucene.store.NIOFSDirectory@/var/lib/solr/data/index indexVersion : 1279834591965 openedAt : Thu Jul 22 23:58:28 UTC 2010 registeredAt : Thu Jul 22 23:58:28 UTC 2010 warmupTime : 3 UPDATE HANDLERS: name: updateHandler class: org.apache.solr.update.DirectUpdateHandler2 version:1.0 description:Update handler that efficiently directly updates the on-disk main lucene index stats: commits : 2 autocommits : 0 optimizes : 0 rollbacks : 0 expungeDeletes : 0 docsPending : 496590 adds : 496590 deletesById : 0 deletesByQuery : 0 errors : 0 cumulative_adds : 501989 cumulative_deletesById : 0 cumulative_deletesByQuery : 2 cumulative_errors : 0 There's nearly 500K pending commits, accumulated over the past 12 hours. I think we're past the specified autocommit limits. :-) What should I look at to figure out what's preventing autocommits? Thank you all in advance! John
RE: filter query on timestamp slowing query???
and a typical query would be: fl=id,type,timestamp,scorestart=0q=Coca+Cola+pepsi+-dr+pepperfq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y) rows=2000 My understanding is that this is essentially what the solr 1.4 trie date fields are made for, I'd use them, should speed things up. Not sure where the best documentation for them is, but see: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
Re: filter query on timestamp slowing query???
I'm in the process of indexing my demi data to test that, I'll have more valid data on whether or not it made the differeve In a few days Thanks ב-23/07/2010, בשעה 19:42, Jonathan Rochkind [via Lucene] ml-node+990234-2085494904-316...@n3.nabble.com כתב/ה: and a typical query would be: fl=id,type,timestamp,scorestart=0q=Coca+Cola+pepsi+-dr+pepperfq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y) rows=2000 My understanding is that this is essentially what the solr 1.4 trie date fields are made for, I'd use them, should speed things up. Not sure where the best documentation for them is, but see: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ -- View message @ http://lucene.472066.n3.nabble.com/filter-query-on-timestamp-slowing-query-tp977280p990234.html To unsubscribe from Re: filter query on timestamp slowing query???, click here (link removed) =. -- View this message in context: http://lucene.472066.n3.nabble.com/filter-query-on-timestamp-slowing-query-tp977280p990337.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit not happening
On Jul 23, 2010, at 9:37 AM, John DeRosa wrote: Hi! I'm a Solr newbie, and I don't understand why autocommits aren't happening in my Solr installation. My one server running Solr: - Ubuntu 10.04 (Lucid Lynx), with all the latest updates. - Solr 1.4.0 running on Tomcat6 - Installation was done via apt-get install solr-common solr-tomcat tomcat6-admin My solrconfig.xml has: autoCommit maxDocs1/maxDocs maxTime1/maxTime /autoCommit [snip] The plot thickens. var/log/tomcat6/catalina.out contains: Jul 22, 2010 9:36:32 PM org.apache.solr.update.DirectUpdateHandler2$CommitTracker init INFO: AutoCommit: disabled What's stepping in and disabling autocommit? John
Allow custom overrides
I need to implement a search engine that will allow users to override pieces of data and then search against or view that data. For example, a doc that has the following values: DocId FulltextMeta1 Meta2 Meta3 1 The quick brown fox foofoo foo Now say a user overrides Meta2 : DocId FulltextMeta1 Meta2 Meta3 1 The quick brown fox foofoo foo bar For that user, if they search for Meta2:bar, I need to hit, but no other user should hit on it. Likewise, if that user searches for Meta2:foo, it should not hit. Also, any searches against that document for that user should return the value 'bar' for Meta2, but should return 'foo' for other users. I'm not sure the best way to implement this. Maybe I could do this with field collapsing somehow? Or with payloads? Custom analyzer? Any help would be appreciated. - Charlie
RE: filter query on timestamp slowing query???
and a typical query would be: fl=id,type,timestamp,scorestart=0q=Coca+Cola+pepsi+-dr+pepperfq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y) rows=2000 On top of using trie dates, you might consider separating the timestamp portion and the type portion of the fq into seperate fq parameters -- that will allow them to to be stored in the filter cache seperately. So for instance, if you include type:x OR type:y in queries a lot, but with different date ranges, then when you make a new query, the set for type:x OR type:y can be pulled from the filter cache and intersected with the other result set, that portion won't have to be run again. That's probably not where your slowness is coming from, but shouldn't hurt. Multiple fq's are essentially AND'd together, so whenever you have an 'fq' that's seperate clauses AND'd together, you can always seperate them into multiple fq's, wont' effect the result set, will effect the caching possibilities.
RE: Novice seeking help to change filters to search without diacritics
Hi HSingh, Maybe the mapping file I attached to https://issues.apache.org/jira/browse/SOLR-2013 will help? Steve -Original Message- From: HSingh [mailto:hsin...@gmail.com] Sent: Thursday, July 22, 2010 11:30 PM To: solr-user@lucene.apache.org Subject: Re: Novice seeking help to change filters to search without diacritics Hoss, thank you for your helpful response! : i think what's confusing you is that you are using the : MappingCharFilterFactory with that file in your text field type to : convert any ISOLatin1Accent characters to their base characters The problem is that a large range of characters are not getting converting to their base characters. The ASCIIFoldingFilterFactory handles this conversion for the entire Latin character set, including the extended sets without having to specify individual characters and their equivalent base characters. Is there way for me to switch to ASCIIFoldingFilterFactory? If so, what changes do I need to make to these files? I would appreciate your help! -- View this message in context: http://lucene.472066.n3.nabble.com/Novice- seeking-help-to-change-filters-to-search-without-diacritics- tp971263p988890.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Spellcheck help
In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84): final static String PATTERN = (?:(?!( + NMTOKEN + :|\\d+)))[\\p{L}_\\-0-9]+; and remove the |\\d+ to make it: final static String PATTERN = (?:(?! + NMTOKEN + :))[\\p{L}_\\-0-9]+; My testing shows this solves your problem. The caution is to test it against all your use cases because obviously someone thought we should ignore leading digits from keywords. Surely there's a reason why although I can't think of it. James Dyer E-Commerce Systems Ingram Book Company (615) 213-4311 -Original Message- From: dekay...@hotmail.com [mailto:dekay...@hotmail.com] Sent: Saturday, July 17, 2010 12:41 PM To: solr-user@lucene.apache.org Subject: Re: Spellcheck help Can anybody help me with this? :( -Original Message- From: Marc Ghorayeb Sent: Thursday, July 08, 2010 9:46 AM To: solr-user@lucene.apache.org Subject: Spellcheck help Hello,I've been trying to get rid of a bug when using the spellcheck but so far with no success :(When searching for a word that starts with a number, for example 3dsmax, i get the results that i want, BUT the spellcheck says it is not correctly spelled AND the collation gives me 33dsmax. Further investigation shows that the spellcheck is actually only checking dsmax which it considers does not exist and gives me 3dsmax for better results, but since i have spellcheck.collate = true, the collation that i show is 33dsmax with the first 3 being the one discarded by the spellchecker... Otherwise, the spellcheck works correctly for normal words... any ideas? :(My spellcheck field is fairly classic, whitespace tokenizer, with lowercase filter...Any help would be greatly appreciated :)Thanks,Marc _ Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement ! http://www.messengersurvotremobile.com/?d=iPhone
Re: Replacing text fields with numeric fields for speed
On Fri, 23 Jul 2010 14:33:54 +0200 Peter Karich peat...@yahoo.de wrote: Gora, just for my interests: does apache bench sends different queries, or from the logs, or always the same query? If it would be always the same query the cache of solr will come and make the response time super small. Yes, the way that things are set up currently the query is always the same. My reasoning was that the effect of the Solr cache should be the same for both numeric, and text fields. I am going to be trying some more rigorous tests, such as turning off Solr caching, and pre-warming the query before running the tests. I would like to find a tool or script where I can send my logfile to solr and measure some things ... because at the moment we are using fastbench and I would like to replace it ;-) Not sure what fastbench is, but using Solr logs as a tool to measure search times for typical searches is an interesting idea. Hmm, we will also need to do that, so maybe we can compare notes on this. Regards, Gora
help with a schema design problem
Hi, Lets say i have table with 3 columns document id Party Value and Party Type. In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type: Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier. Now in this table if i use SQL its easy for me find all document with Party Value as Pramod and Party Type as Client. I need to design solr schema so that i can do the same in Solr. If i create 2 fields in solr schema Party value and Party type both of them multi valued and try to query +Pramod +Supplier then solr will return me the first document, even though in the first document Pramod is a client and not a supplier Thanks, Pramod Goyal
RE: help with a schema design problem
I think you just want something like: p_value:Pramod AND p_type:Supplier no? -Kallin Nagelberg -Original Message- From: Pramod Goyal [mailto:pramod.go...@gmail.com] Sent: Friday, July 23, 2010 2:17 PM To: solr-user@lucene.apache.org Subject: help with a schema design problem Hi, Lets say i have table with 3 columns document id Party Value and Party Type. In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type: Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier. Now in this table if i use SQL its easy for me find all document with Party Value as Pramod and Party Type as Client. I need to design solr schema so that i can do the same in Solr. If i create 2 fields in solr schema Party value and Party type both of them multi valued and try to query +Pramod +Supplier then solr will return me the first document, even though in the first document Pramod is a client and not a supplier Thanks, Pramod Goyal
Sort by index order desc?
Any pointers on how to sort by reverse index order? http://search.lucidimagination.com/search/document/4a59ded3966271ca/sort_by_index_order_desc it seems like it should be easy to do with the function query stuff, but i'm not sure what to sort by (unless I add a new field for indexed time) Any pointers? Thanks Ryan
Re: a bug of solr distributed search
Yonik, why do we do not send the output of TermsComponent of every node in the cluster to a Hadoop instance? Since TermsComponent does the map-part of the map-reduce concept, Hadoop only needs to reduce the stuff. Maybe we even do not need Hadoop for this. After reducing, every node in the cluster gets the current values to compute the idf. We can store this information in a HashMap-based SolrCache (or something like that) to provide constant-time access. To keep the values up to date, we can repeat that after every x minutes. If we got that, it does not care whereas we use doc_X from shard_A or shard_B, since they will all have got the same scores. Even if we got large indices with 10 million or more unique terms, this will only need some megabyte network-traffic. Kind regards, - Mitch Yonik Seeley-2-2 wrote: As the comments suggest, it's not a bug, but just the best we can do for now since our priority queues don't support removal of arbitrary elements. I guess we could rebuild the current priority queue if we detect a duplicate, but that will have an obvious performance impact. Any other suggestions? -Yonik http://www.lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990506.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: a bug of solr distributed search
On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote: why do we do not send the output of TermsComponent of every node in the cluster to a Hadoop instance? Since TermsComponent does the map-part of the map-reduce concept, Hadoop only needs to reduce the stuff. Maybe we even do not need Hadoop for this. After reducing, every node in the cluster gets the current values to compute the idf. We can store this information in a HashMap-based SolrCache (or something like that) to provide constant-time access. To keep the values up to date, we can repeat that after every x minutes. There's already a patch in JIRA that does distributed IDF. Hadoop wouldn't be the right tool for that anyway... it's for batch oriented systems, not low-latency queries. If we got that, it does not care whereas we use doc_X from shard_A or shard_B, since they will all have got the same scores. That only works if the docs are exactly the same - they may not be. -Yonik http://www.lucidimagination.com
Re: a bug of solr distributed search
... Additionally to my previous posting: To keep this sync we could do two things: Waiting for every server to make sure that everyone uses the same values to compute the score and than apply them. Or: Let's say that we collect the new values every 15 minutes. To merge and send them over the network, we declare that this will need 3 additionally minutes (We want to keep the network traffic for such actions very low, so we do not send everything instantly). Okay, and now we say 2 additionally minutes, if 3 were not enough or something needs a little bit more time than we tought.. After those 2 minutes, every node has to apply the new values. Pro: If one node gets broken, we do not delay the Application of the new values. Con: We need two HashMaps and both will have roughly the same sice. That means we will waste some RAM for this operation, if we do not write the values to disk (Which I do not suggest). Thoughts? - Mitch MitchK wrote: Yonik, why do we do not send the output of TermsComponent of every node in the cluster to a Hadoop instance? Since TermsComponent does the map-part of the map-reduce concept, Hadoop only needs to reduce the stuff. Maybe we even do not need Hadoop for this. After reducing, every node in the cluster gets the current values to compute the idf. We can store this information in a HashMap-based SolrCache (or something like that) to provide constant-time access. To keep the values up to date, we can repeat that after every x minutes. If we got that, it does not care whereas we use doc_X from shard_A or shard_B, since they will all have got the same scores. Even if we got large indices with 10 million or more unique terms, this will only need some megabyte network-traffic. Kind regards, - Mitch Yonik Seeley-2-2 wrote: As the comments suggest, it's not a bug, but just the best we can do for now since our priority queues don't support removal of arbitrary elements. I guess we could rebuild the current priority queue if we detect a duplicate, but that will have an obvious performance impact. Any other suggestions? -Yonik http://www.lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990551.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: a bug of solr distributed search
That only works if the docs are exactly the same - they may not be. Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same, don't they? -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: help with a schema design problem
With the usecase you specified it should work to just index each Row as you described in your initial post to be a seperate document. This way p_value and p_type all get singlevalued and you get a correct combination of p_value and p_type. However, this may not go so well with other use-cases you have in mind, e.g.: requiring that no multiple results are returned with the same document id. 2010/7/23 Pramod Goyal pramod.go...@gmail.com I want to do that. But if i understand correctly in solr it would store the field like this: p_value: Pramod Raj p_type: Client Supplier When i search p_value:Pramod AND p_type:Supplier it would give me result as document 1. Which is incorrect, since in document 1 Pramod is a Client and not a Supplier. On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: I think you just want something like: p_value:Pramod AND p_type:Supplier no? -Kallin Nagelberg -Original Message- From: Pramod Goyal [mailto:pramod.go...@gmail.com] Sent: Friday, July 23, 2010 2:17 PM To: solr-user@lucene.apache.org Subject: help with a schema design problem Hi, Lets say i have table with 3 columns document id Party Value and Party Type. In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type: Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier. Now in this table if i use SQL its easy for me find all document with Party Value as Pramod and Party Type as Client. I need to design solr schema so that i can do the same in Solr. If i create 2 fields in solr schema Party value and Party type both of them multi valued and try to query +Pramod +Supplier then solr will return me the first document, even though in the first document Pramod is a client and not a supplier Thanks, Pramod Goyal
Performance issues when querying on large documents
Hello, I have an index with lots of different types of documents. One of those types basically contains extracts of PDF docs. Some of those PDFs can have 1000+ pages, so there would be a lot of stuff to search through. I am experiencing really terrible performance when querying. My whole index has about 270k documents, but less than 1000 of those are the PDF extracts. The slow querying occurs when I search only on those PDF extracts (by specifying filters), and return 100 results. The 100 results definitely adds to the issue, but even cutting that down can be slow. Is there a way to improve querying with such large results? To give an idea, querying for a single word can take a little over a minute, which isn't really viable for an application that revolves around searching. For now, I have limited the results to 20, which makes the query execute in roughly 10-15 seconds. However, I would like to have the option of returning 100 results. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-issues-when-querying-on-large-documents-tp990590p990590.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: help with a schema design problem
In my case the document id is the unique key( each row is not a unique document ) . So a single document has multiple Party Value and Party Type. Hence i need to define both Party value and Party type as mutli-valued. Is there any way in solr to say p_value[someIndex]=pramod And p_type[someIndex]=client. Is there any other way i can design my schema ? I have some solutions but none seems to be a good solution. One way would be to define a single field in the schema as p_value_type = client pramod i.e. combine the value from both the field and store it in a single field. On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits gbr...@gmail.com wrote: With the usecase you specified it should work to just index each Row as you described in your initial post to be a seperate document. This way p_value and p_type all get singlevalued and you get a correct combination of p_value and p_type. However, this may not go so well with other use-cases you have in mind, e.g.: requiring that no multiple results are returned with the same document id. 2010/7/23 Pramod Goyal pramod.go...@gmail.com I want to do that. But if i understand correctly in solr it would store the field like this: p_value: Pramod Raj p_type: Client Supplier When i search p_value:Pramod AND p_type:Supplier it would give me result as document 1. Which is incorrect, since in document 1 Pramod is a Client and not a Supplier. On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: I think you just want something like: p_value:Pramod AND p_type:Supplier no? -Kallin Nagelberg -Original Message- From: Pramod Goyal [mailto:pramod.go...@gmail.com] Sent: Friday, July 23, 2010 2:17 PM To: solr-user@lucene.apache.org Subject: help with a schema design problem Hi, Lets say i have table with 3 columns document id Party Value and Party Type. In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type: Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier. Now in this table if i use SQL its easy for me find all document with Party Value as Pramod and Party Type as Client. I need to design solr schema so that i can do the same in Solr. If i create 2 fields in solr schema Party value and Party type both of them multi valued and try to query +Pramod +Supplier then solr will return me the first document, even though in the first document Pramod is a client and not a supplier Thanks, Pramod Goyal
Re: a bug of solr distributed search
On Fri, Jul 23, 2010 at 2:40 PM, MitchK mitc...@web.de wrote: That only works if the docs are exactly the same - they may not be. Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same, don't they? Documents aren't supposed to be duplicated across shards... so the presence of multiple docs with the same id is a bug anyway. We've chosen to try and handle it gracefully rather than fail hard. Some people have treated this as a feature - and that's OK as long as expectations are set appropriately. -Yonik http://www.lucidimagination.com
Re: Sort by index order desc?
Looks like you can sort by _docid_ to get things in index order or reverse index order. ?sort=_docid_ asc thank you solr! On Fri, Jul 23, 2010 at 2:23 PM, Ryan McKinley ryan...@gmail.com wrote: Any pointers on how to sort by reverse index order? http://search.lucidimagination.com/search/document/4a59ded3966271ca/sort_by_index_order_desc it seems like it should be easy to do with the function query stuff, but i'm not sure what to sort by (unless I add a new field for indexed time) Any pointers? Thanks Ryan
Scoring Search for autocomplete
Hi, I have an autocomplete that is currently working with an NGramTokenizer so if I search for Yo both New York and Toyota are valid results. However I'm trying to figure out how to best implement the search so that from a score perspective if the string matches the beginning of an entire field it ranks first, followed by the beginning of a term and then in the middle of a term. For example if I was searching with vi I would want Virginia ahead of West Virginia ahead of Five. I think I can do this with three seperate fields, one using a white space tokenizer and a ngram filter, another using the edge-ngram + whitespace and another using keyword+edge-ngram, then doing an or on the 3 fields, so that Virginia would match all 3 and get a higher score... but this doesn't feel right to me, so I wanted to check for better options. Thanks.
RE: filter query on timestamp slowing query???
: On top of using trie dates, you might consider separating the timestamp : portion and the type portion of the fq into seperate fq parameters -- : that will allow them to to be stored in the filter cache seperately. So : for instance, if you include type:x OR type:y in queries a lot, but : with different date ranges, then when you make a new query, the set for : type:x OR type:y can be pulled from the filter cache and intersected definitely ... that's the one big thing that jumped out at me once you showed us *how* you were constructing these queries. -Hoss
Re: help with a schema design problem
Is there any way in solr to say p_value[someIndex]=pramod And p_type[someIndex]=client. No, I'm 99% sure there is not. One way would be to define a single field in the schema as p_value_type = client pramod i.e. combine the value from both the field and store it in a single field. yep, for the use-case you mentioned that would definitely work. Multivalued of course, so it can contain Supplier Raj as well. 2010/7/23 Pramod Goyal pramod.go...@gmail.com In my case the document id is the unique key( each row is not a unique document ) . So a single document has multiple Party Value and Party Type. Hence i need to define both Party value and Party type as mutli-valued. Is there any way in solr to say p_value[someIndex]=pramod And p_type[someIndex]=client. Is there any other way i can design my schema ? I have some solutions but none seems to be a good solution. One way would be to define a single field in the schema as p_value_type = client pramod i.e. combine the value from both the field and store it in a single field. On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits gbr...@gmail.com wrote: With the usecase you specified it should work to just index each Row as you described in your initial post to be a seperate document. This way p_value and p_type all get singlevalued and you get a correct combination of p_value and p_type. However, this may not go so well with other use-cases you have in mind, e.g.: requiring that no multiple results are returned with the same document id. 2010/7/23 Pramod Goyal pramod.go...@gmail.com I want to do that. But if i understand correctly in solr it would store the field like this: p_value: Pramod Raj p_type: Client Supplier When i search p_value:Pramod AND p_type:Supplier it would give me result as document 1. Which is incorrect, since in document 1 Pramod is a Client and not a Supplier. On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: I think you just want something like: p_value:Pramod AND p_type:Supplier no? -Kallin Nagelberg -Original Message- From: Pramod Goyal [mailto:pramod.go...@gmail.com] Sent: Friday, July 23, 2010 2:17 PM To: solr-user@lucene.apache.org Subject: help with a schema design problem Hi, Lets say i have table with 3 columns document id Party Value and Party Type. In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type: Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier. Now in this table if i use SQL its easy for me find all document with Party Value as Pramod and Party Type as Client. I need to design solr schema so that i can do the same in Solr. If i create 2 fields in solr schema Party value and Party type both of them multi valued and try to query +Pramod +Supplier then solr will return me the first document, even though in the first document Pramod is a client and not a supplier Thanks, Pramod Goyal
RE: help with a schema design problem
When i search p_value:Pramod AND p_type:Supplier it would give me result as document 1. Which is incorrect, since in document 1 Pramod is a Client and not a Supplier. Would it? I would expect it to give you nothing. -Kal -Original Message- From: Geert-Jan Brits [mailto:gbr...@gmail.com] Sent: Friday, July 23, 2010 5:05 PM To: solr-user@lucene.apache.org Subject: Re: help with a schema design problem Is there any way in solr to say p_value[someIndex]=pramod And p_type[someIndex]=client. No, I'm 99% sure there is not. One way would be to define a single field in the schema as p_value_type = client pramod i.e. combine the value from both the field and store it in a single field. yep, for the use-case you mentioned that would definitely work. Multivalued of course, so it can contain Supplier Raj as well. 2010/7/23 Pramod Goyal pramod.go...@gmail.com In my case the document id is the unique key( each row is not a unique document ) . So a single document has multiple Party Value and Party Type. Hence i need to define both Party value and Party type as mutli-valued. Is there any way in solr to say p_value[someIndex]=pramod And p_type[someIndex]=client. Is there any other way i can design my schema ? I have some solutions but none seems to be a good solution. One way would be to define a single field in the schema as p_value_type = client pramod i.e. combine the value from both the field and store it in a single field. On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits gbr...@gmail.com wrote: With the usecase you specified it should work to just index each Row as you described in your initial post to be a seperate document. This way p_value and p_type all get singlevalued and you get a correct combination of p_value and p_type. However, this may not go so well with other use-cases you have in mind, e.g.: requiring that no multiple results are returned with the same document id. 2010/7/23 Pramod Goyal pramod.go...@gmail.com I want to do that. But if i understand correctly in solr it would store the field like this: p_value: Pramod Raj p_type: Client Supplier When i search p_value:Pramod AND p_type:Supplier it would give me result as document 1. Which is incorrect, since in document 1 Pramod is a Client and not a Supplier. On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: I think you just want something like: p_value:Pramod AND p_type:Supplier no? -Kallin Nagelberg -Original Message- From: Pramod Goyal [mailto:pramod.go...@gmail.com] Sent: Friday, July 23, 2010 2:17 PM To: solr-user@lucene.apache.org Subject: help with a schema design problem Hi, Lets say i have table with 3 columns document id Party Value and Party Type. In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type: Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier. Now in this table if i use SQL its easy for me find all document with Party Value as Pramod and Party Type as Client. I need to design solr schema so that i can do the same in Solr. If i create 2 fields in solr schema Party value and Party type both of them multi valued and try to query +Pramod +Supplier then solr will return me the first document, even though in the first document Pramod is a client and not a supplier Thanks, Pramod Goyal
Re: filter query on timestamp slowing query???
just wanted to mention a possible other route, which might be entirely hypothetical :-) *If* you could query on internal docid (I'm not sure that it's available out-of-the-box, or if you can at all) your original problem, quoted below, could imo be simplified to asking for the last docid inserted (that match the other criteria from your use-case) and in the next call filter from that docid forward. Every 30 minutes, i ask the index what are the documents that were added to it, since the last time i queried it, that match a certain criteria. From time to time, once a week or so, i ask the index for ALL the documents that match that criteria. (i also do this for not only one query, but several) This is why i need the timestamp filter. Again, I'm not entirely sure that quering / filtering on internal docid's is possible (perhaps someone can comment) but if it is, it would perhaps be more performant. Big IF, I know. Geert-Jan 2010/7/23 Chris Hostetter hossman_luc...@fucit.org : On top of using trie dates, you might consider separating the timestamp : portion and the type portion of the fq into seperate fq parameters -- : that will allow them to to be stored in the filter cache seperately. So : for instance, if you include type:x OR type:y in queries a lot, but : with different date ranges, then when you make a new query, the set for : type:x OR type:y can be pulled from the filter cache and intersected definitely ... that's the one big thing that jumped out at me once you showed us *how* you were constructing these queries. -Hoss
RE: Novice seeking help to change filters to search without diacritics
Hi Steve, This is extremely helpful! What is the best way to also preserve/append the diacritics in the index in case someone searches using them? I deeply appreciate your help! -- View this message in context: http://lucene.472066.n3.nabble.com/Novice-seeking-help-to-change-filters-to-search-without-diacritics-tp971263p990949.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: help with a schema design problem
Multiple rows in the OPs example are combined to form 1 solr-document (e.g: row 1 and 2 both have documentid=1) Because of this combine, it would match p_value from row1 with p_type from row2 (or vice versa) 2010/7/23 Nagelberg, Kallin knagelb...@globeandmail.com When i search p_value:Pramod AND p_type:Supplier it would give me result as document 1. Which is incorrect, since in document 1 Pramod is a Client and not a Supplier. Would it? I would expect it to give you nothing. -Kal -Original Message- From: Geert-Jan Brits [mailto:gbr...@gmail.com] Sent: Friday, July 23, 2010 5:05 PM To: solr-user@lucene.apache.org Subject: Re: help with a schema design problem Is there any way in solr to say p_value[someIndex]=pramod And p_type[someIndex]=client. No, I'm 99% sure there is not. One way would be to define a single field in the schema as p_value_type = client pramod i.e. combine the value from both the field and store it in a single field. yep, for the use-case you mentioned that would definitely work. Multivalued of course, so it can contain Supplier Raj as well. 2010/7/23 Pramod Goyal pramod.go...@gmail.com In my case the document id is the unique key( each row is not a unique document ) . So a single document has multiple Party Value and Party Type. Hence i need to define both Party value and Party type as mutli-valued. Is there any way in solr to say p_value[someIndex]=pramod And p_type[someIndex]=client. Is there any other way i can design my schema ? I have some solutions but none seems to be a good solution. One way would be to define a single field in the schema as p_value_type = client pramod i.e. combine the value from both the field and store it in a single field. On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits gbr...@gmail.com wrote: With the usecase you specified it should work to just index each Row as you described in your initial post to be a seperate document. This way p_value and p_type all get singlevalued and you get a correct combination of p_value and p_type. However, this may not go so well with other use-cases you have in mind, e.g.: requiring that no multiple results are returned with the same document id. 2010/7/23 Pramod Goyal pramod.go...@gmail.com I want to do that. But if i understand correctly in solr it would store the field like this: p_value: Pramod Raj p_type: Client Supplier When i search p_value:Pramod AND p_type:Supplier it would give me result as document 1. Which is incorrect, since in document 1 Pramod is a Client and not a Supplier. On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: I think you just want something like: p_value:Pramod AND p_type:Supplier no? -Kallin Nagelberg -Original Message- From: Pramod Goyal [mailto:pramod.go...@gmail.com] Sent: Friday, July 23, 2010 2:17 PM To: solr-user@lucene.apache.org Subject: help with a schema design problem Hi, Lets say i have table with 3 columns document id Party Value and Party Type. In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type: Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier. Now in this table if i use SQL its easy for me find all document with Party Value as Pramod and Party Type as Client. I need to design solr schema so that i can do the same in Solr. If i create 2 fields in solr schema Party value and Party type both of them multi valued and try to query +Pramod +Supplier then solr will return me the first document, even though in the first document Pramod is a client and not a supplier Thanks, Pramod Goyal
Re: commit is taking very very long time
I am not sure why some commits take very long time. Hmm... Because it merges index segments... How large is your index? Also is there a way to reduce the time it takes? You can disable commit in DIH call and use autoCommit instead. It's kind of hack because you postpone commit operation and make it async. Another option is to set optimize=false in DIH call ( it's true by default ). Also you can try to increase mergeFactor parameter but it would affect search performance.
Re: 2 solr dataImport requests on a single core at the same time
having multiple Request Handlers will not degrade the performance IMO you shouldn't worry unless you have hundreds of them
Re: commit is taking very very long time
On 7/23/10 5:59 PM, Alexey Serba wrote: Another option is to set optimize=false in DIH call ( it's true by default ). Ouch - that should really be changed then. - Mark
Re: Performance issues when querying on large documents
Do you use highlighting? ( http://wiki.apache.org/solr/HighlightingParameters ) Try to disable it and compare performance. On Fri, Jul 23, 2010 at 10:52 PM, ahammad ahmed.ham...@gmail.com wrote: Hello, I have an index with lots of different types of documents. One of those types basically contains extracts of PDF docs. Some of those PDFs can have 1000+ pages, so there would be a lot of stuff to search through. I am experiencing really terrible performance when querying. My whole index has about 270k documents, but less than 1000 of those are the PDF extracts. The slow querying occurs when I search only on those PDF extracts (by specifying filters), and return 100 results. The 100 results definitely adds to the issue, but even cutting that down can be slow. Is there a way to improve querying with such large results? To give an idea, querying for a single word can take a little over a minute, which isn't really viable for an application that revolves around searching. For now, I have limited the results to 20, which makes the query execute in roughly 10-15 seconds. However, I would like to have the option of returning 100 results. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-issues-when-querying-on-large-documents-tp990590p990590.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit not happening
For the sake of any future googlers I'll report my own clueless but thankfully brief struggle with autocommit. There are two parts to the story: Part One is where I realize my autoCommit config was not contained within my updateHandler. In Part Two I realized I had typed autocommit rather than autoCommit. --jay On Fri, Jul 23, 2010 at 2:35 PM, John DeRosa jo...@ipstreet.com wrote: On Jul 23, 2010, at 9:37 AM, John DeRosa wrote: Hi! I'm a Solr newbie, and I don't understand why autocommits aren't happening in my Solr installation. [snip] Never mind... I have discovered my boneheaded mistake. It's so silly, I wish I could retract my question from the archives.
Re: help with a schema design problem
: Is there any way in solr to say p_value[someIndex]=pramod : And p_type[someIndex]=client. : No, I'm 99% sure there is not. it's possibly in code, by utilizing positions and FieldMaskingSpanQuery... http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html ...but there is no QParser or RequestHandler with syntax for exposing it to clients. it would have to be a custom plugin. -Hoss
SOLR Memory Usage - Where does it go?
We have been having problems with SOLR on one project lately. Forgive me for writing a novel here but it's really important that we identify the root cause of this issue. It is becoming unavailable at random intervals, and the problem appears to be memory related. There are basically two ways it goes: 1) Straight up OOM error, either from Java or sometimes from the kernel itself. 2) Instead of throwing an OOM, the memory usage gets very high and then drops precipitously (say, from 92% (of 20GB) down to 60%). Once the memory usage is done dropping, SOLR seems to stop responding to requests altogether. It started out mostly being version #1 of the problem but now we're mostly seeing version #2 of the problem... and it's getting more and more frequent. In either scenario the servlet container (Jetty) needs to be restarted to resume service. The number of documents in the index is always going up. They are relatively small in size (1K per piece max - mostly small numeric strings, with 5 text fields (one each for 5 languages) that are rarely more than 50-100 characters), and there are about 5 million of them at the moment (adding around 1000 every day). The machine has 20 GB of RAM, Xmx is set to 18GB, and SOLR is the only thing this machine / servlet container does. There are a couple other cores configured, but they are miniscule in comparison (one with 20 docs, and two more with 1 docs a piece). Eliminating these other cores does not seem to make any significant impact. This is with the SOLR 1.4.1 release, using the SOLR-236 patch that was recently released to go with this version. The patch was slightly modified in order to ensure that paging continued to work properly - basically, an optimization that eliminated paging was removed per the instructions in this comment: https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12867680page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel #action_12867680 I realize this is not ideal if you want to control memory usage, but the design requirements of the project preclude us from eliminating either collapsing or paging. It's also probably worth noting that these problems did not start with version 1.4.1 or this version of the 236 patch - we actually upgraded from 1.4 because they said it fixed some memory leaks, hoping it would help solve this problem. We have some test machines set up and we have been testing out various configuration changes. Watching the stats in the admin area, this is what we've been able to figure out: 1) The fieldValueCache usage stays constant at 23 entries (one for each faceted field), and takes up a total size of about 750MB altogether. 2) Lowering or just eliminating the filterCache and the queryResultCache does not seem to have any serious impact - perhaps a difference of a few percent at the start, but after prolonged usage the memory still goes up seemingly uncontrolled. It would appear the queryResultCache does not get much usage anyway, and even though we have higher eviction rates in the filterCache, this really doesn't seem to impact performance significantly. 3) Lowering or eliminating the documentCache also doesn't seem to have very much impact in memory usage, although it does make searches much slower. 4) We followed the instructions for configuring the HashDocSet parameter, but this doesn't seem to be having much impact either. 5) All the caches, with the exception of the documentCache, are FastLRUCaches. Switching between FastLRUCache and normal LRUCache in general doesn't seem to change the memory usage. 6) Glancing through all of the data on memory usage in the Lucene fieldCache would indicate that this cache is using well under 1GB of RAM as well. Basically, when the servlet first starts, it uses very little RAM (4%). We warm the searcher with a few standard queries that initialize everything in the fieldValueCache off the bat, and the query performance levels off at a reasonable speed, with memory usage around 10-12%. At this point, almost all queries execute within a few 100ms, if not faster. A very few queries that return large numbers of collapsed documents, generally 800K up to about 2 million (we have about 5 distinct queries that do this), will take up to 20 seconds to run the first time, and up to 10 seconds thereafter. Even after running all these queries, memory usage stays around 20-30%. At this point, performance is optimal. We simulate production usage, running queries taken from those logs through the system at a rate similar to production use. For the most part, memory usage stays level. Usage will go up as queries are run (this seems to correspond with when they are being collapsed), but then go back down as the results are returned. Then, over the course of a few hours, at seemingly random
Re: Autocommit not happening
I'll see you, and raise. My solrconfig.xml wasn't being copied to the server by the deployment script. On Jul 23, 2010, at 3:26 PM, Jay Luker wrote: For the sake of any future googlers I'll report my own clueless but thankfully brief struggle with autocommit. There are two parts to the story: Part One is where I realize my autoCommit config was not contained within my updateHandler. In Part Two I realized I had typed autocommit rather than autoCommit. --jay On Fri, Jul 23, 2010 at 2:35 PM, John DeRosa jo...@ipstreet.com wrote: On Jul 23, 2010, at 9:37 AM, John DeRosa wrote: Hi! I'm a Solr newbie, and I don't understand why autocommits aren't happening in my Solr installation. [snip] Never mind... I have discovered my boneheaded mistake. It's so silly, I wish I could retract my question from the archives.