RE: Workaround needed to sort on Multivalued fields indexed in SOLR
How are you hoping that Sort will work on a multivalued field? Normally, trying to do this makes no sense. For example, if you have two authors for a document: Smith, John Jones, Joe Then would you expect the document to sort under 'S' for Smith, or 'J' for Jones? There's probably not a specific rule to choose one or the other, at least not in a generic sense. If you wanted (for example) to be able to sort by the first author, then you could index just the first author in a separate, non-multivalued field, purely for the sort (while still having all the authors in your multivalued field) Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com Join the conversation: Like us on Facebook! Follow us on Twitter! -Original Message- From: Anupam Bhattacharya [mailto:anupam...@gmail.com] Sent: Thursday, May 17, 2012 1:13 AM To: solr-user@lucene.apache.org Subject: Workaround needed to sort on Multivalued fields indexed in SOLR I have indexed many documents which has a field for authors which is multivalued. field name=authors type=string indexed=true stored=true multiValued =true/ How can I sort order by on this kind of multivalued field ? Pls. suggest any workaround ? Thanks Anupam
RE: Does Solr fit my needs?
Without speaking directly to the indexing and searching of the specific fields, it is certainly possible to retrieve the xml file. While Solr isn't a DB, it does allow a binary field to be associated with an index document. We store a GZipped XML file in a binary field and retrieve that under certain conditions to get at original document information. We've found that Solr can handle these much faster than our DB can do. (We regularly index a large portion of our documents, and the XML files are prone to frequent changes). If you DO keep such a blob in your Solr index, make sure you retrieve that field ONLY when you really want it... Now - if your XML files are relatively static (i.e. only change rarely, or only have new ones) then it still might make sense to use a real DB to store those, and just keep the primary key to the DB row in the Solr index. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com Register for the 2012 COSUGI User Group Conference today for early bird pricing! May 2-5 at Disney's Coronado Springs Resort - Lake Buena Vista, Florida Join the conversation: Like us on Facebook! Follow us on Twitter! -Original Message- From: G.Long [mailto:jde...@gmail.com] Sent: Friday, April 27, 2012 10:32 AM To: solr-user@lucene.apache.org Subject: Does Solr fit my needs? Hi there :) I'm looking for a way to save xml files into some sort of database and i'm wondering if Solr would fit my needs. The xml files I want to save have a lot of child nodes which also contain child nodes with multiple values. The depth level can be more than 10. After having indexed the files, I would like to be able to query for subparts of those xml files and be able to reconstruct them as xml files with all their children included. However, I'm wondering if it is possible with an index like solr lucene to keep or easily recover the structure of my xml data? Thanks for your help, Regards, Gary
RE: UTF-8 encoding
Hi, Henri. Make sure that the container in which you are running Solr is also set for UTF-8. For example, in Tomcat, in the server.xml file, your Connector definitions should include: URIEncoding=UTF-8 Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com Register for the 2012 COSUGI User Group Conference today for early bird pricing! May 2-5 at Disney's Coronado Springs Resort - Lake Buena Vista, Florida Join the conversation: Like us on Facebook! Follow us on Twitter! -Original Message- From: henri.gour...@laposte.net [mailto:henri.gour...@laposte.net] Sent: Thursday, March 29, 2012 10:42 AM To: solr-user@lucene.apache.org Subject: UTF-8 encoding I cant get utf-8 encoding to work!! I havestr name=v.contentTypetext/html;charset=UTF-8/str in my request handler, and input.encoding=UTF-8 output.encoding=UTF-8 in velocity.properties, in various locations (I may have the wrong ones! at least in the folder where the .vm files reside) What else should I be doing/configuring. Thanks Henri -- View this message in context: http://lucene.472066.n3.nabble.com/UTF-8- encoding-tp3867885p3867885.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Best Solr escaping?
I won't guarantee this is the 'best algorithm', but here's what we use. (This is in a final class with only static helper methods): // Set of characters / Strings SOLR treats as having special meaning in a query, and the corresponding Escaped versions. // Note that the actual operators '' and '||' don't show up here - we'll just escape the characters '' and '|' wherever they occur. private static final String[] SOLR_SPECIAL_CHARACTERS = new String[] {+, -, , |, !, (, ), {, }, [, ], ^, \, ~, *, ?, :, \\}; private static final String[] SOLR_REPLACEMENT_CHARACTERS = new String[] {\\+, \\-, \\, \\|, \\!, \\(, \\), \\{, \\}, \\[, \\], \\^, \\\, \\~, \\*, \\?, \\:, }; /** * Escapes all special characters from the Search Terms, so they don't get confused with * the Solr query language special characters. * @param value - Search Term to escape * @return - escaped Search value, suitable for a Solr q parameter */ public static String escapeSolrCharacters(String value) { return StringUtils.replaceEach(value, SOLR_SPECIAL_CHARACTERS, SOLR_REPLACEMENT_CHARACTERS); } Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Bill Bell [mailto:billnb...@gmail.com] Sent: Sunday, September 25, 2011 12:22 AM To: solr-user@lucene.apache.org Subject: Best Solr escaping? What is the best algorithm for escaping strings before sending to Solr? Does someone have some code? A few things I have witnessed in q using DIH handler * Double quotes - that are not balanced can cause several issues from an error (strip the double quote?), to no results. * Should we use + or %20 and what cases make sense: * Dr. Phil Smith or Dr.+Phil+Smith or Dr.%20Phil%20Smith - also what is the impact of double quotes? * Unmatched parenthesis I.e. Opening ( and not closing. * (Dr. Holstein * Cardiologist+(Dr. Holstein Regular encoding of strings does not always work for the whole string due to several issues like white space: * White space works better when we use back quote Bill\ Bell especially when using facets. Thoughts? Code? Ideas? Better Wikis?
RE: select query does not find indexed pdf document
Um - looks like you specified your id value as pdfy, which is reflected in the results from the *:* query, but your id query is searching for vpn, hence no matches... What does this query yield? http://www/SearchApp/select/?q=id:pdfy Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Michael Dockery [mailto:dockeryjava...@yahoo.com] Sent: Monday, September 12, 2011 9:56 AM To: solr-user@lucene.apache.org Subject: Re: select query does not find indexed pdf document http://www/SearchApp/select/?q=id:vpn yeilds this: ?xml version=1.0 encoding=UTF-8 ? - response - lstname=responseHeader intname=status0/int intname=QTime15/int - lstname=params strname=qid:vpn/str /lst /lst result name=responsenumFound=0start=0/ /response * http://www/SearchApp/select/?q=*:* yeilds this: ?xml version=1.0 encoding=UTF-8 ? - response - lstname=responseHeader intname=status0/int intname=QTime16/int - lstname=params strname=q*.*/str /lst /lst - resultname=responsenumFound=1start=0 - doc strname=authordoc/str - arrname=content_type strapplication/pdf/str /arr strname=idpdfy/str datename=last_modified2011-05-20T02:08:48Z/date - arrname=title strdmvpndeploy.pdf/str /arr /doc /result /response From: Jan Høydahl jan@cominvent.com To: solr-user@lucene.apache.org; Michael Dockery dockeryjava...@yahoo.com Sent: Monday, September 12, 2011 4:59 AM Subject: Re: select query does not find indexed pdf document Hi, What do you get from a query http://www/SearchApp/select/?q=*:* or http://www/SearchApp/select/?q=id:vpn ? You may not have mapped the fields correctly to your schema? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 12. sep. 2011, at 02:12, Michael Dockery wrote: I am new to solr. I tried to upload a pdf file via curl to my solr webapp (on tomcat) curl http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdfstream.co ntentType=application/pdfliteral.id=pdfycommit=true ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime860/int/lst /response but http://www/SearchApp/select/?q=vpn does not find the document response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=qvpn/str /lst /lst result name=response numFound=0 start=0/ /response help is appreciated. = fyi I point my test webapp to the index/solr home via mod meta- data/context.xml Context crossContext=true Environment name=solr/home type=java.lang.String value=c:/solr_home override=true / and I had to copy all these jars to my webapp lib dir: (to avoid the classnotfound) Solr_download\contrib\extraction\lib ...in the future i plan to put them in the tomcat/lib dir. Also, I have not modified conf\solrconfig.xml or schema.xml.
RE: select query does not find indexed pdf document
Hi, Michael. Well, the stock answer is, 'it depends' For example - would you want to be able to search filename without searching file contents, or would you always search both of them together? If both, then copy both the file name and the parsed file content from the pdf into a single search field, and you can set that up as the default search field. Or - what kind of processing / normalizing do you want on this data? Case insensitive? Accent insensitive? If a 'word' contains camel case (e.g. TheVeryIdea), do you want that split on the case changes? (but then watch out for things like iPad) If a 'word' contains numbers, do want them left together, or separated? Do you want stemming (where searching for 'stemming' would also find 'stem', 'stemmed', that sort of thing?) Is this always English, or are the other languages involved. Do you want the text processing to be the same for indexing vs searching? Do you want to be able to find hits based on the first few characters of a term? (ngrams) Do you want to be able to highlight text segments where the search terms were found? probably you want to read up on the various tokenizers and filters that are available. Do some prototyping and see how it looks. Here's a starting point: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Basically, there is no 'one size fits all' here. Part of the power of Solr / Lucene is its configurability to achieve the results your business case calls for. Part of the drawback of Solr / Lucene - especially for new folks - is its configurability to achieve the results you business case calls for. :) Anyone got anything else to suggest for Michael? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/ From: Michael Dockery [mailto:dockeryjava...@yahoo.com] Sent: Monday, September 12, 2011 1:18 PM To: Bob Sandiford Subject: Re: select query does not find indexed pdf document thank you. that worked. Any tips for very very basic setup of the schema xml? or is the default basic enough? I basically only want to search search on filename andfile contents From: Bob Sandiford bob.sandif...@sirsidynix.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org; Michael Dockery dockeryjava...@yahoo.com Sent: Monday, September 12, 2011 10:04 AM Subject: RE: select query does not find indexed pdf document Um - looks like you specified your id value as pdfy, which is reflected in the results from the *:* query, but your id query is searching for vpn, hence no matches... What does this query yield? http://www/SearchApp/select/?q=id:pdfy Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.commailto:bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Michael Dockery [mailto:dockeryjava...@yahoo.commailto:dockeryjava...@yahoo.com] Sent: Monday, September 12, 2011 9:56 AM To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org Subject: Re: select query does not find indexed pdf document http://www/SearchApp/select/?q=id:vpn yeilds this: ?xml version=1.0 encoding=UTF-8 ? - response - lstname=responseHeader intname=status0/int intname=QTime15/int - lstname=params strname=qid:vpn/str /lst /lst result name=responsenumFound=0start=0/ /response * http://www/SearchApp/select/?q=*:* yeilds this: ?xml version=1.0 encoding=UTF-8 ? - response - lstname=responseHeader intname=status0/int intname=QTime16/int - lstname=params strname=q*.*/str /lst /lst - resultname=responsenumFound=1start=0 - doc strname=authordoc/str - arrname=content_type strapplication/pdf/str /arr strname=idpdfy/str datename=last_modified2011-05-20T02:08:48Z/date - arrname=title strdmvpndeploy.pdf/str /arr /doc /result /response From: Jan Høydahl jan@cominvent.commailto:jan@cominvent.com To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org; Michael Dockery dockeryjava...@yahoo.commailto:dockeryjava...@yahoo.com Sent: Monday, September 12, 2011 4:59 AM Subject: Re: select query does not find indexed pdf document Hi, What do you get from a query http://www/SearchApp/select/?q=*:* or http://www/SearchApp/select/?q=id:vpn ? You may not have mapped the fields correctly to your schema? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 12. sep. 2011, at 02:12, Michael Dockery wrote: I am new to solr. I tried to upload a pdf file via curl to my solr webapp (on tomcat) curl http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdfstream.co ntentType=application/pdfliteral.id=pdfycommit=true ?xml version=1.0 encoding=UTF-8? response lst name
RE: Sentence aware Highlighter
What if you were to make your field a multi-valued field, and at indexing time, split up the text into sentences, putting each sentence into the solr document as one of the values for the mv field? Then I think the normal highlighting code can be used to pull the entire value (i.e. sentence) of a matching mv instance within your document? I.E. put the 'overhead' into the index step, rather than trying to do it at search time? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Monday, September 05, 2011 10:33 AM To: solr-user@lucene.apache.org Subject: Re: Sentence aware Highlighter (11/09/05 23:09), O. Klein wrote: Using the regex in the old highlighter I had reasonable sentence aware highlighting, but speed is an issue. So I tried to get this working with the VFH, but this obviously didn't work with the regex. So I am looking for ways to get the same behavior but with improved speed and came across https://issues.apache.org/jira/browse/LUCENE-1824, which at least would be a small improvement, but the last comment confused me, as I thought FVH was going to be the new highlighter for Solr. So this patch would make some sense if im not mistaken. Nonetheless has anyone managed to make something like a SentenceAwareFragmentsBuilder? Or have some advise on how to realise this? Sorry for the long delay on the issue! I'd like to take a look into it in this week. Hopefully, BreakIterator may be used, which Robert mentioned in the JIRA. Thank you for your patience! koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
RE: performance crossover between single index and sharding
Dumb question time - you are using a 64 bit Java, and not a 32 bit Java? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] Sent: Thursday, August 04, 2011 2:39 AM To: solr-user@lucene.apache.org Subject: Re: performance crossover between single index and sharding Hi Shawn, the 0.05 seconds for search time at peek times (3 qps) is my target for Solr. The numbers for solr are from Solr's statistic report page. So 39.5 seconds average per request is definately to long and I have to change to sharding. For FAST system the numbers for the search dispatcher are: 0.042 sec elapsed per normal search, on avg. 0.053 sec average uncached normal search time (last 100 queries). 99.898% of searches using 1 sec 99.999% of searches using 3 sec 0.000% of all requests timed out 22454567.577 sec time up (that is 259 days) Is there a report page for those numbers for Solr? About the RAM, the 32GB RAM sind physical for each VM and the 20GB RAM are -Xmx for Java. Yesterday I noticed that we are running out of heap during replication so I have to increase -Xmx to about 22g. The reported 0.6 average requests per second seams to me right because the Solr system isn't under full load yet. The FAST system is still taking most of the load. I plan to switch completely to Solr after sharding is up and running stable. So there will be additional 3 qps to Solr at peek times. I don't know if a controlling master like FAST makes any sense for Solr. The small VMs with heartbeat and haproxy sounds great, must be on my todo list. But the biggest problem currently is, how to configure the DIH to split up the content to several indexer. Is there an indexing distributor? Regards, Bernd Am 03.08.2011 16:33, schrieb Shawn Heisey: Replies inline. On 8/3/2011 2:24 AM, Bernd Fehling wrote: To show that I compare apples and oranges here are my previous FAST Search setup: - one master server (controlling, logging, search dispatcher) - six index server (4.25 mio docs per server, 5 slices per index) (searching and indexing at the same time, indexing once per week during the weekend) - each server has 4GB RAM, all servers are physical on seperate machines - RAM usage controlled by the processes - total of 25.5 mio. docs (mainly metadata) from 1500 databases worldwide - index size is about 67GB per indexer -- about 402GB total - about 3 qps at peek times - with average search time of 0.05 seconds at peek times An average query time of 50 milliseconds isn't too bad. If the number from your Solr setup below (39.5) is the QTime, then Solr thinks it is performing better, but Solr's QTime does not include absolutely everything that hs to happen. Do you by chance have 95th and 99th percentile query times for either system? And here is now my current Solr setup: - one master server (indexing only) - two slave server (search only) but only one is online, the second is fallback - each server has 32GB RAM, all server are virtuell (master on a seperate physical machine, both slaves together on a physical machine) - RAM usage is currently 20GB to java heap - total of 31 mio. docs (all metadata) from 2000 databases worldwide - index size is 156GB total - search handler statistic report 0.6 average requests per second - average time per request 39.5 (is that seconds?) - building the index from scratch takes about 20 hours I can't tell whether you mean that each physical host has 32GB or each VM has 32GB. You want to be sure that you are not oversubscribing your memory. If you can get more memory in your machines, you really should. Do you know whether that 0.6 seconds is most of the delay that a user sees when making a search request, or are there other things going on that contribute more delay? In our webapp, the Solr request time is usually small compared with everything else the server and the user's browser are doing to render the results page. As much as I hate being the tall pole in the tent, I look forward to the day when the developers can change that balance. The good thing is I have the ability to compare a commercial product and enterprise system to open source. I started with my simple Solr setup because of kiss (keep it simple and stupid). Actually it is doing excellent as single index on a single virtuell server. But the average time per request should be reduced now, thats why I started this discussion. While searches with smaller Solr index size (3 mio. docs) showed that it can stand with FAST Search it now shows that its time to go with sharding. I think we are already far behind the point of search performance crossover. What I hope to get with sharding: - reduce time
RE: previous and next rows of current record
Well, it sort of depends on what you mean by the 'previous' and the 'next' record. Do you have some type of sequencing built into your concept of your solr / lucene indexes? Do you have sequential id's? i.e. What's the use case, and what's the data available to support your use case? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Jonty Rhods [mailto:jonty.rh...@gmail.com] Sent: Thursday, July 21, 2011 2:18 PM To: solr-user@lucene.apache.org Subject: Re: previous and next rows of current record Pls help.. On Thursday, July 21, 2011, Jonty Rhods jonty.rh...@gmail.com wrote: Hi, Is there any special query in solr to get the previous and next record of current record.I am getting single record detail using id from solr server. I need to get next and previous on detail page. regardsJonty
RE: previous and next rows of current record
But - what is it that makes '9' the next id after '5'? why not '6'? Or '91238412'? or '4'? i.e. you still haven't answered the question about what 'next' and 'previous' really means... But - if you already know that '9' is the next page, why not just do another query with id '9' to get the next record? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Jonty Rhods [mailto:jonty.rh...@gmail.com] Sent: Thursday, July 21, 2011 2:33 PM To: solr-user@lucene.apache.org Subject: Re: previous and next rows of current record Hi in my case there is no id sequence. id is generated sequence wise for all category. but when we filter by category then same id become random. If i m on detail page which have id 5 and nrxt id is 9 so on same page my requirment is to get next id is 9. On Thursday, July 21, 2011, Bob Sandiford bob.sandif...@sirsidynix.com wrote: Well, it sort of depends on what you mean by the 'previous' and the 'next' record. Do you have some type of sequencing built into your concept of your solr / lucene indexes? Do you have sequential id's? i.e. What's the use case, and what's the data available to support your use case? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Jonty Rhods [mailto:jonty.rh...@gmail.com] Sent: Thursday, July 21, 2011 2:18 PM To: solr-user@lucene.apache.org Subject: Re: previous and next rows of current record Pls help.. On Thursday, July 21, 2011, Jonty Rhods jonty.rh...@gmail.com wrote: Hi, Is there any special query in solr to get the previous and next record of current record.I am getting single record detail using id from solr server. I need to get next and previous on detail page. regardsJonty
RE: Restricting the Solr Posting List (retrieved set)
A good answer may also depend on WHY you are wanting to restrict to 500K documents. Are you seeking to reduce the time spent by Solr in determining the doc count? Are you just wanting to prevent people from moving too far into the result set? Is it case that you can only display 6 digits for your return count? :) If Solr is performing adequately, you could always just artificially restrict the result set. Solr doesn't actually 'return' all 5M documents - it only returns the number you have specified in your query (as well as having some cache for the next results in anticipation of a subsequent query). So, if the total count returned exceeds 500K, then just report 500K as the number of results, and similarly restrict how far a user can page through the results... (And - you can (and sounds like you should) sort your results by descending post date so that you do in fact get the most recent ones coming back first...) Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Monday, July 11, 2011 7:43 AM To: solr-user@lucene.apache.org Subject: Re: Restricting the Solr Posting List (retrieved set) We want to search in an index in such a way that even if a clause has a long posting list - Solr should stop collecting documents for the clause after receiving X documents that match the clause. For example, if for query India,solr can return 5M documents, we would like to restrict the set at only 500K documents. The assumption is that since we are posting chronologically - we would like the X most recent documents to be matched for the clause only. Is it possible anyway? Looks like your use-case is suitable for time based sharding. http://wiki.apache.org/solr/DistributedSearch Lets say you divide your shards according to months. You will have a separate core for each month. http://wiki.apache.org/solr/CoreAdmin When a query comes in, you will hit the most recent core. If you don't obtain enough results add a new value (previous month core) to shards= parameter.
RE: updating existing data in index vs inserting new data in index
What are you using as the unique id in your Solr index? It sounds like you may have one value as your Solr index unique id, which bears no resemblance to a unique[1] id derived from your data... Or - another way to put it - what is it that makes these two records in your Solr index 'the same', and what are the unique id's for those two entries in the Solr index? How are those id's related to your original data? [1] not only unique, but immutable. I.E. if you update a row in your database, the unique id derived from that row has to be the same as it would have been before the update. Otherwise, there's nothing for Solr to recognize as a duplicate entry, and do a 'delete' and 'insert' instead of just an 'insert'. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Mark juszczec [mailto:mark.juszc...@gmail.com] Sent: Thursday, July 07, 2011 9:15 AM To: solr-user@lucene.apache.org Subject: updating existing data in index vs inserting new data in index Hello all I'm using Solr 3.2 and am confused about updating existing data in an index. According to the DataImportHandler Wiki: *delta-import* : For incremental imports and change detection run the command `http://host:port/solr/dataimport?command=delta-import . It supports the same clean, commit, optimize and debug parameters as full-import command. I know delta-import will find new data in the database and insert it into the index. My problem is how it handles updates where I've got a record that exists in the index and the database, the database record is changed and I want to incorporate those changes in the existing record in the index. IOW I don't want to insert it again. I've tried this and wound up with 2 records with the same key in the index. The first contains the original db values found when the index was created, the 2nd contains the db values after the record was changed. I've also found this http://search.lucidimagination.com/search/out?u=http%3A%2F%2Flucene.4720 66.n3.nabble.com%2FDelta-import-with-solrj-client-tp1085763p1086173.html the subject is 'Delta-import with solrj client' Greetings. I have a *solrj* client for fetching data from database. I am using *delta*-*import* for fetching data. If a column is changed in database using timestamp with *delta*-*import* i get the latest column indexed but there are *duplicate* values in the index similar to the column but the data is older. This works with cleaning the index but i want to update the index without cleaning it. Is there a way to just update the index with the updated column without having *duplicate* values. Appreciate for any feedback. Hando There are 2 responses: Short answer is no, there isn't a way. *Solr* doesn't have the concept of 'Update' to an indexed document. You need to add the full document (all 'columns') each time any one field changes. If doing that in your DataImportHandler logic is difficult you may need to write a separate Update Service that does: 1) Read UniqueID, UpdatedColumn(s) from database 2) Using UniqueID Retrieve document from *Solr* 3) Add/Update field(s) with updated column(s) 4) Add document back to *Solr* Although, if you use DIH to do a full *import*, using the same query in your *Delta*-*Import* to get the whole document shouldn't be that difficult. and Hi, Make sure you use a proper ID field, which does *not* change even if the content in the database changes. In this way, when your *delta*-*import* fetches changed rows to index, they will update the existing rows in your index. I have an ID field that doesn't change. It is the primary key field from the database table I am trying to index and I have verified it is unique. So, does Solr allow updates (not inserts) of existing records? Is anyone able to do this? Mark
RE: updating existing data in index vs inserting new data in index
Hi, Mark. I haven't used DIH myself - so I'll need to leave comments on your set up to others who have done so. Another question - after your initial index create (and after each delta), do you run a 'commit'? Do you run an 'optimize'? (Without the optimize, 'deleted' records still show up in query results...) Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Mark juszczec [mailto:mark.juszc...@gmail.com] Sent: Thursday, July 07, 2011 10:04 AM To: solr-user@lucene.apache.org Subject: Re: updating existing data in index vs inserting new data in index Bob Thanks very much for the reply! I am using a unique integer called order_id as the Solr index key. My query, deltaQuery and deltaImportQuery are below: entity name=item1 pk=ORDER_ID query=select 1 as TABLE_ID , orders.order_id, orders.order_booked_ind, orders.order_dt, orders.cancel_dt, orders.account_manager_id, orders.of_header_id, orders.order_status_lov_id, orders.order_type_id, orders.approved_discount_pct, orders.campaign_nm, orders.approved_by_cd,orders.advertiser_id, orders.agency_id from orders deltaImportQuery=select 1 as TABLE_ID, orders.order_id, orders.order_booked_ind, orders.order_dt, orders.cancel_dt, orders.account_manager_id, orders.of_header_id, orders.order_status_lov_id, orders.order_type_id, orders.approved_discount_pct, orders.campaign_nm, orders.approved_by_cd,orders.advertiser_id, orders.agency_id from orders where orders.order_id = '${dataimporter.delta.ORDER_ID}' deltaQuery=select orders.order_id from orders where orders.change_dt to_date('${dataimporter.last_index_time}','-MM-DD HH24:MI:SS') /entity The test I am running is two part: 1. After I do a full import of the index, I insert a brand new record (with a never existed before order_id) in the database. The delta import picks this up just fine. 2. After the full import, I modify a record with an order_id that already shows up in the index. I have verified there is only one record with this order_id in both the index and the db before I do the delta update. I guess the question is, am I screwing myself up by defining my own Solr index key? I want to, ultimately, be able to search on ORDER_ID in the Solr index. However, the docs say (I think) a field does not have to be the Solr primary key in order to be searchable. Would I be better off letting Solr manage the keys? Mark On Thu, Jul 7, 2011 at 9:24 AM, Bob Sandiford bob.sandif...@sirsidynix.comwrote: What are you using as the unique id in your Solr index? It sounds like you may have one value as your Solr index unique id, which bears no resemblance to a unique[1] id derived from your data... Or - another way to put it - what is it that makes these two records in your Solr index 'the same', and what are the unique id's for those two entries in the Solr index? How are those id's related to your original data? [1] not only unique, but immutable. I.E. if you update a row in your database, the unique id derived from that row has to be the same as it would have been before the update. Otherwise, there's nothing for Solr to recognize as a duplicate entry, and do a 'delete' and 'insert' instead of just an 'insert'. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Mark juszczec [mailto:mark.juszc...@gmail.com] Sent: Thursday, July 07, 2011 9:15 AM To: solr-user@lucene.apache.org Subject: updating existing data in index vs inserting new data in index Hello all I'm using Solr 3.2 and am confused about updating existing data in an index. According to the DataImportHandler Wiki: *delta-import* : For incremental imports and change detection run the command `http://host:port/solr/dataimport?command=delta-import . It supports the same clean, commit, optimize and debug parameters as full-import command. I know delta-import will find new data in the database and insert it into the index. My problem is how it handles updates where I've got a record that exists in the index and the database, the database record is changed and I want to incorporate those changes in the existing record in the index. IOW I don't want to insert it again. I've tried this and wound up with 2 records with the same key in the index. The first contains the original db values found when the index was created, the 2nd contains the db values after the record was changed. I've also found this http://search.lucidimagination.com/search/out?u=http%3A%2F%2Flucene.4720 66.n3.nabble.com%2FDelta-import-with-solrj-client- tp1085763p1086173.html the subject is 'Delta-import
Solr just 'hangs' under load test - ideas?
Hi, all. I'm hoping someone has some thoughts here. We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the getLuceneVersion() calls, but use luceneMatchVersion directly). We're running in a Tomcat instance, 64 bit Java. CATALINA_OPTS are: -Xmx7168m -Xms7168m -XX:MaxPermSize=256M We're running 2 Solr cores, with the same schema. We use SolrJ to run our searches from a Java app running in JBoss. JBoss, Tomcat, and the Solr Index folders are all on the same server. In case it's relevant, we're using JMeter as a load test harness. We're running on Solaris, a 16 processor box with 48GB physical memory. I've run a successful load test at a 100 user load (at that rate there are about 5-10 solr searches / second), and solr search responses were coming in under 100ms. When I tried to ramp up, as far as I can tell, Solr is just hanging. (We have some logging statements around the SolrJ calls - just before, we log how long our query construction takes, then we run the SolrJ query and log the search times. We're getting a number of the query construction logs, but no corresponding search time logs). Symptoms: The Tomcat and JBoss processes show as well under 1% CPU, and they are still the top processes. CPU states show around 99% idle. RES usage for the two Java processes around 3GB each. LWP under 120 for each. STATE just shows as sleep. JBoss is still 'alive', as I can get into a piece of software that talks to our JBoss app to get data. We set things up to use log4j logging for Solr - the log isn't showing any errors or exceptions. We're not indexing - just searching. Back in January, we did load testing on a prototype, and had no problems (though that was Solr 1.4 at the time). It ramped up beautifully - bottle necks were our apps, not Solr. What I'm benchmarking now is a descendent of that prototyping - a bit more complex on searches and more fields in the schema, but same basic search logic as far as SolrJ usage. Any ideas? What else to look at? Ringing any bells? I can send more details if anyone wants specifics... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/
RE: Solr just 'hangs' under load test - ideas?
OK - I figured it out. It's not solr at all (and I'm not really surprised). In the prototype benchmarks, we used a different instance of tomcat than we're using for production load tests. Our prototype tomcat instance had no maxThreads value set, so was using the default value of 200. The production tomcat environment has a maxThreads value of 15 - we were just running out of threads and getting connection refused exceptions thrown when we ramped up the Solr hits past a certain level. Thanks for considering, Yonik (and any others waiting to see any reply I made)... (As others have said - this listserv is great!) Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, June 29, 2011 12:18 PM To: solr-user@lucene.apache.org Subject: Re: Solr just 'hangs' under load test - ideas? Can you get a thread dump to see what is hanging? -Yonik http://www.lucidimagination.com On Wed, Jun 29, 2011 at 11:45 AM, Bob Sandiford bob.sandif...@sirsidynix.com wrote: Hi, all. I'm hoping someone has some thoughts here. We're running Solr 3.1 (with the patch for SolrQueryParser.java to not do the getLuceneVersion() calls, but use luceneMatchVersion directly). We're running in a Tomcat instance, 64 bit Java. CATALINA_OPTS are: -Xmx7168m -Xms7168m -XX:MaxPermSize=256M We're running 2 Solr cores, with the same schema. We use SolrJ to run our searches from a Java app running in JBoss. JBoss, Tomcat, and the Solr Index folders are all on the same server. In case it's relevant, we're using JMeter as a load test harness. We're running on Solaris, a 16 processor box with 48GB physical memory. I've run a successful load test at a 100 user load (at that rate there are about 5-10 solr searches / second), and solr search responses were coming in under 100ms. When I tried to ramp up, as far as I can tell, Solr is just hanging. (We have some logging statements around the SolrJ calls - just before, we log how long our query construction takes, then we run the SolrJ query and log the search times. We're getting a number of the query construction logs, but no corresponding search time logs). Symptoms: The Tomcat and JBoss processes show as well under 1% CPU, and they are still the top processes. CPU states show around 99% idle. RES usage for the two Java processes around 3GB each. LWP under 120 for each. STATE just shows as sleep. JBoss is still 'alive', as I can get into a piece of software that talks to our JBoss app to get data. We set things up to use log4j logging for Solr - the log isn't showing any errors or exceptions. We're not indexing - just searching. Back in January, we did load testing on a prototype, and had no problems (though that was Solr 1.4 at the time). It ramped up beautifully - bottle necks were our apps, not Solr. What I'm benchmarking now is a descendent of that prototyping - a bit more complex on searches and more fields in the schema, but same basic search logic as far as SolrJ usage. Any ideas? What else to look at? Ringing any bells? I can send more details if anyone wants specifics... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/
RE: MultiValued facet behavior question
the facet field name. 3) You'll want to read up on the MemoryIndex class to see more about how it works, rather than me re-iterating that here. [1] Caveats 1) We didn't do anything with the date type faceting, or with any ranges. 2) We didn't do anything with Facet prefix handling - it may or may not work if you need prefixes. 3) Anything else that facets do that we didn't handle - or at least, didn't test :) As I say, it's a very special case for us, and this is in no way intended to be a general solution or fit for 'prime time' submission as a Solr enhancement. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Bill Bell [mailto:billnb...@gmail.com] Sent: Wednesday, June 22, 2011 3:49 AM To: solr-user@lucene.apache.org Subject: Re: MultiValued facet behavior question You can type q=cardiology and match on cardiologist. If stemming did not work you can just add a synonym: cardiology,cardiologist But that is not the issue. The issue is around multiValue fields and facets. You would expect a user Who is searching on the multiValued field to match on some values in there. For example, they type Cardiologist and it matches on the value Cardiologist. So it matches in the multiValue field. So that part works. Then when I output the facet, I need a different behavior than the default. I need The facet to only output the value that matches (scored) - NOT ALL VALUES in the multiValued field. I think it makes sense? On 6/22/11 1:42 AM, Michael Kuhlmann s...@kuli.org wrote: Am 22.06.2011 05:37, schrieb Bill Bell: It can get more complicated. Here is another example: q=cardiologydefType=dismaxqf=specialties (Cardiology and cardiologist are stems)... But I don't really know which value in Cardiologist match perfectly. Again, I only want it to return: Cardiologist: 3 You would never get Cardiologist: 3 as the facet result, because if Cardiologist would be in your index, it's impossible to find it when searching for cardiology (except when you manage to write some strange tokenizer that translates cardiology to Cardiologist on query time, including the upper case letter). Facets are always taken from the index, so they usually match exactly or never when querying for it. -Kuli
RE: difficult sort
What if you were to set up a new field, which is the concatenation of your 'field' and 'category group', and then facet on that? How many combinations would we be talking about here? And - against what field(s) do you run your query? We did something a bit similar, where we wanted an 'author' search, where 'author' is a field in our documents. We have a field set up based on 'author' to search against, as well as a field based on 'author' for faceting. We search against the author field, return 0 results but all the facet values, and then display the facet values with their counts, and when the users select one, then we issue a new query to return all documents with that author facet value. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: lee carroll [mailto:lee.a.carr...@googlemail.com] Sent: Friday, June 17, 2011 5:47 AM To: solr-user@lucene.apache.org Subject: difficult sort Is this possible in 1.4.1 Return a result set sorted by a field but within Categorical groups, limited to 1 record per group Something like: group1 xxx (bottom of sorted field within group) group2 xxx (bottom of sorted field within group) etc is the only approach to issue multiple queries and collate in the front end app ? cheers lee c
RE: Copying few field using copyField to non multiValued field
Omri - you need to indicate to Solr that your at_location field can accept multiple values. Add this to the field declaration: multiValued=true See this reference for more information / options: http://wiki.apache.org/solr/SchemaXml Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Omri Cohen [mailto:omri...@gmail.com] Sent: Wednesday, June 15, 2011 8:00 AM To: solr-user@lucene.apache.org Subject: Copying few field using copyField to non multiValued field Hello all, in my schema.xml i have this fields: field name=at_location type=text indexed=index stored=true required=false / field name=at_country type=text indexed=index stored=true required=false / field name=at_city type=text indexed=index stored=true required=false / field name=at_state type=text indexed=index stored=true required=false /. I am trying to do the following: copyField source=at_city dest=at_location/ copyField source=at_state dest=at_location/ copyField source=at_country dest=at_location/ I am getting the next exception: ERROR: multiple values encountered for non multiValued copy field at_location some one has any idea, how I solve this without changing at_location to multiField? thanks *Omri Cohen* Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3- 6036295 My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric [image: Twitter] http://www.twitter.com/omricohe [image: WordPress]http://omricohen.me Please consider your environmental responsibility. Before printing this e-mail message, ask yourself whether you really need a hard copy. IMPORTANT: The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email by mistake, please notify the sender immediately and do not disclose the contents to anyone or make copies thereof. Signature powered by http://www.wisestamp.com/email- install?utm_source=extensionutm_medium=emailutm_campaign=footer WiseStamphttp://www.wisestamp.com/email- install?utm_source=extensionutm_medium=emailutm_campaign=footer
RE: Copying few field using copyField to non multiValued field
Oops - sorry - missed that... Well, the multiValued setting is explicitly to allow multiple values. So - what's your actual use case - i.e. why do you want multiple values in a field, but not want it to be multiValued? What's the problem you're trying to solve here? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Omri Cohen [mailto:omri...@gmail.com] Sent: Wednesday, June 15, 2011 8:42 AM To: solr-user@lucene.apache.org Subject: Re: Copying few field using copyField to non multiValued field thanks for the quick response, though as I said in my original post: *some one has any idea, how I solve this without changing at_location to multiField? * thank you very much though * * *Omri Cohen* Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3- 6036295 My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric [image: Twitter] http://www.twitter.com/omricohe [image: WordPress]http://omricohen.me Please consider your environmental responsibility. Before printing this e-mail message, ask yourself whether you really need a hard copy. IMPORTANT: The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email by mistake, please notify the sender immediately and do not disclose the contents to anyone or make copies thereof. Signature powered by http://www.wisestamp.com/email- install?utm_source=extensionutm_medium=emailutm_campaign=footer WiseStamphttp://www.wisestamp.com/email- install?utm_source=extensionutm_medium=emailutm_campaign=footer On Wed, Jun 15, 2011 at 3:21 PM, Bob Sandiford bob.sandif...@sirsidynix.com wrote: Omri - you need to indicate to Solr that your at_location field can accept multiple values. Add this to the field declaration: multiValued=true See this reference for more information / options: http://wiki.apache.org/solr/SchemaXml Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Omri Cohen [mailto:omri...@gmail.com] Sent: Wednesday, June 15, 2011 8:00 AM To: solr-user@lucene.apache.org Subject: Copying few field using copyField to non multiValued field Hello all, in my schema.xml i have this fields: field name=at_location type=text indexed=index stored=true required=false / field name=at_country type=text indexed=index stored=true required=false / field name=at_city type=text indexed=index stored=true required=false / field name=at_state type=text indexed=index stored=true required=false /. I am trying to do the following: copyField source=at_city dest=at_location/ copyField source=at_state dest=at_location/ copyField source=at_country dest=at_location/ I am getting the next exception: ERROR: multiple values encountered for non multiValued copy field at_location some one has any idea, how I solve this without changing at_location to multiField? thanks *Omri Cohen* Co-founder @ yotpo.com | o...@yotpo.com | +972-50-7235198 | +972-3- 6036295 My profiles: [image: LinkedIn] http://www.linkedin.com/in/omric [image: Twitter] http://www.twitter.com/omricohe [image: WordPress]http://omricohen.me Please consider your environmental responsibility. Before printing this e-mail message, ask yourself whether you really need a hard copy. IMPORTANT: The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email by mistake, please notify the sender immediately and do not disclose the contents to anyone or make copies thereof. Signature powered by http://www.wisestamp.com/email- install?utm_source=extensionutm_medium=emailutm_campaign=footer WiseStamphttp://www.wisestamp.com/email- install?utm_source=extensionutm_medium=emailutm_campaign=footer
RE: Text field case sensitivity problem
Unfortunately, wild card search terms don't get processed by the analyzers. One suggestion that's fairly common is to make sure you lower case your wild card search terms yourself before issuing the query. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Jamie Johnson [mailto:jej2...@gmail.com] Sent: Tuesday, June 14, 2011 5:13 PM To: solr-user@lucene.apache.org Subject: Re: Text field case sensitivity problem Also of interest to me is this returns results http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kristine On Tue, Jun 14, 2011 at 5:08 PM, Jamie Johnson jej2...@gmail.com wrote: I am using the following for my text field: fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType I have a field defined as field name=Person_Name type=text stored=true indexed=true / when I execute a go to the following url I get results http://localhost:8983/solr/select?defType=luceneq=Person_Name:kris* but if I do http://localhost:8983/solr/select?defType=luceneq=Person_Name:Kris* I get nothing. I thought the LowerCaseFilterFactory would have handled lowercasing both the query and what is being indexed, am I missing something?
Odd (i.e. wrong) File Names in 3.1 distro source zip
Hi, all. I just downloaded the apache-solr-3.1.0-src.gz file, and unzipped that. I see inside there, a apache-solr-3.1.0-src file, and tried unzipping that. There weren't any errors, but as I look inside the apache-solr-3.0.1-src file, I see that not all the java code (for example) ended up being unzipped with a .java extension. For example, in the path apache-solr-3.1.0\lucene\backwards\src\test\org\apache\lucene\analysis\tokenattributes I see two files: TestSimpleAtt100644 TestTermAttri100644 Any ideas? Is there some specific tool I should be using to expand these? I'm doing this in Windows XP. Thanks! Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/ Join the conversation - you may even get an iPad or Nook out of it! [cid:image002.jpg@01CC1FC6.2324C620]http://www.facebook.com/SirsiDynixLike us on Facebook! [cid:image004.jpg@01CC1FC6.2324C620]http://twitter.com/#!/SirsiDynixFollow us on Twitter!
RE: highlighting in multiValued field
What is your actual query? Did you look at the hl.snippets parameter? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com Join the conversation - you may even get an iPad or Nook out of it! Like us on Facebook! Follow us on Twitter! -Original Message- From: Jeffrey Chang [mailto:jclal...@gmail.com] Sent: Thursday, May 26, 2011 11:10 PM To: solr-user@lucene.apache.org Subject: highlighting in multiValued field Hi All, I am having a problem with search highlighting for multiValued fields and am wondering if someone can point me in the right direction. I have in my schema a multiValued field as such: field name=description type=text stored=true indexed=true multiValued=true/ When I search for term Tel, it returns me the correct doc: doc ... arr name=description strTel to talent 1/str strTel to talent 2/str /arr ... /doc When I enable highlighting, it returns me the following highlight with only one vector returned: ... lst name=highlighting lst name=1 arr name=description stremTel/em to talent 1/str /arr /lst /lst What I'm expecting is actually both vectors to be returned as such: lst name=highlighting lst name=1 arr name=description stremTel/em to talent 1/str stremTel/em to talent 2/str /arr /lst /lst Am I doing something wrong in my config or query (I'm using default)? Any help is appreciated. Thanks, Jeff
RE: highlighting in multiValued field
The only thing I can think of is to post-process your snippets. I.E. pull the highlighting tags out of the strings, look for the match in your result description field looking for a match, and if you find one, replace that description with the original highlight text (i.e. with the highlight tags still in place). Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com Join the conversation - you may even get an iPad or Nook out of it! Like us on Facebook! Follow us on Twitter! -Original Message- From: Jeffrey Chang [mailto:jclal...@gmail.com] Sent: Friday, May 27, 2011 12:16 AM To: solr-user@lucene.apache.org Subject: Re: highlighting in multiValued field Hi Bob, I have no idea how I missed that! Thanks for pointing me to use hl.snippets - that did the magic! Please allow me squeeze one more question along the same line. Since I'm now able to display multiple snippets - what I'm trying to achieve is, determine which highlighted snippet maps back to what position in the original document. e.g. If I search for Tel, with highlighting and hl.snippets=2 it'll return me: doc ... arr name=descID str1/str str2/str str3/str /arr arr name=description strTel to talent 1/str strTel to talent 2/str strTel to talent 3/str /arr ... /doc lst name=highlighting lst name=1 arr name=description stremTel/em to talent 1/str stremTel/em to talent 2/str /arr /lst ... Is there a way for me to figure out which highlighted snippet belongs to which descID so I can display also display the non-highlighted rows for my search results. Or is this not the way how highlighting is designed and to be used? Thanks so much, Jeff [snip]
RE: Document match with no highlight
Don't you need to include your unique id field in your 'fl' parameter? It will be needed anyways so you can match up the highlight fragments with the result docs once highlighting is working... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com Join the conversation - you may even get an iPad or Nook out of it! Like us on Facebook! Follow us on Twitter! -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, May 12, 2011 7:10 AM To: solr-user@lucene.apache.org Subject: Re: Document match with no highlight URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%2 23+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexpl ainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=flDOC_TEXT,score/str str name=start0/str str name=qDOC_TEXT:3 1 15/str str name=qtstandard/str str name=fq/ /lst /lst result name=response numFound='1 start=0 maxScore=0.035959315 doc float name=score0.035959315/float arr name=DOC_TEXTstr ... /str/arr doc /result lst name=highlighting lst name=123456/ /lst lst name=debug str name=rawquerystringDOC_TEXT:3 1 15/str str name=querystringDOC_TEXT:3 1 15/str str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str str name=parsedquery_toStringDOC_TEXT:3 1 15/str lst name=explain str name=123456 0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 = tf(phraseFreq=1.0) 0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1) 0.0390625 = fieldNorm(field=DOC_TEXT, doc=0) /str /lst str name=QParserLuceneQParser/str arr name=filter_queries str/ /arr arr name=parsed_filter_queries/ lst name=timing ... /lst /response Nothing looks suspicious. Can you provide two things more; fieldType of DOC_TEXT and field definition of DOC_TEXT. Also do you get snippet from the same doc, when you remove quotes from your query?
Test Post
Hi, all. Sorry for the 'spam' - I'm just testing that my posts are actually being seen. I've sent a few queries over the past couple of weeks and haven't had a single response :( Anyways - if one or two would respond to this, I'd appreciate it - just to let me know that I'm being ignored, vs unseen :) Thanks! Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/ Join the conversation - you may even get an iPad or Nook out of it! [cid:image002.jpg@01CC0F1E.A9E7FB90]http://www.facebook.com/SirsiDynixLike us on Facebook! [cid:image004.jpg@01CC0F1E.A9E7FB90]http://twitter.com/#!/SirsiDynixFollow us on Twitter!
Problems with Spellchecker in 3.1
strLANGUAGE_facet/str strPUBDATE_nfacet/str strSUBJECT_facet/str strABCDEF_cfacet/str /arr str name=qtspellcheckedStandard/str arr name=fq strACCESS_LEVEL_nfacet:0/str strCLEARANCE_nfacet:0/str strNEED_TO_KNOWS_facet:@@EMPTY@@/str strCITIZENSHIPS_facet:@@EMPTY@@/str strRESTRICTIONS_facet:@@EMPTY@@/str /arr str name=facet.mincount1/str str name=indenttrue/str str name=hl.fl*/str str name=rows12/str str name=hl.snippets5/str str name=start0/str str name=qTITLE_boost:kljhklsdjahfkljsdhf book rck~100^200.0 OR PRIMARY_AUTHOR_boost:kljhklsdjahfkljsdhf book rck~100^100.0 OR DOC_TEXT:kljhklsdjahfkljsdhf book rck~100^2 OR PRIMARY_TITLE_boost:kljhklsdjahfkljsdhf book rck~100^1000.0 OR AUTHOR_boost:kljhklsdjahfkljsdhf book rck~100^20.0 OR textFuzzy:kljhklsdjahfkljsdhf~0.7 AND textFuzzy:book~0.7 AND textFuzzy:rck~0.7/str /lst /lst result name=response numFound=0 start=0 maxScore=0.0/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=AUTHOR_facet/ lst name=FORMAT_facet/ lst name=LANGUAGE_facet/ lst name=PUBDATE_nfacet/ lst name=SUBJECT_facet/ lst name=ABCDEF_cfacet/ /lst lst name=facet_dates/ lst name=facet_ranges/ /lst lst name=highlighting/ lst name=spellcheck lst name=suggestions lst name=rck int name=numFound5/int int name=startOffset362/int int name=endOffset365/int int name=origFreq0/int arr name=suggestion lst str name=wordrock/str int name=freq24000/int /lst lst str name=wordrick/str int name=freq6048/int /lst lst str name=wordrack/str int name=freq84/int /lst lst str name=wordreck/str int name=freq78/int /lst lst str name=wordruck/str int name=freq30/int /lst /arr /lst bool name=correctlySpelledfalse/bool /lst /lst /response Thanks! Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/
Problems with Spellchecker in 3.1
Oops. Sorry. I'm hijacking my own thread to put a real Subject in place... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Bob Sandiford Sent: Monday, April 25, 2011 5:34 PM To: solr-user@lucene.apache.org Subject: Hi, all. We're having some troubles with the Solr Spellcheck Response. We're running version 3.1. Overview: If we search for something really ugly like: kljhklsdjahfkljsdhf book rck then when we get back the response, there's a suggestions list for 'rck', but no suggestions list for the other two words. For 'book', that's fine, because it is 'spelled correctly' (i.e. we got hits on the word) and there shouldn't be any suggestions. For the ugly thing, though, there aren't any hits. The problem is that when we're handling the result, we can't tell the difference between no suggestions for a 'correctly spelled' term, and no suggestions for something that's odd like this. (Now - this is happening with searches that aren't as obviously garbage - this was just to illustrate the point). Our setup: We're running multiple shards, which may be part of the issue. For example, 'book' might be found in one of the shards, but not another. I don't *think* this has anything to do with our schema, since it's really how the Search Suggestions are being returned to us. What we'd really like to see is the response coming back with an indication that a word wasn't found / had no suggestions. We've hacked around in the code a little bit to do this, but were wondering if anyone has come across this, and what approaches you've taken. Here's the xml we're getting back from the search: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime56/int lst name=params str name=spellchecktrue/str str name=facettrue/str str name=sortscore desc, RELEVANCE_SORT_nsort desc/str str name=shards.qtspellcheckedStandard/str str name=hl.mergeContiguoustrue/str str name=facet.limit1000/str str name=hltrue/str str name=fl ELECTRONIC_ACCESS_display ISBN_display TITLE_boost FORMAT_display score MEDIA_TYPE_display AUTHOR_boost LOCALURL_display UPC_display id DOC_ID_display CHILD_SITE_display DS_EC PRIMARY_AUTHOR_boost PRIMARY_TITLE_boost DS_ID TOPIC_display ASSET_NAME_display OCLC_display/str str name=shardslocalhost:8983/solr/SD_ILS/,localhost:8983/solr/SD_ASSET/ /str arr name=facet.field strAUTHOR_facet/str strFORMAT_facet/str strLANGUAGE_facet/str strPUBDATE_nfacet/str strSUBJECT_facet/str strABCDEF_cfacet/str /arr str name=qtspellcheckedStandard/str arr name=fq strACCESS_LEVEL_nfacet:0/str strCLEARANCE_nfacet:0/str strNEED_TO_KNOWS_facet:@@EMPTY@@/str strCITIZENSHIPS_facet:@@EMPTY@@/str strRESTRICTIONS_facet:@@EMPTY@@/str /arr str name=facet.mincount1/str str name=indenttrue/str str name=hl.fl*/str str name=rows12/str str name=hl.snippets5/str str name=start0/str str name=qTITLE_boost:kljhklsdjahfkljsdhf book rck~100^200.0 OR PRIMARY_AUTHOR_boost:kljhklsdjahfkljsdhf book rck~100^100.0 OR DOC_TEXT:kljhklsdjahfkljsdhf book rck~100^2 OR PRIMARY_TITLE_boost:kljhklsdjahfkljsdhf book rck~100^1000.0 OR AUTHOR_boost:kljhklsdjahfkljsdhf book rck~100^20.0 OR textFuzzy:kljhklsdjahfkljsdhf~0.7 AND textFuzzy:book~0.7 AND textFuzzy:rck~0.7/str /lst /lst result name=response numFound=0 start=0 maxScore=0.0/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=AUTHOR_facet/ lst name=FORMAT_facet/ lst name=LANGUAGE_facet/ lst name=PUBDATE_nfacet/ lst name=SUBJECT_facet/ lst name=ABCDEF_cfacet/ /lst lst name=facet_dates/ lst name=facet_ranges/ /lst lst name=highlighting/ lst name=spellcheck lst name=suggestions lst name=rck int name=numFound5/int int name=startOffset362/int int name=endOffset365/int int name=origFreq0/int arr name=suggestion lst str name=wordrock/str int name=freq24000/int /lst lst str name=wordrick/str int name=freq6048/int /lst lst str name=wordrack/str int name=freq84/int /lst lst str name=wordreck/str int name=freq78/int /lst lst str name=wordruck/str int name=freq30/int /lst /arr /lst bool name=correctlySpelledfalse/bool /lst /lst /response Thanks! Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com
Solr - upgrade from 1.4.1 to 3.1 - finding AbstractSolrTestCase binaries - help please?
HI, all. I'm working on upgrading from 1.4.1 to 3.1, and I'm having some troubles with some of the unit test code for our custom Filters. We wrote the tests to extend AbstractSolrTestCase, and I've been reading the thread about the test-harness elements not being present in the 3.1 distributables. [1] So, I have checked out the 3.1 branch code and built that (ant generate-maven-artifacts), and I've found the lucene-test-framework-3.1-xxx.jar(s). However, these contain only the lucene level framework elements, and none of the solr. Did the solr test framework actually get built and embedded in one of the solr jars somewhere? Or, if not, is there some way to build a jar that contains the solr portion of the test harnesses? [1] SOLR-2061https://issues.apache.org/jira/browse/SOLR-2061 Generate jar containing test classes.https://issues.apache.org/jira/browse/SOLR-2061 * Thanks! Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/
RE: Understanding multi-field queries with q and fq
Have you looked at the 'qf' parameter? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com _ http://www.cosugi.org/ -Original Message- From: mrw [mailto:mikerobertsw...@gmail.com] Sent: Wednesday, March 02, 2011 2:28 PM To: solr-user@lucene.apache.org Subject: Re: Understanding multi-field queries with q and fq Anyone understand how to do boolean logic across multiple fields? Dismax is nice for searching multiple fields, but doesn't necessarily support our syntax requirements. eDismax appears to be not available until Solr 3.1. In the meantime, it looks like we need to support applying the user's query to multiple fields, so if the user enters led zeppelin merle we need to be able to do the logical equivalent of fq=field1:led zeppelin merle OR field2:led zeppelin merle Any ideas? :) mrw wrote: After searching this list, Google, and looking through the Pugh book, I am a little confused about the right way to structure a query. The Packt book uses the example of the MusicBrainz DB full of song metadata. What if they also had the song lyrics in English and German as files on disk, and wanted to index them along with the metadata, so that each document would basically have song title, artist, publisher, date, ..., All_Metadata (copy field of all metadata fields), Text_English, and Text_German fields? There can only be one default field, correct? So if we want to search for all songs containing (zeppelin AND (dog OR merle)) do we repeat the entire query text for all three major fields in the 'q' clause (assuming we don't want to use the cache): q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog OR merle)+Text_German:(zeppelin AND (dog OR merle)) or repeat the entire query text for all three major fields in the 'fq' clause (assuming we want to use the cache): q=*:*fq=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle)) ? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries- with-q-and-fq-tp2528866p2619700.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr multi cores or not
Hmmm. Maybe I'm not understanding what you're getting at, Jonathan, when you say 'There is no good way in Solr to run a query across multiple Solr indexes'. What about the 'shards' parameter? That allows searching across multiple cores in the same instance, or shards across multiple instances. There are certainly implications here (like Relevance not being consistent across cores / shards), but it works pretty well for us... Thanks! Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, February 16, 2011 4:09 PM To: solr-user@lucene.apache.org Cc: Thumuluri, Sai Subject: Re: Solr multi cores or not Solr multi-core essentially just lets you run multiple seperate distinct Solr indexes in the same running Solr instance. It does NOT let you run queries accross multiple cores at once. The cores are just like completely seperate Solr indexes, they are just conveniently running in the same Solr instance. (Which can be easier and more compact to set up than actually setting up seperate Solr instances. And they can share some config more easily. And it _may_ have implications on JVM usage, not sure). There is no good way in Solr to run a query accross multiple Solr indexes, whether they are multi-core or single cores in seperate Solr doesn't matter. Your first approach should be to try and put all the data in one Solr index. (one Solr 'core'). Jonathan On 2/16/2011 3:45 PM, Thumuluri, Sai wrote: Hi, I have a need to index multiple applications using Solr, I also have the need to share indexes or run a search query across these application indexes. Is solr multi-core - the way to go? My server config is 2virtual CPUs @ 1.8 GHz and has about 32GB of memory. What is the recommendation? Thanks, Sai Thumuluri
RE: Using terms and N-gram
I don't suppose it's something silly like the fact that your indexing chain includes 'words=stopwords.txt', and your query chain does not? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com _ Early COSUGI birds get the worm! Register by 15 February and get a one time viewing of the three course Circulation Basics self-paced training suite. http://www.cosugi.org/ -Original Message- From: openvictor Open [mailto:openvic...@gmail.com] Sent: Thursday, February 03, 2011 12:02 AM To: solr-user@lucene.apache.org Subject: Using terms and N-gram Dear all, I am trying to implement an autocomplete system for research. But I am stuck on some problems that I can't solve. Here is my problem : I give text like : the cat is black and I want to explore all 1 gram to 8 gram for all the text that are passed : the, cat, is, black, the cat, cat is, is black, etc... In order to do that I have defined the following fieldtype in my schema : !--Custom fieldtype-- fieldType name=ngram_field class=solr.TextField analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.CommonGramsFilterFactory words=stopwords.txt ignoreCase=true maxGramSize=8 minGramSize=1/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.CommonGramsFilterFactory ignoreCase=true maxGramSize=8 minGramSize=1/ /analyzer /fieldType Then the following field : field name=p_title_ngram type=ngram_field indexed=true stored=true/ Then I feed solr with some phrases and I was really surprised to see that Solr didn't behave as expected. I went to the schema browser to see the result for the very profound query : the cat is black and it rains The results are quite deceiving : first 1 grams are not found. some 2 grams are found like : the_cat, and_it etc... But not what I expected. Is there something I am missing here ? (by the way I also tried to remove the mingramsize and maxgramsize even the words). Thank you, Victor Kabdebon
RE: match count per shard and across shards
Or - you could add a standard field to each shard, populate with a distinct value for each shard, and facet on that field. Then look at the facet counts of the value that corresponds to a shard, and, hey-presto, you're done... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Saturday, January 29, 2011 6:52 PM To: solr-user@lucene.apache.org Subject: Re: match count per shard and across shards To my knowledge, the distributed search functionality is intended to be transparent, thus no details deriving from it are exposed (e.g. what docs come from which shard), so, no, I don't believe it to be possible. The only way I know right now that you could achieve it is by two (sets of) queries. One would be a distributed search across all shards, and the other would be a single hit to every shard. To fake such a facet, this second set of queries would only need to ask for totals, so it could use a rows=0. Otherwise you'd have to enhance the distributed indexing code to expose some of this information in its response. Upayavira On Sat, 29 Jan 2011 03:48 -0800, csj christiansonnejen...@gmail.com wrote: Hi, Is it possible to construct a Solr query that will return the total number of hits there across all shards, and at the same time getting the number of hits per shard? I was thinking along the lines of a faceted search, but I'm not deep enough into Solr capabilities and query parameters to figure it out. Regards, Christian Sonne Jensen -- View this message in context: http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across- shards-tp2369627p2369627.html Sent from the Solr - User mailing list archive at Nabble.com. --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
RE: Will Result Grouping return documents that don't contain the specified group.field?
What if you put in a default value for the group_id field in the solr schema - would that work for you? e.g. something like 'unknown' Then you'll get all those with no original group_id value still grouped together, and you can figure out at display time what you want to do with them. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Andy [mailto:angelf...@yahoo.com] Sent: Thursday, January 06, 2011 3:06 PM To: solr-user@lucene.apache.org Subject: Will Result Grouping return documents that don't contain the specified group.field? I want to group my results by a field named group_id. However, some of my documents don't contain the field group_id. But I still want these documents to be returned as part of the results as long as they match the main query q. Do I need to do anything to tell Solr that I want those documents? Thanks.
Special Parent / Child relationship - advice / observations welcome on how to approach this
to dive into the Solr / Lucene code if that's what it will take - I'd just like an indication of what people think would be a good / possible approach before I get into that level... e.g. some way of providing to the Indexer a tuple of each found combination of the 5 values, and then doing something (what?) with searching for the facet queries Thanks! Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/
RE: Empty value/string matching
One possibility to consider - if you really need documents with specifically empty or non-defined values (if that's not an oxymoron :)), and you have control over the values you send into the indexing, you could set a special value that means 'no value'. We've done that in a similar vein, using something like '@@EMPTY@@' for a given field, meaning that the original document didn't actually have a value for that field. I.E. it is something very unlikely to be a 'real' value - and then we can easily select on documents by querying for the field:@@EMPTY@@ instead of the negated form of the select... However, we haven't considered things like what it does to index size. It's relatively rare for us (that there not be a value), so our 'gut feel' is that it's not impacting the indexes very much size-wise or performance-wise. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Viswa S [mailto:svis...@hotmail.com] Sent: Saturday, November 20, 2010 5:38 PM To: solr-user@lucene.apache.org Subject: RE: Empty value/string matching Erick, Thanks for the quick response. The output i showed is on a test instance i created to simulate this issue. I intentionally tried to create documents with no values by creating xml nodes with field name=fieldName/field, but having values in the other fields in a document. Are you saying that there is no way have a field with no value?, with text fields they seem to make sense than for string?. You are right on fieldName:[* TO *] results, which basically returned all the documents which included the couple of documents in question. -Viswa Date: Sat, 20 Nov 2010 17:20:53 -0500 Subject: Re: Empty value/string matching From: erickerick...@gmail.com To: solr-user@lucene.apache.org I don't think that's correct. The documents wouldn't be showing up in the facets if they had no value for the field. So I think you're being mislead by the printout from the faceting. Perhaps you have unprintable characters in there or some such. Certainly the name: is actually a value, admittedly just a space. As for the other, I suspect something similar. What results do you get back when you just search for FieldName:[* TO *]? I'm betting you get all the docs back, but I've been very wrong before. Best Erick On Sat, Nov 20, 2010 at 5:02 PM, Viswa S svis...@hotmail.com wrote: Yes I do have a couple of documents with no values and one with an empty string. Find below the output of a facet on the fieldName. ThanksViswa int name=2/intint name=CASTIGO.4302/intint name=GDOGPRODY.4242/intint name=QMAGIC.4122/intint name= 1/int Date: Sat, 20 Nov 2010 15:29:06 -0500 Subject: Re: Empty value/string matching From: erickerick...@gmail.com To: solr-user@lucene.apache.org Are you absolutely sure your documents really don't have any values for FieldName? Because your results are perfectly correct if every doc has a value for FieldName. Or are you saying there no such field as FieldName? Best Erick On Sat, Nov 20, 2010 at 3:12 PM, Viswa S svis...@hotmail.com wrote: Folks,Am trying to query documents which have no values present, I have used the following constructs and it doesn't seem to work on the solr dev tip (as of 09/22) or the 1.4 builds.1. (*:* AND -FieldName[* TO *]) - returns no documents, parsedquery was +MatchAllDocsQuery(*:*) -FieldName:[* TO *]2. -FieldName:[* TO *] - returns no documents, parsedquery was -FieldName:[* TO *]3. FieldName: - returns no documents, parsedquery was empty (str name=parsedquery/)The field is type string, using the LuceneQParser, I have also tried to see if FieldName:[* TO *] if the documents with no terms are ignored and didn't seem to be the case, the result set was everything.Any help would be appreciated.-Viswa
RE: Dynamic creating of cores in solr
(); } } And that's about it. You could adjust the above so there's only one core per index that you want - if you don't do complete reindexes, and don't need the index to always be searchable. Hope that helps... Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Nizan Grauer [mailto:niz...@yahoo-inc.com] Sent: Tuesday, November 09, 2010 3:36 AM To: solr-user@lucene.apache.org Subject: Dynamic creating of cores in solr Hi, I'm not sure this is the right mail to write to, hopefully you can help or direct me to the right person I'm using solr - one master with 17 slaves in the server and using solrj as the java client Currently there's only one core in all of them (master and slaves) - only the cpaCore. I thought about using multi-cores solr, but I have some problems with that. I don't know in advance which cores I'd need - When my java program runs, I call for documents to be index to a certain url, which contains the core name, and I might create a url based on core that is not yet created. For example: Calling to index - http://localhost:8080/cpaCore - existing core, everything as usual Calling to index - http://localhost:8080/newCore - server realizes there's no core newCore, creates it and indexes to it. After that - also creates the new core in the slaves Calling to index - http://localhost:8080/newCore - existing core, everything as usual What I'd like to have on the server side to do is realize by itself if the cores exists or not, and if not - create it One other restriction - I can't change anything in the client side - calling to the server can only make the calls it's doing now - for index and search, and cannot make calls for cores creation via the CoreAdminHandler. All I can do is something in the server itself What can I do to get it done? Write some RequestHandler? REquestProcessor? Any other option? Thanks, nizan
RE: Dynamic creating of cores in solr
Why not use replication? Call it inexperience... We're really early into working with and fully understanding Solr and the best way to approach various issues. I did mention that this was a prototype and non-production code, so I'm covered, though :) We'll take a look at the replication feature... Thanks! Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, November 10, 2010 3:26 PM To: solr-user@lucene.apache.org Subject: Re: Dynamic creating of cores in solr You could use the actual built-in Solr replication feature to accomplish that same function -- complete re-index to a 'master', and then when finished, trigger replication to the 'slave', with the 'slave' being the live index that actually serves your applications. I am curious if there was any reason you chose to roll your own solution using JSolr and dynamic creation of cores, instead of simply using the replication feature. Were there any downsides of using the replication feature for this purpose that you amerliorated through your solution? Jonathan Bob Sandiford wrote: We also use SolrJ, and have a dynamically created Core capability - where we don't know in advance what the Cores will be that we require. We almost always do a complete index build, and if there's a previous instance of that index, it needs to be available during a complete index build, so we have two cores per index, and switch them as required at the end of an indexing run. Here's a summary of how we do it (we're in an early prototype / implementation right now - this isn't production quality code - as you can tell from our voluminous javadocs on the methods...) 1) Identify if the core exists, and if not, create it: /** * This method instantiates two SolrServer objects, solr and indexCore. It requires that * indexName be set before calling. */ private void initSolrServer() throws IOException { String baseUrl = http://localhost:8983/solr/;; solr = new CommonsHttpSolrServer(baseUrl); String indexCoreName = indexName + SolrConstants.SUFFIX_INDEX; // SUFIX_INDEX = _INDEX String indexCoreUrl = baseUrl + indexCoreName; // Here we create two cores for the indexName, if they don't already exist - the live core used // for searching and a second core used for indexing. After indexing, the two will be switched so the // just-indexed core will become the live core. The way that core swapping works, the live core will always // be named [indexName] and the indexing core will always be named [indexname]_INDEX, but the // dataDir of each core will alternate between [indexName]_1 and [indexName]_2. createCoreIfNeeded(indexName, indexName + _1, solr); createCoreIfNeeded(indexCoreName, indexName + _2, solr); indexCore = new CommonsHttpSolrServer(indexCoreUrl); } /** * Create a core if it does not already exists. Returns true if a new core was created, false otherwise. */ private boolean createCoreIfNeeded(String coreName, String dataDir, SolrServer server) throws IOException { boolean coreExists = true; try { // SolrJ provides no direct method to check if a core exists, but getStatus will // return an empty list for any core that doesn't. CoreAdminResponse statusResponse = CoreAdminRequest.getStatus(coreName, server); coreExists = statusResponse.getCoreStatus(coreName).size() 0; if(!coreExists) { // Create the core LOG.info(Creating Solr core: + coreName); CoreAdminRequest.Create create = new CoreAdminRequest.Create(); create.setCoreName(coreName); create.setInstanceDir(.); create.setDataDir(dataDir); create.process(server); } } catch (SolrServerException e) { e.printStackTrace(); } return !coreExists; } 2) Do the index, clearing it first if it's a complete rebuild: [snip] if (fullIndex) { try { indexCore.deleteByQuery(*:*); } catch (SolrServerException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. } } [snip] various logic, then (we submit batches of 100 : [snip] ListSolrInputDocument docList = b.getSolrInputDocumentList(); UpdateResponse rsp; try
RE: Facet showing MORE results than expected when its selected?
Shouldn't the second query have the clause: fq=themes_raw:Hotel en Restaurant instead of: fq=themes:Hotel en Restaurant Otherwise you're mixing apples (themes_raw) and oranges (themes). (Notice how I cleverly extended the restaurant theme to be food related :)) Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: PeterKerk [mailto:vettepa...@hotmail.com] Sent: Wednesday, November 10, 2010 4:34 PM To: solr-user@lucene.apache.org Subject: Facet showing MORE results than expected when its selected? A facet shows the amount of results that match with that facet, e.g. New York (433) So when the facet is clicked, you'd expect that amount of results (433). However, I have a facet Hotel en Restaurant (321), that, when clicked shows 370 results! :s 1st query: http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start= 0rows=25fl=id,title,themesfacet.field=themes_rawfacet.mincount=1 This is (part) of the resultset of my first query lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=themes_raw int name=Hotel en Restaurant321/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst Now when I click the facet Hotel en Restaurant, it fires my second query: http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:Ho tel en Restaurantq=*:*start=0rows=25fl=id,title,themesfacet.field=themes_ rawfacet.mincount=1 I would expect 321, however I get 370! schema.xml field name=themes type=text indexed=true stored=true multiValued=true / field name=themes_raw type=string indexed=true stored=true multiValued=true/ copyField source=themes dest=themes_raw/ -- View this message in context: http://lucene.472066.n3.nabble.com/Facet- showing-MORE-results-than-expected-when-its-selected- tp1878828p1878828.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Natural string sorting
Well, you could do a magnitude notation approach. Depends on how complex the strings are, but based on your examples, this would work: 1) Identify a series of integers in the string. (This assumes lengths are no more than 9 for each series). 2) Insert the number of integers into the string before the integer series itself So - for sorting - you would have: string1 -- string11 string10 -- string210 string2 -- string12 which will then sort as string11, string12, string210, but use the original strings as the displays you want. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] Sent: Friday, October 29, 2010 4:33 AM To: solr-user@lucene.apache.org Subject: Re: Natural string sorting I think string10 is before string2 in lexicographic order? On 29 October 2010 09:18, RL rl.subscri...@gmail.com wrote: Just a quick question about natural sorting of strings. I've a simple dynamic field in my schema: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ field name=nameSort_en type=string indexed=true stored=false omitNorms=true/ There are 3 indexed strings for example string1,string2,string10 Executing a query and sorting by this field leads to unnatural sorting of : string1 string10 string2 (Some time ago i used Lucene and i was pretty sure that Lucene used a natural sort, thus i expected the same from solr) Is there a way to sort in a natural order? Config option? Plugin? Expected output would be: string1 string2 string10 Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Natural-string-sorting- tp1791227p1791227.html Sent from the Solr - User mailing list archive at Nabble.com.