Re: Filter to cut out all zeors?
won't this replace *all* 0s ? ie, 1024 will become 124 ? _ {Beto|Norberto|Numard} Meijome The only people that never change are the stupid and the dead Jorge Luis Borges. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. On 11 March 2010 03:24, Sebastian F qba...@yahoo.com wrote: yes, thank you. That was exactly what I was looking for! Great help! From: Ahmet Arslan iori...@yahoo.com To: solr-user@lucene.apache.org Sent: Tue, March 9, 2010 7:26:46 PM Subject: Re: Filter to cut out all zeors? I'm trying to figure out the best way to cut out all zeros of an input string like 01.10. or 022.300... Is there such a filter in Solr or anything similar that I can adapt to do the task? With solr.MappingCharFilterFactory[1] you can replace all zeros with before tokenizer. charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/ SolrHome/conf/mapping.txt file will contain this line: 0 = So that 01.10. will become 1.1. and 022.300 will become 22.3 Is that you want? [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.MappingCharFilterFactory
Re: weird problem with letters S and T
On Wed, 28 Oct 2009 19:20:37 -0400 Joel Nylund jnyl...@yahoo.com wrote: Well I tried removing those 2 letters from stopwords, didnt seem to help, I also tried changing the field type to text_ws, didnt seem to work. Any other ideas? Hi Joel, if your stop word filter was applied on index, you will have to reindex again (at least those documents with S and T). If your stop filter was *only* on query, then it should work after you reloaded your app. b _ {Beto|Norberto|Numard} Meijome Those who do not remember the past are condemned to repeat it. George Santayana I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: 99.9% uptime requirement
On Mon, 3 Aug 2009 13:15:44 -0700 Robert Petersen rober...@buy.com wrote: Thanks all, I figured there would be more talk about daemontools if there were really a need. I appreciate the input and for starters we'll put two slaves behind a load balancer and grow it from there. Robert, not taking away from daemon tools, but daemon tools won't help you if your whole server goes down. don't put all your eggs in one basket - several servers, load balancer (hardware load balancers x 2, haproxy, etc) and sure, use daemon tools to keep your services running within each server... B _ {Beto|Norberto|Numard} Meijome Why do you sit there looking like an envelope without any address on it? Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Updating Solr index from XML files
On Tue, 7 Jul 2009 22:16:04 -0700 Francis Yakin fya...@liquid.com wrote: I have the following curl cmd to update and doing commit to Solr ( I have 10 xml files just for testing) [...] hello, DIH supports XML, right? not sure if it works with n files...but it's worth looking at it. alternatively, u can write a relatively simple java app that will pick each file up and post it for you using SolrJ b _ {Beto|Norberto|Numard} Meijome Mix a little foolishness with your serious plans; it's lovely to be silly at the right moment. Horace I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using http connection?
On Sun, 5 Jul 2009 21:36:35 +0200 Marcus Herou marcus.he...@tailsweep.com wrote: Sharing some of our exports from DB to solr. Note: many of the statements below might not work due to clip-clip. thx Marcus - but that's a DIH config right? :) b _ {Beto|Norberto|Numard} Meijome I respect faith, but doubt is what gives you an education. Wilson Mizner I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using http connection?
On Sun, 5 Jul 2009 10:28:16 -0700 Francis Yakin fya...@liquid.com wrote: [...] upload the file to your SOLR server? Then the data file is local to your SOLR server , you will bypass any WAN and firewall you may be having. (or some variation of it, sql - SOLR server as file, etc..) How we upload the file? Do we need to convert the data file to Lucene Index first? And Documentation how we do this? pick your poison... rsync? ftp? scp ? B _ {Beto|Norberto|Numard} Meijome The freethinking of one age is the common sense of the next. Matthew Arnold I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using http connection?
On Mon, 6 Jul 2009 09:56:03 -0700 Francis Yakin fya...@liquid.com wrote: Norberto, Thanks, I think my questions is: why not generate your SQL output directly into your oracle server as a file What type of file is this? a file in a format that you can then import into SOLR. _ {Beto|Norberto|Numard} Meijome Gravity cannot be blamed for people falling in love. Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using http connection?
On Thu, 2 Jul 2009 11:28:51 -0700 Francis Yakin fya...@liquid.com wrote: Norberto, Hi Francis, Please reply to the list, or keep it in CC. You saying: Other alternatives are to transform the XML into csv and import it that way How do you transfer that CSV file to Solr? http://wiki.apache.org/solr/UpdateCSV There actually is a LOT of information in the wiki, as well as the mailing list archives. good luck, B _ {Beto|Norberto|Numard} Meijome The freethinking of one age is the common sense of the next. Matthew Arnold I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using http connection?
On Thu, 2 Jul 2009 11:02:28 -0700 Francis Yakin fya...@liquid.com wrote: Norberto, Thanks for your input. What do you mean with Have you tried connecting to SOLR over HTTP from localhost, therefore avoiding any firewall issues and network latency ? it should work a LOT faster than from a remote site. ? Here are how our servers lay out: 1) Database ( Oracle ) is running on separate machine 2) Solr master is running on separate machine by itself 3) 6 solr slaves ( these 6 pulll the index from master using rsync) We have a SQL(Oracle) script to post the data/index from Oracle Database machine to Solr Master over http. We wrote those script(Someone in Oracle Database administrator write it). You said in your other email you are having issues with slow transfers between 1) and 2). Your subject relates to the data transfer between 1) and 2, - 2) and 3) is irrelevant to this part. My question (what you quoted above) relates to the point you made about it being slow ( WHY is it slow?), and issues with opening so many connections through firewall. so, I'll rephrase my question (see below...) [] We can not do localhost since it's solr is not running on Oracle machine. why not generate your SQL output directly into your oracle server as a file, upload the file to your SOLR server? Then the data file is local to your SOLR server , you will bypass any WAN and firewall you may be having. (or some variation of it, sql - SOLR server as file, etc..) Any speed issues that are rooted in the fact that you are posting via HTTP (vs embedded solr or DIH) aren't going to go away. But it's the simpler approach without changing too much of your current setup. Another alternative that we think of is to transform XML into CSV and import/export it. How about if LUSQL, some mentioned about this? Is this apps free(open source) application? Do you have any experience with this apps? Not i, sorry. Have you looked into DIH? It's designed for this kind of work. B _ {Beto|Norberto|Numard} Meijome Great spirits have often encountered violent opposition from mediocre minds. Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is there any other way to load the index beside using http connection?
On Wed, 1 Jul 2009 15:07:12 -0700 Francis Yakin fya...@liquid.com wrote: We have several thousands of xml files in database that we load it to solr master The Database uses http connection and transfer those files to solr master. Solr then translate xml files to their lindex. We are experiencing issue with close/open connection in the firewall and very very slow. Is there any other way to load the data/index from Database to solr master beside using http connection, so it means we just scp/ftp the xml file from Database system to solr master and let solr convert those to lucene indexes? Francis, after reading the whole thread, it seems you have : - Data source : Oracle DB, on separate location to your SOLR. - Data format : XML output. definitely DIH is a great option, but since you are on 1.2, not available to you (you should look into upgrading if you can!). Have you tried connecting to SOLR over HTTP from localhost, therefore avoiding any firewall issues and network latency ? it should work a LOT faster than from a remote site. Also make sure not to commit until you really needed. Other alternatives are to transform the XML into csv and import it that way. Or write a simple app that will parse the xml and post it directly using the embedded solr method. plenty of options, all of them documented @ solr's site. good luck, b _ {Beto|Norberto|Numard} Meijome People demand freedom of speech to make up for the freedom of thought which they avoid. Soren Aabye Kierkegaard I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Is it problem? I use solr to search and index is made by lucene. (not EmbeddedSolrServer(wiki is old))
On Thu, 2 Jul 2009 16:12:58 +0800 James liu liuping.ja...@gmail.com wrote: I use solr to search and index is made by lucene. (not EmbeddedSolrServer(wiki is old)) Is it problem when i use solr to search? which the difference between Index(made by lucene and solr)? Hi James, make sure the version of Lucene used to create your index is the same as the libraries included in your version of SOLR. it should work. it may be that an older lucene index works with a newer lucene-provided-in-solr libs, but after using it you may not be able to go back , but i am not sure of the details. probably an FAQ by now - check the archives :) good luck, B _ {Beto|Norberto|Numard} Meijome He has no enemies, but is intensely disliked by his friends. Oscar Wilde I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr document security
On Wed, 24 Jun 2009 23:20:26 -0700 (PDT) pof melbournebeerba...@gmail.com wrote: Hi, I am wanting to add document-level security that works as following: An external process makes a query to the index, depending on their security allowences based of a login id a list of hits are returned minus any the user are meant to know even exist. I was thinking maybe a custom filter with a JDBC connection to check security of the user vs. the document. I'm not sure how I would add the filter or how to write the filter or how to get the login id from a GET parameter. Any suggestions, comments etc.? Hi Brett, (keeping in mind that i've been away from SOLR for 8 months, but i dont think this was added of late) standard approach is to manage security @ your application layer, not @ SOLR. ie, search, return documents (which should contain some kind of data to identify their ACL ) and then you can decide whether to show it or not. HIH _ {Beto|Norberto|Numard} Meijome They never open their mouths without subtracting from the sum of human knowledge. Thomas Brackett Reed I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i indexing MS-Outlook files?
On Sun, 14 Dec 2008 19:22:00 -0800 (PST) Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Perhaps an easier alternative is to index not the MS-Outlook files themselves, but email messages pulled from the IMAP or POP servers, if that's where the original emails live. PST files ('outlook files') are local to the end user and quite possibly their contents aren't available in the server anymore. Another alternative could be to access, from Exchange's file system itself, the files that represent each object... I don't know whether this is still possible in Exchange 2007, or whether it is 'sanctioned' by MS... Possibly some kind of object interface with exchange itself would be most desirable _ {Beto|Norberto|Numard} Meijome FAST, CHEAP, SECURE: Pick Any TWO I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Using Solr for indexing emails
On Tue, 25 Nov 2008 03:59:31 +0200 Timo Sirainen [EMAIL PROTECTED] wrote: would it be faster to say q=user:user AND highestuid:[ * TO *] ? Now that I read again what fq really did, yes, sounds like you're right. you may want to compare them both to see which one is better... I just went from memory :P ( and i guess you'd sort DESC and return 1 record only). No, I'd use the above for getting highestuid value for all mailboxes (there should be only one record per mailbox (each mailbox has separate uid values - separate highestuid value)) so I can look at the returned highestuid values to see what mailboxes aren't fully indexed yet. gotcha. It is an interesting use of SOLR, i must say... I for one am not used to having to deal with up to the second update needs. good luck, B _ {Beto|Norberto|Numard} Meijome Never offend people with style when you can offend them with substance. Sam Brown I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: port of Nutch CommonGrams to Solr for help with slow phrase queries
On Mon, 24 Nov 2008 13:31:39 -0500 Burton-West, Tom [EMAIL PROTECTED] wrote: The approach to this problem used by Nutch looks promising. Has anyone ported the Nutch CommonGrams filter to Solr? Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/C ommonGrams.html Tom, i haven't used Nutch's implementation, but used the current implementation (1.3) of ngrams and shingles to address exactly the same issue ( database of music albums and tracks). We didn't notice any severe performance hit but : - data set isn't huge ( ca 1 MM docs). - reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer to lower the number of hits to SOLR. B _ {Beto|Norberto|Numard} Meijome Truth has no special time of its own. Its hour is now -- always. Albert Schweitzer I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: port of Nutch CommonGrams to Solr for help with slow phrase queries
On Wed, 26 Nov 2008 10:08:03 +1100 Norberto Meijome [EMAIL PROTECTED] wrote: We didn't notice any severe performance hit but : - data set isn't huge ( ca 1 MM docs). - reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer to lower the number of hits to SOLR. To make this clear - there was a noticeable hit when we removed stop words, but the nature of the beast forced our hand. b _ {Beto|Norberto|Numard} Meijome Peace can only be achieved by understanding. Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Using Solr for indexing emails
On Mon, 24 Nov 2008 20:21:17 +0200 Timo Sirainen [EMAIL PROTECTED] wrote: I think I gave enough reasons above for why I don't like this solution. :) I also don't like adding new shared global state databases just for Solr. Solr should be the one shared global state database.. fair enough - it makes more sense to me now :) [...] Store the per-mailbox highest indexed UID in a new unique field created like user/uidvalidity/mailbox. Always update it by deleting the old one first and then adding the new one. you mean delete, commit, add, commit? if you replace the record, simply submitting the new document and committing would do (of course, you must ensure the value of the uniqueKey field matches, so SOLR replaces the old doc). So to find out the highest indexed UID for a mailbox just look it up using its unique field. For finding the highest indexed UID for a user's all mailboxes do a single query: - fl=highestuid - q=highestuid:[* TO *] - fq=user:user would it be faster to say q=user:user AND highestuid:[ * TO *] ? ( and i guess you'd sort DESC and return 1 record only). If messages are being simultaneously indexed by multiple processes the highest-uid value may sometimes (rarely) be set too low, but that doesn't matter. The next search will try to re-add some of the messages that were already in index, but because they'll have the same unique IDs than what already exists they won't get added again. The highest-uid gets updated and all is well. B _ {Beto|Norberto|Numard} Meijome Mind over matter: if you don't mind, it doesn't matter I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: [VOTE] Community Logo Preferences
On Sun, 23 Nov 2008 11:59:50 -0500 Ryan McKinley [EMAIL PROTECTED] wrote: Please submit your preferences for the solr logo. https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg https://issues.apache.org/jira/secure/attachment/12394263/apache_solr_a_blue.jpg https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg thanks!! B _ {Beto|Norberto|Numard} Meijome Tell a person you're the Metatron and they stare at you blankly. Mention something out of a Charleton Heston movie and suddenly everyone's a Theology scholar! Dogma I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Using Solr for indexing emails
On Sun, 23 Nov 2008 16:02:16 +0200 Timo Sirainen [EMAIL PROTECTED] wrote: Hi, Hi Timo, [...] The main problem is that before doing the search, I first have to check if there are any unindexed messages and then add them to Solr. This is done using a query like: - fl=uid - rows=1 - sort=uid desc - q=uidv:uidvalidity box:mailbox user:user So, if I understand correctly, the process is : 1. user sends search query Q to search interface 2. interface checks highest indexed uidv in SOLR 3. checks in IMAP store for mailbox if there are any objects ('emails') newer than uidv from 2. 4. anything found in 3. is processed, submitted to SOLR, committed. 5. interface submits search query Q to index, gets results 6. results are presented / returned to user It strikes me that this may work ok in some situations but may not scale. I would decouple the {find new documents / submit / commit } process from the { search / presentation} layer - SPECIALLY if you plan to have several mailboxes in play now. So it returns the highest IMAP UID field (which is an always-ascending integer) for the given mailbox (you can ignore the uidvalidity). I can then add all messages with higher UIDs to Solr before doing the actual search. When searching multiple mailboxes the above query would have to be sent to every mailbox separately. hmm...not sure what you mean by query would have to be sent to every MAILBOX ... That really doesn't seem like the best solution, especially when there are a lot of mailboxes. But I don't think Solr has a way to return highest uid field for each box:mailbox? hmmm... maybe you can use facets on 'box' ... ? though you'd still have to query for each box, i think... Is that above query even efficient for a single mailbox? i don't think so. I did consider using separate documents for storing the highest UID for each mailbox, but that causes annoying desynchronization possibilities. Especially because currently I can just keep sending documents to Solr without locking and let it drop duplicates automatically (should be rare). With per-mailbox highest-uid documents I can't really see a way to do this without locking or allowing duplicate fields to be added and later some garbage collection deleting all but the one highest value (annoyingly complex). I have a feeling the issues arise from serialising the whole process (as I described above... ). It makes more sense (to me) to implement something similar to DIH, where you load data as needed (even a 'delta query', which would only return new data... I am not sure whether you could use DIH ( RSS feed from IMAP store? ) I could of course also keep track of what's indexed on Dovecot's side, but that could also lead to desynchronization issues and I'd like to avoid them. I guess the ideal solution would be if it was somehow possible to create a SQL-like trigger that updates the per-mailbox highest-uid document whenever adding a new document with a higher UID value. I am not sure how much effort you want to put into this...but I would think that writing a lean app that periodically (for a period that makes sense for your hardware and user's expectation... 5 minutes? 10? 1? ) crawls the IMAP stores for UID, processes them and submits to SOLR, and keeps its own state ( dbm or sqlite ) may be a more flexible approach. Or, if dovecot support this, a 'plugin / hook ' that sends a msg to your indexing app everytime a new document is created. I am interested to hear what you decide to go with, and why. cheers, B _ {Beto|Norberto|Numard} Meijome All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use hammer. IBM maintenance manual, 1975 I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i protect the SOLR Cores?
On Wed, 19 Nov 2008 22:58:52 -0800 (PST) RaghavPrabhu [EMAIL PROTECTED] wrote: Im using multiple cores and all i need to do is,to make the each core in secure manner. If i am accessing the particular core via url,it should ask and validate the credentials say Username Password for each core. You should be able to handle this @ the servlet container level. What I did, using Jetty + starting from the example app, was : 1) modify web.xml (part of the sources of solr.war, which you'll have to rebuild) to define the authentication constraints you want. [...] !-- block by default. -- security-constraint web-resource-collection web-resource-nameDefault/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint/ !-- BLOCK! -- /security-constraint !-- this constraint has no auth constraint or data constraint = allows without auth. -- security-constraint web-resource-collection web-resource-nameAllowedQueries/web-resource-name url-pattern/core1/select/*/url-pattern url-pattern/core2/select/*/url-pattern url-pattern/core3/select/*/url-pattern /web-resource-collection /security-constraint !-- this constraint allows access to admin pages, with basic auth -- security-constraint web-resource-collection web-resource-nameAdmin/web-resource-name !-- the admin for cores management -- url-pattern/admin/*/url-pattern !-- the admin for each individual core -- url-pattern/core1/admin/*/url-pattern url-pattern/core2/admin/*/url-pattern url-pattern/core3/admin/*/url-pattern !-- The Test core, full access to it -- url-pattern/_test_/*/url-pattern /web-resource-collection auth-constraint !-- Roles of users are defined int the properties file -- !-- we allow users with admin-only access -- role-nameAdmin-role/role-name !-- we allow users with full access -- role-nameFullAccess-role/role-name /auth-constraint /security-constraint !-- this constraint allows access to modify the data in the SOLR service, with basic auth -- security-constraint web-resource-collection web-resource-nameRW/web-resource-name !-- the dataimport handler for each individual core -- url-pattern/core1/dataimport/url-pattern url-pattern/core2/dataimport/url-pattern url-pattern/core3/dataimport/url-pattern !-- the update handler (XML over HTTP) for each individual core -- url-pattern/core1/update/*/url-pattern url-pattern/core2/update/*/url-pattern url-pattern/core3/update/*/url-pattern /web-resource-collection auth-constraint !-- Roles of users are defined int the properties file -- !-- we allow users with rw-only access -- role-nameRW-role/role-name !-- we allow users with full access -- role-nameFullAccess-role/role-name /auth-constraint /security-constraint !-- the Realm for this app. Ideally we should have different realms for each security-constraint, but I can't get it to work properly -- login-config auth-methodBASIC/auth-method realm-nameSearchSvc/realm-name /login-config security-role role-nameAdmin-role/role-name /security-role security-role role-nameFullAccess-role/role-name /security-role security-role role-nameRW-role/role-name /security-role [...] 2) in Jetty's jetty.xml (or in a context...i just used jetty.xml), define where to get the AUTH details from : [...] Set name=UserRealms Array type=org.mortbay.jetty.security.UserRealm Item New class=org.mortbay.jetty.security.HashUserRealm Set name=nameSearchSvc/Set Set name=config SystemProperty name=jetty.home default=. //etc/searchsvc_access.properties/Set !--Set name=reloadInterval10/Set-- !--Call name=start/Call-- /New /Item [...] 3) Read in jetty's documentation how to create the .properties file with the auth info... I am not sure if this is the BEST way
Re: Use SOLR like the MySQL LIKE
On Tue, 18 Nov 2008 14:26:02 +0100 Aleksander M. Stensby [EMAIL PROTECTED] wrote: Well, then I suggest you index the field in two different ways if you want both possible ways of searching. One, where you treat the entire name as one token (in lowercase) (then you can search for avera* and match on for instance average joe etc.) And then another field where you tokenize on whitespace for instance, if you want/need that possibility aswell. Look at the solr copy fields and try it out, it works like a charm :) You should also make extensive use of analysis.jsp to see how data in your field (1) is tokenized, filtered and indexed, and how your search terms are tokenized, filtered and matched against (1). Hint 1 : check all the checkboxes ;) Hint 2: you don't need to reindex all your data, just enter test data in the form and give it a go. You will of course have to tweak schema.xml and restart your service when you do this. good luck, B _ {Beto|Norberto|Numard} Meijome Intellectual: 'Someone who has been educated beyond his/her intelligence' Arthur C. Clarke, from 3001, The Final Odyssey, Sources. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Core Size limit
On Tue, 11 Nov 2008 10:25:07 -0800 (PST) Otis Gospodnetic [EMAIL PROTECTED] wrote: Doc ID gaps are zapped during segment merges and index optimization. thanks Otis :) b _ {Beto|Norberto|Numard} Meijome I didn't attend the funeral, but I sent a nice letter saying I approved of it. Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Core Size limit
On Tue, 11 Nov 2008 20:39:32 -0800 (PST) Otis Gospodnetic [EMAIL PROTECTED] wrote: With Distributed Search you are limited to # of shards * Integer.MAX_VALUE. yeah, makes sense. And i would suspect since this is PER INDEX , it applies to each core only ( so you could have n cores in m shards for n * m * integer.MAX_VALUE docs). _ {Beto|Norberto|Numard} Meijome The more I see the less I know for sure. John Lennon I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Core Size limit
On Mon, 10 Nov 2008 10:24:47 -0800 (PST) Otis Gospodnetic [EMAIL PROTECTED] wrote: I don't think there is a limit other than your hardware and the internal Doc ID which limits you to 2B docs on 32-bit machines. Hi Otis, just curious is this internal doc ID reused when an optimise happens? or gaps left and re-filled when 2B is reached ? cheers, b _ {Beto|Norberto|Numard} Meijome Whenever you find that you are on the side of the majority, it is time to reform. Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How to use multicore feature in JBOSS
On Tue, 4 Nov 2008 23:45:40 -0800 (PST) con [EMAIL PROTECTED] wrote: But for the first question, I am still not clear. I think to use the multicore feature we should inform the server. In the Jetty server, we are starting the server using: java -Dsolr.solr.home=multicore -jar start.jar Once the server is started I think it will take the parameters from multicore/solr.xml. But I am confused on how and where to pass this argument to JBOSS. Con, Sorry, i don't have a jboss available to test... what happens if you use the standard configuration ( with solr.xml @ the top level of your solr directory, NOT in multicore/ ) launch it, look @ the debug messages , see which cores are picked up (from the admin page ). FWIW, by having {solr_installation_directory}/solr.xml , I never had to tell jetty where solr.xml was. IIRC, multicore/solr.xml is the layout in the example app , because the default config is 1-core only. b _ {Beto|Norberto|Numard} Meijome We must openly accept all ideologies and systems as means of solving humanity's problems. One country, one nation, one ideology, one system is not sufficient. Dalai Lama. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How to use multicore feature in JBOSS
On Tue, 4 Nov 2008 09:55:38 -0800 (PST) con [EMAIL PROTECTED] wrote: 1) Which all files do I need to edit to use the multicore feature? 2) Also, where can I specify the index directly so that we can point the indexed documents to a custom folder instead of jboss/bin? Con, please check the wiki - the answers should be there ( 1) = solr.xml ( previously multicore.xml) 2) look in solrconfig.xml for each core ) _ {Beto|Norberto|Numard} Meijome Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Searching on other fields which are not in query
On Thu, 30 Oct 2008 15:50:58 -0300 Jorge Solari [EMAIL PROTECTED] wrote: copyField source=* dest=text/ in the schema file. or use Dismax query handler. b _ {Beto|Norberto|Numard} Meijome Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: DIH and rss feeds
On Thu, 30 Oct 2008 20:46:16 -0700 Lance Norskog [EMAIL PROTECTED] wrote: Now: a few hours later there are a different 100 lastest documents. How do I add those to the index so I will have 200 documents? 'full-import' throws away the first 100. 'delta-import' is not implemented. What is the special trick here? I'm using the Solr-1.3.0 release. Lance, 1) DIH has a clean parameter that, when set to true ( default, i think), will delete all existing docs in the index. 2) ensure your new documents have different values in your field defined as key ( schema.xml) . let us know how it goes, B _ {Beto|Norberto|Numard} Meijome Lack of planning on your part does not constitute an emergency on ours. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: search not working correctly
On Mon, 20 Oct 2008 03:24:36 -0700 (PDT) prerna07 [EMAIL PROTECTED] wrote: Yes, We want search on these incomplete words. Look into the NGram token factory . works a treat - I don't think it's explained a lot in the wiki, but has been discussed in this list in the past, and you also have JavaDoc and the source itself. FWIW, I had problems getting it to work properly with minNgram != maxNGram - analysis.jsp shows a match, but it didn't work in the QH . It could *definitely* have been myself or code @ the time I tested it (pre 1.3 release)... I'll test again to see if it is happening and log a bug if needed. B _ {Beto|Norberto|Numard} Meijome There are two kinds of stupid people. One kind says,'This is old and therefore good'. The other kind says, 'This is new, and therefore better.' John Brunner, 'The Shockwave Rider'. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Synonym format not working
On Mon, 20 Oct 2008 00:08:07 -0700 (PDT) prerna07 [EMAIL PROTECTED] wrote: The issue with synonym arise when i have number in synonym defination: ccc =1,2 gives following result in debugQuery= true : str name=parsedqueryMultiPhraseQuery(all: (1 ) (2 ccc ) 3)/str str name=parsedquery_toStringall: (1 ) (2 ccc ) 3/str However fooaaa= fooaaa, baraaa,bazaaa gives correct synonym results: str name=parsedqueryall:fooaaa all:baraaa all:bazaaa/str str name=parsedquery_toStringall:fooaaa all:baraaa all:bazaaa/str Any pointers to solve the issue with numbers in synonyms? Prerna, in your first email you show your field type has : [...] filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ [..] generateNumberParts=1 will, AFAIK, generate a different token on a number. so ccc1 will be indexed as ccc, 1 . If you use admin/analsys.jsp you can see the step by step process taken by the tokenizer + filters for your data type - you can then tweak it as necessary until you are happy with the results. b _ {Beto|Norberto|Numard} Meijome Immediate success shouldn't be necessary as a motivation to do the right thing. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: query parsing issue + behavior as OR (solr 1.4-dev)
On Mon, 20 Oct 2008 06:21:06 -0700 (PDT) Sunil Sarje [EMAIL PROTECTED] wrote: I am working with nightly build of Oct 17, 2008 and found the issue that something wrong with LuceneQParserPlugin; It takes + as OR Sunil, please do not hijack the thread : http://en.wikipedia.org/wiki/Thread_hijacking thanks, B _ {Beto|Norberto|Numard} Meijome He could be a poster child for retroactive birth control. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: solr1.3 - testing language ?
On Mon, 20 Oct 2008 06:25:09 -0700 (PDT) sunnyfr [EMAIL PROTECTED] wrote: I implemented multi language search, but I didn't finished the website in PHP, how can I check it works properly? maybe by sending to SOLR the queries you plan your PHP frontend to generate ? _ {Beto|Norberto|Numard} Meijome Always do right. This will gratify some and astonish the rest. Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Sorting performance
On Mon, 20 Oct 2008 16:28:23 +0300 christophe [EMAIL PROTECTED] wrote: Hum. this mean I have to wait before I index new documents and avoid indexing when they are created (I have about 50 000 new documents created each day and I was planning to make those searchable ASAP). you can always index + optimize out of band in a 'master' / RW server , and then send the updated index to your slave (the one actually serving the requests). This *will NOT* remove the need to refresh your cache, but it will remove any delay introduced by commit/indexing + optimise. Too bad there is no way to have a centralized cache that can be shared AND updated when new documents are created. hmm not sure it makes sense like that... but maybe along the lines of having an active cache that is used to serve queries, and new ones being prepared, and then swapped when ready. Speaking of which (or not :P) , has anyone thought about / done any work on using memcached for these internal solr caches? I guess it would make sense for setups with several slaves ( or even a master updating memcached too...)...though for a setup with shards it would be slightly more involved (although it *could* be used to support several slaves per 'data shard' ). All the best, B _ {Beto|Norberto|Numard} Meijome RTFM and STFW before anything bad happens. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Advice on analysis/filtering?
On Thu, 16 Oct 2008 16:09:17 +0200 Jarek Zgoda [EMAIL PROTECTED] wrote: They came to such expectations seeing Solr's own Spellcheck at work - if it can suggest correct versions, it should be able to sanitize broken words in documents and search them using sanitized input. For me, this seemed reasonable request (of course, if this can be achieved reasonably abusing solr's spellcheck component). don't forget that the solr spellchecker finds its suggestions based on your corpus. so if you don't have a correctly spelt version of wordA , you won't receive back wordA as a 'spellchecked' version of that word. I think that's how it works by default (which is all I've needed so far). I *think* there is a way to use an external spellchecker (component or list) - so you could have your full list of Polish words in a file, i guess I agree playing with analysis.jsp is the best approach to solving these problems ( tick all the boxes and see how the changes to your terms take place). good luck - let us know what you come up with :) B _ {Beto|Norberto|Numard} Meijome You can discover what your enemy fears most by observing the means he uses to frighten you. Eric Hoffer (1902 - 1983) I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: solr1.3 - testing language ?
On Mon, 20 Oct 2008 08:16:50 -0700 (PDT) sunnyfr [EMAIL PROTECTED] wrote: ok so straight by the admin part ! Hi Johanna - not sure what you mean by 'the admin part'. it should work .. so it doesn't if you tell us what you did (what url you called) , what you expect to receive back (sample of your indexed data) and what you get instead , we may be able to offer better answers... b _ {Beto|Norberto|Numard} Meijome Two things have come out of Berkeley, Unix and LSD. It is uncertain which caused the other. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: dismax and long phrases
On Tue, 07 Oct 2008 09:27:30 -0700 Jon Drukman [EMAIL PROTECTED] wrote: Yep, you can fake it by only using fieldsets (qf) that have a consistent set of stopwords. does that mean changing the query or changing the schema? Jon, - you change schema.xml to define which type each field is. The fieldType says whether you have stopwords or not. - you change solrconfig.xml to define which fields will dismax query on. i dont think you should have to change your query. b _ {Beto|Norberto|Numard} Meijome Mix a little foolishness with your serious plans; it's lovely to be silly at the right moment. Horace I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Problem in using Unique key
On Wed, 8 Oct 2008 03:45:20 -0700 (PDT) con [EMAIL PROTECTED] wrote: But in that case, while doing a full-import I am getting the following error: org.apache.solr.common.SolrException: QueryElevationComponent requires the schema to have a uniqueKeyField Con, if you don't use the Query Elevation component, you can disable it in solrconfig.xml . Not sure why uniqueField is needed for it though. b _ {Beto|Norberto|Numard} Meijome First they ignore you, then they laugh at you, then they fight you, then you win. Mahatma Gandhi. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Dismax , query phrases
On Tue, 30 Sep 2008 11:43:57 -0700 (PDT) Chris Hostetter [EMAIL PROTECTED] wrote: : That's why I was wondering how Dismax breaks it all apart. It makes sense...I : suppose what I'd like to have is a way to tell dismax which fields NOT to : tokenize the input for. For these fields, it would pass the full q instead of : each part of it. Does this make sense? would it be useful at all? the *goal* makes sense, but the implementation would be ... problematic. you have to remember the DisMax parser's whole way of working is to make each chunk of input match against any qf field, and find the highest scoring field for each chunk, with this input... q = some phase qf = a b c ...you get... ( (a:some | b:some | c:some) (a:phrase | b:phrase | c:phrase) ) ...even if dismax could tell that c was a field that should only support exact matches, thanks Hoss, it would by a configuration option. how would it fit c:some phrase into that structure? does this make sense? ( (a:some | b:some ) (a:phrase | b:phrase) ( c:some phrase) ) I've already kinda forgotten how this thread started ... trying to get *exact* matches to always score higher using dismax - keeping in mind that I have multiple exact fields, with different boosts... but would it make sense to just use your exact fields in the pf, and have inexact versions of them in the qf? then docs that match your input exactly should score at the top, but less exact matches will also still match. aha! right, i think that makes sense...i obviously haven't got my head properly around all the different functionality of dismax. I will try it when I'm back @ work... right now, i seem to have solved the problem by using shingles -the fields are artists, song albumtitles ,so high matching on shingles is quite approximate to exact matching - except that I had to remove stopwords, so that impacts on performance. Thanks again :) B _ {Beto|Norberto|Numard} Meijome Which is worse: ignorance or apathy? Don't know. Don't care. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Dismax , query phrases
On Fri, 26 Sep 2008 10:42:42 -0700 (PDT) Chris Hostetter [EMAIL PROTECTED] wrote: : tokenizer : class=solr.KeywordTokenizerFactory / !-- The LowerCase TokenFilter does : Now, when I search with ?q=the doors , all the terms in my q= aren't used : together to build the dismaxQuery , so I never get a match on the _exact fields: The query parser (even the dismax queryparser) does it's white space chunking before handing any input off to the analyzer for the appropriate field, so with [[ ?q=the doors ]] the and doors are going to get analyzed seperately ... which is why you see artist_exact:the^100.0 and artist_exact:doors^100.0 in your parsedquery -- *BUT* since you used KeywordTOkenizer at index time, you'll never get a match for either of those on any document (unles the artist is just the or doors) Hi Hoss :) thanks for the feedback - I arrived @ the same conclusion . The biz requirement is that these *_exact fields match exactly the original contents of the field. Right now we are using Dismax, and changing this means rewriting a lot of the queries , which isn't possible. That's why I was wondering how Dismax breaks it all apart. It makes sense...I suppose what I'd like to have is a way to tell dismax which fields NOT to tokenize the input for. For these fields, it would pass the full q instead of each part of it. Does this make sense? would it be useful at all? : I've tried with other queries that don't include stopwords (smashing pumpkins, : for example), and in all cases, if I don't use , only the LAST word is used : with my _exact fields ( tried with 1, 2 and 3 words, always the same against my : _exact fields..) this LAST word part doesn't make sense to me ... you can see the making it into your query on the *_exact fields in the first DisjunctionMaxQuery, do you have toStrings for these other queries we could see to understand what you mean? I agree, it makes sense as you say...i must have missed the initial tokens. I can't confirm atm, so I'll follow the common sense path :) As usual, thanks for your time and insights :) B _ {Beto|Norberto|Numard} Meijome Humans die and turn to dust, but writing makes us remembered 4000-year-old words of an Egyptian scribe I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Create Indexes
On Fri, 26 Sep 2008 18:58:14 +0530 Dinesh Gupta [EMAIL PROTECTED] wrote: Please tell me where to upload the files. anywhere you have access to... your own website, somewhere anyone on the list can access the files you want to share to address your problems :) b _ {Beto|Norberto|Numard} Meijome Science Fiction...the only genuine consciousness expanding drug Arthur C. Clarke I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How to select one entity at a time?
On Fri, 26 Sep 2008 00:46:07 -0700 (PDT) con [EMAIL PROTECTED] wrote: To be more specific: I have the data-config.xml just like: dataConfig dataSource **/ document entity name=user query=select * from USER /entity entity name=manager query=select * from MANAGERS /entity entity name=both query=select * from MANAGERS,USER where MANAGERS.userID= USER .userID /entity /document /dataConfig Con, I may be confused here...are you asking how to load only data from your USERS SQL table into SOLR, or how to search in your SOLR index for data about 'USERS'. data-config.xml is only relevant for the Data Import Handler...but your following question: I have 3 search conditions. when the client wants to search all the users, only the entity, 'user' must be executed. And if he wants to search all managers, the entity, 'manager' must be executed. How can i accomplish this through url? *seems* to indicate you want to search on this . If you want to search on a particular field from your SOLR schema, DIH is not involved. If you use the standard QH, you say ?q=user:Bob If I misunderstood your question, please explain... cheers, b _ {Beto|Norberto|Numard} Meijome Everything is interesting if you go into it deeply enough Richard Feynman I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Create Indexes
On Fri, 26 Sep 2008 16:32:05 +0530 Dinesh Gupta [EMAIL PROTECTED] wrote: Is it OK to create whole index by Solr web-app? If not than ,How can I create index? I have attached some file that create index now. Dinesh, you sent the same email 2 1/2 hours ago. sending it again will not give you more answers. If you have a file you want to share, you should upload it to a webserver and share the URL - most mailing lists drop any file attachments. _ {Beto|Norberto|Numard} Meijome Never take Life too seriously, no one gets out alive anyway. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How to select one entity at a time?
On Fri, 26 Sep 2008 02:35:18 -0700 (PDT) con [EMAIL PROTECTED] wrote: What you meant is correct only. Please excuse for that I am new to solr. :-( Con, have a read here : http://www.ibm.com/developerworks/java/library/j-solr1/ it helped me pick up the basics a while back. it refers to 1.2, but the core concepts are relevant to 1.3 too. b _ {Beto|Norberto|Numard} Meijome Hildebrant's Principle: If you don't know where you are going, any road will get you there. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How to select one entity at a time?
On Fri, 26 Sep 2008 02:35:18 -0700 (PDT) con [EMAIL PROTECTED] wrote: What you meant is correct only. Please excuse for that I am new to solr. :-( hi Con, nothing to be excused for..but you may want to read the wiki , as it provides quite a lot of information that should answer your questions. DIH is great, but I wouldn't go near it until you understand how to create your own schema.xml and solrconfig.xml . http://wiki.apache.org/solr/FrontPage is the wiki ( everyone else ... is there a guide on getting started on SOLR ? step by step, taking the example and changing it for your own use? ) I want to index all the query results. (I think this will be done by the data-config.xml) hmm...terminology :-) you index documents (similar to records in a database). when you send a query to Solr, you will get results if your query Now while accessing this indexed data, i need this filtering. ie. Either user or manager. I tried your suggestion: http://localhost:8983/solr/select/?q=user:bobversion=2.2start=0rows=10indent=onwt=json the url LOOKS ok. do you have any document in your index with field user containing 'bob; ? try this to get all results ( xml format, first 3 results only... http://localhost:8983/solr/select/?q=*:*rows=3 then, find a field with a value , then search for that value and see if you get that document back - it should work...(with lots of caveats, yes).. If you send us the result we can help u understand better why it isn't working as you intend.. b _ {Beto|Norberto|Numard} Meijome First they ignore you, then they laugh at you, then they fight you, then you win. Mahatma Gandhi. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Dismax , query phrases
On Wed, 24 Sep 2008 08:34:57 -0700 (PDT) Otis Gospodnetic [EMAIL PROTECTED] wrote: What happens if you change ps from 100 to 1 and comment out that ord function? Otis, I think what I am after is what Hoss described in his last paragraph in his reply to your email last year : http://www.nabble.com/DisMax-and-REQUIRED-OR-REQUIRED-query-rewrite-td13395349.html#a13395349 ie, I want everything that Dismax does, BUT , on certain fields, I want it to search for all the terms in my q= , as a phrase. I am thinking of modifying dismax to allow this to be passed as a configuration ( eg, fieldsSearchExact=artist_exact, title_exact), but if I can avoid it that'd be great :). any other ideas, anyone?? thanks! B _ {Beto|Norberto|Numard} Meijome Nature doesn't care how smart you are. You can still be wrong. Richard Feynman I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Shingles , min size?
hi guys, I may have missed it ,but is it possible to tell the solr.ShingleFilterFactory the minimum number of grams to generate per shingle? Similar to NGramTokenizerFactory's minGramSize=3 maxGramSize=3 thanks! B _ {Beto|Norberto|Numard} Meijome Ask not what's inside your head, but what your head's inside of. J. J. Gibson I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Using Shingles to Increase Phrase Search Performance
On Sat, 16 Aug 2008 15:39:44 -0700 Chris Harris [EMAIL PROTECTED] wrote: [...] So finally I modified the Lucene ShingleFilter class to add an outputUnigramIfNoNgram option. Basically, if you set that option, and also set outputUnigrams=false, then the filter will tokenize just as in Exhibit B, except that if the query is only one word long, it will return a corresponding single token, rather than zero tokens. In other words, [Exhibit C] please - please Things were still zippy. And, so far, I think I have seriously improved my phrase search performance without ruining anything. hi Chris, is this change part of 1.3 ? I've tried fieldType name=shingle4_mark2 class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory / filter class=solr.ShingleFilterFactory maxShingleSize=4 outputUnigrams=false outputUnigramIfNoNgram=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType but analysis.jsp shows no tokens generated when there is only 1 word. thanks! B _ {Beto|Norberto|Numard} Meijome I sense much NT in you. NT leads to Bluescreen. Bluescreen leads to downtime. Downtime leads to suffering. NT is the path to the darkside. Powerful Unix is. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Dismax , query phrases
Hello, I've seen references to this in the list, but not completely explained...my apologies if this is FAQ (and for the length of the email). I am using dismax across a number of fields on an index with data about music albums songs - the fields are quite full of stop words. I am trying to boost 'exact' matches - ie, if you search for 'The Doors', those documents with 'The Doors' should be first. I've created the following fieldType and I use it for fields artist_exact and title_exact: fieldType name=lowerCaseString class=solr.TextField sortMissingLast=true omitNorms=true analyzer !-- KeywordTokenizer does no actual tokenizing, so the entire input string is preserved as a single token -- tokenizer class=solr.KeywordTokenizerFactory / !-- The LowerCase TokenFilter does what you expect, which can be when you want your sorting to be case insensitive -- filter class=solr.LowerCaseFilterFactory / !-- The TrimFilter removes any leading or trailing whitespace -- filter class=solr.TrimFilterFactory / /analyzer /fieldType I then give artist_exact and title_exact pretty high boosts ( title_exact^200.0 artist_exact^100.0 ) Now, when I search with ?q=the doors , all the terms in my q= aren't used together to build the dismaxQuery , so I never get a match on the _exact fields: (there are a few other fields involved...pretty self explanatory) str name=rawquerystringthe doors/str str name=querystringthe doors/str ___ str name=parsedquery +((DisjunctionMaxQuery((title_ngram2:th he^0.1 | artist_ngram2:th he^0.1 | title_ngram3:the^4.5 | artist_ngram3:the^3.5 | artist_exact:the^100.0 | title_exact:the^200.0)~0.01) DisjunctionMaxQuery((genre:door^0.2 | title_ngram2:do oo or rs^0.1 | artist_ngram2:do oo or rs^0.1 | title_ngram3:doo oor ors^4.5 | title:door^6.0 | artist_ngram3:doo oor ors^3.5 | artist:door^4.0 | artist_exact:doors^100.0 | title_exact:doors^200.0)~0.01))~2) DisjunctionMaxQuery((title:door^2.0 | artist:door^0.8)~0.01) FunctionQuery((ord(release_year))^0.5) /str str name=parsedquery_toString +(((title_ngram2:th he^0.1 | artist_ngram2:th he^0.1 | title_ngram3:the^4.5 | artist_ngram3:the^3.5 | artist_exact:the^100.0 | title_exact:the^200.0)~0.01 (genre:door^0.2 | title_ngram2:do oo or rs^0.1 | artist_ngram2:do oo or rs^0.1 | title_ngram3:doo oor ors^4.5 | title:door^6.0 | artist_ngram3:doo oor ors^3.5 | artist:door^4.0 | artist_exact:doors^100.0 | title_exact:doors^200.0)~0.01)~2) (title:door^2.0 | artist:door^0.8)~0.01 (ord(release_year))^0.5 but, if I build my search as ?q=the doors str name=parsedquery +DisjunctionMaxQuery((genre:door^0.2 | title_ngram2:th he e d do oo or rs^0.1 | artist_ngram2:th he e d do oo or rs^0.1 | title_ngram3:the he e d do doo oor ors^4.5 | title:door^6.0 | artist_ngram3:the he e d do doo oor ors^3.5 | artist:door^4.0 | artist_exact:the doors^100.0 | title_exact:the doors^200.0)~0.01) DisjunctionMaxQuery((title:door^2.0 | artist:door^0.8)~0.01) FunctionQuery((ord(release_year))^0.5) /str str name=parsedquery_toString +(genre:door^0.2 | title_ngram2:th he e d do oo or rs^0.1 | artist_ngram2:th he e d do oo or rs^0.1 | title_ngram3:the he e d do doo oor ors^4.5 | title:door^6.0 | artist_ngram3:the he e d do doo oor ors^3.5 | artist:door^4.0 | artist_exact:the doors^100.0 | title_exact:the doors^200.0)~0.01 (title:door^2.0 | artist:door^0.8)~0.01 (ord(release_year))^0.5 I've tried with other queries that don't include stopwords (smashing pumpkins, for example), and in all cases, if I don't use , only the LAST word is used with my _exact fields ( tried with 1, 2 and 3 words, always the same against my _exact fields..) What is the reason for this behaviour? my full dismax config is : str name=mm2-1 5-2 690%/str str name=spellchecktrue/str str name=spellcheck.extendedResultstrue/str str name=tie0.01/str str name=qf title_exact^200.0 artist_exact^100.0 title^6.0 title_ngram3^4.5 artist^4.0 artist_ngram3^3.5 title_ngram2^0.1 artist_ngram2^0.1 genre^0.2 /str str name=q.alt*:*/str str name=spellcheck.collatetrue/str str name=defTypedismax/str str name=spellcheck.onlyMorePopulartrue/str str name=rows10/str str name=pftitle^2.0 artist^0.8/str str name=echoParamsall/str str name=fl*,score/str str name=bford(release_year)^0.5/str str name=spellcheck.count1/str str name=ps100/str /lst TIA! B _ {Beto|Norberto|Numard} Meijome Never offend people with style when you can offend them with substance. Sam Brown I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: help required: how to design a large scale solr system
On Wed, 24 Sep 2008 07:46:57 -0400 Mark Miller [EMAIL PROTECTED] wrote: Yes. You will def see a speed increasing by avoiding http (especially doc at a time http) and using the direct csv loader. http://wiki.apache.org/solr/UpdateCSV and the obvious reason that if, for whatever reason, something breaks while you are indexing directly from memory, can you restart the import? it may be just easier to keep in disk and keep track of where you are up to adding to the index... B _ {Beto|Norberto|Numard} Meijome Sysadmins can't be sued for malpractice, but surgeons don't have to deal with patients who install new versions of their own innards. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Dismax , query phrases
On Wed, 24 Sep 2008 08:34:57 -0700 (PDT) Otis Gospodnetic [EMAIL PROTECTED] wrote: What happens if you change ps from 100 to 1 and comment out that ord function? Otis Hi Otis, no luck - without : str name=rawquerystringsmashing pumpkins/str str name=querystringsmashing pumpkins/str str name=parsedquery +((DisjunctionMaxQuery((genre:smash^0.2 | title_ngram2:sm ma as sh hi in ng^0.1 | artist_ngram2:sm ma as sh hi in ng^0.1 | title_ngram3:sma mas ash shi hin ing^4.5 | title:smash^6.0 | artist_ngram3:sma mas ash shi hin ing^3.5 | artist:smash^4.0 | artist_exact:smashing^100.0 | title_exact:smashing^200.0)~0.01) DisjunctionMaxQuery((genre:pumpkin^0.2 | title_ngram2:pu um mp pk ki in ns^0.1 | artist_ngram2:pu um mp pk ki in ns^0.1 | title_ngram3:pum ump mpk pki kin ins^4.5 | title:pumpkin^6.0 | artist_ngram3:pum ump mpk pki kin ins^3.5 | artist:pumpkin^4.0 | artist_exact:pumpkins^100.0 | title_exact:pumpkins^200.0)~0.01))~2) DisjunctionMaxQuery((title:smash pumpkin~1^2.0 | artist:smash pumpkin~1^0.8)~0.01) /str ___ str name=parsedquery_toString +(((genre:smash^0.2 | title_ngram2:sm ma as sh hi in ng^0.1 | artist_ngram2:sm ma as sh hi in ng^0.1 | title_ngram3:sma mas ash shi hin ing^4.5 | title:smash^6.0 | artist_ngram3:sma mas ash shi hin ing^3.5 | artist:smash^4.0 | artist_exact:smashing^100.0 | title_exact:smashing^200.0)~0.01 (genre:pumpkin^0.2 | title_ngram2:pu um mp pk ki in ns^0.1 | artist_ngram2:pu um mp pk ki in ns^0.1 | title_ngram3:pum ump mpk pki kin ins^4.5 | title:pumpkin^6.0 | artist_ngram3:pum ump mpk pki kin ins^3.5 | artist:pumpkin^4.0 | artist_exact:pumpkins^100.0 | title_exact:pumpkins^200.0)~0.01)~2) (title:smash pumpkin~1^2.0 | artist:smash pumpkin~1^0.8)~0.01 Still OK if I include ... I am trying on another setup, with same data, to work with shingles rather than on 'exact' ... dismax seems to handle it much better...but it may be that I haven't added to that config all the ngram3 ngram3 fields for substring matching... the resulting params were : str name=mm2-1 5-2 690%/str str name=spellchecktrue/str str name=spellcheck.extendedResultstrue/str str name=tie0.01/str str name=trstore_albums.xsl/str ___ str name=qf title_exact^200.0 artist_exact^100.0 title^6.0 title_ngram3^4.5 artist^4.0 artist_ngram3^3.5 title_ngram2^0.1 artist_ngram2^0.1 genre^0.2 /str str name=q.alt*:*/str str name=spellcheck.collatetrue/str str name=wtxml/str str name=defTypedismax/str str name=rows10/str str name=spellcheck.onlyMorePopulartrue/str str name=pftitle^2.0 artist^0.8/str str name=echoParamsall/str str name=fl*,score/str str name=spellcheck.count1/str str name=ps1/str str name=debugQuerytrue/str str name=echoParamsall/str str name=wtxml/str str name=qsmashing pumpkins/str thanks, B _ {Beto|Norberto|Numard} Meijome Don't remember what you can infer. Harry Tennant I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: help required: how to design a large scale solr system
On Wed, 24 Sep 2008 11:45:34 -0400 Mark Miller [EMAIL PROTECTED] wrote: Nothing to stop you from breaking up the tsv/csv files into multiple tsv/csv files. Absolutely agreeing with you ... in one system where I implemented SOLR, I have a process run through the file system and lazily pick up new files as they come in.. if something breaks (and it will,as the files are user generated in many cases...), report it / leave it for later...move on. b _ {Beto|Norberto|Numard} Meijome I used to hate weddings; all the Grandmas would poke me and say, You're next sonny! They stopped doing that when i started to do it to them at funerals. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Defining custom schema
On Wed, 24 Sep 2008 04:42:42 -0700 (PDT) con [EMAIL PROTECTED] wrote: In the table we will be having various column names like CUSTOMER_NAME, CUSTOMER_PHONE etc. If we use the default schema.xml, we have to map these values to some the default values like cat, features etc. this will cause difficulty when we need to process the output. Instead can we set the column name and column type dynamically to the schema.xml so that the output will show something like, CUSTOMER_NAME markrmiller/CUSTOMER_NAME Con, the default schema you refer to is from the example application. You should definitely edit it and define your own fields. b _ {Beto|Norberto|Numard} Meijome In my opinion, we don't devote nearly enough scientific research to finding a cure for jerks. Calvin I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Any way to extract most used keywords from an index (or a random set)
On Mon, 22 Sep 2008 15:46:54 +0530 Jacob Singh [EMAIL PROTECTED] wrote: Hi, I'm trying to write a testing suite to gauge the performance of solr searches. To do so, I'd like to be able to find out what keywords will get me search results. Is there anyway to programaticaly do this with luke? I'm trying to figure out what all it exposes, but I'm not seeing this. Hi Jacob, are you after something that the following URLs don't provide ? http://host/solr/core/admin/luke?wt=xslttr=luke.xsl but I actually prefer the schema browser ( 1.3 ) to see the top n terms per field... b _ {Beto|Norberto|Numard} Meijome If it's there, and you can see it, it's real. If it's not there, and you can see it, it's virtual. If it's there, and you can't see it, it's transparent. If it's not there, and you can't see it, you erased it. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Special character matching 'x' ?
On Thu, 18 Sep 2008 10:53:39 +0530 Sanjay Suri [EMAIL PROTECTED] wrote: One of my field values has the name R__ikk__nen which contains a special characters. Strangely, as I see it anyway, it matches on the search query 'x' ? Can someone explain or point me to the solution/documentation? hi Sanjay, Akshay should have given you an answer for this. In a more general way, if you want to know WHY something is matching the way it is, run the query with debugQuery=true . There are a few pages in the wiki which explain other debugging techniques. b _ {Beto|Norberto|Numard} Meijome Ask not what's inside your head, but what your head's inside of. J. J. Gibson I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: about boost weight
On Sat, 13 Sep 2008 16:17:12 + zzh [EMAIL PROTECTED] wrote: I think this is a stupid method, because the search conditions is too long, and the search efficiency will be low, we hope you can help me to solve this problem. Hi, IMHO,a long set of conditions doesn't make it stupid. You may not be going the best way about it though. You may find http://wiki.apache.org/solr/DisMaxRequestHandler an interesting and useful read :) B _ {Beto|Norberto|Numard} Meijome Quality is never an accident, it is always the result of intelligent effort. John Ruskin (1819-1900) I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Regarding Indexing
On Fri, 29 Aug 2008 00:31:13 -0700 (PDT) sanraj25 [EMAIL PROTECTED] wrote: But still i cant maintain two index. please help me how to create two cores in solr What specific problem do you have ? B _ {Beto|Norberto|Numard} Meijome Always listen to experts. They'll tell you what can't be done, and why. Then do it. Robert A. Heinlein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Regarding Indexing
On Fri, 29 Aug 2008 02:37:10 -0700 (PDT) sanraj25 [EMAIL PROTECTED] wrote: I want to store two independent datas in solr index. so I decided to create two index.But that's not possible.so i go for multicore concept in solr .can u give me step by step procedure to create multicore in solr Hi, without specific questions, i doubt myself or others can give you any other information than the documentation, which can be found at : http://wiki.apache.org/solr/CoreAdmin Please make sure you are using (a recent version of ) 1.3. B _ {Beto|Norberto|Numard} Meijome Your reasoning is excellent -- it's only your basic assumptions that are wrong. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Storing two different files
On Thu, 28 Aug 2008 02:01:05 -0700 (PDT) sanraj25 [EMAIL PROTECTED] wrote: I want to index two different files in solr.(for ex) I want to store two tables like, job_post and job_profile in solr. But now both are stored in same place in solr.when i get data from job_post, data come from job_profile also.So i want to maintain the data of job_post and job_profile separately. hi :) you need to have 2 separate schemas, and therefore 2 separate indexes. You should read about MultiCore in the wiki. B _ {Beto|Norberto|Numard} Meijome Unix is very simple, but it takes a genius to understand the simplicity. Dennis Ritchie I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Question about search suggestion
On Tue, 26 Aug 2008 15:15:21 +0300 Aleksey Gogolev [EMAIL PROTECTED] wrote: Hello. I'm new to solr and I need to make a search suggest (like google suggestions). Hi Aleksey, please search the archives of this list for subjects containing 'autocomplete' or 'auto-suggest'. that should give you a few ideas and starting points. best, B _ {Beto|Norberto|Numard} Meijome The more I see the less I know for sure. John Lennon I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Multicore and snapshooter / snappuller
On Fri, 22 Aug 2008 12:21:53 -0700 Lance Norskog [EMAIL PROTECTED] wrote: Apparently the ZFS (Silicon Graphics originally) is great for really huge files. hi Lance, You may be confusing Sun's ZFS with SGI's XFS. The OP referred, i think, to ZFS. B _ {Beto|Norberto|Numard} Meijome The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well-meaning but without understanding. Justice Louis D. Brandeis I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: dataimporthandler and mysql connector jar
On Mon, 25 Aug 2008 17:11:47 +0200 Walter Ferrara [EMAIL PROTECTED] wrote: Launching a multicore solr with dataimporthandler using a mysql driver, (driver=com.mysql.jdbc.Driver) works fine if the mysql connector jar (mysql-connector-java-5.0.7-bin.jar) is in the classpath, either jdk classpath or inside the solr.war lib dir. While putting the mysql-connector-java-5.0.7-bin.jar in core0/lib directory, or in the multicore shared lib dir (specified in sharedLib attribute in solr.xml) result in exception, even if the jar is correctly loaded by the classloader: Hi Walter, As at nightly build of August 19th, the DIH failing to connect to the data source on SOLR's startup does *not* kill SOLR anymore. I haven't tested yesterday's ...it could be a regression bug, but i doubt it - the error used to be different to yours (about connectivity, not failure in document). for what is worth,i only have 1 copy of the jdbc jar (MS SQL in my case), in the SOLR's lib directory, used by several cores's own DIH. You can check if it's picked up by SOLR's classpath in the Java Info page under admin/ You may also want to try with a valid but empty document definition in data-config.xml to rule out syntax issues. B _ {Beto|Norberto|Numard} Meijome Any society that would give up a little liberty to gain a little security will deserve neither and lose both. Benjamin Franklin I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Querying Question
On Thu, 21 Aug 2008 18:09:11 -0700 Jake Conk [EMAIL PROTECTED] wrote: I thought if I used copyField / to copy my string field to a text field then I can search for words within it and not limited to the entire content. Did I misunderstand that? but you need to search on the fields that are defined as fieldType=text...it seems you are searching on the string fields. B _ {Beto|Norberto|Numard} Meijome He has the attention span of a lightning bolt. Robert Redford I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: hello, a question about solr.
On Wed, 20 Aug 2008 10:58:50 -0300 Alexander Ramos Jardim [EMAIL PROTECTED] wrote: A tiny but really explanation can be found here http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters thanks Alexander - indeed, quite short, and focused on shingles ... which , if I understand correctly, are groups of terms of n size... the ngramtokizer creates tokens of n-characters from your input. Searching for ngram or n-gram in the archives should bring more relevant information up, which isnt in the wiki yet. B _ {Beto|Norberto|Numard} Meijome All that is necessary for the triumph of evil is that good men do nothing. Edmund Burke I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: hello, a question about solr.
On Mon, 18 Aug 2008 15:33:02 +0800 finy finy [EMAIL PROTECTED] wrote: the name field is text,which is analysed, i use the query name:ibmT63notebook why do you search with no spaces? is this free text entered by a user, or is it part of a link which you control ? PS: please dont top-post _ {Beto|Norberto|Numard} Meijome Commitment is active, not passive. Commitment is doing whatever you can to bring about the desired result. Anything less is half-hearted. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
.wsdl for example....
hi :) does anyone have a .wsdl definition for the example bundled with SOLR? if nobody has it, would it be useful to have one ? cheers, B _ {Beto|Norberto|Numard} Meijome Intelligence: Finding an error in a Knuth text. Stupidity: Cashing that $2.56 check you got. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: hello, a question about solr.
On Mon, 18 Aug 2008 23:07:19 +0800 finy finy [EMAIL PROTECTED] wrote: because i use chinese character, for example ibm___ solr will parse it into a term ibm and a phraze _ __ can i use solr to query with a term ibm and a term _ and a term __? Hi finy, you should look into n-gram tokenizers. Not sure if it is documented in the wiki, but it has been discussed in the mailing list quite a few times. in short, an n-gram tokenizer breaks your input into blocks of characters of size n , which are then used to compare in the index. I think for Chinese , bi-gram is the favoured approach. good luck, B _ {Beto|Norberto|Numard} Meijome I used to hate weddings; all the Grandmas would poke me and say, You're next sonny! They stopped doing that when i started to do it to them at funerals. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: .wsdl for example....
On Mon, 18 Aug 2008 19:08:24 -0300 Alexander Ramos Jardim [EMAIL PROTECTED] wrote: Do you wanna a full web service for SOLR example? How a .wsdl will help you? Why don't you use the HTTP interface SOLR provides? Anyways, if you need to develop a web service (SOAP compliant) to access SOLR, just remember to use an embedded core on your webservice. On Mon, 18 Aug 2008 15:37:24 -0400 Erik Hatcher [EMAIL PROTECTED] wrote: WSDL? surely you jest. Erik :D I obviously said something terribly stupid, oh well, not the first time and most likely wont be the last one either. Anyway, the reason for my asking is : - I've put together a SOLR search service with a few cores. Nothing fancy, it works great as is. - the .NET developer I am working with on this asked for a .wsdl (or .asmx) file to import into Visual Studio ... yes, he can access the service directly, but he seems to prefer a more 'well defined' interface (haven't really decided whether it is worth the effort, but that is another question altogether) The way I see it, SOLR is a RESTful service. I am not looking into wrapping the whole thing behind SOAP ( I actually much prefer REST than SOAP, but that is entering into quasi-religious grounds...) - which should be able to be defined with a .wsdl ( v 1.1 should suffice as only GET + POST are supported in SOLR anyway). Am I missing anything here ? thanks in advance for your time + thoughts , B _ {Beto|Norberto|Numard} Meijome He has no enemies, but is intensely disliked by his friends. Oscar Wilde I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: .wsdl for example....
On Tue, 19 Aug 2008 11:23:48 +1000 Norberto Meijome [EMAIL PROTECTED] wrote: On Mon, 18 Aug 2008 19:08:24 -0300 Alexander Ramos Jardim [EMAIL PROTECTED] wrote: Do you wanna a full web service for SOLR example? How a .wsdl will help you? Why don't you use the HTTP interface SOLR provides? Anyways, if you need to develop a web service (SOAP compliant) to access SOLR, just remember to use an embedded core on your webservice. On Mon, 18 Aug 2008 15:37:24 -0400 Erik Hatcher [EMAIL PROTECTED] wrote: WSDL? surely you jest. Erik :D I obviously said something terribly stupid, oh well, not the first time and most likely wont be the last one either. Anyway, the reason for my asking is : - I've put together a SOLR search service with a few cores. Nothing fancy, it works great as is. - the .NET developer I am working with on this asked for a .wsdl (or .asmx) file to import into Visual Studio ... yes, he can access the service directly, but he seems to prefer a more 'well defined' interface (haven't really decided whether it is worth the effort, but that is another question altogether) The way I see it, SOLR is a RESTful service. I am not looking into wrapping the whole thing behind SOAP ( I actually much prefer REST than SOAP, but that is entering into quasi-religious grounds...) - which should be able to be defined with a .wsdl ( v 1.1 should suffice as only GET + POST are supported in SOLR anyway). Am I missing anything here ? thanks in advance for your time + thoughts , B To be clear, i don't suggest we should have a .wsdl for example, simply asking if there would be any use in having one. but given the responses I got, I'm curious now to understand what I have gotten wrong :) Best, B _ {Beto|Norberto|Numard} Meijome I sense much NT in you. NT leads to Bluescreen. Bluescreen leads to downtime. Downtime leads to suffering. NT is the path to the darkside. Powerful Unix is. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Clarification on facets
On Tue, 19 Aug 2008 10:18:12 +1200 Gene Campbell [EMAIL PROTECTED] wrote: Is this interpreted as meaning, there are 10 documents that will match with 'car' in the title, and likewise 6 'boat' and 2 'bike'? Correct. If so, is there any way to get counts for the *number times* a value is found in a document. I'm looking for a way to determine the number of times 'car' is repeated in the title, for example Not sure - i would suggest that a field with a term repeated several times would receive a higher score when searching for that term, but not sure how you could get the information you seek...maybe with the Luke handler ? ( but on a per-document basis...slow... ? ) B _ {Beto|Norberto|Numard} Meijome Computers are like air conditioners; they can't do their job properly if you open windows. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
[SOLVED...]Re: Problems using saxon for XSLT transforms
On Tue, 12 Aug 2008 23:36:32 +1000 Norberto Meijome [EMAIL PROTECTED] wrote: hi :) I'm trying to use SAXON instead of the default XSLT parser. I was pretty sure i had it running fine on 1.2, but when I repeated the same steps (as per the wiki) on latest nightly build, i cannot see any sign of it being loaded or use, although the classpath seems to be pointing to them (see below) [...] well, although no explicit information is present about whether it IS using saxon, it obviously dies when saxon isn't present- I moved lib/saxon* out of the way, and any transformation dies with : HTTP ERROR: 500 Provider net.sf.saxon.TransformerFactoryImpl not found javax.xml.transform.TransformerFactoryConfigurationError: Provider net.sf.saxon.TransformerFactoryImpl not found at javax.xml.transform.TransformerFactory.newInstance(TransformerFactory.java:108) at org.apache.solr.util.xslt.TransformerProvider.init(TransformerProvider.java:45) at org.apache.solr.util.xslt.TransformerProvider.clinit(TransformerProvider.java:43) at org.apache.solr.request.XSLTResponseWriter.getTransformer(XSLTResponseWriter.java:117) at org.apache.solr.request.XSLTResponseWriter.getContentType(XSLTResponseWriter.java:65) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:250) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1088) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:829) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:450) RequestURI=/solr/tracks/select/ I guess not as clear as what I'd had hoped for, but should do for now :) cheers, B _ {Beto|Norberto|Numard} Meijome Computers are like air conditioners; they can't do their job properly if you open windows. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
DataImportHandler : more forgiving initialisation possible?
hi guys, First of all, thanks for DIH - it's great :) One thing I noticed during my tests ( nightly, 2008-08-16) is that, if the DB is not available during SOLR startup time, the whole core won't initialise .- the error is shown below. I was wondering, 1) would it be possible to have DIH bomb out in this situation, but not bring down the whole core from running? I think it would be desirable , with a big warning , possibly... thoughts ? 2) How hard would it be to handle this more gracefully - for example, in case of error, leave the handler in an non-init state, and when being accessed, repeat the whole init process (and bomb out if it fails again ,of course)... Thanks for your time on this email + DIH + all other features :) B [...] Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.DataImportHandler processConfiguration INFO: Processing configuration from solrconfig.xml: {config=data-config.xml} Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity an_artist with URL: jdbc:sqlserver://a.b.c.d:1433;databaseName=DBNAME;user=usrname;password=magicpassword;responseBuffering=adaptive; Aug 17, 2008 11:25:48 PM org.apache.solr.handler.dataimport.DataImportHandler inform SEVERE: Exception while loading DataImporter org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to initialize DataSource: null Processing Documemt # at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:306) at org.apache.solr.handler.dataimport.DataImporter.addDataSource(DataImporter.java:273) at org.apache.solr.handler.dataimport.DataImporter.initEntity(DataImporter.java:228) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:98) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:294) at org.apache.solr.core.SolrCore.init(SolrCore.java:473) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:295) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:107) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:593) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1220) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:513) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:222) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:977) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to create database connection Processing Documemt # at org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:67) at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:303) ... 34 more Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host has failed. java.net.ConnectException: Connection refused at
DIH - calling spellchecker rebuild...
Guys + gals, just a question of form - would DIH itself be the right place to implement a URLS to call after successfully completing a DIH full or partial load - for example, to rebuild spellchecker when new items have been added? Or should that be part of my external process (cron - shell script, for example ) that calls DIH in the first place ? cheers B _ {Beto|Norberto|Numard} Meijome If you find a solution and become attached to it, the solution may become your next problem. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: DIH - calling spellchecker rebuild...
On Sun, 17 Aug 2008 20:22:26 +0530 Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: If it is only SpellCheckComponent that you are interested in, then see SOLR-622. You can add this to your SCC config to rebuild SCC after every commit: str name=buildOnCommittrue/str ah great stuff , thanks Shalin. B _ {Beto|Norberto|Numard} Meijome Truth has no special time of its own. Its hour is now -- always. Albert Schweitzer I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
DIH - commit / optimize
Hi again, I see in the DIH wiki page : [...] full-import [..] commit: (default 'true'). Tells whether to commit+optimize after the operation [...] but nothing for delta-import... I think it would be useful , a 'commit' (default=true) , 'optimize' (default=false) for the delta-import - these should most probably be separate ones, i think. - for full-import , wouldn't it make sense to split commit + optimize into 2 different options? Granted, if I do a clean=true, i'd probably want (need!) an optimize... even then, optimize may be too slow / use too much memory at that point in time... ? ( not too sure about this argument..) cheers, B _ {Beto|Norberto|Numard} Meijome Never take Life too seriously, no one gets out alive anyway. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: DIH - commit / optimize
On Mon, 18 Aug 2008 10:14:32 +0800 finy finy [EMAIL PROTECTED] wrote: i use solr for 3 months, and i find some question follow: Please do not hijack mail threads. http://en.wikipedia.org/wiki/Thread_hijacking _ {Beto|Norberto|Numard} Meijome Ask not what's inside your head, but what your head's inside of. J. J. Gibson I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: DIH - commit / optimize
On Mon, 18 Aug 2008 09:34:56 +0530 Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Actually we have commit and optimize as separate request parameters defaulting to true for both full-import and delta-import. You can add a request parameter optimize=false for delta-import if you want to commit but not to optimize the index. ah , now it makes perfect sense :) sorry, i should have checked the src myself. thanks so much again :) B _ {Beto|Norberto|Numard} Meijome What you are afraid to do is a clear indicator of the next thing you need to do. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Best way to index without diacritics
( 2 in 1 reply) On Wed, 13 Aug 2008 09:59:21 -0700 Walter Underwood [EMAIL PROTECTED] wrote: Stripping accents doesn't quite work. The correct translation is language-dependent. In German, o-dieresis should turn into oe, but in English, it shoulde be o (as in co__perate or M__tley Cr__e). In Swedish, it should not be converted at all. Hi Walter, understood. This goes back to the question of language-specific field definitions / parsers... more on this below. There are other character-to-string conversions: ae-ligature to ae, __ to ss, and so on. Luckily, those are independent of language. wunder On 8/13/08 9:16 AM, Steven A Rowe [EMAIL PROTECTED] wrote: Hi Norberto, https://issues.apache.org/jira/browse/LUCENE-1343 hi Steve, thanks for the pointer. this is a Lucene entry... I thought the Latin-filter was a SOLR feature? I, for one, definitely meant a SOLR filter. Given what Walter rightly pointed out about differences in language, I suspect it would be a SOLR-level thing - fieldType name=textDE language=DE would apply the filter of unicode chars to {ascii?} with the appropriate mapping for German, etc. Or is this that Lucene would / should take care of ? B _ {Beto|Norberto|Numard} Meijome I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. Dostoevsky I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Best way to index without diacritics
On Thu, 14 Aug 2008 11:34:47 -0400 Steven A Rowe [EMAIL PROTECTED] wrote: [...] The kind of filter Walter is talking about - a generalized language-aware character normalization Solr/Lucene filter - does not yet exist. My guess is that if/when it does materialize, both the Solr and the Lucene projects will want to have it. Historically, most functionality shared by Solr and Lucene is eventually hosted by Lucene, since Solr has a Lucene dependency, but not vice-versa. So, yes, Solr would be responsible for hosting configuration for such a filter, but the responsibility for doing something with the configuration would be Lucene's responsibility, assuming that Lucene would (eventually) host the filter and Solr would host a factory over the filter. Steve thanks for the thorough explanation ,Steve . B _ {Beto|Norberto|Numard} Meijome Throughout the centuries there were [people] who took first steps down new paths armed only with their own vision. Ayn Rand I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Searching Questions
On Tue, 12 Aug 2008 13:26:26 -0700 Jake Conk [EMAIL PROTECTED] wrote: 1) I want to search only within a specific field, for instance `category`. Is there a way to do this? of course. Please see http://wiki.apache.org/solr/SolrQuerySyntax (in particular, follow the link to Lucene syntax..) 2) When searching for multiple results are the following identical since *_facet and *_facet_mv have their type's both set to string? /select?q=tag_facet:%22John+McCain%22+OR+tag_facet:%22Barack+Obama%22 /select?q=tag_facet_mv:%22John+McCain%22+OR+tag_facet_mv:%22Barack+Obama%22 Erik H. already answered this question , in another of your emails. Check your mailbox or the lists archives. 3) If I'm searching for something that is in a text field but I specify it as a facet string rather than a text type would it still search within text fields or would it just limit the search to string fields? I am not sure what you mean by 'a facet string' . You facet on fields, SOLR automatically creates facets on those fields based on the results to your query . 4) Is there a page that will show me different querying combinations or can someone post some more examples? Have you check the wiki ? which page do you suggest needs more examples? 5) Anyone else notice returning back the data in php (wt=phps) doesn't unserialize? I am using PHP 5.3 w/ a nightly copy of Solr from last week. sorry, haven't used PHP + SOLR cheers, B _ {Beto|Norberto|Numard} Meijome All that is necessary for the triumph of evil is that good men do nothing. Edmund Burke I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Problems using saxon for XSLT transforms
hi :) I'm trying to use SAXON instead of the default XSLT parser. I was pretty sure i had it running fine on 1.2, but when I repeated the same steps (as per the wiki) on latest nightly build, i cannot see any sign of it being loaded or use, although the classpath seems to be pointing to them (see below) In my logs,i see : INFO: created xslt: org.apache.solr.request.XSLTResponseWriter Aug 12, 2008 11:20:07 PM org.apache.solr.request.XSLTResponseWriter init INFO: xsltCacheLifetimeSeconds=5 which is the RH itself, then, on a hit that triggers the transform : Aug 12, 2008 11:21:25 PM org.apache.solr.util.xslt.TransformerProvider init WARNING: The TransformerProvider's simplistic XSLT caching mechanism is not appropriate for high load scenarios, unless a single XSLT transform is used and xsltCacheLifetimeSeconds is set to a sufficiently high value. This is where I would expect to see saxon...right? I'm running SOLR 1.3, nightly from 2008-08-11, under FreeBSD 7 (stable), JDK 1.6.. I have 4 cores defined in this test environment. I start my service with : java -Xms64m -Xmx1024m -server -Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl -jar start.jar the /admin/get-properties.jsp shows [] javax.xml.transform.TransformerFactory = net.sf.saxon.TransformerFactoryImpl java.specification.version = 1.6 [...] java.class.path = /solrhome:/solrhome/lib/saxon9-s9api.jar:/solrhome/lib/jetty-6.1.11.jar:/solrhome/lib/saxon9-jdom.jar:/solrhome/lib/saxon9-sql.jar:/solrhome/lib/servlet-api-2.5-6.1.11.jar:/solrhome/lib/saxon9-xqj.jar:/solrhome/lib/saxon9.jar:/solrhome/lib/jetty-util-6.1.11.jar:/solrhome/lib/saxon9-xom.jar:/solrhome/lib/saxon9-dom4j.jar:/solrhome/lib/saxon9-xpath.jar:/solrhome/lib/saxon9-dom.jar:/solrhome/lib/jsp-2.1/core-3.1.1.jar:/solrhome/lib/jsp-2.1/ant-1.6.5.jar:/solrhome/lib/jsp-2.1/jsp-2.1.jar:/solrhome/lib/jsp-2.1/jsp-api-2.1.jar:/solrhome/lib/management/jetty-management-6.1.11.jar:/solrhome/lib/naming/jetty-naming-6.1.11.jar:/solrhome/lib/naming/activation-1.1.jar:/solrhome/lib/naming/mail-1.4.jar:/solrhome/lib/plus/jetty-plus-6.1.11.jar:/solrhome/lib/xbean/jetty-xbean-6.1.11.jar:/solrhome/lib/annotations/geronimo-annotation_1.0_spec-1.0.jar:/solrhome/lib/annotations/jetty-annotations-6.1.11.jar:/solrhome/lib/ext/jetty-java5-threadpool-6.1.11.jar:/solrhome/lib/ext/jetty-sslengine-6 .1.11.jar:/solrhome/lib/ext/jetty-servlet-tester-6.1.11.jar:/solrhome/lib/ext/jetty-ajp-6.1.11.jar:/solrhome/lib/ext/jetty-setuid-6.1.11.jar:/solrhome/lib/ext/jetty-client-6.1.11.jar:/solrhome/lib/ext/jetty-html-6.1.11.jar [...] Any pointers to where I should check to confirm saxon is being used, or to address the problem will be greatly appreciated. TIA, B _ {Beto|Norberto|Numard} Meijome Nature doesn't care how smart you are. You can still be wrong. Richard Feynman I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: adds / delete within same 'transaction'..
On Tue, 12 Aug 2008 11:21:50 -0700 Mike Klaas [EMAIL PROTECTED] wrote: will delete happen first, and then the add, or could it be that the add happens before delete, in which case i end up with no more doc id=1 ? As long as you are sending these requests on the same thread, they will occur in order. -Mike right, that is GREAT to know then :) cheers, b _ {Beto|Norberto|Numard} Meijome Life is not measured by the number of breaths we take, but by the moments that take our breath away. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: adds / delete within same 'transaction'..
On Tue, 12 Aug 2008 20:53:12 -0400 Yonik Seeley [EMAIL PROTECTED] wrote: On Tue, Aug 12, 2008 at 1:48 AM, Norberto Meijome [EMAIL PROTECTED] wrote: What happens if I issue: deleteid1/id/delete adddocid1/idnamenew/name/doc commit/ will delete happen first, and then the add, or could it be that the add happens before delete Doesn't matter... it's an implementation detail. Solr used to buffer deletes, and if it crashed at the right time one could get duplicates. Now, Lucene does the buffering of deletes (internally lucene does the adds first and buffers the deletes until a segment flush) and it should be impossible to see more than one 1 or no 1 at all. Thanks Yonik. I wasn't asking about the specific details, but of the consequence. I seem to remember (incorrectly , or v1.2 only maybe) , that if one wanted assurances that the case above happened in the right order, one had to commit after the deletes, and once more after the adds. This not being the case, I am happy :) Thanks again, B _ {Beto|Norberto|Numard} Meijome He has Van Gogh's ear for music. Billy Wilder I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Best way to index without diacritics
On Tue, 12 Aug 2008 11:44:42 -0400 Steven A Rowe [EMAIL PROTECTED] wrote: Solr is Unicode aware. The ISOLatin1AccentFilterFactory handles diacritics for the ISO Latin-1 section of the Unicode character set. UTF (do you mean UTF-8?) is a (set of) Unicode serialization(s), and once Solr has deserialized it, it is just Unicode characters (Java's in-memory UTF-16 representation). So as long as you're only concerned about removing diacritics from the set of Unicode characters that overlaps ISO Latin-1, and not about other Unicode characters, then ISOLatin1AccentFilterFactory should work for you. hi, do you know if anyone has implemented a similar filter using icu and mapping (a lot more of) UTF-8 to ascii ? B _ {Beto|Norberto|Numard} Meijome He has the attention span of a lightning bolt. Robert Redford I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Still no results after removing from stopwords
On Sun, 10 Aug 2008 19:58:24 -0700 (PDT) SoupErman [EMAIL PROTECTED] wrote: I needed to run a search with a query containing the word not, so I removed not from the stopwords.txt file. Which seemed to work, at least as far as parsing the query. It was now successfully searching for that keyword, as noted in the query debugger. However it isn't returning any results where not is in the query, which suggests not hasn't been indexed. However looking at the listing for a particular item, not is listed as one of the keywords, so it should be finding it? Hi Michael, did you reindex your documents after 1) changing your settings and 2) restarting SOLR (to allow your settings to come into effect)? B _ {Beto|Norberto|Numard} Meijome Real Programmers don't comment their code. If it was hard to write, it should be hard to understand and even harder to modify. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: unique key
On Wed, 6 Aug 2008 12:25:34 +1000 Norberto Meijome [EMAIL PROTECTED] wrote: On Tue, 5 Aug 2008 14:41:08 -0300 Scott Swan [EMAIL PROTECTED] wrote: I currently have multiple documents that i would like to index but i would like to combine two fields to produce the unique key. the documents either have 1 or the other fields so by combining the two fields i will get a unique result. is this possible in the solr schema? Hi Scott, you can't do that by the schema - you need to do it when you generate your document, before posting it to SOLR. Hi again, after reading the DataImportHandler documentation, you could do this too with specific configuration in DIH itself. Of course, you have to be using DIH to load data into your SOLR ;) B _ {Beto|Norberto|Numard} Meijome Intellectual: 'Someone who has been educated beyond his/her intelligence' Arthur C. Clarke, from 3001, The Final Odyssey, Sources. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Can't Delete Record
On Mon, 11 Aug 2008 06:48:05 -0700 (PDT) Vj Ali [EMAIL PROTECTED] wrote: i also sends coomit tag as well. maybe you need commit/ instead of coomit ? _ {Beto|Norberto|Numard} Meijome With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. [RFC1925 - section 2, subsection 3] I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
adds / delete within same 'transaction'..
Hello :) I *think* i know the answer, but i'd like to confirm : Say I have docid1id/nameold/name/doc already indexed and commited (ie, 'live' ) What happens if I issue: deleteid1/id/delete adddocid1/idnamenew/name/doc commit/ will delete happen first, and then the add, or could it be that the add happens before delete, in which case i end up with no more doc id=1 ? thanks!! B _ {Beto|Norberto|Numard} Meijome Anyone who isn't confused here doesn't really understand what's going on. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: case preserving for data but not for indexing
On Wed, 6 Aug 2008 21:35:47 -0700 (PDT) Otis Gospodnetic [EMAIL PROTECTED] wrote: tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.StandardTokenizerFactory/ 2 Tokenizers? i wondered about that too, but didn't have the time to test... B _ {Beto|Norberto|Numard} Meijome Always listen to experts. They'll tell you what can't be done, and why. Then do it. Robert A. Heinlein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: HTML Standard Strip filter word boundary bug
On Thu, 7 Aug 2008 00:50:59 -0700 (PDT) matt connolly [EMAIL PROTECTED] wrote: Where do I file a bug report? https://issues.apache.org/jira thanks! B _ {Beto|Norberto|Numard} Meijome Contrary to popular belief, Unix is user friendly. It just happens to be very selective about who it decides to make friends with. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Logo thought
On Tue, 05 Aug 2008 16:02:51 -0400 Stephen Weiss [EMAIL PROTECTED] wrote: My issue with the logos presented was they made solr look like a school project instead of the powerful tool that it is. The tricked out font or whatever just usually doesn't play well with the business types... they want serious-looking software. First impressions are everything. While the fiery colors are appropriate for something named Solr, you can play with that without getting silly - take a look at: couldn't agree more. current logo needs improvement, but I think it can be done much better... In particular thinking of small icons, print,etc... http://www.ascsolar.com/images/asc_solar_splash_logo.gif http://www.logostick.com/images/EOS_InvestmentingLogo_lg.gif (Luckily there are many businesses that do solar energy!) They have the same elements but with a certain simplicity and elegance. I know probably some people don't care if it makes the boss or client happy, but, these are the kinds of seemingly insignificant things that Indeed - the way I see it, if you don't care either way, then you should be happy to have a professional looking one :P B _ {Beto|Norberto|Numard} Meijome Caminante no hay camino, se hace camino al andar Antonio Machado I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Diagnostic tools
On Tue, 5 Aug 2008 11:43:44 -0500 Kashyap, Raghu [EMAIL PROTECTED] wrote: Hi, Hi Kashyap, please don't hijack topic threads. http://en.wikipedia.org/wiki/Thread_hijacking thanks!! B _ {Beto|Norberto|Numard} Meijome Software QA is like cleaning my cat's litter box: Sift out the big chunks. Stir in the rest. Hope it doesn't stink. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: unique key
On Tue, 5 Aug 2008 14:41:08 -0300 Scott Swan [EMAIL PROTECTED] wrote: I currently have multiple documents that i would like to index but i would like to combine two fields to produce the unique key. the documents either have 1 or the other fields so by combining the two fields i will get a unique result. is this possible in the solr schema? Hi Scott, you can't do that by the schema - you need to do it when you generate your document, before posting it to SOLR. btw, please don't hijack topic threads. http://en.wikipedia.org/wiki/Thread_hijacking thanks!! B _ {Beto|Norberto|Numard} Meijome Law of Conservation of Perversity: we can't make something simpler without making something else more complex I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Sum of one field
On Tue, 05 Aug 2008 18:58:42 -0300 Leonardo Dias [EMAIL PROTECTED] wrote: So I'm looking for a Ferrari. CarStore says that there are 5 ads for Ferrari, but one ad has 2 Ferraris being sold, the other ad has 3 Ferraris and all the others have 1 Ferrari each, meaning that there are 5 ads and 8 Ferraris. And yes, I'm doing an example with Fibonacci numbers. ;) why not create one separate document per car? It'll make it easier (for the client) to manage too when one of the cars is sold but not the other 4 B _ {Beto|Norberto|Numard} Meijome With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. [RFC1925 - section 2, subsection 3] I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Logo thought
On Mon, 4 Aug 2008 09:29:30 -0700 Ryan McKinley [EMAIL PROTECTED] wrote: If there is a still room for new log design for Solr and the community is open for it then I can try to come up with some proposal. Doing logo for Mahout was really interesting experience. In my opinion, yes I'd love to see more effort put towards the logo. I have stayed out of this discussion since I don't really think any of the logos under consideration are complete. (I begged some friends to do two of the three logos under consideration) I would love to refine them, but time... oooh time. +1 If we are going to change what we have, i'd love to see some more options , or better quality - no offence meant , but those logos aren't really a huge improvement or departure from the current one. I think whatever we change to we'll be wanting to use it for a long time. B _ {Beto|Norberto|Numard} Meijome If you find a solution and become attached to it, the solution may become your next problem. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: solr 1.3 ??
On Mon, 4 Aug 2008 21:13:09 -0700 (PDT) Vicky_Dev [EMAIL PROTECTED] wrote: Can we get solr 1.3 release as soon as possible? Otherwise some interim release (1.2.x) containing DataImportHandler will also a good option. Any Thoughts? have you tried one of the nightly builds? I've been following it every so often...sometimes there is a problem, but hardly ever... you can find a build you are comfortable with, and it'll be far closer to the actual 1.3 when released than 1.2 . B _ {Beto|Norberto|Numard} Meijome Quantum Logic Chicken: The chicken is distributed probabalistically on all sides of the road until you observe it on the side of your course. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: performance implications on using lots of values in fq
On Wed, 23 Jul 2008 11:28:49 -0700 (PDT) briand [EMAIL PROTECTED] wrote: I have documents in SOLR such that each document contains one to many points (latitude and longitudes). Currently we store the multiple points for a given document in the db and query the db to find all of the document ids around a given point first. Once we have the list of ids, we populate the fq with those ids and the q value and send that off to SOLR to do a search. In the longest query to SOLR we're populating about 450 ids into the fq parameter at this time. I was wondering if anyone knows the performance implications of passing so many ids into the fq and when it would potentially be a problem for SOLR? Currently the query passing in 450 ids is not a problem at all and returns in less than a second. Thanks. Hey Brian, sorry, i can't answer your question. but I wonder if you tried Postgresql + PostGis extensions, and what has your experience been, compared to Lucene/SOLR. thanks :) b _ {Beto|Norberto|Numard} Meijome Computers are like air conditioners; they can't do their job properly if you open windows. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Duplicate content
On Tue, 15 Jul 2008 13:15:41 +0530 Sunil [EMAIL PROTECTED] wrote: 1) I don't want duplicate content. SOLR uses the field you define as the unique field to determine whether a document should be replaced or added. The rest of the fields are in your hands. You could devise a setup whereby the document id is generated by hashing all the other fields in your schema, thereby ensuring that a unique document id means unique content (of course, for a meaning of 'uniqueness' that is different bytes ;) ) 2) I don't want to overwrite old content with new one. Means, if I add duplicate content in solr and the content already exists, the old content should not be overwritten. before inserting a new document, query the index - if you get a result back, then don't insert. I don't know of any other way. b _ {Beto|Norberto|Numard} Meijome The real voyage of discovery consists not in seeking new landscapes, but in having new eyes. Marcel Proust I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Duplicate content
On Tue, 15 Jul 2008 10:48:14 +0200 Jarek Zgoda [EMAIL PROTECTED] wrote: 2) I don't want to overwrite old content with new one. Means, if I add duplicate content in solr and the content already exists, the old content should not be overwritten. before inserting a new document, query the index - if you get a result back, then don't insert. I don't know of any other way. This operation is not atomic, so you get a race condition here. Other than that, it seems fine. ;) of course - but i am not sure you can control atomicity at the SOLR level (yet? ;) ) for /update handler - so it'd have to either be a custom handler, or your app being the only one accessing and controlling write access to it that way. It definitely gets more interesting if you start adding shards ;) _ {Beto|Norberto|Numard} Meijome All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use hammer. IBM maintenance manual, 1975 I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Filter by Type increases search results.
On Tue, 15 Jul 2008 18:07:43 +0530 Preetam Rao [EMAIL PROTECTED] wrote: When I say filter, I meant q=fishfq=type:idea btw, this *seems* to only work for me with standard search handler. dismax and fq: dont' seem to get along nicely... but maybe, it is just late and i'm not testing it properly.. _ {Beto|Norberto|Numard} Meijome Mix a little foolishness with your serious plans; it's lovely to be silly at the right moment. Horace I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Wiki for 1.3
On Mon, 14 Jul 2008 15:52:35 + sundar shankar [EMAIL PROTECTED] wrote: Hi Hoss, I was talking about classes like EdgeNGramFilterFactory, PatterReplaceFilterfactory etc. I didnt find these in the 1.2 Jar. Where do I find wiki for these and Specific classes introduced for 1.3? Sundar, as explained in my email on 12/July , the Wiki contains all classes. The ones that are 1.3 specific will say so @ the top of the page. If you want to know what classes were introduced in 1.3, why not check out both trees and compare? b _ {Beto|Norberto|Numard} Meijome Which is worse: ignorance or apathy? Don't know. Don't care. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.