Re: Replication and querying
Hi, its would be possible to add that to the main solr but the problem is: Lets face it (example): We have kind of 1.5 million documents in the solr master. These Documents are books. These books have fields like title, ids, numbers and authors and more. This solr is global. Now: The slave solr is for a local library which has all these books, but want to sort in another way, and wants to add their own fields. For sorting and output (these fields doesnt need to be indexed or searched through). So we try to replicate the whole database but have a slightly differen schema.xml in the slaves. Secondly we need for another Project to know if its possible to change data oninsert, onupdate. So that the replicationed data gets edited before its really inserted. Is there some kind of hook? As an exmaple lets take the book example from top: On replication the slave gets a updated document set. But before updated on the the slaves db we like to add fields which come from another database or we like to replace strings in some fields and such things. Is that possible? Thanks for any answers. Am 09.02.2010 um 16:53 schrieb Jan Høydahl / Cominvent: Hi, Index replication in Solr makes an exact copy of the original index. Is it not possible to add the 6 extra fields to both instances? An alternative to replication is to feed two independent Solr instances - full control :) Please elaborate on your specific use case if this is not useful answer to you. -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 9. feb. 2010, at 13.21, Julian Hille wrote: Hi, id like to know if its possible to have a solr Server with a schema and lets say 10 fields indexed. I know want to replicate this whole index to another solr server which has a slightly different schema. There are additional 6 fields these fields change the sort order for a product which base is our solr database. Is this kind of replication possible? Is there another way to interact with data in solr? We'd like to calculate some fields when they will be added. I cant seem to find a good documentation about the possible calls in the query itself nor documentaion about queries/calculation which should be done on update. so far, Julian Hille --- NetImpact KG Altonaer Straße 8 20357 Hamburg Tel: 040 / 6738363 2 Mail: jul...@netimpact.de Geschäftsführer: Tarek Müller Mit freundlichen Grüßen, Julian Hille --- NetImpact KG Altonaer Straße 8 20357 Hamburg Tel: 040 / 6738363 2 Mail: jul...@netimpact.de Geschäftsführer: Tarek Müller
Re: after flush: fdx size mismatch on query durring writes
Yes, more details would be great... Is this easily repeated? The exists?=false is particularly spooky. It means, somehow, a new segment was being flushed, containing 1285 docs, but then after closing the doc stores, the stored fields index file (_X.fdx) had been deleted. Can you turn on IndexWriter.setInfoStream, get this error to happen again, and then post the output? Thanks. Mike On Wed, Feb 10, 2010 at 12:59 AM, Lance Norskog goks...@gmail.com wrote: We need more information. How big is the index in disk space? How many documents? How many fields? What's the schema? What OS? What Java version? Do you run this on a local hard disk or is it over an NFS mount? Does this software commit before shutting down? If you run with asserts on do you get errors before this happens. -ea:org.apache.lucene... as a JVM argument On Tue, Feb 9, 2010 at 5:08 PM, Acadaca ph...@acadaca.com wrote: We are using Solr 1.4 in a multi-core setup with replication. Whenever we write to the master we get the following exception: java.lang.RuntimeException: after flush: fdx size mismatch: 1285 docs vs 0 length in bytes of _gqg.fdx file exists?=false at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:97) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:50) Has anyone had any success debugging this one? thx. -- View this message in context: http://old.nabble.com/%22after-flush%3A-fdx-size-mismatch%22-on-query-durring-writes-tp27524755p27524755.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Solr-JMX/Jetty agentId
Hi, I am (still) trying to get JMX to work. I have finally managed to get a Jetty installation running with the right parameters to enable JMX. Now the next problem appeared. I need to get Solr to register ist MBeans with the Jetty MBeanServer. Using jmx serviceUrl=service:jmx:rmi:///jndi/rmi:///jettymbeanserver /, Solr doesn't complain on loading, but the MBeans simply don't show up in JConsole, so I would like to use jmx agentId=agentId /. But where do I get the agentId? And what exactly does this Id represent? Does it change every time I restart Jetty? Thanks in advance! Jan-Simon Winkelmann
spellcheck
Hello,all! I have some problem with spellcheck! I download,build and connect dictionary(~500 000 words)!It work fine! But i have suggestions for any word (even correct word)! Is there possible to get suggestion only for wrong word? -- View this message in context: http://old.nabble.com/spellcheck-tp27527425p27527425.html Sent from the Solr - User mailing list archive at Nabble.com.
How to not limit maximum number of documents?
Hi at all, I'm working with Solr1.4 and came across the point, that Solr limits the number of documents retrieved by a solr response. This number can be changed by the common query parameter 'rows'. In my scenario it is very important that the response contains ALL documents in the index! I played around with the 'rows'-parameter but couldn't find a way to do it. I was not able to find any hint in the mailing list. Thanks a lot in advance. Cheers, Egon -- NEU: Mit GMX DSL über 1000,- ¿ sparen! http://portal.gmx.net/de/go/dsl02
Re: Solr-JMX/Jetty agentId
2010/2/10 Jan Simon Winkelmann winkelm...@newsfactory.de: I am (still) trying to get JMX to work. I have finally managed to get a Jetty installation running with the right parameters to enable JMX. Now the next problem appeared. I need to get Solr to register ist MBeans with the Jetty MBeanServer. Using jmx serviceUrl=service:jmx:rmi:///jndi/rmi:///jettymbeanserver /, Solr doesn't complain on loading, but the MBeans simply don't show up in JConsole, so I would like to use jmx agentId=agentId /. But where do I get the agentId? And what exactly does this Id represent? Does it change every time I restart Jetty? I just have jmx / in solrconfig.xml. On command line I start solr with this: $ java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port= -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -jar start.jar In jconsole I can browse the solr beans just fine. /Tim
RE: analysing wild carded terms
hello *, quick question, what would i have to change in the query parser to allow wildcarded terms to go through text analysis? I believe it is illogical. wildcarded terms will go through terms enumerator.
Getting max/min dates from solr index
How can we get the max and min date from the Solr index ? I would need these dates to draw a graph ( for example timeline graph ) Also can we use date faceting to show how many documents are indexed every month . Consider I need to draw a timeline graph for current year to show how many records are indexed for every month .So i will have months in X axis and no of document in Y axis. What should be the better approach to design a schema to achieve this functionality ? Any suggestions would be appreciated thanks -- Nipen Mark
RE: How to not limit maximum number of documents?
I was just thinking along similar lines As far as I can tell you can use the parameters start rows in combination to control the retrieval of query results So http://host:port/solr/select/?q=query Will retrieve up to results 1..10 http://host:port/solr/select/?q=querystart=11rows=10 Will retrieve up results 11..20 So it is up to your application to control result traversal/pagination Question - does this mean that http://host:port/solr/select/?q=querystart=11rows=10 Runs the query a 2nd time And so on Regards Stefan Maric
Cannot get like exact searching to work
I am using SOLR 1.3 and my server is embedded and accessed using SOLRJ. I would like to setup my searches so that exact matches are the first results returned, followed by near matches, and finally token based matches. For example, if I have a summary field in schema which is created using copyField from a bunch of other fields: My item title, keyword, other, stuff I want this search to match the item above first and foremost: 1) My item title* Then this one: 2) my item* and finally this one should also work: 3) my title I tried creating a field to hold exact match data (summaryExact) which actually works if I paste in the precise text but stops working as soon as I add any wildcard to it. In other words I get no matches for My item title* but I get 1 match for My item title. I also tried this: (summary:my item || summaryExact:my item*^3) but that results in 0 matches as well. I could not quite figure out which tokenizer to use if I don't want any tokens created but just want to trim and lowercase the string so let me know if you have ideas on this. Basically, I want something similar to DB like matching without case sensitivity and probably trimmed as well. I don't really want the field to be tokenized though. I am attaching my schema in case that helps. I have spent a few days reading through the SOLR documentation and forums and trying various things to get this to work but I just end up making the matching worse when I make changes. I appreciate any pointers, links, or ideas. Thanks! -AZ -- Aaron Zeckoski (azeckoski (at) vt.edu) Senior Research Engineer - CARET - University of Cambridge https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile ?xml version=1.0 encoding=UTF-8 ? !-- This is the Solr schema file. This file should be named schema.xml and should be in the conf directory under the solr home (i.e. ./solr/conf/schema.xml by default) or located where the classloader for the Solr webapp can find it. For more information, on how to customize this file, please see http://wiki.apache.org/solr/SchemaXml -- !-- Steeple Portal project schema - Aaron Zeckoski (aa...@caret.cam.ac.uk) -- schema name=steeple version=1.1 !-- this is a unified schema of multiple types since the searches need to be combined, not completely sure if this is required -- types !-- omitNorms -If you have tokenized fields of variable size and you want the field length to affect the relevance score, then you do not want to omit norms. Omitting norms is good for fields where length is of no importance (e.g. gender=Male vs. gender=Female). Omitting norms saves you heap/RAM, one byte per doc per field without norms, I believe. positionIncrementGap - Used for multivalued fields With a position increment gap of 0, a phrase query of doe bob would be a match. But often it is undesirable for that kind of match across different field values. A position increment gap controls the virtual space between the last token of one field instance and the first token of the next instance. With a gap of 100, this prevents phrase queries (even with a modest slop factor) from matching across instances. Comma delimited splitter (maybe for keywords if they are delimited) analyzer class=org.apache.lucene.analysis.PatternTokenizerFactory pattern=, * / -- !-- The identifier should always be extremely simple so there are no filters on it -- fieldType name=identifier class=solr.StrField sortMissingLast=true omitNorms=true compressed=false indexed=true stored=true / !-- special field for exact text matches, no processing -- fieldType name=exact class=solr.TextField compressed=false indexed=true stored=true / !-- name indicates names, titles, and summaries, these are not tokenized but are flattened (html and special chars) to make searches easier -- fieldType name=name class=solr.StrField sortMissingLast=true omitNorms=true compressed=false indexed=true stored=true analyzer type=index tokenizer class=solr.HTMLStripStandardTokenizerFactory/ !-- splits things up filter class=solr.StandardFilterFactory/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory / /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=year class=solr.SortableIntField sortMissingLast=true omitNorms=true compressed=false indexed=true stored=true / fieldtype name=keywords class=solr.TextField positionIncrementGap=10 omitNorms=true analyzer tokenizer class=solr.LowerCaseTokenizerFactory/ /analyzer /fieldtype !-- standard field
Re: How to not limit maximum number of documents?
Hi Stefan, you are right. I noticed this page-based result handling too. For web pages it is handy to maintain a number-of-results-per-page parameter together with an offset to browse result pages. Both can be done be solr's 'start' and 'rows' parameters. But as I don't use Solr in a web context it's important for me to get all results in one go. While waiting for answers I was working on a work-around and came across the LukeRequestHandler (http://wiki.apache.org/solr/LukeRequestHandler). It allows to query the index and obtain meta information about it. I found a parameter in the response called 'numDocs' which seams to contain the current number of index rows. So I was now thinking about first asking for the number of index rows via the LukeRequestHandler and then setting the 'rows' parameter to this value. Apparently, this is quite expensive as one front-end query always leads to two back-end queries. So I'm still searching for a better way to do this! Cheers, Egon Original-Nachricht Datum: Wed, 10 Feb 2010 13:19:05 + Von: stefan.ma...@bt.com An: solr-user@lucene.apache.org Betreff: RE: How to not limit maximum number of documents? I was just thinking along similar lines As far as I can tell you can use the parameters start rows in combination to control the retrieval of query results So http://host:port/solr/select/?q=query Will retrieve up to results 1..10 http://host:port/solr/select/?q=querystart=11rows=10 Will retrieve up results 11..20 So it is up to your application to control result traversal/pagination Question - does this mean that http://host:port/solr/select/?q=querystart=11rows=10 Runs the query a 2nd time And so on Regards Stefan Maric -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
RE: How to not limit maximum number of documents?
Egon If you first run your query with q=queryrows=0 Then your you get back an indication of the total number of docs result name=response numFound=53 start=0/ Now your app can query again to get 1st n rows manage forward|backward traversal of results by subsequent queries Regards Stefan Maric -Original Message- From: ego...@gmx.de [mailto:ego...@gmx.de] Sent: 10 February 2010 14:08 To: solr-user@lucene.apache.org Subject: Re: How to not limit maximum number of documents? Hi Stefan, you are right. I noticed this page-based result handling too. For web pages it is handy to maintain a number-of-results-per-page parameter together with an offset to browse result pages. Both can be done be solr's 'start' and 'rows' parameters. But as I don't use Solr in a web context it's important for me to get all results in one go. While waiting for answers I was working on a work-around and came across the LukeRequestHandler (http://wiki.apache.org/solr/LukeRequestHandler). It allows to query the index and obtain meta information about it. I found a parameter in the response called 'numDocs' which seams to contain the current number of index rows. So I was now thinking about first asking for the number of index rows via the LukeRequestHandler and then setting the 'rows' parameter to this value. Apparently, this is quite expensive as one front-end query always leads to two back-end queries. So I'm still searching for a better way to do this! Cheers, Egon Original-Nachricht Datum: Wed, 10 Feb 2010 13:19:05 + Von: stefan.ma...@bt.com An: solr-user@lucene.apache.org Betreff: RE: How to not limit maximum number of documents? I was just thinking along similar lines As far as I can tell you can use the parameters start rows in combination to control the retrieval of query results So http://host:port/solr/select/?q=query Will retrieve up to results 1..10 http://host:port/solr/select/?q=querystart=11rows=10 Will retrieve up results 11..20 So it is up to your application to control result traversal/pagination Question - does this mean that http://host:port/solr/select/?q=querystart=11rows=10 Runs the query a 2nd time And so on Regards Stefan Maric -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
Re: How to not limit maximum number of documents?
just set the rows to a very large number, larger than the number of documents available useful to set the fl parameter with the fields required to avoid memory problems, if each document contains a lot of information - Original Message - From: stefan maric stefan.ma...@bt.com To: solr-user@lucene.apache.org Sent: Wednesday, 10 February, 2010 2:14:05 PM Subject: RE: How to not limit maximum number of documents? Egon If you first run your query with q=queryrows=0 Then your you get back an indication of the total number of docs result name=response numFound=53 start=0/ Now your app can query again to get 1st n rows manage forward|backward traversal of results by subsequent queries Regards Stefan Maric -Original Message- From: ego...@gmx.de [mailto:ego...@gmx.de] Sent: 10 February 2010 14:08 To: solr-user@lucene.apache.org Subject: Re: How to not limit maximum number of documents? Hi Stefan, you are right. I noticed this page-based result handling too. For web pages it is handy to maintain a number-of-results-per-page parameter together with an offset to browse result pages. Both can be done be solr's 'start' and 'rows' parameters. But as I don't use Solr in a web context it's important for me to get all results in one go. While waiting for answers I was working on a work-around and came across the LukeRequestHandler (http://wiki.apache.org/solr/LukeRequestHandler). It allows to query the index and obtain meta information about it. I found a parameter in the response called 'numDocs' which seams to contain the current number of index rows. So I was now thinking about first asking for the number of index rows via the LukeRequestHandler and then setting the 'rows' parameter to this value. Apparently, this is quite expensive as one front-end query always leads to two back-end queries. So I'm still searching for a better way to do this! Cheers, Egon Original-Nachricht Datum: Wed, 10 Feb 2010 13:19:05 + Von: stefan.ma...@bt.com An: solr-user@lucene.apache.org Betreff: RE: How to not limit maximum number of documents? I was just thinking along similar lines As far as I can tell you can use the parameters start rows in combination to control the retrieval of query results So http://host:port/solr/select/?q=query Will retrieve up to results 1..10 http://host:port/solr/select/?q=querystart=11rows=10 Will retrieve up results 11..20 So it is up to your application to control result traversal/pagination Question - does this mean that http://host:port/solr/select/?q=querystart=11rows=10 Runs the query a 2nd time And so on Regards Stefan Maric -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
AW: Solr-JMX/Jetty agentId
2010/2/10 Jan Simon Winkelmann winkelm...@newsfactory.de: I am (still) trying to get JMX to work. I have finally managed to get a Jetty installation running with the right parameters to enable JMX. Now the next problem appeared. I need to get Solr to register ist MBeans with the Jetty MBeanServer. Using jmx serviceUrl=service:jmx:rmi:///jndi/rmi:///jettymbeanserver /, Solr doesn't complain on loading, but the MBeans simply don't show up in JConsole, so I would like to use jmx agentId=agentId /. But where do I get the agentId? And what exactly does this Id represent? Does it change every time I restart Jetty? I just have jmx / in solrconfig.xml. On command line I start solr with this: $ java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port= -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -jar start.jar In jconsole I can browse the solr beans just fine. Thanks for that, it appears my thinking was just too complicated here. Works fine now :) Best Jan
Re: How to not limit maximum number of documents?
Setting the 'rows' parameter to a number larger than the number of documents available requires that you know how much are available. That's what I intended to retrieve via the LukeRequestHandler. Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :) But still, it feels like a hack. Originally I was searching more for something like: q=queryrows=-1 Which leaves the API to do the job (efficiently!). :) The question is: Does Solr support something? Or should we write a feature request? Cheers, Egon Original-Message Datum: Wed, 10 Feb 2010 14:38:51 + (GMT) Von: Ron Chan rc...@i-tao.com An: solr-user@lucene.apache.org Betreff: Re: How to not limit maximum number of documents? just set the rows to a very large number, larger than the number of documents available useful to set the fl parameter with the fields required to avoid memory problems, if each document contains a lot of information - Original Message - From: stefan maric stefan.ma...@bt.com To: solr-user@lucene.apache.org Sent: Wednesday, 10 February, 2010 2:14:05 PM Subject: RE: How to not limit maximum number of documents? Egon If you first run your query with q=queryrows=0 Then your you get back an indication of the total number of docs result name=response numFound=53 start=0/ Now your app can query again to get 1st n rows manage forward|backward traversal of results by subsequent queries Regards Stefan Maric -- NEU: Mit GMX DSL über 1000,- ¿ sparen! http://portal.gmx.net/de/go/dsl02
RE: How to not limit maximum number of documents?
Yes, I tried the q=queryrows=-1 - the other day and gave up But as you say it wouldn't help because you might get a) timeouts because you have to wait a 'long' time for the large set of results to be returned b) exceptions being thrown because you're retrieving too much info to be thrown around the system Regards Stefan Maric -Original Message- From: ego...@gmx.de [mailto:ego...@gmx.de] Sent: 10 February 2010 15:06 To: solr-user@lucene.apache.org Subject: Re: How to not limit maximum number of documents? Setting the 'rows' parameter to a number larger than the number of documents available requires that you know how much are available. That's what I intended to retrieve via the LukeRequestHandler. Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :) But still, it feels like a hack. Originally I was searching more for something like: q=queryrows=-1 Which leaves the API to do the job (efficiently!). :) The question is: Does Solr support something? Or should we write a feature request? Cheers, Egon Original-Message Datum: Wed, 10 Feb 2010 14:38:51 + (GMT) Von: Ron Chan rc...@i-tao.com An: solr-user@lucene.apache.org Betreff: Re: How to not limit maximum number of documents? just set the rows to a very large number, larger than the number of documents available useful to set the fl parameter with the fields required to avoid memory problems, if each document contains a lot of information - Original Message - From: stefan maric stefan.ma...@bt.com To: solr-user@lucene.apache.org Sent: Wednesday, 10 February, 2010 2:14:05 PM Subject: RE: How to not limit maximum number of documents? Egon If you first run your query with q=queryrows=0 Then your you get back an indication of the total number of docs result name=response numFound=53 start=0/ Now your app can query again to get 1st n rows manage forward|backward traversal of results by subsequent queries Regards Stefan Maric -- NEU: Mit GMX DSL über 1000,- ¿ sparen! http://portal.gmx.net/de/go/dsl02
Re: How to not limit maximum number of documents?
Solr will not do this efficiently. Getting all rows will be very slow. Adding a parameter will not make it fast. Why do you want to do this? wunder On Feb 10, 2010, at 7:06 AM, ego...@gmx.de wrote: Setting the 'rows' parameter to a number larger than the number of documents available requires that you know how much are available. That's what I intended to retrieve via the LukeRequestHandler. Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :) But still, it feels like a hack. Originally I was searching more for something like: q=queryrows=-1 Which leaves the API to do the job (efficiently!). :) The question is: Does Solr support something? Or should we write a feature request? Cheers, Egon Original-Message Datum: Wed, 10 Feb 2010 14:38:51 + (GMT) Von: Ron Chan rc...@i-tao.com An: solr-user@lucene.apache.org Betreff: Re: How to not limit maximum number of documents? just set the rows to a very large number, larger than the number of documents available useful to set the fl parameter with the fields required to avoid memory problems, if each document contains a lot of information - Original Message - From: stefan maric stefan.ma...@bt.com To: solr-user@lucene.apache.org Sent: Wednesday, 10 February, 2010 2:14:05 PM Subject: RE: How to not limit maximum number of documents? Egon If you first run your query with q=queryrows=0 Then your you get back an indication of the total number of docs result name=response numFound=53 start=0/ Now your app can query again to get 1st n rows manage forward|backward traversal of results by subsequent queries Regards Stefan Maric -- NEU: Mit GMX DSL über 1000,- ¿ sparen! http://portal.gmx.net/de/go/dsl02
Re: How to not limit maximum number of documents?
Okay. So we have to leave this question open for now. There might be other (more advanced) users that can answer this question. It's for sure, the solution we found is not quite good. In the meantime, I will look for a way to submit a feature request. :) Original-Message Datum: Wed, 10 Feb 2010 15:13:49 + Von: stefan.ma...@bt.com An: solr-user@lucene.apache.org Betreff: RE: How to not limit maximum number of documents? Yes, I tried the q=queryrows=-1 - the other day and gave up But as you say it wouldn't help because you might get a) timeouts because you have to wait a 'long' time for the large set of results to be returned b) exceptions being thrown because you're retrieving too much info to be thrown around the system -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
Re: How to not limit maximum number of documents?
I meant, available in total, not what just what satisfies the particular query you should have at least an estimate of the amount of total documents, even if it grows daily and if you are talking about millions of rows, and you are try to retrieve them all, IMHO, not getting all of them will be the least of your problems - Original Message - From: egon o ego...@gmx.de To: solr-user@lucene.apache.org Sent: Wednesday, 10 February, 2010 3:06:25 PM Subject: Re: How to not limit maximum number of documents? Setting the 'rows' parameter to a number larger than the number of documents available requires that you know how much are available. That's what I intended to retrieve via the LukeRequestHandler. Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :) But still, it feels like a hack. Originally I was searching more for something like: q=queryrows=-1 Which leaves the API to do the job (efficiently!). :) The question is: Does Solr support something? Or should we write a feature request? Cheers, Egon Original-Message Datum: Wed, 10 Feb 2010 14:38:51 + (GMT) Von: Ron Chan rc...@i-tao.com An: solr-user@lucene.apache.org Betreff: Re: How to not limit maximum number of documents? just set the rows to a very large number, larger than the number of documents available useful to set the fl parameter with the fields required to avoid memory problems, if each document contains a lot of information - Original Message - From: stefan maric stefan.ma...@bt.com To: solr-user@lucene.apache.org Sent: Wednesday, 10 February, 2010 2:14:05 PM Subject: RE: How to not limit maximum number of documents? Egon If you first run your query with q=queryrows=0 Then your you get back an indication of the total number of docs result name=response numFound=53 start=0/ Now your app can query again to get 1st n rows manage forward|backward traversal of results by subsequent queries Regards Stefan Maric -- NEU: Mit GMX DSL über 1000,- ¿ sparen! http://portal.gmx.net/de/go/dsl02
delete via DIH
Hi, There is a solution to update via DIH, but is there also a way to define a query that fetches id's for documents that should be removed? regards, Lukas Kahwe Smith m...@pooteeweet.org
question/suggestion for Solr-236 patch
I have been able to apply and use the solr-236 patch (field collapsing) successfully. Very, very cool and powerful. My one comment/concern is that the collapseCount and aggregate function values in the collapse_counts list only represent the collapsed documents (ie the ones that are not shown in results). Are there any plans to include the non-collapsed (?) document in the collapseCount and aggregate function values (ie so that it includes ALL documents, not just the collapsed ones)? Possibly via some parameter like collapse.includeAll? I think this would be a great addition to the collapse code (and solr functionality) via what I would think is a small change. -- View this message in context: http://old.nabble.com/question-suggestion-for-Solr-236-patch-tp27533695p27533695.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: analysing wild carded terms
sorry, what i meant to say is apply text analysis to the part of the query that is wildcarded, for example if a term with latin1 diacritics is wildcarded ide still like to run it through ISOLatin1Filter On Wed, Feb 10, 2010 at 4:59 AM, Fuad Efendi f...@efendi.ca wrote: hello *, quick question, what would i have to change in the query parser to allow wildcarded terms to go through text analysis? I believe it is illogical. wildcarded terms will go through terms enumerator.
Re: question/suggestion for Solr-236 patch
Joe Calderon-2 wrote: you can do that very easily yourself in a post processing step after you receive the solr response true (and am already doing so). was thinking that having this done as part of the field collapsing code, it might be faster than doing so via post processing (ie no need to navigate the xml results for two different values for each collapsed set, adding the numbers to get the total, etc) it was just a suggestion. field collapsing is a great feature. -- View this message in context: http://old.nabble.com/question-suggestion-for-Solr-236-patch-tp27533695p27535153.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: analysing wild carded terms
Hi Joe, See this recent thread from a user with a very similar issue: http://old.nabble.com/No-wildcards-with-solr.ASCIIFoldingFilterFactory--td24162104.html In the above thread, Mark Miller mentions that Lucene's AnalyzingQueryParser should do the trick, but would need to be integrated into Solr. Down at the bottom of the thread, the original poster has a patch file implementing Solr integration that he says worked for him. Steve From: Joe Calderon [calderon@gmail.com] Sent: Wednesday, February 10, 2010 12:14 PM To: solr-user@lucene.apache.org Subject: Re: analysing wild carded terms sorry, what i meant to say is apply text analysis to the part of the query that is wildcarded, for example if a term with latin1 diacritics is wildcarded ide still like to run it through ISOLatin1Filter On Wed, Feb 10, 2010 at 4:59 AM, Fuad Efendi f...@efendi.ca wrote: hello *, quick question, what would i have to change in the query parser to allow wildcarded terms to go through text analysis? I believe it is illogical. wildcarded terms will go through terms enumerator.
Re: Distributed search and haproxy and connection build up
Thanks, I bypassed haproxy as a test and it did reduce the number of connections - but it did not seem as those these connections were hurting anything. Ian. On Tue, Feb 9, 2010 at 11:01 PM, Lance Norskog goks...@gmail.com wrote: This goes through the Apache Commons HTTP client library: http://hc.apache.org/httpclient-3.x/ We used 'balance' at another project and did not have any problems. On Tue, Feb 9, 2010 at 5:54 AM, Ian Connor ian.con...@gmail.com wrote: I have been using distributed search with haproxy but noticed that I am suffering a little from tcp connections building up waiting for the OS level closing/time out: netstat -a ... tcp6 1 0 10.0.16.170%34654:53789 10.0.16.181%363574:8893 CLOSE_WAIT tcp6 1 0 10.0.16.170%34654:43932 10.0.16.181%363574:8890 CLOSE_WAIT tcp6 1 0 10.0.16.170%34654:43190 10.0.16.181%363574:8895 CLOSE_WAIT tcp6 0 0 10.0.16.170%346547:8984 10.0.16.181%36357:53770 TIME_WAIT tcp6 1 0 10.0.16.170%34654:41782 10.0.16.181%363574: CLOSE_WAIT tcp6 1 0 10.0.16.170%34654:52169 10.0.16.181%363574:8890 CLOSE_WAIT tcp6 1 0 10.0.16.170%34654:55947 10.0.16.181%363574:8887 CLOSE_WAIT tcp6 0 0 10.0.16.170%346547:8984 10.0.16.181%36357:54040 TIME_WAIT tcp6 1 0 10.0.16.170%34654:40030 10.0.16.160%363574:8984 CLOSE_WAIT ... Digging a little into the haproxy documentation, it seems that they do not support persistent connections. Does solr normally persist the connections between shards (would this problem happen even without haproxy)? Ian. -- Lance Norskog goks...@gmail.com -- Regards, Ian Connor
dismax and multi-language corpus
Hello list, I have a corpus with 3 languages, so i setup a text content field (with no stemming) and 3 text-[en|it|de] fields with specific snowball stemmers. i copyField the text to my language-away fields. So, I setup this dismax searchHandler: requestHandler name=content class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str str name=pftitle^1.2 content-en^0.8 content-it^0.8 content-de^0.8/str str name=bftitle^1.2 content-en^0.8 content-it^0.8 content-de^0.8/str str name=qftitle^1.2 content-en^0.8 content-it^0.8 content-de^0.8/str float name=tie0.1/float /lst /requestHandler but i get this error: HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en' type Status report message org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en' description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en'). Any idea? TIA Claudio -- Claudio Martella Digital Technologies Unit Research Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
RE: Indexing / querying multiple data types
Lance after a bit more reading - cleaning up my configuration (case sensitivity corrected but didn't appear to be affecting the indexing i don't use the atomID field for querying anyhow) I've added a docType field when I index my data and now use the fq parameter to filter on that new field -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: 10 February 2010 03:28 To: solr-user@lucene.apache.org Subject: Re: Indexing / querying multiple data types A couple of minor problems: The qt parameter (Que Tee) selects the parser for the q (Q for query) parameter. I think you mean 'qf': http://wiki.apache.org/solr/DisMaxRequestHandler#qf_.28Query_Fields.29 Another problems with atomID, atomId, atomid: Solr field names are case-sensitive. I don't know how this plays out. Now, to the main part: the entity name=name1 part does not create a column named name1. The two queries only populate the same namespace of four fields: id, atomID, name, description. If you want data from each entity to have a constant field distinguishing it, you have to create a new field with a constant value. You do this with the TemplateTransformer. http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer Add this as an entity attribute to both entities: transformer=TemplateTransformer and add this as a column to each entity: field column=name template=name1 and then name2. You may have to do something else for these to appear in the document. On Tue, Feb 9, 2010 at 12:41 AM, stefan.ma...@bt.com wrote: Sven In my data-config.xml I have the following document entity name=name1 query=select id, atomID, name, description from v_1 / entity name=name2 query=select id, atomID, name, description from V_2 / /document In my schema.xml I have field name=id type=string indexed=true stored=true required=true / field name=name type=text indexed=true stored=true/ field name=atomId type=string indexed=false stored=true required=true / field name=description type=text indexed=true stored=true / And in my solrconfig.xml I have requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config.xml/str /lst /requestHandler requestHandler name=name1 class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfname^1.5 description^1.0/str /lst /requestHandler requestHandler name=contacts class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfname^1.5 description^1.0/str /lst /requestHandler And the requestHandler name=dismax class=solr.SearchHandler Has been untouched So when I run http://localhost:7001/solr/select/?q=foodqt=name1 I was expecting to get results form the data that had been indexed by entity name=name1 Regards Stefan Maric -- Lance Norskog goks...@gmail.com No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.435 / Virus Database: 271.1.1/2677 - Release Date: 02/09/10 07:35:00
DataImportHandler - too many connections MySQL error after upgrade to Solr 1.4 release
Hi all, I had DataImportHandler working perfectly on Solr 1.4 nightly build from June 2009. I upgraded the Solr to 1.4 release and started getting errors: Caused by: com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: Server connection failure during transaction. Due to underlying exception: 'com.mysql.jdbc.except ions.MySQLNonTransientConnectionException: Too many connections'. This is the same machine, the same setup (except new Solr) that never had problems. The error doesn't pop-up at the beginning, DIH runs for few hours and then breaks (after few millions of rows are processed). Solr is the only process using MySQL, max_connections on MySQL is set to 100, so it seems like there might exist some connection leak in DIH. Few more informations on the setup: MySQL version 5.0.67 driver: mysql-connector-java-5.0.8-bin.jar Java: 1.6.0_14 connection URL parameters : autoReconnect=true, batchSize=-1 OS : CentOS 5.2 Did anyone else had similar problems with 1.4 release? Regards
implementing profanity detector
FYI this does not work. It appears that the update seems to run on a different thread to the analysis, perhaps because the update is done when the commit happens? I'm sending the document XML with commitWithin=6. I would appreciate any other ideas. I'm drawing a blank on how to implement this efficiently with Lucene/Solr. mike On Thu, Jan 28, 2010 at 4:31 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: How about this crazy idea - a custom TokenFilter that stores the safe flag in ThreadLocal? - Original Message From: Mike Perham mper...@onespot.com To: solr-user@lucene.apache.org Sent: Thu, January 28, 2010 4:46:54 PM Subject: implementing profanity detector We'd like to implement a profanity detector for documents during indexing. That is, given a file of profane words, we'd like to be able to mark a document as safe or not safe if it contains any of those words so that we can have something similar to google's safe search. I'm trying to figure out how best to implement this with Solr 1.4: - An UpdateRequestProcessor would allow me to dynamically populate a safe boolean field but requires me to pull out the content, tokenize it and run each token through my set of profanities, essentially running the analysis pipeline again. That's a lot of overheard AFAIK. - A TokenFilter would allow me to tap into the existing analysis pipeline so I get the tokens for free but I can't access the document. Any suggestions on how to best implement this? Thanks in advance, mike
Re: dismax and multi-language corpus
Claudio - fields with '-' in them can be problematic. Side comment: do you really want to search across all languages at once? If not, maybe 3 different dismax configs would make your searches better. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Claudio Martella claudio.marte...@tis.bz.it To: solr-user@lucene.apache.org Sent: Wed, February 10, 2010 3:15:40 PM Subject: dismax and multi-language corpus Hello list, I have a corpus with 3 languages, so i setup a text content field (with no stemming) and 3 text-[en|it|de] fields with specific snowball stemmers. i copyField the text to my language-away fields. So, I setup this dismax searchHandler: dismax title^1.2 content-en^0.8 content-it^0.8 content-de^0.8 title^1.2 content-en^0.8 content-it^0.8 content-de^0.8 title^1.2 content-en^0.8 content-it^0.8 content-de^0.8 0.1 but i get this error: HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en' type Status report message org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en' description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en'). Any idea? TIA Claudio -- Claudio Martella Digital Technologies Unit Research Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
Need a bit of help, Solr 1.4: type text.
I'm using the standard text type for a field, and part of the data being indexed is 13th, as in Friday the 13th. I can't seem to get it to match when I'm querying for Friday the 13th either quoted or not. One thing that does match is 13 th if I send the search query with a space between... Any suggestions? I know this is short on detail, but it's been a long day... time to get outta here. Thanks for any and all help. -Dan This message contains information which may be confidential and/or privileged. Unless you are the intended recipient (or authorized to receive for the intended recipient), you may not read, use, copy or disclose to anyone the message or any information contained in the message. If you have received the message in error, please advise the sender by reply e-mail and delete the message and any attachment(s) thereto without retaining any copies.
Re: Need a bit of help, Solr 1.4: type text.
Check out the configuration of WordDelimiterFilterFactory in your schema.xml. Depending on your settings, it's probably tokenizaing 13th into 13 and th. You can also have them concatenated back into a single token, but I can't remember the exact parameter. I think it could be catenateAll. On Wed, Feb 10, 2010 at 4:32 PM, Dickey, Dan dan.dic...@savvis.net wrote: I'm using the standard text type for a field, and part of the data being indexed is 13th, as in Friday the 13th. I can't seem to get it to match when I'm querying for Friday the 13th either quoted or not. One thing that does match is 13 th if I send the search query with a space between... Any suggestions? I know this is short on detail, but it's been a long day... time to get outta here. Thanks for any and all help. -Dan This message contains information which may be confidential and/or privileged. Unless you are the intended recipient (or authorized to receive for the intended recipient), you may not read, use, copy or disclose to anyone the message or any information contained in the message. If you have received the message in error, please advise the sender by reply e-mail and delete the message and any attachment(s) thereto without retaining any copies. -- “When nothing seems to help, I go look at a stonecutter hammering away at his rock perhaps a hundred times without as much as a crack showing in it. Yet at the hundred and first blow it will split in two, and I know it was not that blow that did it, but all that had gone before.” — Jacob Riis
Re: How to configure multiple data import types
: Subject: How to configure multiple data import types : In-Reply-To: 4b6c0de5.8010...@zib.de http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Re: Indexing / querying multiple data types
: Subject: Indexing / querying multiple data types : In-Reply-To: 8cf3f00d0572f8479efcd0783be11eb1927...@xmb-rcd-104.cisco.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Re: Faceting
: NOTE: Please start a new email thread for a new topic (See : http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking) FWIW: I'm the most nit-picky person i know about Thread-Hijacking, but i don't see any MIME headers to indicate that Jose did that). : If i follow this path can i then facet on email and/or link ? For : example combining facet field with facet value params? Any indexed field can be faceted on ... it's hard to be sure what exactly your goal is, but if you ultimately want to be able to have a list of search results, and then display facet info like Number of results containing an email address and Number of results containing a URL then yes: as long as you have a way of extracting that metadata and including it in an indexed field, you can facet on it ... you could use Field Faceting on something like a properities: field (where all the indexed values are contains_email and containes_url, etc...) or you could use facet queries to check arbitrary criteria (ie: facet.query=has_email:true facet.query=urls:[* TO *], etc... -Hoss
Re: How to not limit maximum number of documents?
: Okay. So we have to leave this question open for now. There might be : other (more advanced) users that can answer this question. It's for : sure, the solution we found is not quite good. The question really isn't open, it's a FAQ... http://wiki.apache.org/solr/FAQ#How_can_I_get_ALL_the_matching_documents_back.3F_..._How_can_I_return_an_unlimited_number_of_rows.3F -Hoss
Query elevation based on field
Is it possible to do query elevation based on field? Basically, I would like to search the same term on three different fields: q=field1:term OR field2:term OR field3:term and I would like to sort the results by fourth field sort=field4+asc However, I would like to elevate all of field1 matches to be at the beginning, with those matches sorted ascending and the rest of the field2 and field3 matches sorted ascending. Is this possible? Thanks.
RE: Index Courruption after replication by new Solr 1.4 Replication
Hi All, I found out there is file corruption issue by using both EmbeddedSolrServer Solr 1.4 Java based replication together in slave server. In my slave server, I have 2 webapps in a tomcat instance. 1) multicore webapp with slave config 2) my custom webapp using EmbeddedSolrServer while queries Solr Index Data. Both webapps were set up according to the instruction from Solr wiki. However, I found out there are multi-threading issue which cause index file corruption. The following is the root case: EmbeddedSolrServer requires to have a CoreContainer object as parameter. However, during the creation of CoreContainer object, the process load the slave solr configuration which silently creates an Extra ReplcationHandler (SnapPuller) in background. However, there is a ReplcationHandler (SnapPuller) already created by multicore webapp because of the slave configuration. As a result, there are 2 threads doing file replication as same time. It causes index corruption with different IOExceptions. After I replaced the usage of EmbeddedSolrServer with CommonsHttpSolrServer (Stop creating CoreContainer object in slave server), Solr 1.4 Java based replication work perfectly without having any file corruption issue. In other to use EmbeddedSolrServer in slave server, I think we need to have a way to create CoreContainer object with slave configuration without creating extra thread to replicate files. Should I file a bug? Thanks, Osborn -Original Message- From: Osborn Chan [mailto:oc...@shutterfly.com] Sent: Friday, January 15, 2010 12:35 PM To: solr-user@lucene.apache.org Subject: RE: Index Courruption after replication by new Solr 1.4 Replication Hi Otis, Thanks. There is no NFS anymore, and all index files are local. We migrated to new Solr 1.4 new Replication in order to avoid all the NSF Stale Exception. Thanks, Osborn -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, January 15, 2010 12:31 PM To: solr-user@lucene.apache.org Subject: Re: Index Courruption after replication by new Solr 1.4 Replication This is not a direct answer to your question, but can you avoid NFS? My first guess would be that NFS somehow causes this problem. If you check the ML archives for: NFS lock , you will see what I mean. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Osborn Chan oc...@shutterfly.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Fri, January 15, 2010 3:23:21 PM Subject: Index Courruption after replication by new Solr 1.4 Replication Hi all, I have migrated new Solr 1.4 Replication feature with multicore support from Solr 1.2 with NFS mounting recently. The following exceptions are in catalina.log from time to time, and there are some EOF exceptions which I believe the slave index files are corrupted after replication from index server. I have following configuration with Solr 1.4, please correct me if it is configured incorrectly. (The index files are not corrupted in master servers, but it is corrupted in slave servers. Usually only one of the slave servers are corrupted with EOF exception, but not all.) 1 Master Server: (Index Server) - 8 indexes with multicore configuration. - All indexes are configured to replicateAfter optimize only. - The size of index data are vary. The smallest index only have 2.5 MB. The biggest index have ~ 100 MB. - There would be infrequent optimize calls to indexes. (a optimize call every ~30 mins to 6 hours depending on indexes). - There are many commit calls to all indexes. (But there is no concurrent commit and optimize for all indexes.) - Did not configure commitReserveDuration in ReplicationHandler - Using default values. 4 Slave Servers (Search Server) - 8 indexes with multicore configuration. - All indexes are configured to poll for every ~15 minutes. - All update handler configuration are removed in solrconfig-slave.xml (solrconfig.xml) in order to prevent add/commit/optimize calls. - (Search Slave Servers are only responsible for search operation.) - removed. - removed. - class=solr.BinaryUpdateRequestHandler / removed. A) FileNotFoundException INFO: Total time taken for download : 1 secs Jan 15, 2010 10:34:16 AM org.apache.solr.handler.ReplicationHandler doFetch SEVERE: SnapPull failed org.apache.solr.common.SolrException: Index fetch failed : at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
Re: source tree for lucene
: i want to recompile lucene with : http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure : which source tree to use, i tried using the implied trunk revision : from the admin/system page but solr fails to build with the generated : jars, even if i exclude the patches from 2230... Hmmm... I think the problem you are running into is that the Lucene Implementation Version information that Solr displays only tells you the svn revision number -- but not the branch. If you note the Solr 1.4 CHANGES.txt it says... Versions of Major Components Apache Lucene 2.9.1 (r832363 on 2.9 branch) Apache Tika 0.4 Carrot2 3.1.0 ...so the key is to check out the 2.9 branch. (none of which garuntees that any patches you try will actually compile) -Hoss
The Riddle of the Underscore and the Dollar Sign . . .
I am perplexed by the behavior I am seeing of the Solr Analyzer and Filters with regard to Underscores. I am trying to get rid of underscores('_') when shingling, but seem unable to do so with a Stopwords Filter. And yet underscores are being removed when I am not even trying to by the WordDelimiter Filter. Conversely, I would like to retain dollar signs symbols ('$') when they are adjacent to numbers, but seem unable to without having to accept all forms of other syntax. 1) How can I get rid of underscores('_') without using the wordDelimiter Filter (which gets rid of other syntax I need)? 2) How can I stop the wordDelimiter Filter from removing dollar signs symbols ('$')? Most grateful for any guidance, Christopher
RE: HTTP caching and distributed search
: I tried your suggestion, Hoss, but committing to the new coordinator : core doesn't change the indexVersion and therefore the ETag value isn't : changed. Hmmm... so the empty commit doesn't change the indexVersion? ... i didn't realize that. Well, I suppose you could replace your empty commit with an update to a bogus document ... it's hackish, but it should work... http://host/solr/coordinator/update?stream.body=adddocfield name=bogusbogus/field/doc/addcommit=true -Hoss
Re: Which schema changes are incompatible?
: http://wiki.apache.org/solr/FAQ#How_can_I_rebuild_my_index_from_scratch_if_I_change_my_schema.3F : : but it is not clear about the times when this is needed. So I wonder, do I : need to do it after adding a field, removing a field, changing field type, : changing indexed/stored/multiValue properties? What happens if I don't do : it, will Solr die? there is no simple answer to that question ... if you add a field you don't need to rebuild (unless you want to ensure every doc gets a value indexed or if you are depending on solr to apply a default value). If you remove a field you don't need to rebuild (but none of hte space taken up by that field in the index will be reclaimed, and if it's stored it will still be included in the response. Changing a field type is one of the few sitautions where we can categoricly say you *HAVE* to reindex everything : Also, the FAQ entry notes that one can delete all documents, change the : schema.xml file, and then reload the core. Would it be possible to instead : change schema.xml, reload the core, and then rebuild the index -- in effect : slowly deleting the old documents, but never ending up with a completely : empty index? I realize that some weird search results could happen during : such a rebuild, but that may be preferable to having no search results at The end result won't be 100% equivilent from an index standpoint -- whne you delete all solr is actaully able to completely start over with an empty index, absent all low level metadata about fields that use to exist -- if you incrementally delete, some of that low level metadata will still be in the index -- it probably won't be something that will ever affect you, but it is a distinction. -Hoss
Re: dismax and multi-language corpus
Claudio - fields with '-' in them can be problematic. Why's that? On Wed, Feb 10, 2010 at 2:38 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Claudio - fields with '-' in them can be problematic. Side comment: do you really want to search across all languages at once? If not, maybe 3 different dismax configs would make your searches better. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Claudio Martella claudio.marte...@tis.bz.it To: solr-user@lucene.apache.org Sent: Wed, February 10, 2010 3:15:40 PM Subject: dismax and multi-language corpus Hello list, I have a corpus with 3 languages, so i setup a text content field (with no stemming) and 3 text-[en|it|de] fields with specific snowball stemmers. i copyField the text to my language-away fields. So, I setup this dismax searchHandler: dismax title^1.2 content-en^0.8 content-it^0.8 content-de^0.8 title^1.2 content-en^0.8 content-it^0.8 content-de^0.8 title^1.2 content-en^0.8 content-it^0.8 content-de^0.8 0.1 but i get this error: HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en' type Status report message org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en' description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en'). Any idea? TIA Claudio -- Claudio Martella Digital Technologies Unit Research Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
Question on Solr Scalability
Suppose I am indexing very large data (5 billion rows in a database) Now I want to use the Solr Core feature to split the index into manageable chunks. However I have two questions 1. Can Cores reside on difference physical servers? 2. when a query comes, will the query be answered by index in 1 core or the query will be sent to all the cores? My desire is to have a system which from outside appears as a single large index... but inside it is multiple small indexes running on different hardware machines. -- View this message in context: http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on Solr Scalability
To scale solr, take a look to this article http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Juan Pedro Danculovic CTO - www.linebee.com On Thu, Feb 11, 2010 at 4:12 AM, abhishes abhis...@gmail.com wrote: Suppose I am indexing very large data (5 billion rows in a database) Now I want to use the Solr Core feature to split the index into manageable chunks. However I have two questions 1. Can Cores reside on difference physical servers? 2. when a query comes, will the query be answered by index in 1 core or the query will be sent to all the cores? My desire is to have a system which from outside appears as a single large index... but inside it is multiple small indexes running on different hardware machines. -- View this message in context: http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on Solr Scalability
Hi, I think your needs would meet better with Distributed Search http://wiki.apache.org/solr/DistributedSearch Which allows sharding to live on different servers and will search across all of those shard when a query comes in. There are a few patch which will hopefully be available in the Solr 1.5 release that will improve this including distributed tf idf across shards Regards, David On 11 Feb 2010, at 07:12, abhishes abhis...@gmail.com wrote: Suppose I am indexing very large data (5 billion rows in a database) Now I want to use the Solr Core feature to split the index into manageable chunks. However I have two questions 1. Can Cores reside on difference physical servers? 2. when a query comes, will the query be answered by index in 1 core or the query will be sent to all the cores? My desire is to have a system which from outside appears as a single large index... but inside it is multiple small indexes running on different hardware machines. -- View this message in context: http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html Sent from the Solr - User mailing list archive at Nabble.com.