Re: C++ being filtered (please help)
I have a field which may take the form C++,PHP MySql,C# now i want to tokenize it based on comma or white space and other word delimiting characters only. Not on the plus sign. so that result after tokenization should be C++ PHP MySql C# But the result I am getting is c php mysql c Please give me some pointers as to which analyzer and tokenizer to use You can use this analyzer: analyzer charFilter class=solr.MappingCharFilterFactory mapping=mappings.txt / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer With mappings.txt file: , = you can add more characters (to mappings.txt file) that you want to break words at.
RE: Solr response extremely slow
Hey Can any one say which is the latest and stable version, We are using 1.2 Solr Specification Version: 1.2.0 Solr Implementation Version: 1.2.0 - Yonik - 2007-06-02 17:35:12 Lucene Specification Version: 2007-05-20_00-04-53 Lucene Implementation Version: build 2007-05-20 Current Time: Wed Feb 03 03:45:56 EST 2010 Regards Prakash -Original Message- From: Vijayant Kumar [mailto:vijay...@websitetoolbox.com] Sent: Wednesday, February 03, 2010 1:12 PM To: solr-user@lucene.apache.org Subject: Re: Solr response extremely slow Hi Rajat, You can find the version of solr by http://localhost:8983/solr/admin/registry.jsp -- Thank you, Vijayant Kumar Software Engineer Website Toolbox Inc. http://www.websitetoolbox.com 1-800-921-7803 x211 Java version is - java version 1.5.0_18 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_18-b02) Java HotSpot(TM) Server VM (build 1.5.0_18-b02, mixed mode) Not sure how to find solr version. Can you tell me how to look it up? Also, i don't have a dedicated server to run this on. -- View this message in context: http://old.nabble.com/Solr-response-extremely-slow-tp27432229p27432419 .html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr response extremely slow
On Wed, Feb 3, 2010 at 2:18 PM, Doddamani, Prakash prakash.doddam...@corp.aol.com wrote: Hey Can any one say which is the latest and stable version, We are using 1.2 Solr Specification Version: 1.2.0 Solr Implementation Version: 1.2.0 - Yonik - 2007-06-02 17:35:12 Lucene Specification Version: 2007-05-20_00-04-53 Lucene Implementation Version: build 2007-05-20 Current Time: Wed Feb 03 03:45:56 EST 2010 Solr 1.4 is the latest stable release. In future, please don't reply to an unrelated email thread. Start a new thread instead. -- Regards, Shalin Shekhar Mangar.
Re: Deploying Solr 1.3 in JBoss 5
Apparently, that worked! I've never realized that the order of the elements in XML is significant, nice to see. As always, problems leads to other problems, so now I'm facing with a Xerces ClassCastException with JDK 6. org.jboss.xb.binding.JBossXBRuntimeException: Failed to create a new SAX parser at org.jboss.xb.binding.UnmarshallerFactory$UnmarshallerFactoryImpl.newUnmarshaller(UnmarshallerFactory.java:100) at org.jboss.web.tomcat.service.deployers.JBossContextConfig.processContextConfig(JBossContextConfig.java:549) at org.jboss.web.tomcat.service.deployers.JBossContextConfig.init(JBossContextConfig.java:536) at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:279) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117) at org.apache.catalina.core.StandardContext.init(StandardContext.java:5436) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4148) at org.jboss.web.tomcat.service.deployers.TomcatDeployment.performDeployInternal(TomcatDeployment.java:310) at org.jboss.web.tomcat.service.deployers.TomcatDeployment.performDeploy(TomcatDeployment.java:142) at org.jboss.web.deployers.AbstractWarDeployment.start(AbstractWarDeployment.java:461) at org.jboss.web.deployers.WebModule.startModule(WebModule.java:118) at org.jboss.web.deployers.WebModule.start(WebModule.java:97) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:157) at org.jboss.mx.server.Invocation.dispatch(Invocation.java:96) at org.jboss.mx.server.Invocation.invoke(Invocation.java:88) at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264) at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:668) at org.jboss.system.microcontainer.ServiceProxy.invoke(ServiceProxy.java:206) at $Proxy38.start(Unknown Source) at org.jboss.system.microcontainer.StartStopLifecycleAction.installAction(StartStopLifecycleAction.java:42) at org.jboss.system.microcontainer.StartStopLifecycleAction.installAction(StartStopLifecycleAction.java:37) at org.jboss.dependency.plugins.action.SimpleControllerContextAction.simpleInstallAction(SimpleControllerContextAction.java:62) at org.jboss.dependency.plugins.action.AccessControllerContextAction.install(AccessControllerContextAction.java:71) at org.jboss.dependency.plugins.AbstractControllerContextActions.install(AbstractControllerContextActions.java:51) at org.jboss.dependency.plugins.AbstractControllerContext.install(AbstractControllerContext.java:348) at org.jboss.system.microcontainer.ServiceControllerContext.install(ServiceControllerContext.java:297) at org.jboss.dependency.plugins.AbstractController.install(AbstractController.java:1633) at org.jboss.dependency.plugins.AbstractController.incrementState(AbstractController.java:935) at org.jboss.dependency.plugins.AbstractController.resolveContexts(AbstractController.java:1083) at org.jboss.dependency.plugins.AbstractController.resolveContexts(AbstractController.java:985) at org.jboss.dependency.plugins.AbstractController.change(AbstractController.java:823) at org.jboss.dependency.plugins.AbstractController.change(AbstractController.java:553) at org.jboss.system.ServiceController.doChange(ServiceController.java:688) at org.jboss.system.ServiceController.start(ServiceController.java:460) at org.jboss.system.deployers.ServiceDeployer.start(ServiceDeployer.java:163) at org.jboss.system.deployers.ServiceDeployer.deploy(ServiceDeployer.java:99) at org.jboss.system.deployers.ServiceDeployer.deploy(ServiceDeployer.java:46) at org.jboss.deployers.spi.deployer.helpers.AbstractSimpleRealDeployer.internalDeploy(AbstractSimpleRealDeployer.java:62) at org.jboss.deployers.spi.deployer.helpers.AbstractRealDeployer.deploy(AbstractRealDeployer.java:50) at org.jboss.deployers.plugins.deployers.DeployerWrapper.deploy(DeployerWrapper.java:171) at org.jboss.deployers.plugins.deployers.DeployersImpl.doDeploy(DeployersImpl.java:1440) at org.jboss.deployers.plugins.deployers.DeployersImpl.doInstallParentFirst(DeployersImpl.java:1158) at org.jboss.deployers.plugins.deployers.DeployersImpl.doInstallParentFirst(DeployersImpl.java:1179) at org.jboss.deployers.plugins.deployers.DeployersImpl.install(DeployersImpl.java:1099) at
how to stress test solr
before stressing test, Should i close SolrCache? which tool u use? How to do stress test correctly? Any pointers? -- regards j.L ( I live in Shanghai, China)
Re: DataImportHandler - convertType attribute
One thing I find awkward about convertType is that it is JdbcDataSource specific, rather than field-specific. Isn't the current implementation far too broad? Erik On Feb 3, 2010, at 1:16 AM, Noble Paul നോബിള് नोब्ळ् wrote: implicit conversion can cause problem when Transformers are applied. It is hard for user to guess the type of the field by looking at the schema.xml. In Solr, String is the most commonly used type. if you wish to do numeric operations on a field convertType will cause problems. If it is explicitly set, user knows why the type got changed. On Tue, Feb 2, 2010 at 6:38 PM, Alexey Serba ase...@gmail.com wrote: Hello, I encountered blob indexing problem and found convertType solution in FAQhttp://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5 I was wondering why it is not enabled by default and found the following comment http://www.lucidimagination.com/search/document/169e6cc87dad5e67/dataimporthandler_and_blobs#169e6cc87dad5e67 in mailing list: We used to attempt type conversion from the SQL type to the field's given type. We found that it was error prone and switched to using the ResultSet#getObject for all columns (making the old behavior a configurable option – convertType in JdbcDataSource). Why it is error prone? Is it safe enough to enable convertType for all jdbc data sources by default? What are the side effects? Thanks in advance, Alex -- - Noble Paul | Systems Architect| AOL | http://aol.com
wildcards in stopword list
Hi, I am wondering if there is some way to maintain a stopword list with widcards: ignoring anything that starts with foo: foo* i am doing some funky hackery inside DIH via javascript to make my autosuggest work. i basically split phrases and store them together with the full phrase: the phrase: Foo Bar becomes: Foo Bar foo bar {foo}Foo_Bar {bar}Foo_Bar the phrase: Foo-Bar becomes: Foo-Bar foo-bar {foo}Foo-Bar {bar}Foo-Bar However if bar is a stop word, i would like to simply ignore all tokens that start with {bar} Obviously I could have this logic inside my DIH script, but then i would need to read in the stopword.txt file the script, which i would like to avoid, then again it would probably be the more efficient approach. regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: DataImportHandler - convertType attribute
On Wed, Feb 3, 2010 at 3:31 PM, Erik Hatcher erik.hatc...@gmail.com wrote: One thing I find awkward about convertType is that it is JdbcDataSource specific, rather than field-specific. Isn't the current implementation far too broad? it is feature of JdbcdataSource and no other dataSource offers it. we offer it because JDBC drivers have mechanism to do type conversion What do you mean by it is too broad? Erik On Feb 3, 2010, at 1:16 AM, Noble Paul നോബിള് नोब्ळ् wrote: implicit conversion can cause problem when Transformers are applied. It is hard for user to guess the type of the field by looking at the schema.xml. In Solr, String is the most commonly used type. if you wish to do numeric operations on a field convertType will cause problems. If it is explicitly set, user knows why the type got changed. On Tue, Feb 2, 2010 at 6:38 PM, Alexey Serba ase...@gmail.com wrote: Hello, I encountered blob indexing problem and found convertType solution in FAQhttp://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5 I was wondering why it is not enabled by default and found the following comment http://www.lucidimagination.com/search/document/169e6cc87dad5e67/dataimporthandler_and_blobs#169e6cc87dad5e67in mailing list: We used to attempt type conversion from the SQL type to the field's given type. We found that it was error prone and switched to using the ResultSet#getObject for all columns (making the old behavior a configurable option – convertType in JdbcDataSource). Why it is error prone? Is it safe enough to enable convertType for all jdbc data sources by default? What are the side effects? Thanks in advance, Alex -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Lucene User Group Meetup in Amsterdam
Hi All, On 17th February we'll host the first Dutch Lucene User Group Meetup. This meet-up will be split into two parts: - The first part will be dedicated to the user group itself. We'll have an introduction to the members and have an open discussion about the goals of the user group and the expectations from it. - In the second part, Anne Veling (http://www.beyondtrees.com) will give a session about his latest experiences with large scale Solr deployments. Of course, you will not only get food for thought, but also food for you stomach - we'll have a pizza break between the parts and of course beer during after. Date: 17th February 2010 Time: 17:00 Location: Frederiksplein 1 1017XK Amsterdam The Netherlands For more information or questions, please visit: http://www.lucene-nl.org/first_meetup Hope to see you there! Cheers, Uri
Re: DataImportHandler - convertType attribute
On Feb 3, 2010, at 5:36 AM, Noble Paul നോബിള് नोब्ळ् wrote: On Wed, Feb 3, 2010 at 3:31 PM, Erik Hatcher erik.hatc...@gmail.com wrote: One thing I find awkward about convertType is that it is JdbcDataSource specific, rather than field-specific. Isn't the current implementation far too broad? it is feature of JdbcdataSource and no other dataSource offers it. we offer it because JDBC drivers have mechanism to do type conversion What do you mean by it is too broad? I mean the convertType flag is not field-specific (or at least field overridable). Conversions occur on a per-field basis, but the setting is for the entire data source and thus all fields. Erik
RE: Basic indexing question
Thanks that was it - I've now configured a dismax requesthandler that suits my needs -Original Message- From: Joe Calderon [mailto:calderon@gmail.com] Sent: 03 February 2010 00:20 To: solr-user@lucene.apache.org Subject: Re: Basic indexing question see http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field for details on default field, most people use the dismax handler when handling queries from user see http://wiki.apache.org/solr/DisMaxRequestHandler for more details, if you dont have many fields you can write your own query using the lucene query parser as i mentioned before, the syntax cen be found at http://lucene.apache.org/java/2_9_1/queryparsersyntax.html hope this helps --joe On Tue, Feb 2, 2010 at 3:59 PM, Stefan Maric sma...@ntlworld.com wrote: Thanks for the quick reply I will have to see if the default query mechanism will suffice for most of my needs I have skimmed through most of the Solr documentation and didn't see anything describing I can easily change my DB View so that I only source Solr with a single string plus my id field (as my application makng the search will have to collate associated information into a presentable screen anyhow - so I'm not too worried about info being returned by Solr as such) Would that be a reasonable way of using Solr -Original Message- From: Joe Calderon [mailto:calderon@gmail.com] Sent: 02 February 2010 23:42 To: solr-user@lucene.apache.org Subject: Re: Basic indexing question by default solr will only search the default fields, you have to either query all fields field1:(ore) or field2:(ore) or field3:(ore) or use a different query parser like dismax On Tue, Feb 2, 2010 at 3:31 PM, Stefan Maric sma...@ntlworld.com wrote: I have got a basic configuration of Solr up and running and have loaded some data to experiment with When I run a query for 'ore' I get 3 results when I'm expecting 4 Dataimport is pulling the expected number of rows in from my DB view In my schema.xml I have field name=id type=string indexed=true stored=true required=true / field name=atomId type=string indexed=true stored=true required=true / field name=name type=text indexed=true stored=true/ field name=description type=text indexed=true stored=true / and the defaults field name=text type=text indexed=true stored=false multiValued=true/ copyField source=name dest=text/ From an SQL point of view - I am expecting a search for 'ore' to retrieve 4 results (which the following does) select * from v_sm_search_sectors where description like '% ore%' or name like '% ore%'; 121 B0.010.010 Mining and quarrying Mining of metal ore, stone, sand, clay, coal and other solid minerals 1000144 E0.030 Metal and metal ores wholesale (null) 1000145 E0.030.010 Metal and metal ores wholesale (null) 1000146 E0.030.020 Metal and metal ores wholesale agents (null) From a Solr query for 'ore' - I get the following response - lst name=responseHeader int name=status0/int int name=QTime0/int - lst name=params str name=rows10/str str name=start0/str str name=indenton/str str name=qore/str str name=version2.2/str /lst /lst - result name=response numFound=3 start=0 - doc str name=atomIdE0.030/str str name=id1000144/str str name=nameMetal and metal ores wholesale/str /doc - doc str name=atomIdE0.030.010/str str name=id1000145/str str name=nameMetal and metal ores wholesale/str /doc - doc str name=atomIdE0.030.020/str str name=id1000146/str str name=nameMetal and metal ores wholesale agents/str /doc /result /response So I don't retrieve the document where 'ore' is in the descritpion field (and NOT the name field) It would seem that Solr is ONLY returning me results based on what has been put into the field name=text by the copyField source=name dest=text/ Any hints as to what I've missed ?? Regards Stefan Maric No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.435 / Virus Database: 271.1.1/2663 - Release Date: 02/02/10 07:35:00 No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.435 / Virus Database: 271.1.1/2664 - Release Date: 02/02/10 19:35:00
Another basic question
I have got a basic configuration of Solr up and running and have loaded some data to experiment with Dataimport is pulling the expected number of rows in from my DB view If I query for Beekeeping i get one result returned (as expected) If I query for bee - I get no results similarly for Bee etc What areas of Solr configuration do I need to look into Thanks Stefan Maric
Re: Another basic question
I have got a basic configuration of Solr up and running and have loaded some data to experiment with Dataimport is pulling the expected number of rows in from my DB view If I query for Beekeeping i get one result returned (as expected) If I query for bee - I get no results similarly for Bee etc Do you want the query (bee) to return documents containing beekeeping? You can use prefix query bee* but I think DisMax does not support it. Alternatively you can use index time synonym expansion : filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true / with index_synonyms.txt : beekeeping, bee keeping, bee-keeping
query all filled field?
Hi all, Is it possible to query some field in order to get only not empty documents? All documents where field x is filled? Thanks, Frederico
Re: DataImportHandler - convertType attribute
On Wed, Feb 3, 2010 at 4:16 PM, Erik Hatcher erik.hatc...@gmail.com wrote: On Feb 3, 2010, at 5:36 AM, Noble Paul നോബിള് नोब्ळ् wrote: On Wed, Feb 3, 2010 at 3:31 PM, Erik Hatcher erik.hatc...@gmail.com wrote: One thing I find awkward about convertType is that it is JdbcDataSource specific, rather than field-specific. Isn't the current implementation far too broad? it is feature of JdbcdataSource and no other dataSource offers it. we offer it because JDBC drivers have mechanism to do type conversion What do you mean by it is too broad? I mean the convertType flag is not field-specific (or at least field overridable). Conversions occur on a per-field basis, but the setting is for the entire data source and thus all fields. Yes. it is true. First of all this is not very widely used, so fine tuning did not make sense Erik -- - Noble Paul | Systems Architect| AOL | http://aol.com
How can I make my solr admin Password Protected
Hi, Can any one help me, how can I make my solr adim password protected so that only authorise person can access it. -- Thank you, Vijayant Kumar Software Engineer Website Toolbox Inc. http://www.websitetoolbox.com 1-800-921-7803 x211
Re: How can I make my solr admin Password Protected
There's some basic info for Jetty and Resin here: http://wiki.apache.org/solr/SolrSecurity Keep in mind the various URLs that Solr exposes though, so if you aren't protecting /solr completely you'll want to be aware that / update can add/update/delete anything, and so on. Erik On Feb 3, 2010, at 6:40 AM, Vijayant Kumar wrote: Hi, Can any one help me, how can I make my solr adim password protected so that only authorise person can access it. -- Thank you, Vijayant Kumar Software Engineer Website Toolbox Inc. http://www.websitetoolbox.com 1-800-921-7803 x211
Re: Indexing an oracle warehouse table
What would be the right way to point out which field contains the term searched for. I would use highlighting for all of these fields and then post process Solr response in order to check highlighting tags. But I don't have so many fields usually and don't know if it's possible to configure Solr to highlight fields using '*' as dynamic fields. On Wed, Feb 3, 2010 at 2:43 AM, caman aboxfortheotherst...@gmail.com wrote: Thanks all. I am on track. Another question: What would be the right way to point out which field contains the term searched for. e.g. If I search for SOLR and if the term exist in field788 for a document, how do I pinpoint that which field has the term. I copied all the fields in field called 'body' which makes searching easier but would be nice to show the field which has that exact term. thanks caman wrote: Hello all, hope someone can point me to right direction. I am trying to index an oracle warehouse table(TableA) with 850 columns. Out of the structure about 800 fields are CLOBs and are good candidate to enable full-text searching. Also have few columns which has relational link to other tables. I am clean on how to create a root entity and then pull data from other relational link as child entities. Most columns in TableA are named as field1,field2...field800. Now my question is how to organize the schema efficiently: First option: if my query is 'select * from TableA', Do I define field name=attr1 column=FIELD1 / for each of those 800 columns? Seems cumbersome. May be can write a script to generate XML instead of handwriting both in data-config.xml and schema.xml. OR Dont define any field name=attr1 column=FIELD1 / so that column in SOLR will be same as in the database table. But questions are 1)How do I define unique field in this scenario? 2) How to copy all the text fields to a common field for easy searching? Any helpful is appreciated. Please feel free to suggest any alternative way. Thanks -- View this message in context: http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27429352.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: wildcards in stopword list
I am wondering if there is some way to maintain a stopword list with widcards: ignoring anything that starts with foo: foo* A custom TokenFilterFactory derived from StopFilterFactory can remove a token if it matches a java.util.regex.Pattern. List of patterns can be loaded from a file in a similar fashion to stopwords.txt i am doing some funky hackery inside DIH via javascript to make my autosuggest work. i basically split phrases and store them together with the full phrase: the phrase: Foo Bar becomes: Foo Bar foo bar {foo}Foo_Bar {bar}Foo_Bar What is the benefit of storing {foo}Foo_Bar and {bar}Foo_Bar? Then how are you querying this to auto-suggest?
RE: query all filled field?
Ok, if anyone needs it: I tried fieldX:[* TO *] I think this is correct. In my case I found out that I was not indexing this field correctly because they are all empty. :) -Original Message- From: Frederico Azeiteiro [mailto:frederico.azeite...@cision.com] Sent: quarta-feira, 3 de Fevereiro de 2010 11:34 To: solr-user@lucene.apache.org Subject: query all filled field? Hi all, Is it possible to query some field in order to get only not empty documents? All documents where field x is filled? Thanks, Frederico
Re: wildcards in stopword list
On 03.02.2010, at 13:07, Ahmet Arslan wrote: i am doing some funky hackery inside DIH via javascript to make my autosuggest work. i basically split phrases and store them together with the full phrase: the phrase: Foo Bar becomes: Foo Bar foo bar {foo}Foo_Bar {bar}Foo_Bar What is the benefit of storing {foo}Foo_Bar and {bar}Foo_Bar? Then how are you querying this to auto-suggest? this way i can do a prefix facet search for the term foo or bar and in both cases i can show the user Foo Bar with a bit of frontend logic to split off the payload aka original data. regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: query all filled field?
Is it possible to query some field in order to get only not empty documents? All documents where field x is filled? Yes. q=x:[* TO *] will bring documents that has non-empty x field.
Re: wildcards in stopword list
this way i can do a prefix facet search for the term foo or bar and in both cases i can show the user Foo Bar with a bit of frontend logic to split off the payload aka original data. So you have a list of phrases (pre-extracted) to be used for auto-suggest? Or you are using bi-gram shingles?
Re: wildcards in stopword list
On 03.02.2010, at 13:41, Ahmet Arslan wrote: this way i can do a prefix facet search for the term foo or bar and in both cases i can show the user Foo Bar with a bit of frontend logic to split off the payload aka original data. So you have a list of phrases (pre-extracted) to be used for auto-suggest? Or you are using bi-gram shingles? For the actual search I am using bi-gram shingles for phrase boosting. However for autosuggest this is not practical. The issue is that I have multiple fields of data (names, address etc) that should all be relevant for the auto suggest. Furthermore a phrase entered can either match on one field or any combination of fields. Phrase in this context means separated by spaces or dash. For this I found the above approach the only feasible solution. regards, Lukas Kahwe Smith m...@pooteeweet.org
RE: query all filled field?
Hum, strange.. I reindexed some docs with the field corrected. Now I'm sure the field is filled because: fieldX:(*a*) returns docs. But fieldX:[* TO *] is returning the same as *.* (all results) I tried with -fieldX:[* TO *] and I get no results at all. I wonder if someone has tried this before with success? The field is indexed as string, indexed=true and stored=true. Thanks, Frederico -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: quarta-feira, 3 de Fevereiro de 2010 11:48 To: solr-user@lucene.apache.org Subject: Re: query all filled field? Is it possible to query some field in order to get only not empty documents? All documents where field x is filled? Yes. q=x:[* TO *] will bring documents that has non-empty x field.
Re: wildcards in stopword list
Actually I plan to write a bigger blog post about the approach. In order to match the different fields I actually have a separate core with an index dedicated to auto suggest alone where I merge all fields together via some javascript code: This way I can then use terms for a single word entered and a facet prefix search with the last term as the prefix and the rest as the query for multi term entries into the auto suggest box. The idea is that I can then enter any part of any of the fields, but I will then be suggested the entire phrase in that field: So if I have a field: Foo Bar Ding Dong and I enter ding into the search box, I would get a suggestion of Foo Bar Ding Dong If I am not wrong you have a list of suggestion candidates indexed in a separate core dedicated to auto suggest alone. I think you can use this field type for suggestion. fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType With this field type, the query ding or din or di would return Foo Bar Ding Dong. Also you do not need to index all combinations like: {foo} Foo Bar Ding Dong {bar} Foo Bar Ding Dong {ding}Foo Bar Ding Dong {dong}Foo Bar Ding Dong
Re: wildcards in stopword list
On 03.02.2010, at 14:34, Ahmet Arslan wrote: Actually I plan to write a bigger blog post about the approach. In order to match the different fields I actually have a separate core with an index dedicated to auto suggest alone where I merge all fields together via some javascript code: This way I can then use terms for a single word entered and a facet prefix search with the last term as the prefix and the rest as the query for multi term entries into the auto suggest box. The idea is that I can then enter any part of any of the fields, but I will then be suggested the entire phrase in that field: So if I have a field: Foo Bar Ding Dong and I enter ding into the search box, I would get a suggestion of Foo Bar Ding Dong If I am not wrong you have a list of suggestion candidates indexed in a separate core dedicated to auto suggest alone. I think you can use this field type for suggestion. First up: I very much appreciate your input! fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType With this field type, the query ding or din or di would return Foo Bar Ding Dong. hmm wouldnt it return foo bar ding dong ? obviously i have to decide how important it is for me to get the original mixed case string for auto suggest, but it does matter a bit more over here in Europe than in the US for example. if i would both index the original mixed case and the lower case version and remove the solr.LowerCaseFilterFactory in both analyzer sections, then it should work however as long as terms usually start with an upper case letter if they do contain upper case letters. let me try this out .. regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: ContentStreamUpdateRequest addFile fails to close Stream
Hey Christoph, Could you give the patch at https://issues.apache.org/jira/browse/SOLR-1744 a try and let me know how it works out for you? -- - Mark http://www.lucidimagination.com Mark Miller wrote: Christoph Brill wrote: I tried to fix it in CommonsHttpSolrServer but I wasn't sure how to do it. I tried to close the stream after the method got executed, but somehow getContent() always returned null (see attached patch against solr 1.4 for my non-working attempt). Who's responsible for closing a stream? CommonsHttpSolrServer? The caller? FileStream? I'm unsure because I don't know solrj in depth. Regards, Chris Am 02.02.2010 14:37, schrieb Mark Miller: Broken by design? How about we just fix BinaryUpdateRequestHandler (and possibly CommonsHttpSolrServer) to close the stream it gets? That class is a little messy for following - but I'd try just assigning the stream to a local stream thats available through the whole method, and then at the very bottom finally block, if stream != null, close it. I think we also want to close it if the exception that causes a retry happens: catch( NoHttpResponseException r ) { // This is generally safe to retry on method.releaseConnection(); method = null; // If out of tries then just rethrow (as normal error). if( ( tries 1 ) ) { throw r; } //log.warn( Caught: + r + . Retrying... ); }
Re: wildcards in stopword list
With this field type, the query ding or din or di would return Foo Bar Ding Dong. hmm wouldnt it return foo bar ding dong ? No, it will return original string. In this method you are not using faceting anymore. You are just querying and requesting a field. q=suggest_field:difl=suggest_field
Re: wildcards in stopword list
On 03.02.2010, at 15:19, Ahmet Arslan wrote: With this field type, the query ding or din or di would return Foo Bar Ding Dong. hmm wouldnt it return foo bar ding dong ? No, it will return original string. In this method you are not using faceting anymore. You are just querying and requesting a field. q=suggest_field:difl=suggest_field Yeah, I just realized that while I was trying it out. :-) Still testing .. regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: how to stress test solr
I like to use JMeter with a large queries file. This way you can measure response times with lots of requests at the same time. Having JConsole opened at the same time you can check the memory status James liu-2 wrote: before stressing test, Should i close SolrCache? which tool u use? How to do stress test correctly? Any pointers? -- regards j.L ( I live in Shanghai, China) -- View this message in context: http://old.nabble.com/how-to-stress-test-solr-tp27433733p27437524.html Sent from the Solr - User mailing list archive at Nabble.com.
Any idea what could be wrong with this fq value?
Following is my solr URL. http://hostname:port/solr/entities/select/?version=2.2start=0indent=onqt=dismaxrows=60fq=statusName:(Open OR Cancelled)debugQuery=trueq=devfq=groupName:Infrastructure“ “groupName” is one of the attributes I create fq (filterQuery) on. This field(groupName) is being indexed and stored. When I search for anything else other than “Infrastructure” in fq groupName Solr brings me back correct results. When I pass in “Infrastructure” in the fq=groupName:Infrastructure“ it never brings me anything back. If I remove “fq” completely it will bring me all results including records with groupName:Infrastructure“. Something is wrong only with this “Infrastructure” value in the fq. Any idea what wrong could be happening. Clearly this is only related to value Infrastructure“ in the filter query. Thanks, -- View this message in context: http://old.nabble.com/Any-idea-what-could-be-wrong-with-this-fq-value--tp27437723p27437723.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any idea what could be wrong with this fq value?
is groupName a string field? If not, it probably should be. My hunch is that you're analyzing that field and it is lowercased in the index, and maybe even stemmed. Try q=*:*facet=onfacet.field=groupName to see all the *indexed* values of the groupName field. Erik On Feb 3, 2010, at 10:05 AM, javaxmlsoapdev wrote: Following is my solr URL. http://hostname:port/solr/entities/select/? version=2.2start=0indent=onqt=dismaxrows=60fq=statusName:(Open OR Cancelled)debugQuery=trueq=devfq=groupName:Infrastructure“ “groupName” is one of the attributes I create fq (filterQuery) on. This field(groupName) is being indexed and stored. When I search for anything else other than “Infrastructure” in fq groupName Solr brings me back correct results. When I pass in “Infrastructure” in the fq=groupName:Infrastructure“ it never brings me anything back. If I remove “fq” completely it will bring me all results including records with groupName:Infrastructure“. Something is wrong only with this “Infrastructure” value in the fq. Any idea what wrong could be happening. Clearly this is only related to value Infrastructure“ in the filter query. Thanks, -- View this message in context: http://old.nabble.com/Any-idea-what-could-be-wrong-with-this-fq-value--tp27437723p27437723.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: wildcards in stopword list
On 03.02.2010, at 14:34, Ahmet Arslan wrote: Actually I plan to write a bigger blog post about the approach. In order to match the different fields I actually have a separate core with an index dedicated to auto suggest alone where I merge all fields together via some javascript code: This way I can then use terms for a single word entered and a facet prefix search with the last term as the prefix and the rest as the query for multi term entries into the auto suggest box. The idea is that I can then enter any part of any of the fields, but I will then be suggested the entire phrase in that field: So if I have a field: Foo Bar Ding Dong and I enter ding into the search box, I would get a suggestion of Foo Bar Ding Dong If I am not wrong you have a list of suggestion candidates indexed in a separate core dedicated to auto suggest alone. I think you can use this field type for suggestion. fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType hmm .. not sure yet if i like this approach better. it seems i cannot use dismax here, at least its not finding matches, which means i need to parse the query to prevent people from going crazy stuff. also in the old approach i was suggesting single words first and then slowly help people towards a full phrase. with this approach i immediately end up with full phrases, which severely limits the usefulness. at the same time i am not sure if the index will really be significantly smaller with this approach than with my hack. and since there can also be matches inside words that have no real meaning i am also not sure if this really gets me better quality on this level either. will play around with this some more tough. regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: Any idea what could be wrong with this fq value?
thanks Erik for the pointer. I had this field as text and after changing it to string it started working as expected. I am still not sure why this particular value(Infrastructure) was failing to bring back results. other values like Network, Information etc worked fine when field was of type text as well. I tried(when groupName was of type text) q=*:*facet=onfacet.field=groupName and it brought back Infrascture correctly. Can you explain internally how solr indexed this attribute differently and changing to string from text started working? Thanks, Erik Hatcher-4 wrote: is groupName a string field? If not, it probably should be. My hunch is that you're analyzing that field and it is lowercased in the index, and maybe even stemmed. Try q=*:*facet=onfacet.field=groupName to see all the *indexed* values of the groupName field. Erik On Feb 3, 2010, at 10:05 AM, javaxmlsoapdev wrote: Following is my solr URL. http://hostname:port/solr/entities/select/? version=2.2start=0indent=onqt=dismaxrows=60fq=statusName:(Open OR Cancelled)debugQuery=trueq=devfq=groupName:Infrastructure“ “groupName” is one of the attributes I create fq (filterQuery) on. This field(groupName) is being indexed and stored. When I search for anything else other than “Infrastructure” in fq groupName Solr brings me back correct results. When I pass in “Infrastructure” in the fq=groupName:Infrastructure“ it never brings me anything back. If I remove “fq” completely it will bring me all results including records with groupName:Infrastructure“. Something is wrong only with this “Infrastructure” value in the fq. Any idea what wrong could be happening. Clearly this is only related to value Infrastructure“ in the filter query. Thanks, -- View this message in context: http://old.nabble.com/Any-idea-what-could-be-wrong-with-this-fq-value--tp27437723p27437723.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Any-idea-what-could-be-wrong-with-this-fq-value--tp27437723p27439279.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing an oracle warehouse table
Thanks. I will give this a shot. Alexey-34 wrote: What would be the right way to point out which field contains the term searched for. I would use highlighting for all of these fields and then post process Solr response in order to check highlighting tags. But I don't have so many fields usually and don't know if it's possible to configure Solr to highlight fields using '*' as dynamic fields. On Wed, Feb 3, 2010 at 2:43 AM, caman aboxfortheotherst...@gmail.com wrote: Thanks all. I am on track. Another question: What would be the right way to point out which field contains the term searched for. e.g. If I search for SOLR and if the term exist in field788 for a document, how do I pinpoint that which field has the term. I copied all the fields in field called 'body' which makes searching easier but would be nice to show the field which has that exact term. thanks caman wrote: Hello all, hope someone can point me to right direction. I am trying to index an oracle warehouse table(TableA) with 850 columns. Out of the structure about 800 fields are CLOBs and are good candidate to enable full-text searching. Also have few columns which has relational link to other tables. I am clean on how to create a root entity and then pull data from other relational link as child entities. Most columns in TableA are named as field1,field2...field800. Now my question is how to organize the schema efficiently: First option: if my query is 'select * from TableA', Do I define field name=attr1 column=FIELD1 / for each of those 800 columns? Seems cumbersome. May be can write a script to generate XML instead of handwriting both in data-config.xml and schema.xml. OR Dont define any field name=attr1 column=FIELD1 / so that column in SOLR will be same as in the database table. But questions are 1)How do I define unique field in this scenario? 2) How to copy all the text fields to a common field for easy searching? Any helpful is appreciated. Please feel free to suggest any alternative way. Thanks -- View this message in context: http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27429352.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Indexing-an-oracle-warehouse-table-tp27414263p27439611.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR Performance Tuning: Fuzzy Search
I was lucky to contribute an excellent solution: http://issues.apache.org/jira/browse/LUCENE-2230 Even 2nd edition of Lucene in Action advocates to use fuzzy search only in exceptional cases. Another solution would be 2-step indexing (it may work for many use cases), but it is not spellchecker 1. Create a regular index 2. Create a dictionary of terms 3. For each term, find nearest terms (for instance, stick with distance=2) 4. Use copyField in SOLR, or smth similar to synonym dictionary; or, for instance, generate specific Query Parser... 5. Of course, custom request handler and etc. It may work well (but only if query contains term from dictionary; it can't work as a spellchecker) Combination 2 algos can boost performance extremely... Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http://www.tokenizer.ca/ Data Mining, Vertical Search
Re: distributed search and failed core
My only suggestion is to put haproxy in front of two replicas and then have haproxy do the failover. If a shard fails, the whole search will fail unless you do something like this. On Fri, Jan 29, 2010 at 3:31 PM, Joe Calderon calderon@gmail.comwrote: hello *, in distributed search when a shard goes down, an error is returned and the search fails, is there a way to avoid the error and return the results from the shards that are still up? thx much --joe -- Regards, Ian Connor
Re: Solr response extremely slow
Here you go - Solr Specification Version: 1.3.0 Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47 Lucene Specification Version: 2.4-dev Lucene Implementation Version: 2.4-dev 691741 - 2008-09-03 15:25:16 -- View this message in context: http://old.nabble.com/Solr-response-extremely-slow-tp27432229p27441205.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: distributed search and failed core
On Fri, Jan 29, 2010 at 3:31 PM, Joe Calderon calderon@gmail.com wrote: hello *, in distributed search when a shard goes down, an error is returned and the search fails, is there a way to avoid the error and return the results from the shards that are still up? The SolrCloud branch has load-balancing capabilities for distributed search amongst shard replicas. http://wiki.apache.org/solr/SolrCloud -Yonik http://www.lucidimagination.com
Re: distributed search and failed core
thx guys, i ended up using a mix of code from the solr-1143 and solr-1537 patches, now whenever there is an exception theres is a section in the results indicating the result is partial and also lists the failed core(s), weve added some monitoring to check for that output as well to alert us when a shard has failed On Wed, Feb 3, 2010 at 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Jan 29, 2010 at 3:31 PM, Joe Calderon calderon@gmail.com wrote: hello *, in distributed search when a shard goes down, an error is returned and the search fails, is there a way to avoid the error and return the results from the shards that are still up? The SolrCloud branch has load-balancing capabilities for distributed search amongst shard replicas. http://wiki.apache.org/solr/SolrCloud -Yonik http://www.lucidimagination.com
autosuggest via solr.EdgeNGramFilterFactory (was: Re: wildcards in stopword list)
Hi Ahmet, Well after some more testing I am now convinced that you rock :) I like the solution because its obviously way less hacky and more importantly I expect this to be a lot faster and less memory intensive, since instead of a facet prefix or terms search, I am doing an equality comparison on tokens (albeit a fair number of them, but each much smaller). I can also have more control over the ordering of the results. I can also make full use of the stopword filter, which again should improve the sort order (like if I have a stopword ag and a word starts with ag it will not be overpowered by tons of strings containing ag as a single word). Obviously there is one limitation if people enter search terms longer than 20, but I think I can safely ignore this case. Even with long german words 15 letters should be enough to find what the user is looking for. and if a word needs more characters, then its probably a meaningless post fix like versicherungsgesellschaft which just means insurance agency and the user is just being stupid. I do loose the nice numbers telling the user how often a given term matched, which has some merit for street/city names, less so for the names of people and close to none for company names. There is also a minor niggle with how the data is returned which I discuss at the end of the email. I am using the following in my schema.xml fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / /analyzer /fieldType field name=name type=prefix_token indexed=true stored=true / field name=firstname type=prefix_token indexed=true stored=true / field name=email type=prefix_token indexed=true stored=true / field name=city type=prefix_token indexed=true stored=true / field name=street type=prefix_token indexed=true stored=true / field name=telefon type=prefix_token indexed=true stored=true / field name=id type=string indexed=true stored=true required=true / and finally the following in my solrconfig.xml requestHandler name=auto class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str int name=rows10/int str name=qfname firstname email^0.5 telefon^0.5 city^0.6 street^0.6/str str name=flname,firstname,telefon,email,city,street/str /lst /requestHandler This all works well. There is just one minor uglyness, which might still be solveable inside solr, but I fixed it in the php frontend logic. The issue is that I obviously get all the fields for each document returned and I need to figure out for which I actually had a match to be presented in the autosuggest. Is there some Solr magic that will do this work for me? $query = new SolrQuery($searchstring); $response = $this-solrClientAuto-query($query); $numFound = empty($response-response-numFound) ? 0 : $response-response-numFound; $data = array('results' = array(), 'numFound' = $numFound); if (!empty($response-response-docs)) { $p = str_replace('', '', substr($searchstring, strpos($searchstring, ' '))); foreach ($response-response-docs as $doc) { foreach ((array)$doc as $value) { if (stripos($value, $p) === 0 || stripos($value, ' '.$p)) { $data['results'][$value] = 1; } } } } Then again I have to review with the UI guys if we will always just show the name anyways and replace the entire user entered term with the name which should be sufficiently unique in most cases to get a small enough result set. regards, Lukas
The Riddle of the Underscore and the Dollar Sign
I am perplexed by the behavior I am seeing of the Solr Analyzer and Filters with regard to Underscores. 1) I am trying to get rid of them when shingling, but seem unable to do so with a Stopwords Filter. And yet they are being removed when I am not even trying to by the WordDelimiter Filter. 2) Conversely, I would like to retain '$' symbols when they adjacent to numbers, but seem unable to without having to accept all forms of other syntax. My simple example configuration and test data and results are below. Most grateful for any guidance, Christopher Test Data: doc field name=idStopWordTestData/field field name=conSubSec-text_dcPreShingled ThisIsNotAStopWord ThisIsAStopWord ThisIsAlsoAStopWord beforeaperiod. beforeacomma, beforeacollan: under_Score don't Peter's s $1.00 $1 $1,000 $200 $3,000,000 $3m - # -#- --#-- Yes X No _ __ ___ a and also about/field /doc Field 1 - Delimited_text: Index Analyzer: org.apache.solr.analysis.TokenizerChain Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: 1. org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 0 catenateWords: 1 generateWordParts: 0 catenateAll: 1 catenateNumbers: 1 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} Field 1 - Resulting Index Terms: Term # 100 2 1000 2 200 2 3 2 300 2 3m 2 a 2 about 2 also 2 and 2 beforeacollan 2 beforeacomma 2 beforeaperiod 2 dont 2 m 2 no 2 peter 2 preshingled 2 s 2 thisisalsoastopword 2 thisisastopword 2 thisisnotastopword 2 underscore 2 x 2 yes 2 1 2 Field2 - Shingled_Text: Index Analyzer: org.apache.solr.analysis.TokenizerChain Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: 2. 1. org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 0 catenateWords: 1 stemEnglishPossessive: 0 generateWordParts: 0 catenateAll: 0 catenateNumbers: 1 } 3. 2. org.apache.solr.analysis.StopFilterFactory args:{words: StopWords-PreShingled.txt ignoreCase: true enablePositionIncrements: true } 4. 3. org.apache.solr.analysis.LowerCaseFilterFactory args:{} 5. 4. org.apache.solr.analysis.ShingleFilterFactory args:{outputUnigrams: false maxShingleSize: 5 } File: StopWords-PreShingled.txt s _ PreShingled __ ThisIsAStopWord ThisIsAlsoAStopWord Field2 - Resulting Index Terms (Sample): Term # _ 100 1 _ 100 1 1000 1 _ _ 1 _ _ beforeaperiod beforeacomma 1 _ beforeaperiod 1 _ beforeaperiod beforeacomma beforeacollan 1 _ thisisnotastopword 1 _ thisisnotastopword _ _ 1
Re: Search wihthout diacritics
On Feb 2, 2010, at 8:53 PM, Olala wrote: Hi all! I have problem with Solr, and I hope everyboby in there can help me :) I want to search text without diacritic but Solr will response diacritic text and without diacritic text. For example, I query solr index, it will response solr index, sôlr index, sòlr index, sólr indèx,... I was tried ASCIIFoldingFilter and ISOLatin1AccentFilterFactory but it is not correct :( What's not correct? Can you provide more detail? Is it not indexed correctly? You might look at the Analysis tool under the Solr admin area to see how it is processing your content during indexing and searching. My schema config: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType You probably should strip diacritics during query time, too. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: Guidance on Solr errors
Inline below. On Feb 2, 2010, at 8:40 PM, Vauthrin, Laurent wrote: Hello, I'm trying to troubleshoot a problem that occurred on a few Solr slave Tomcat instances and wanted to run it by the list to see if I'm on the right track. The setup involves 1 master replicating to three slaves (I don't know what the replication interval is at this time). These instances have been running fine for a while (from what I understand) but ran into problems just today during peak site usage. The following two exceptions were observed (partially stripped stack traces): WARNING: [] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. Feb 1, 2010 10:00:31 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:941) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2. java:368) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpd ateProcessorFactory.java:77) Feb 1, 2010 10:29:36 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:734) at org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.termDocs(MultiS egmentReader.java:612) at org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.termDocs(MultiS egmentReader.java:605) at org.apache.lucene.index.MultiSegmentReader$MultiTermDocs.read(MultiSegme ntReader.java:570) at org.apache.lucene.search.TermScorer.next(TermScorer.java:106) at org.apache.lucene.search.DisjunctionSumScorer.initScorerDocQueue(Disjunc tionSumScorer.java:105) at org.apache.lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer. java:144) at org.apache.lucene.search.BooleanScorer2.next(BooleanScorer2.java:352) at org.apache.lucene.search.DisjunctionSumScorer.initScorerDocQueue(Disjunc tionSumScorer.java:105) at org.apache.lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer. java:144) at org.apache.lucene.search.BooleanScorer2.next(BooleanScorer2.java:352) at org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:8 0) at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:4 8) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:319) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:137) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher. java:920) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.j ava:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:2 69) Here's the config for the caches: filterCache: size=15000 initialSize=5000 autowarmCount=5000 queryResultCache: size=15000 initialSize=5000 autowarmCount=15000 documentCache: size=15000 initialSize=5000 From what I understand, the first exception indicates that multiple replications are being processed at the same time. Is that correct or could it be something else? You are probably committing/replicating faster than Solr can open up the new index and warm the new searcher. Does the second exception indicate that Solr is having problems handling the query load (possibly due to a commit happening at the same time)? This is likely caused by the first problem b/c you are running out of memory Does anyone have any insight that might help here? I sort of suspect that the autowarm counts are too large but I may be off there. I can provide more details (as I get them) about this if needed. You probably should start smaller, yes. Bigger is not always better when it comes to caches, especially when GC is factored in. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Slow QueryComponent.process() when queries have numbers in them
According to my logs org.apache.solr.handler.component.QueryComponent.process() takes a significant amount of time (5s but I've seen up to 15s) when a query has an odd pattern of numbers in e.g neodymium megagauss-oersteds (MGOe) (1 MG·Oe = 7,958·10³ T·A/m = 7,958 kJ/m³ myers 8e psychology chapter 9 JOHN PIPER 1 TIMOTEO 3:1? lab 2.6.2: using wireshark to view protocol data units malha de aço 3x3 6mm - peso m2 or even looks like it could be a query An experiment has two outcomes, A and A. If A is three time as likely to occur as , what is P(A)? other params were fl: *,score fq: +num_pages:[2 TO *] AND +language:1 hl: true hl.fl: content title description hl.simple.post: /strong hl.simple.pre: strong hl.snippets: 2 qf: title^1.5 content^0.8 qs: 2 qt: dismax rows: 10 sort: score desc start: 0 wt: json is this just something I'm going to have to put up with? Or is there something I can do to mitigate it. If it's a bug any suggestions on how to start patching it?
need help with feb 3/2010 trunk and solr-236
I got latest trunk (feb3/2010) using svn and applied solr-236. did an ant clean and it seems to build fine with no errors or warnings. however, when I start solr I get an error (here is a snippet): SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.so lr.handler.component.CollapseComponent' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader. java:373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:422) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:444) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1499) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1493) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1526) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:824) ... I can see CollapseComponent.class in apache-solr-1.5-dev.war inside apache-solr-core-1.5-dev.jar\org\apache\solr\handler\component, so it seems to be finding and building the java file ok. any ideas? -- View this message in context: http://old.nabble.com/need-help-with-feb-3-2010-trunk-and-solr-236-tp27446001p27446001.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help with feb 3/2010 trunk and solr-236
gdeconto wrote: I got latest trunk (feb3/2010) using svn and applied solr-236. I tried the latest patch with Solr trunk yesterday, no problems there. did an ant clean and it seems to build fine with no errors or warnings. Did you ant example? I think ant clean will delete everything... Koji -- http://www.rondhuit.com/en/
Using solr to store data
Hi all, I work on search at Scoopler.com, a real-time search engine which uses Solr. We current use solr for indexing but then fetch data from our couchdb cluster using the IDs solr returns. We are now considering storing a larger portion of data in Solr's index itself so we don't have to hit the DB too. Assuming that we are still storing data on the db (for backend and back up purposes) are there any significant disadvantages to using solr as a data store too? We currently run a master-slave setup on EC2 using x-large slave instances to allow for the disk cache to use as much memory as possible. I imagine we would definitely have to add more slave instances to accomodate the extra data we're storing (and make sure it stays in memory). Any tips would be really helpful. -- AJ Asver Co-founder, Scoopler.com +44 (0) 7834 609830 / +1 (415) 670 9152 a...@scoopler.com Follow me on Twitter: http://www.twitter.com/_aj Add me on Linkedin: http://www.linkedin.com/in/ajasver or YouNoodle: http://younoodle.com/people/ajmal_asver My Blog: http://ajasver.com
Re: Solr response extremely slow
On Feb 3, 2010, at 1:38 PM, Rajat Garg wrote: Solr Specification Version: 1.3.0 Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47 There's the problem right there... that grantingersoll guy :) (kidding) Sounds like you're just hitting cache warming which can take a while. Have you tried Solr 1.4? Faceting performance, for example, is dramatically improved, among many other improvements. Erik
Re: Solr usage with Auctions/Classifieds?
This field type allows you to have an external file that gives a float value for a field. You can only use functions on it. On Sat, Jan 30, 2010 at 7:05 AM, Jan Høydahl / Cominvent jan@cominvent.com wrote: A follow-up on the auction use case. How do you handle the need for frequent updates of only one field, such as the last bid field (needed for sort on price, facets or range)? For high traffic sites, the document update rate becomes very high if you re-send the whole document every time the bid price changes. -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 10. des. 2009, at 19.52, Grant Ingersoll wrote: On Dec 8, 2009, at 6:37 PM, regany wrote: hello! just wondering if anyone is using Solr as their search for an auction / classified site, and if so how have you managed your setup in general? ie. searching against listings that may have expired etc. I know several companies using Solr for classifieds/auctions. Some remove the old listings while others leave them in and filter them or even allow users to see old stuff (but often for reasons other than users finding them, i.e. SEO). For those that remove, it's typically a batch operation that takes place at night. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Lance Norskog goks...@gmail.com
Solr 1.4: Full import FileNotFoundException
Hello, I have noticed that when I run concurrent full-imports using DIH in Solr 1.4, the index ends up getting corrupted. I see the following in the log files (a snippet): record date2010-02-03T17:54:24/date millis1265248464553/millis sequence764/sequence loggerorg.apache.solr.handler.dataimport.SolrWriter/logger levelSEVERE/level classorg.apache.solr.handler.dataimport.SolrWriter/class methodcommit/method thread25/thread messageException while solr commit./message exception messagejava.io.FileNotFoundException: /solrserver/apache-solr-1.3.0/exampl e/multicore/RET/data/index/_5.cfs (No such file or directory)/message frame classjava.io.RandomAccessFile/class methodopen/method /frame frame classjava.io.RandomAccessFile/class methodlt;initgt;/method line212/line /frame frame classorg.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor/class methodlt;initgt;/method line552/line /frame frame classorg.apache.lucene.store.FSDirectory$FSIndexInput/class methodlt;initgt;/method line582/line /frame frame classorg.apache.lucene.store.FSDirectory/class methodopenInput/method line488/line /frame frame classorg.apache.lucene.index.CompoundFileReader/class methodlt;initgt;/method line70/line /frame frame classorg.apache.lucene.index.SegmentReader/class methodinitialize/method line319/line /frame frame classorg.apache.lucene.index.SegmentReader/class methodget/method line304/line /frame frame classorg.apache.lucene.index.SegmentReader/class methodget/method line234/line /frame frame classorg.apache.solr.handler.dataimport.DataImporter$1/class methodrun/method line377/line /frame /exception /record Could this be because the concurrent full-imports are stepping on each other's toes? It seems like one full-import request ends up deleting another's segment files. Is there a way to avoid this? Perhaps a config option? I would like to retain the flexibility to issue concurrent full-import requests. I found some documentation on this issue at: http://old.nabble.com/FileNotFoundException-on-index-td25717530.html But I looked at: http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html and was under the impression that this issue was fixed in Solr 1.4. Kindly advise. Ranjit. -- View this message in context: http://old.nabble.com/Solr-1.4%3A-Full-import-FileNotFoundException-tp27446982p27446982.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: weird behabiour when setting negative boost with bq using dismax
In the standard query parser, this means remove all entries in which field_a = 54. bq=-field_a:54^1 Generally speaking, by convention boosts in Lucene have unity at 1.0, not 0.0. So, a negative boost is usually done with boosts between 0 and 1. For this case, maybe a boost of 0.1 is what you want? On Mon, Feb 1, 2010 at 8:04 AM, Marc Sturlese marc.sturl...@gmail.com wrote: I already asked about this long ago but the answer doesn't seem to work... I am trying to set a negative query boost to send the results that match field_a: 54 to a lower position. I have tried it in 2 different ways: bq=(*:* -field_a:54^1) bq=-field_a:54^1 None of them seem to work. What seems to happen is that results that match field_a:54 are excluded. Just like doing: fq=-field_a:54 Any idea what could be happening? Has anyone experienced this behaviour before? Thnaks in advance -- View this message in context: http://old.nabble.com/weird-behabiour-when-setting-negative-boost-with-bq-using-dismax-tp27406614p27406614.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: ClassCastException setting date.formats in ExtractingRequestHandler
Please file a bug for this in the JIRA: http://issues.apache.org/jira/browse/SOLR Please add all details. Thanks! -- Forwarded message -- From: Christoph Brill christoph.br...@chamaeleon.de Date: Tue, Feb 2, 2010 at 4:11 AM Subject: ClassCastException setting date.formats in ExtractingRequestHandler To: solr-user@lucene.apache.org Hi list, I tried to add the following to my solrconfig.xml (to the 'requestHandler name=/update/extract ...' block) lst name=date.formats str-MM-dd/str /lst which is described on the wiki page of the ExtractingRequestHandler[1]. After doing so I always get a ClassCastException once the lazy init of the handler is happening. This is a stock solr 1.4 with no modifications. The exception is like this: org.apache.solr.common.util.NamedList$1$1 cannot be cast to java.lang.String Is this a known bug? Or a I doing something wrong? Thanks in advance, Chris [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Configuration -- Lance Norskog goks...@gmail.com
Re: Search wihthout diacritics
You need to add AsciiFoldingFilter to the query path as well as the indexing path. The solr/admin/analysis.jsp page lets you explore how these analysis stacks work. On Tue, Feb 2, 2010 at 5:53 PM, Olala hthie...@gmail.com wrote: Hi all! I have problem with Solr, and I hope everyboby in there can help me :) I want to search text without diacritic but Solr will response diacritic text and without diacritic text. For example, I query solr index, it will response solr index, sôlr index, sòlr index, sólr indèx,... I was tried ASCIIFoldingFilter and ISOLatin1AccentFilterFactory but it is not correct :( My schema config: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType -- View this message in context: http://old.nabble.com/Search-wihthout-diacritics-tp27430345p27430345.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Using solr to store data
If you're happy with disk sizes and indexingsearch performance, there are still holes: Documents update instead of fields, so when you have a million documents that say German and should say French, you have to reindex a million documents. There are no tools for managing distributed indexes, so you're on your own. Distributed TF/IDF is coming, but will never be perfect. So managing your own distributed relevance strategies is a must. On Wed, Feb 3, 2010 at 5:41 PM, AJ Asver a...@scoopler.com wrote: Hi all, I work on search at Scoopler.com, a real-time search engine which uses Solr. We current use solr for indexing but then fetch data from our couchdb cluster using the IDs solr returns. We are now considering storing a larger portion of data in Solr's index itself so we don't have to hit the DB too. Assuming that we are still storing data on the db (for backend and back up purposes) are there any significant disadvantages to using solr as a data store too? We currently run a master-slave setup on EC2 using x-large slave instances to allow for the disk cache to use as much memory as possible. I imagine we would definitely have to add more slave instances to accomodate the extra data we're storing (and make sure it stays in memory). Any tips would be really helpful. -- AJ Asver Co-founder, Scoopler.com +44 (0) 7834 609830 / +1 (415) 670 9152 a...@scoopler.com Follow me on Twitter: http://www.twitter.com/_aj Add me on Linkedin: http://www.linkedin.com/in/ajasver or YouNoodle: http://younoodle.com/people/ajmal_asver My Blog: http://ajasver.com -- Lance Norskog goks...@gmail.com
Re: Solr response extremely slow
Is it possible that the virtual machine does not give clean system millisecond numbers? On Wed, Feb 3, 2010 at 5:43 PM, Erik Hatcher erik.hatc...@gmail.com wrote: On Feb 3, 2010, at 1:38 PM, Rajat Garg wrote: Solr Specification Version: 1.3.0 Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47 There's the problem right there... that grantingersoll guy :) (kidding) Sounds like you're just hitting cache warming which can take a while. Have you tried Solr 1.4? Faceting performance, for example, is dramatically improved, among many other improvements. Erik -- Lance Norskog goks...@gmail.com
Re: java.lang.NullPointerException with MySQL DataImportHandler
I just tested this with a DIH that does not use database input. If the DataImportHandler JDBC code does not support a schema that has optional fields, that is a major weakness. Noble/Shalin, is this true? On Tue, Feb 2, 2010 at 8:50 AM, Sascha Szott sz...@zib.de wrote: Hi, since some of the fields used in your DIH configuration aren't mandatory (e.g., keywords and tags are defined as nullable in your db table schema), add a default value to all optional fields in your schema configuration (e.g., default = ). Note, that Solr does not understand the db-related concept of null values. Solr's log output SolrInputDocument[{keywords=keywords(1.0)={Dolce}, name=name(1.0)={Dolce amp; Gabbana Damp;G Neckties designer Tie for men 543}, productID=productID(1.0)={220213}}] indicates that there aren't any tags or descriptions stored for the item with productId 220213. Since no default value is specified, Solr raises an error when creating the index document. -Sascha Jean-Michel Philippon-Nadeau wrote: Hi, Thanks for the reply. On Tue, 2010-02-02 at 16:57 +0100, Sascha Szott wrote: * the output of MySQL's describe command for all tables/views referenced in your DIH configuration mysql describe products; ++--+--+-+-++ | Field | Type | Null | Key | Default | Extra | ++--+--+-+-++ | productID | int(10) unsigned | NO | PRI | NULL | auto_increment | | skuCode | varchar(320) | YES | MUL | NULL | | | upcCode | varchar(320) | YES | MUL | NULL | | | name | varchar(320) | NO | | NULL | | | description | text | NO | | NULL | | | keywords | text | YES | | NULL | | | disqusThreadID | varchar(50) | NO | | NULL | | | tags | text | YES | | NULL | | | createdOn | int(10) unsigned | NO | | NULL | | | lastUpdated | int(10) unsigned | NO | | NULL | | | imageURL | varchar(320) | YES | | NULL | | | inStock | tinyint(1) | YES | MUL | 1 | | | active | tinyint(1) | YES | | 1 | | ++--+--+-+-++ 13 rows in set (0.00 sec) mysql describe product_soldby_vendor; +-+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-+---+ | productID | int(10) unsigned | NO | MUL | NULL | | | productVendorID | int(10) unsigned | NO | MUL | NULL | | | price | double | NO | | NULL | | | currency | varchar(5) | NO | | NULL | | | buyURL | varchar(320) | NO | | NULL | | +-+--+--+-+-+---+ 5 rows in set (0.00 sec) mysql describe products_vendors_subcategories; ++--+--+-+-++ | Field | Type | Null | Key | Default | Extra | ++--+--+-+-++ | productVendorSubcategoryID | int(10) unsigned | NO | PRI | NULL | auto_increment | | productVendorCategoryID | int(10) unsigned | NO | | NULL | | | labelEnglish | varchar(320) | NO | | NULL | | | labelFrench | varchar(320) | NO | | NULL | | ++--+--+-+-++ 4 rows in set (0.00 sec) mysql describe products_vendors_categories; +-+--+--+-+-++ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-++ | productVendorCategoryID | int(10) unsigned | NO | PRI | NULL | auto_increment | | labelEnglish | varchar(320) | NO | | NULL | | | labelFrench | varchar(320) | NO | | NULL | | +-+--+--+-+-++ 3 rows in set (0.00 sec) mysql describe product_vendor_in_subcategory; +---+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +---+--+--+-+-+---+ | productVendorID | int(10) unsigned | NO | MUL | NULL | | | productCategoryID | int(10) unsigned | NO | MUL | NULL | |
Re: Using solr to store data
Hey AJ, For simplicity sake, I am using Solr to serve as storage and search for http://researchwatch.net. The dataset is 110K NSF grants from 1999 to 2009. The faceting is all dynamic fields and I use a catch all to copy all fields to a default text field. All fields are also stored and used for individual grant view. The performance seems fine for my purposes. I haven't done any extensive benchmarking with it. The site was built using a light ROR/rsolr layer on a small EC2 instance. Feel free to bang against the site with jmeter if you want to stress test a sample server to failure. :) -- Tommy Chheng Developer UC Irvine Graduate Student http://tommy.chheng.com On 2/3/10 5:41 PM, AJ Asver wrote: Hi all, I work on search at Scoopler.com, a real-time search engine which uses Solr. We current use solr for indexing but then fetch data from our couchdb cluster using the IDs solr returns. We are now considering storing a larger portion of data in Solr's index itself so we don't have to hit the DB too. Assuming that we are still storing data on the db (for backend and back up purposes) are there any significant disadvantages to using solr as a data store too? We currently run a master-slave setup on EC2 using x-large slave instances to allow for the disk cache to use as much memory as possible. I imagine we would definitely have to add more slave instances to accomodate the extra data we're storing (and make sure it stays in memory). Any tips would be really helpful. -- AJ Asver Co-founder, Scoopler.com +44 (0) 7834 609830 / +1 (415) 670 9152 a...@scoopler.com Follow me on Twitter: http://www.twitter.com/_aj Add me on Linkedin: http://www.linkedin.com/in/ajasver or YouNoodle: http://younoodle.com/people/ajmal_asver My Blog: http://ajasver.com
Re: query all filled field?
Queries that start with minus or NOT don't work. You have to do this: *:* AND -fieldX:[* TO *] On Wed, Feb 3, 2010 at 5:04 AM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Hum, strange.. I reindexed some docs with the field corrected. Now I'm sure the field is filled because: fieldX:(*a*) returns docs. But fieldX:[* TO *] is returning the same as *.* (all results) I tried with -fieldX:[* TO *] and I get no results at all. I wonder if someone has tried this before with success? The field is indexed as string, indexed=true and stored=true. Thanks, Frederico -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: quarta-feira, 3 de Fevereiro de 2010 11:48 To: solr-user@lucene.apache.org Subject: Re: query all filled field? Is it possible to query some field in order to get only not empty documents? All documents where field x is filled? Yes. q=x:[* TO *] will bring documents that has non-empty x field. -- Lance Norskog goks...@gmail.com
Re: The Riddle of the Underscore and the Dollar Sign
Please reframe how you give the various fields and tests - i'ts hard to follow in this email. On Wed, Feb 3, 2010 at 12:50 PM, Christopher Ball christopher.b...@metaheuristica.com wrote: I am perplexed by the behavior I am seeing of the Solr Analyzer and Filters with regard to Underscores. 1) I am trying to get rid of them when shingling, but seem unable to do so with a Stopwords Filter. And yet they are being removed when I am not even trying to by the WordDelimiter Filter. 2) Conversely, I would like to retain '$' symbols when they adjacent to numbers, but seem unable to without having to accept all forms of other syntax. My simple example configuration and test data and results are below. Most grateful for any guidance, Christopher Test Data: doc field name=idStopWordTestData/field field name=conSubSec-text_dcPreShingled ThisIsNotAStopWord ThisIsAStopWord ThisIsAlsoAStopWord beforeaperiod. beforeacomma, beforeacollan: under_Score don't Peter's s $1.00 $1 $1,000 $200 $3,000,000 $3m - # -#- --#-- Yes X No _ __ ___ a and also about/field /doc Field 1 - Delimited_text: Index Analyzer: org.apache.solr.analysis.TokenizerChain Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: 1. org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 0 catenateWords: 1 generateWordParts: 0 catenateAll: 1 catenateNumbers: 1 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} Field 1 - Resulting Index Terms: Term # 100 2 1000 2 200 2 3 2 300 2 3m 2 a 2 about 2 also 2 and 2 beforeacollan 2 beforeacomma 2 beforeaperiod 2 dont 2 m 2 no 2 peter 2 preshingled 2 s 2 thisisalsoastopword 2 thisisastopword 2 thisisnotastopword 2 underscore 2 x 2 yes 2 1 2 Field2 - Shingled_Text: Index Analyzer: org.apache.solr.analysis.TokenizerChain Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: 2. 1. org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 0 catenateWords: 1 stemEnglishPossessive: 0 generateWordParts: 0 catenateAll: 0 catenateNumbers: 1 } 3. 2. org.apache.solr.analysis.StopFilterFactory args:{words: StopWords-PreShingled.txt ignoreCase: true enablePositionIncrements: true } 4. 3. org.apache.solr.analysis.LowerCaseFilterFactory args:{} 5. 4. org.apache.solr.analysis.ShingleFilterFactory args:{outputUnigrams: false maxShingleSize: 5 } File: StopWords-PreShingled.txt s _ PreShingled __ ThisIsAStopWord ThisIsAlsoAStopWord Field2 - Resulting Index Terms (Sample): Term # _ 100 1 _ 100 1 1000 1 _ _ 1 _ _ beforeaperiod beforeacomma 1 _ beforeaperiod 1 _ beforeaperiod beforeacomma beforeacollan 1 _ thisisnotastopword 1 _ thisisnotastopword _ _ 1 -- Lance Norskog goks...@gmail.com
Re: java.lang.NullPointerException with MySQL DataImportHandler
On Thu, Feb 4, 2010 at 10:50 AM, Lance Norskog goks...@gmail.com wrote: I just tested this with a DIH that does not use database input. If the DataImportHandler JDBC code does not support a schema that has optional fields, that is a major weakness. Noble/Shalin, is this true? The problem is obviously not with DIH. DIH blindly passes on all the fields it could obtain from the DB. if some field is missing DIH does not do anything On Tue, Feb 2, 2010 at 8:50 AM, Sascha Szott sz...@zib.de wrote: Hi, since some of the fields used in your DIH configuration aren't mandatory (e.g., keywords and tags are defined as nullable in your db table schema), add a default value to all optional fields in your schema configuration (e.g., default = ). Note, that Solr does not understand the db-related concept of null values. Solr's log output SolrInputDocument[{keywords=keywords(1.0)={Dolce}, name=name(1.0)={Dolce amp; Gabbana Damp;G Neckties designer Tie for men 543}, productID=productID(1.0)={220213}}] indicates that there aren't any tags or descriptions stored for the item with productId 220213. Since no default value is specified, Solr raises an error when creating the index document. -Sascha Jean-Michel Philippon-Nadeau wrote: Hi, Thanks for the reply. On Tue, 2010-02-02 at 16:57 +0100, Sascha Szott wrote: * the output of MySQL's describe command for all tables/views referenced in your DIH configuration mysql describe products; ++--+--+-+-++ | Field | Type | Null | Key | Default | Extra | ++--+--+-+-++ | productID | int(10) unsigned | NO | PRI | NULL | auto_increment | | skuCode | varchar(320) | YES | MUL | NULL | | | upcCode | varchar(320) | YES | MUL | NULL | | | name | varchar(320) | NO | | NULL | | | description | text | NO | | NULL | | | keywords | text | YES | | NULL | | | disqusThreadID | varchar(50) | NO | | NULL | | | tags | text | YES | | NULL | | | createdOn | int(10) unsigned | NO | | NULL | | | lastUpdated | int(10) unsigned | NO | | NULL | | | imageURL | varchar(320) | YES | | NULL | | | inStock | tinyint(1) | YES | MUL | 1 | | | active | tinyint(1) | YES | | 1 | | ++--+--+-+-++ 13 rows in set (0.00 sec) mysql describe product_soldby_vendor; +-+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-+---+ | productID | int(10) unsigned | NO | MUL | NULL | | | productVendorID | int(10) unsigned | NO | MUL | NULL | | | price | double | NO | | NULL | | | currency | varchar(5) | NO | | NULL | | | buyURL | varchar(320) | NO | | NULL | | +-+--+--+-+-+---+ 5 rows in set (0.00 sec) mysql describe products_vendors_subcategories; ++--+--+-+-++ | Field | Type | Null | Key | Default | Extra | ++--+--+-+-++ | productVendorSubcategoryID | int(10) unsigned | NO | PRI | NULL | auto_increment | | productVendorCategoryID | int(10) unsigned | NO | | NULL | | | labelEnglish | varchar(320) | NO | | NULL | | | labelFrench | varchar(320) | NO | | NULL | | ++--+--+-+-++ 4 rows in set (0.00 sec) mysql describe products_vendors_categories; +-+--+--+-+-++ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-++ | productVendorCategoryID | int(10) unsigned | NO | PRI | NULL | auto_increment | | labelEnglish | varchar(320) | NO | | NULL | | | labelFrench | varchar(320) | NO | | NULL | | +-+--+--+-+-++ 3 rows in set (0.00 sec) mysql describe product_vendor_in_subcategory; +---+--+--+-+-+---+ | Field | Type | Null | Key | Default |