Re: Rules engine and Solr
Thanks for the revert, Ravi. I am currently working on some of kind rules in front (application side) of our solr instance. These rules are more application specific and are not general. Like deciding which fields to facet, which fields to return in response, which fields to highlight, boost value for each field (both at query time and at index time). The approach I have taken is to define a database table which holds these fields parameters. Which are then interpreted by my application to decide the query to be sent to Solr. This allow tweaking the Solr fields on the fly and hence influence the search results. I guess, this is the usual usage of solr server. In my case this is no different. Search queries have a personalized experience, which means behaviors for facets, highlighting etc .. are customizable. We pull it off using databases and java data structures. I will be interested to hear from you about the Kind of rules you talk about and your approach towards it. Are these Rules like a regular expression that when matched with the user query, execute a specific solr query ? http://en.wikipedia.org/wiki/Business_rules_engine Cheers Avlesh On Wed, Jan 6, 2010 at 12:12 PM, Ravi Gidwani ravi.gidw...@gmail.comwrote: Avlesh: I am currently working on some of kind rules in front (application side) of our solr instance. These rules are more application specific and are not general. Like deciding which fields to facet, which fields to return in response, which fields to highlight, boost value for each field (both at query time and at index time). The approach I have taken is to define a database table which holds these fields parameters. Which are then interpreted by my application to decide the query to be sent to Solr. This allow tweaking the Solr fields on the fly and hence influence the search results. I will be interested to hear from you about the Kind of rules you talk about and your approach towards it. Are these Rules like a regular expression that when matched with the user query, execute a specific solr query ? ~Ravi On Tue, Jan 5, 2010 at 8:25 PM, Avlesh Singh avl...@gmail.com wrote: Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 Hahaha, thats classic Hoss! Thanks for introducing me to the XY problem. Had I known the two completely, I wouldn't have posted it on the mailing list. And I wasn't looking for a solution either. Anyways, as I replied back earlier, I'll get back with questions once I get more clarity. Cheers Avlesh On Wed, Jan 6, 2010 at 2:02 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I am planning to build a rules engine on top search. The rules are database : driven and can't be stored inside solr indexes. These rules would ultimately : two do things - : :1. Change the order of Lucene hits. :2. Add/remove some results to/from the Lucene hits. : : What should be my starting point? Custom search handler? This smells like an XY problem ... can you elaborate on the types of rules/conditions/situations when you want #1 and #2 listed above to happen? http://people.apache.org/~hossman/#xyproblemhttp://people.apache.org/%7Ehossman/#xyproblem http://people.apache.org/%7Ehossman/#xyproblem http://people.apache.org/%7Ehossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
readOnly=true IndexReader
In the Wiki page : http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, I've found -Open the IndexReader with readOnly=true. This makes a big difference when multiple threads are sharing the same reader, as it removes certain sources of thread contention. How to open the IndexReader with readOnly=true ? I can't find anything related to this parameter. Do the VJM parameters -Dslave=disabled or -Dmaster=disabled have any incidence on solr with a standart solrConfig.xml? Thank you for your answers. Patrick.
Re: readOnly=true IndexReader
On Wed, Jan 6, 2010 at 4:26 PM, Patrick Sauts patrick.via...@gmail.comwrote: In the Wiki page : http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, I've found -Open the IndexReader with readOnly=true. This makes a big difference when multiple threads are sharing the same reader, as it removes certain sources of thread contention. How to open the IndexReader with readOnly=true ? I can't find anything related to this parameter. Solr always opens IndexReader with readOnly=true. It was added with SOLR-730 and released in Solr 1.3 -- Regards, Shalin Shekhar Mangar.
Re: readOnly=true IndexReader
On Wed, Jan 6, 2010 at 4:26 PM, Patrick Sauts patrick.via...@gmail.com wrote: In the Wiki page : http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, I've found -Open the IndexReader with readOnly=true. This makes a big difference when multiple threads are sharing the same reader, as it removes certain sources of thread contention. How to open the IndexReader with readOnly=true ? I can't find anything related to this parameter. Do the VJM parameters -Dslave=disabled or -Dmaster=disabled have any incidence on solr with a standart solrConfig.xml? these are not variables used by Solr. These are just substituted in solrconfig.xml and probably consumed by ReplicationHandler (this is not a standard) Thank you for your answers. Patrick. -- - Noble Paul | Systems Architect| AOL | http://aol.com
schema.xml and Xinclude
As types/ in schema.xml are the same between all our indexes, I'd like to make them an XInclude so I tried : ?xml version=1.0 encoding=UTF-8? schema name=example version=1.2 xmlns:xi=http://www.w3.org/2001/XInclude; xi:include href=solr-types.xml/ fields - - - /schema My Syntax might not be correct ? Or it is not possible ? yet ? Thank you again for your time. Patrick.
Yankee's Solr integration
Hello everybody, I was wordering how did Yankee ( http://www.yankeegroup.com/search.do?searchType=advancedSearch) did to provide the possibility to Create Alerts, Save Searches, and generate a RSS Feed out of a custom search using Solr, do you have any idea ? Thanks a lot, Best regards happy new year ! Nicolas
Re: Basic sentence parsing with the regex highlighter fragmenter
Hmmm, the name WordDelimiterFilterFactory might be leading you astray. Its purpose isn't to break things up into words that have anything to do with grammatical rules. Rather, it's purpose is to break up strings of funky characters into searchable stuff. see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory In the grammatical sense, PowerShot should just be PowerShot, not power shot (which is what WordDelimiterFactory gives you, options permitting). So I think you probably want one of the other analyzers Have you tried any other analyzers? StandardAnalyzer might be more friendly HTH Erick On Tue, Jan 5, 2010 at 5:18 PM, Caleb Land caleb.l...@gmail.com wrote: I've tracked this problem down to the fact that I'm using the WordDelimiterFilter. I don't quite understand what's happening, but if I add preserveOriginal=1 as an option, everything looks fine. I think it has to do with the period being stripped in the token stream. On Tue, Jan 5, 2010 at 2:05 PM, Caleb Land caleb.l...@gmail.com wrote: Hello, I'm using Solr 1.4, and I'm trying to get the regex fragmenter to parse basic sentences, and I'm running into a problem. I'm using the default regex specified in the example solr configuration: [-\w ,/\n\']{20,200} But I am using a larger fragment size (140) with a slop of 1.0. Given the passage: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla a neque a ipsum accumsan iaculis at id lacus. Sed magna velit, aliquam ut congue vitae, molestie quis nunc. When I search for Nulla (the first word of the second sentence) and grab the first highlighted snippet, this is what I get: . emNulla/em a neque a ipsum accumsan iaculis at id lacus As you can see, there's a leading period from the previous sentence and the period from the current sentence is missing. I understand this regex isn't that advanced, but I've tried everything I can think of, regex-wise, to get this to work, and I always end up with this problem. For example, I've tried: \w[^.!?]{0,200}[.!?] Which seems like it should include the ending punctuation, but it doesn't, so I think I'm missing something. Does anybody know a regex that works? -- Caleb Land -- Caleb Land
Re: performance question
Strictly speaking there is some insignificant distinctions in performance related to how a field name is resolved -- Grant alluded to this earlier in this thread -- but it only comes into play when you actually refer to that field by name and Solr has to look them up in the metadata. So for example if your request refered to 100 differnet field names in the q, fq, and facet.field params there would be a small overhead for any of those 100 fields that existed because of dynamicField/ declarations, that would not exist for any of those fields that were declared using field/ -- but there would be no added overhead to htat query if there were 999 other fields that existed in your index because of that same dynamicField/ declaration. But frankly: we're getting talking about seriously ridiculous pico-optimizing at this point ... if you find yourselv with performance concerns there are probaly 500 other things worth worrying about before this should ever cross your mind. Thanks for the follow up. I've converted our schema to required fields only with every other field being a dynamic field. The only negative that I've found so far is that you lose the copyField capability, so it makes my ingest a little bigger, since I have to manually copy the values myself. -- A. Steven Anderson Independent Consultant st...@asanderson.com
Re: performance question
You don't lose copyField capability with dynamic fields. You can copy dynamic fields into a fixed field name like *_s = text or dynamic fields into another dynamic field like *_s = *_t Erik On Jan 6, 2010, at 9:35 AM, A. Steven Anderson wrote: Strictly speaking there is some insignificant distinctions in performance related to how a field name is resolved -- Grant alluded to this earlier in this thread -- but it only comes into play when you actually refer to that field by name and Solr has to look them up in the metadata. So for example if your request refered to 100 differnet field names in the q, fq, and facet.field params there would be a small overhead for any of those 100 fields that existed because of dynamicField/ declarations, that would not exist for any of those fields that were declared using field/ -- but there would be no added overhead to htat query if there were 999 other fields that existed in your index because of that same dynamicField/ declaration. But frankly: we're getting talking about seriously ridiculous pico-optimizing at this point ... if you find yourselv with performance concerns there are probaly 500 other things worth worrying about before this should ever cross your mind. Thanks for the follow up. I've converted our schema to required fields only with every other field being a dynamic field. The only negative that I've found so far is that you lose the copyField capability, so it makes my ingest a little bigger, since I have to manually copy the values myself. -- A. Steven Anderson Independent Consultant st...@asanderson.com
ord on TrieDateField always returning max
Hi everyone, I've been trying to add a date based boost to my queries. I have a field like: fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ field name=datetime type=tdate indexed=true stored=true required=true / When I look at the datetime field in the solr schema browser I can see that there are 9051 distinct dates. When I try to add the parameter to my query like: bf=ord(datetime) (on a dismax query) I always get 9051 as the result of the function. I see this in the debug data: 1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of: 9051.0 = 9051 1.0 = boost 0.18767032 = queryNorm It is exactly the same for every result, even though each result has a different value for datetime. Does anyone have any suggestions as to why this could be happening? I have done extensive googling with no luck. Thanks, Kallin Nagelberg.
replication -- missing field data file
I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pretty much every operation (core reload, commit, add document) fails with an error like the one posted below. How could this happen? How can one recover from such an error? Is there any way to regenerate the FDT file without re-indexing everything? This brings me to a question about backups. If I run the replication?command=backup command, where is this backup stored? I've tried this a few times and get an OK response from the machine, but I don't see the backup generated anywhere. Thanks, Gio. org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409) ... 18 more Caused by: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.lt;initgt;(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65) at org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599) at org.apache.lucene.index.DirectoryReader.lt;initgt;(DirectoryReader.java:103) at org.apache.lucene.index.ReadOnlyDirectoryReader.lt;initgt;(ReadOnlyDirectoryReader.java:27) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:68) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at org.apache.lucene.index.IndexReader.open(IndexReader.java:403) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(So
Re: ord on TrieDateField always returning max
Besides using up a lot more memory, ord() isn't even going to work for a field with multiple tokens indexed per value (like tdate). I'd recommend using a function on the date value itself. http://wiki.apache.org/solr/FunctionQuery#ms -Yonik http://www.lucidimagination.com On Wed, Jan 6, 2010 at 10:52 AM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: Hi everyone, I've been trying to add a date based boost to my queries. I have a field like: fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ field name=datetime type=tdate indexed=true stored=true required=true / When I look at the datetime field in the solr schema browser I can see that there are 9051 distinct dates. When I try to add the parameter to my query like: bf=ord(datetime) (on a dismax query) I always get 9051 as the result of the function. I see this in the debug data: 1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of: 9051.0 = 9051 1.0 = boost 0.18767032 = queryNorm It is exactly the same for every result, even though each result has a different value for datetime. Does anyone have any suggestions as to why this could be happening? I have done extensive googling with no luck. Thanks, Kallin Nagelberg.
RE: ord on TrieDateField always returning max
Thanks Yonik, I was just looking at that actually. Trying something like recip(ms(NOW,datetime),3.16e-11,1,1)^10 now. My 'inspiration' for the ord method was actually the Solr 1.4 Enterprise Search server book. Page 126 has a section 'using reciprocals and rord with dates'. You should let those guys know what's up! Thanks, Kallin. -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, January 06, 2010 11:23 AM To: solr-user@lucene.apache.org Subject: Re: ord on TrieDateField always returning max Besides using up a lot more memory, ord() isn't even going to work for a field with multiple tokens indexed per value (like tdate). I'd recommend using a function on the date value itself. http://wiki.apache.org/solr/FunctionQuery#ms -Yonik http://www.lucidimagination.com On Wed, Jan 6, 2010 at 10:52 AM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: Hi everyone, I've been trying to add a date based boost to my queries. I have a field like: fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ field name=datetime type=tdate indexed=true stored=true required=true / When I look at the datetime field in the solr schema browser I can see that there are 9051 distinct dates. When I try to add the parameter to my query like: bf=ord(datetime) (on a dismax query) I always get 9051 as the result of the function. I see this in the debug data: 1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of: 9051.0 = 9051 1.0 = boost 0.18767032 = queryNorm It is exactly the same for every result, even though each result has a different value for datetime. Does anyone have any suggestions as to why this could be happening? I have done extensive googling with no luck. Thanks, Kallin Nagelberg.
Re: ord on TrieDateField always returning max
On Wed, Jan 6, 2010 at 11:26 AM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: Thanks Yonik, I was just looking at that actually. Trying something like recip(ms(NOW,datetime),3.16e-11,1,1)^10 now. I'd also recommend looking into a multiplicative boost too - IMO they normally make more sense. http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents -Yonik http://www.lucidimagination.com My 'inspiration' for the ord method was actually the Solr 1.4 Enterprise Search server book. Page 126 has a section 'using reciprocals and rord with dates'. You should let those guys know what's up! Thanks, Kallin. -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, January 06, 2010 11:23 AM To: solr-user@lucene.apache.org Subject: Re: ord on TrieDateField always returning max Besides using up a lot more memory, ord() isn't even going to work for a field with multiple tokens indexed per value (like tdate). I'd recommend using a function on the date value itself. http://wiki.apache.org/solr/FunctionQuery#ms -Yonik http://www.lucidimagination.com On Wed, Jan 6, 2010 at 10:52 AM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: Hi everyone, I've been trying to add a date based boost to my queries. I have a field like: fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ field name=datetime type=tdate indexed=true stored=true required=true / When I look at the datetime field in the solr schema browser I can see that there are 9051 distinct dates. When I try to add the parameter to my query like: bf=ord(datetime) (on a dismax query) I always get 9051 as the result of the function. I see this in the debug data: 1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of: 9051.0 = 9051 1.0 = boost 0.18767032 = queryNorm It is exactly the same for every result, even though each result has a different value for datetime. Does anyone have any suggestions as to why this could be happening? I have done extensive googling with no luck. Thanks, Kallin Nagelberg.
Re: performance question
You don't lose copyField capability with dynamic fields. You can copy dynamic fields into a fixed field name like *_s = text or dynamic fields into another dynamic field like *_s = *_t Ahhh...I missed that little detail. Nice! Ok, so there are no negatives to using dynamic fields then. ;-) Thanks for all the info! -- A. Steven Anderson Independent Consultant st...@asanderson.com
Re: replication -- missing field data file
On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pretty much every operation (core reload, commit, add document) fails with an error like the one posted below. How could this happen? How can one recover from such an error? Is there any way to regenerate the FDT file without re-indexing everything? This brings me to a question about backups. If I run the replication?command=backup command, where is this backup stored? I've tried this a few times and get an OK response from the machine, but I don't see the backup generated anywhere. The backup is done asynchronously. So it always gives an OK response immedietly. The backup is created in the data dir itself Thanks, Gio. org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409) ... 18 more Caused by: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.lt;initgt;(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65) at org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599) at org.apache.lucene.index.DirectoryReader.lt;initgt;(DirectoryReader.java:103) at org.apache.lucene.index.ReadOnlyDirectoryReader.lt;initgt;(ReadOnlyDirectoryReader.java:27) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:68) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at
solr and patch - SOLR-64 SOLR-792
hi, I tried to apply patches to solr-1.4 Here is the result javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0 SOLR-64.patch patching file src/java/org/apache/solr/schema/HierarchicalFacetField.java patching file src/common/org/apache/solr/common/params/FacetParams.java Hunk #1 FAILED at 108. 1 out of 1 hunk FAILED -- saving rejects to file src/common/org/apache/solr/common/params/FacetParams.java.rej patching file example/solr/conf/schema.xml Hunk #1 FAILED at 144. Hunk #2 FAILED at 417. 2 out of 2 hunks FAILED -- saving rejects to file example/solr/conf/schema.xml.rej patching file src/java/org/apache/solr/request/SimpleFacets.java Hunk #1 FAILED at 33. Hunk #2 FAILED at 227. Hunk #3 FAILED at 238. Hunk #4 FAILED at 484. Hunk #5 FAILED at 541. 5 out of 5 hunks FAILED -- saving rejects to file src/java/org/apache/solr/request/SimpleFacets.java.rej javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0 SOLR-792.patch can't find file to patch at input line 5 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |diff --git a/example/solr/conf/solrconfig.xml b/example/solr/conf/solrconfig.xml |index e2c0a48..1cd18bc 100755 |--- a/example/solr/conf/solrconfig.xml |+++ b/example/solr/conf/solrconfig.xml -- File to patch: It failed on windows xp too. What's wrong with what I'm doing? Thanks t.
Re: solr and patch - SOLR-64 SOLR-792
You probably aren't doing anything wrong, other than those patches are a bit out of date with trunk. You might have to fight through getting them current a bit, or wait until I or someone else can get to updating them. Erik On Jan 6, 2010, at 11:52 AM, Thibaut Lassalle wrote: hi, I tried to apply patches to solr-1.4 Here is the result javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0 SOLR-64.patch patching file src/java/org/apache/solr/schema/ HierarchicalFacetField.java patching file src/common/org/apache/solr/common/params/ FacetParams.java Hunk #1 FAILED at 108. 1 out of 1 hunk FAILED -- saving rejects to file src/common/org/apache/solr/common/params/FacetParams.java.rej patching file example/solr/conf/schema.xml Hunk #1 FAILED at 144. Hunk #2 FAILED at 417. 2 out of 2 hunks FAILED -- saving rejects to file example/solr/conf/schema.xml.rej patching file src/java/org/apache/solr/request/SimpleFacets.java Hunk #1 FAILED at 33. Hunk #2 FAILED at 227. Hunk #3 FAILED at 238. Hunk #4 FAILED at 484. Hunk #5 FAILED at 541. 5 out of 5 hunks FAILED -- saving rejects to file src/java/org/apache/solr/request/SimpleFacets.java.rej javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0 SOLR-792.patch can't find file to patch at input line 5 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |diff --git a/example/solr/conf/solrconfig.xml b/example/solr/conf/solrconfig.xml |index e2c0a48..1cd18bc 100755 |--- a/example/solr/conf/solrconfig.xml |+++ b/example/solr/conf/solrconfig.xml -- File to patch: It failed on windows xp too. What's wrong with what I'm doing? Thanks t.
RE: replication -- missing field data file
How can you differentiate between the backup and the normal index files? -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble Paul ??? ?? Sent: Wednesday, January 06, 2010 11:52 AM To: solr-user Subject: Re: replication -- missing field data file On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pretty much every operation (core reload, commit, add document) fails with an error like the one posted below. How could this happen? How can one recover from such an error? Is there any way to regenerate the FDT file without re-indexing everything? This brings me to a question about backups. If I run the replication?command=backup command, where is this backup stored? I've tried this a few times and get an OK response from the machine, but I don't see the backup generated anywhere. The backup is done asynchronously. So it always gives an OK response immedietly. The backup is created in the data dir itself Thanks, Gio. org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409) ... 18 more Caused by: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.lt;initgt;(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65) at org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599) at org.apache.lucene.index.DirectoryReader.lt;initgt;(DirectoryReader.java:103) at org.apache.lucene.index.ReadOnlyDirectoryReader.lt;initgt;(ReadOnlyDirectoryReader.java:27) at
Re: replication -- missing field data file
the index dir is in the name index others will be stored as indexdate-as-number On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: How can you differentiate between the backup and the normal index files? -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble Paul ??? ?? Sent: Wednesday, January 06, 2010 11:52 AM To: solr-user Subject: Re: replication -- missing field data file On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pretty much every operation (core reload, commit, add document) fails with an error like the one posted below. How could this happen? How can one recover from such an error? Is there any way to regenerate the FDT file without re-indexing everything? This brings me to a question about backups. If I run the replication?command=backup command, where is this backup stored? I've tried this a few times and get an OK response from the machine, but I don't see the backup generated anywhere. The backup is done asynchronously. So it always gives an OK response immedietly. The backup is created in the data dir itself Thanks, Gio. org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409) ... 18 more Caused by: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.lt;initgt;(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65) at org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599) at
Re: Solr Cell - PDFs plus literal metadata - GET or POST ?
On Tue, Jan 5, 2010 at 2:25 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: Really? Doesn't it have to be delimited differently, if both the file contents and the document metadata will be part of the POST data? How does Solr Cell tell the difference between the literals and the start of the file? I've tried this before and haven't had any luck with it. Thanks Shalin. And Giovanni, yes it definitely works. This will set literal.mydata to the contents of mydata.txt curl http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_fmap.content=attr_contentcommit=true; -F myfi...@tutorial.html -F literal.mydata=mydata.txt Unfortunately I could not get the UTF-8 encoding to work property. It's probably a curl or o/s configuration issue. I tried mydata.txt with and without BOM and I can do a more mydata.txt command and the special characters display correctly on my terminal set to UTF-8 but they get screwed up when indexed. I gave up in the end and went back to putting it urlencoded in the url. Ross -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Monday, January 04, 2010 4:28 AM To: solr-user@lucene.apache.org Subject: Re: Solr Cell - PDFs plus literal metadata - GET or POST ? On Wed, Dec 30, 2009 at 7:49 AM, Ross tetr...@gmail.com wrote: Hi all I'm experimenting with Solr. I've successfully indexed some PDFs and all looks good but now I want to index some PDFs with metadata pulled from another source. I see this example in the docs. curl http://localhost:8983/solr/update/extract?literal.id=doc4captureAttr=truedefaultField=textcapture=divfmap.div=foo_tboost.foo_t=3literal.blah_s=Bah -F tutori...@tutorial.pdf I can write code to generate a script with those commands substituting my own literal.whatever. My metadata could be up to a couple of KB in size. Is there a way of making the literal a POST variable rather than a GET? With Curl? Yes, see the man page. Will Solr Cell accept it as a POST? Yes, it will. -- Regards, Shalin Shekhar Mangar.
Strange Behavior When Using CSVRequestHandler
The problem: Not all of the documents that I expect to be indexed are showing up in the index. The background: I start off with an empty index based on a schema with a single field named 'query', marked as unique and using the following analyzer: analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer My input is a utf-8 encoded file with one sentence per line. Its total size is about 60MB. I would like each line of the file to correspond to a single document in the solr index. If I print the number of unique lines in the file (using cat | sort | uniq | wc -l), I get a little over 2M. Printing the total number of lines in the file gives me around 2.7M. I use the following to start indexing: curl 'http://localhost:8983/solr/update/csv?commit=trueseparator=%09stream.file=/home/gkropitz/querystage2map/file1stream.contentType=text/plain;charset=utf-8fieldnames=queryescape=\' When this command completes, I see numDocs is approximately 470k (which is what I find strange) and maxDocs is approximately 890k (which is fine since I know I have around 700k duplicates). Even more confusing is that if I run this exact command a second time without performing any other operations, numDocs goes up to around 610k, and a third time brings it up to about 750k. Can anyone tell me what might cause Solr not to index everything in my input file the first time, and why it would be able to index new documents the second and third times? I also have this line in solrconfig.xml, if it matters: requestParsers enableRemoteStreaming=true multipartUploadLimitInKB=2048 / Thanks, Dan -- View this message in context: http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 - stats page slow
Sorry, know I'm a little late in replying but the LukeRequestHandler tip was just what I needed! Thank you so much. -- Steve On Dec 25, 2009, at 2:03 AM, Chris Hostetter wrote: : I've noticed this as well, usually when working with a large field cache. I : haven't done in-depth analysis of this yet, but it seems like when the stats : page is trying to pull data from a large field cache it takes quite a long : time. In Solr 1.4, the stats page was modified to start reporting stats on the FieldCache (using the new FieldCache introspection API added by Lucene Java 2.9) so that may be what you are seeing. : more than 10 seconds. We call this programmatically to retrieve the last : commit date so that we can keep users from committing too frequently. This : means some of our administration pages are now taking a long time to load. i'm not really following this ... what piece of data from the stats.jsp are you using to compute/infer a commit date? if you are looking at registration date of the SolrIndexSearcher you can also get that from the LukeRequestHandler which is much more efficient (it has options for limiting the work it does)... http://localhost:8983/solr/admin/luke?numTerms=0fl=BOGUS -Hoss
How to ignore term frequency 1? Field-specific Similarity class?
Hi, I want to modify scoring to ignore term frequency 1. This is useful for short fields like titles or subjects, where the number of times a term appears does not correspond to relevancy. I found several discussions of this problem, and also an implementation that changes the Similarity class to achieve this (http://osdir.com/ml/solr-user.lucene.apache.org/2009-09/msg00672.html). However, this change is global, but I only need the behavior for some fields. What's the best way to do this? Is there a way to use a field-specific similarity class, or to evaluate field names/parameters inside a Similarity class? Thanks! Andreas
Re: Basic sentence parsing with the regex highlighter fragmenter
I've looked at the docs/source for WordDelimiterFilter, and I understand what it does now. Here is my configuration: http://gist.github.com/270590 I've tried the StandardTokenizerFactory instead of the WhitespaceTokenizerFactory, but I get the same problem as before, a the period from the previous sentence shows up and the period from the current sentence is cut off of highlighter fragments. I've tried the WhitespaceTokenizer with the StandardFilter, and this kinda works, but to match a word at the end of a sentence, you need to search for the period at the end of the sentence (the period is being tokenized along with the word). In any case, if I use the WordDelimiterFilter or add preserveOriginal=1, everything seems to work. (If I remove the WordDelimiterFilter, the periods are indexed with the word they're connected to, and searching for those words doesn't match unless the user includes the period) I'm trying to go through the code to understand how this works. On Wed, Jan 6, 2010 at 9:13 AM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, the name WordDelimiterFilterFactory might be leading you astray. Its purpose isn't to break things up into words that have anything to do with grammatical rules. Rather, it's purpose is to break up strings of funky characters into searchable stuff. see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory In the grammatical sense, PowerShot should just be PowerShot, not power shot (which is what WordDelimiterFactory gives you, options permitting). So I think you probably want one of the other analyzers Have you tried any other analyzers? StandardAnalyzer might be more friendly HTH Erick On Tue, Jan 5, 2010 at 5:18 PM, Caleb Land caleb.l...@gmail.com wrote: I've tracked this problem down to the fact that I'm using the WordDelimiterFilter. I don't quite understand what's happening, but if I add preserveOriginal=1 as an option, everything looks fine. I think it has to do with the period being stripped in the token stream. On Tue, Jan 5, 2010 at 2:05 PM, Caleb Land caleb.l...@gmail.com wrote: Hello, I'm using Solr 1.4, and I'm trying to get the regex fragmenter to parse basic sentences, and I'm running into a problem. I'm using the default regex specified in the example solr configuration: [-\w ,/\n\']{20,200} But I am using a larger fragment size (140) with a slop of 1.0. Given the passage: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla a neque a ipsum accumsan iaculis at id lacus. Sed magna velit, aliquam ut congue vitae, molestie quis nunc. When I search for Nulla (the first word of the second sentence) and grab the first highlighted snippet, this is what I get: . emNulla/em a neque a ipsum accumsan iaculis at id lacus As you can see, there's a leading period from the previous sentence and the period from the current sentence is missing. I understand this regex isn't that advanced, but I've tried everything I can think of, regex-wise, to get this to work, and I always end up with this problem. For example, I've tried: \w[^.!?]{0,200}[.!?] Which seems like it should include the ending punctuation, but it doesn't, so I think I'm missing something. Does anybody know a regex that works? -- Caleb Land -- Caleb Land -- Caleb Land
No Analyzer, tokenizer or stemmer works at Solr
I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. I am using the example-configuration of the current Solr 1.4 release. What's wrong? Thank you! -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: replication -- missing field data file
How can you tell when the backup is done? -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble Paul ??? ?? Sent: Wednesday, January 06, 2010 12:23 PM To: solr-user Subject: Re: replication -- missing field data file the index dir is in the name index others will be stored as indexdate-as-number On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: How can you differentiate between the backup and the normal index files? -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble Paul ??? ?? Sent: Wednesday, January 06, 2010 11:52 AM To: solr-user Subject: Re: replication -- missing field data file On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pretty much every operation (core reload, commit, add document) fails with an error like the one posted below. How could this happen? How can one recover from such an error? Is there any way to regenerate the FDT file without re-indexing everything? This brings me to a question about backups. If I run the replication?command=backup command, where is this backup stored? I've tried this a few times and get an OK response from the machine, but I don't see the backup generated anywhere. The backup is done asynchronously. So it always gives an OK response immedietly. The backup is created in the data dir itself Thanks, Gio. org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409) ... 18 more Caused by: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.lt;initgt;(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65) at org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104) at
Re: Strange Behavior When Using CSVRequestHandler
I think the root of your problem is that unique fields should NOT be multivalued. See http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key) http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)In this case, since you're tokenizing, your query field is implicitly multi-valued, I don't know what the behavior will be. But there's another problem: All the filters in your analyzer definition will mess up the correspondence between the Unix uniq and numDocs even if you got by the above. I.e StopFilter would make the lines a problem and the problem identical. WordDelimiter would do all kinds of interesting things LowerCaseFilter would make Myproblem and myproblem identical. RemoveDuplicatesFilter would make interesting interesting and interesting identical You could define a second field, make *that* one unique and NOT analyzer it in any way... You could hash your sentences and define the hash as your unique key. You could HTH Erick On Wed, Jan 6, 2010 at 1:06 PM, danben dan...@gmail.com wrote: The problem: Not all of the documents that I expect to be indexed are showing up in the index. The background: I start off with an empty index based on a schema with a single field named 'query', marked as unique and using the following analyzer: analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer My input is a utf-8 encoded file with one sentence per line. Its total size is about 60MB. I would like each line of the file to correspond to a single document in the solr index. If I print the number of unique lines in the file (using cat | sort | uniq | wc -l), I get a little over 2M. Printing the total number of lines in the file gives me around 2.7M. I use the following to start indexing: curl ' http://localhost:8983/solr/update/csv?commit=trueseparator=%09stream.file=/home/gkropitz/querystage2map/file1stream.contentType=text/plain;charset=utf-8fieldnames=queryescape= \' When this command completes, I see numDocs is approximately 470k (which is what I find strange) and maxDocs is approximately 890k (which is fine since I know I have around 700k duplicates). Even more confusing is that if I run this exact command a second time without performing any other operations, numDocs goes up to around 610k, and a third time brings it up to about 750k. Can anyone tell me what might cause Solr not to index everything in my input file the first time, and why it would be able to index new documents the second and third times? I also have this line in solrconfig.xml, if it matters: requestParsers enableRemoteStreaming=true multipartUploadLimitInKB=2048 / Thanks, Dan -- View this message in context: http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Basic sentence parsing with the regex highlighter fragmenter
Hmmm, I'll have to defer to the highlighter experts here Erick On Wed, Jan 6, 2010 at 3:23 PM, Caleb Land redhatd...@gmail.com wrote: I've looked at the docs/source for WordDelimiterFilter, and I understand what it does now. Here is my configuration: http://gist.github.com/270590 I've tried the StandardTokenizerFactory instead of the WhitespaceTokenizerFactory, but I get the same problem as before, a the period from the previous sentence shows up and the period from the current sentence is cut off of highlighter fragments. I've tried the WhitespaceTokenizer with the StandardFilter, and this kinda works, but to match a word at the end of a sentence, you need to search for the period at the end of the sentence (the period is being tokenized along with the word). In any case, if I use the WordDelimiterFilter or add preserveOriginal=1, everything seems to work. (If I remove the WordDelimiterFilter, the periods are indexed with the word they're connected to, and searching for those words doesn't match unless the user includes the period) I'm trying to go through the code to understand how this works. On Wed, Jan 6, 2010 at 9:13 AM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, the name WordDelimiterFilterFactory might be leading you astray. Its purpose isn't to break things up into words that have anything to do with grammatical rules. Rather, it's purpose is to break up strings of funky characters into searchable stuff. see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory In the grammatical sense, PowerShot should just be PowerShot, not power shot (which is what WordDelimiterFactory gives you, options permitting). So I think you probably want one of the other analyzers Have you tried any other analyzers? StandardAnalyzer might be more friendly HTH Erick On Tue, Jan 5, 2010 at 5:18 PM, Caleb Land caleb.l...@gmail.com wrote: I've tracked this problem down to the fact that I'm using the WordDelimiterFilter. I don't quite understand what's happening, but if I add preserveOriginal=1 as an option, everything looks fine. I think it has to do with the period being stripped in the token stream. On Tue, Jan 5, 2010 at 2:05 PM, Caleb Land caleb.l...@gmail.com wrote: Hello, I'm using Solr 1.4, and I'm trying to get the regex fragmenter to parse basic sentences, and I'm running into a problem. I'm using the default regex specified in the example solr configuration: [-\w ,/\n\']{20,200} But I am using a larger fragment size (140) with a slop of 1.0. Given the passage: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla a neque a ipsum accumsan iaculis at id lacus. Sed magna velit, aliquam ut congue vitae, molestie quis nunc. When I search for Nulla (the first word of the second sentence) and grab the first highlighted snippet, this is what I get: . emNulla/em a neque a ipsum accumsan iaculis at id lacus As you can see, there's a leading period from the previous sentence and the period from the current sentence is missing. I understand this regex isn't that advanced, but I've tried everything I can think of, regex-wise, to get this to work, and I always end up with this problem. For example, I've tried: \w[^.!?]{0,200}[.!?] Which seems like it should include the ending punctuation, but it doesn't, so I think I'm missing something. Does anybody know a regex that works? -- Caleb Land -- Caleb Land -- Caleb Land
Re: No Analyzer, tokenizer or stemmer works at Solr
Well, I have noticed that Solr isn't using ANY analyzer How do you know this? Because it's highly unlikely that SOLR is completely broken on that level. Erick On Wed, Jan 6, 2010 at 3:48 PM, MitchK mitc...@web.de wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. I am using the example-configuration of the current Solr 1.4 release. What's wrong? Thank you! -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
On Jan 6, 2010, at 3:48 PM, MitchK wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. The stored value is always the original input. The *indexed* values are transformed by analysis. If you really need to store the analyzed fields, that may be possible with an UpdateRequestProcessor. also see: https://issues.apache.org/jira/browse/SOLR-314 ryan
How to set User.dir or CWD for Solr during Tomcat startup
Is there anyway to force the cwd that solr starts up in when using the standard startup scripts for tomcat? I'm working on solaris and using the SMF to start and stop tomcat sets the path to /root. I've been doing a bunch of googling and haven't seen if there is a parameter to set within Tomcat other than the solr/home which is setup in the solr.xml under the $CATALINA_HOME/conf/Catalina/localhost/. I've had one person give me instructions using the Gui on windows, but I'm at a loss on which configuration file that would set that or which environment variable can or should be defined. Any help would be appreciated. Thanks Robbin
Search query log using solr
Hi All: I am currently using solr 1.4 as the search engine for my application. I am planning to add a search query log that will capture all the search queries (and more information like IP,user info,date time,etc). I understand I can easily do this on the application side capturing all the search request, logging them in a DB/File before sending them to solr for execution. But I wanted to check with the forum if there was any better approach OR best practices OR anything that has been added to Solr for such requirement. The idea is then to use this search log for statistical as well as improving the search results. Please share your experience/ideas. TIA ~Ravi.
Re: DisMaxRequestHandler bf configuration
On Wed, Jan 6, 2010 at 2:43 AM, Andy angelf...@yahoo.com wrote: I'd like to boost every query using {!boost b=log(popularity)}. But I'd rather not have to prepend that to every query. It'd be much cleaner for me to configure Solr to use that as default. My plan is to make DisMaxRequestHandler the default handler and add the following to solrconfig.xml: requestHandler name=dismax class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=bf log(popularity) /str /lst /requestHandler Is this the correct way to do it? bf adds in the function query {!boost} multiples the function query In the new edismax (which may replace dismax soon) you can specify the multiplicative boost via boost=log(popularity) -Yonik http://www.lucidimagination.com
Re: DisMaxRequestHandler bf configuration
So if I want to configure Solr to turn every query q=foo into q={!boost b=log(popularity)}foo, dismax wouldn't work but edismax would? If that's the case, can you tell me how to set up/use edismax? I can't find much documentation on it. Is it recommended for production use? --- On Wed, 1/6/10, Yonik Seeley yo...@lucidimagination.com wrote: From: Yonik Seeley yo...@lucidimagination.com Subject: Re: DisMaxRequestHandler bf configuration To: solr-user@lucene.apache.org Date: Wednesday, January 6, 2010, 7:09 PM On Wed, Jan 6, 2010 at 2:43 AM, Andy angelf...@yahoo.com wrote: I'd like to boost every query using {!boost b=log(popularity)}. But I'd rather not have to prepend that to every query. It'd be much cleaner for me to configure Solr to use that as default. My plan is to make DisMaxRequestHandler the default handler and add the following to solrconfig.xml: requestHandler name=dismax class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=bf log(popularity) /str /lst /requestHandler Is this the correct way to do it? bf adds in the function query {!boost} multiples the function query In the new edismax (which may replace dismax soon) you can specify the multiplicative boost via boost=log(popularity) -Yonik http://www.lucidimagination.com
Re: DisMaxRequestHandler bf configuration
On Wed, Jan 6, 2010 at 7:43 PM, Andy angelf...@yahoo.com wrote: So if I want to configure Solr to turn every query q=foo into q={!boost b=log(popularity)}foo, dismax wouldn't work but edismax would? You can do it with dismax it's just that the syntax is slightly more convoluted. Check out the section on boosting newer documents: http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents If that's the case, can you tell me how to set up/use edismax? I can't find much documentation on it. Is it recommended for production use? It's in trunk (not 1.4). -Yonik http://www.lucidimagination.com
Re: SOLR or Hibernate Search?
Hi! Thanks for the answers. These were crucial to my decision. I've adapted the solr in my application. On Wed, Dec 30, 2009 at 2:00 AM, Ryan McKinley ryan...@gmail.com wrote: If you need to search via the Hibernate API, then use hibernate search. If you need a scaleable HTTP (REST) then solr may be the way to go. Also, i don't think hibernate has anything like the faceting / complex query stuff etc. On Dec 29, 2009, at 3:25 PM, Márcio Paulino wrote: Hey Everyone! I was make a comparison of both technologies (SOLR AND Hibernate Search) and i see many things are equals. Anyone could told me when i must use SOLR and when i must use Hibernate Search? Im my project i will have: 1. Queries for indexed fields (Strings) and for not indexed Fields (Integer, Float, Date). [In Hibernate Search on in SOLR, i must search on index and, with results of query, search on database (I can't search in both places ate same time).] I Will Have search like: Give me all Register Where Value 190 And Name Contains = 'JAVA' 2. My client need process a lot of email (20.000 per day) and i must indexed all fields (excluded sentDate ) included Attachments, and performance is requirement of my System 3. My Application is multiclient, and i need to separate the index by client. In this Scenario, whats the best solution? SOLR or HIbernateSearch I See SOLR is a dedicated server and has a good performance test. I don't see advantages to use hibernate-search in comparison with SOLR (Except the fact of integrate with my Mapped Object) Thanks for Help -- att, ** Márcio Paulino Campo Grande - MS MSN / Gtalk: mcopaul...@gmail.com ICQ: 155897898 ** -- att, ** Márcio Paulino Campo Grande - MS MSN / Gtalk: mcopaul...@gmail.com ICQ: 155897898 **
Re: DisMaxRequestHandler bf configuration
I meant can I do it with dismax without modifying every single query? I'm accessing Solr through haystack and all queries are generated by haystack. I'd much rather not have to go under haystack to modify the generated queries. Hence I'm trying to find a way to boost every query by default. --- On Wed, 1/6/10, Yonik Seeley yo...@lucidimagination.com wrote: From: Yonik Seeley yo...@lucidimagination.com Subject: Re: DisMaxRequestHandler bf configuration To: solr-user@lucene.apache.org Date: Wednesday, January 6, 2010, 7:48 PM On Wed, Jan 6, 2010 at 7:43 PM, Andy angelf...@yahoo.com wrote: So if I want to configure Solr to turn every query q=foo into q={!boost b=log(popularity)}foo, dismax wouldn't work but edismax would? You can do it with dismax it's just that the syntax is slightly more convoluted. Check out the section on boosting newer documents: http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
Re: DisMaxRequestHandler bf configuration
On Wed, Jan 6, 2010 at 8:24 PM, Andy angelf...@yahoo.com wrote: I meant can I do it with dismax without modifying every single query? I'm accessing Solr through haystack and all queries are generated by haystack. I'd much rather not have to go under haystack to modify the generated queries. Hence I'm trying to find a way to boost every query by default. If you can get haystack to pass through the user query as something like qq, then yes - just use something like the last link I showed at http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents and set defaults for everything except qq. -Yonik http://www.lucidimagination.com --- On Wed, 1/6/10, Yonik Seeley yo...@lucidimagination.com wrote: From: Yonik Seeley yo...@lucidimagination.com Subject: Re: DisMaxRequestHandler bf configuration To: solr-user@lucene.apache.org Date: Wednesday, January 6, 2010, 7:48 PM On Wed, Jan 6, 2010 at 7:43 PM, Andy angelf...@yahoo.com wrote: So if I want to configure Solr to turn every query q=foo into q={!boost b=log(popularity)}foo, dismax wouldn't work but edismax would? You can do it with dismax it's just that the syntax is slightly more convoluted. Check out the section on boosting newer documents: http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
Re: DisMaxRequestHandler bf configuration
Let me make sure I understand you. I'd get my regular query from haystack as qq=foo rather than q=foo. Then I put in solrconfig within the dismax section: str name=q.alt {!boost b=$popularityboost v=$qq}popularityboost=log(popularity) /str Is that what you meant? --- On Wed, 1/6/10, Yonik Seeley yo...@lucidimagination.com wrote: From: Yonik Seeley yo...@lucidimagination.com Subject: Re: DisMaxRequestHandler bf configuration To: solr-user@lucene.apache.org Date: Wednesday, January 6, 2010, 8:42 PM On Wed, Jan 6, 2010 at 8:24 PM, Andy angelf...@yahoo.com wrote: I meant can I do it with dismax without modifying every single query? I'm accessing Solr through haystack and all queries are generated by haystack. I'd much rather not have to go under haystack to modify the generated queries. Hence I'm trying to find a way to boost every query by default. If you can get haystack to pass through the user query as something like qq, then yes - just use something like the last link I showed at http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents and set defaults for everything except qq. -Yonik http://www.lucidimagination.com --- On Wed, 1/6/10, Yonik Seeley yo...@lucidimagination.com wrote: From: Yonik Seeley yo...@lucidimagination.com Subject: Re: DisMaxRequestHandler bf configuration To: solr-user@lucene.apache.org Date: Wednesday, January 6, 2010, 7:48 PM On Wed, Jan 6, 2010 at 7:43 PM, Andy angelf...@yahoo.com wrote: So if I want to configure Solr to turn every query q=foo into q={!boost b=log(popularity)}foo, dismax wouldn't work but edismax would? You can do it with dismax it's just that the syntax is slightly more convoluted. Check out the section on boosting newer documents: http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
Re: No Analyzer, tokenizer or stemmer works at Solr
Hello Erick, thank you for answering. I can do whatever I want - Solr does nothing. For example: If I use the textgen-fieldtype which is predefined, nothing happens to the text. Even the stopFilter is not working - no stopword from stopword.txt was replaced. I think, that this only affects the index, because, if I query for for he returns nothing, which is quietly correct, due to the work of the stopFilter. Everything works fine on analysis.jsp, but not in reality. If you have got any testcase-data you want me to add, please, tell me and I will show you the saved data afterwards. Thank you. Mitch Erick Erickson wrote: Well, I have noticed that Solr isn't using ANY analyzer How do you know this? Because it's highly unlikely that SOLR is completely broken on that level. Erick On Wed, Jan 6, 2010 at 3:48 PM, MitchK mitc...@web.de wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. I am using the example-configuration of the current Solr 1.4 release. What's wrong? Thank you! -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055510.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
Hello Ryan, thank you for answering. In my schema.xml I am defining the field as indexed = true. The problem is: nothing, even the original predefined analyzers don't work anyway. Please, have a look on my response to Erick. Mitch P.S. Oh, I see what you mean. The field is indexed = true. My language was a little bit tricky ;). ryantxu wrote: On Jan 6, 2010, at 3:48 PM, MitchK wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. The stored value is always the original input. The *indexed* values are transformed by analysis. If you really need to store the analyzed fields, that may be possible with an UpdateRequestProcessor. also see: https://issues.apache.org/jira/browse/SOLR-314 ryan -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055512.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
Mitch, Again, I think you're misunderstanding what analysis does. You must be expecting we think, though you've not provided exact duplication steps to be sure, that the value you get back from Solr is the analyzer processed output. It's not, it's exactly what you provide. Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. There's some thinking going on implementing it such that analyzed output is stored. You can, however, use the analysis request handler componentry to get analyzed stuff back as you see it in analysis.jsp on a per-document or per-field text basis - if you're looking to leverage the analyzer output in that fashion from a client. Erik On Jan 7, 2010, at 1:21 AM, MitchK wrote: Hello Erick, thank you for answering. I can do whatever I want - Solr does nothing. For example: If I use the textgen-fieldtype which is predefined, nothing happens to the text. Even the stopFilter is not working - no stopword from stopword.txt was replaced. I think, that this only affects the index, because, if I query for for he returns nothing, which is quietly correct, due to the work of the stopFilter. Everything works fine on analysis.jsp, but not in reality. If you have got any testcase-data you want me to add, please, tell me and I will show you the saved data afterwards. Thank you. Mitch Erick Erickson wrote: Well, I have noticed that Solr isn't using ANY analyzer How do you know this? Because it's highly unlikely that SOLR is completely broken on that level. Erick On Wed, Jan 6, 2010 at 3:48 PM, MitchK mitc...@web.de wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. I am using the example-configuration of the current Solr 1.4 release. What's wrong? Thank you! -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055510.html Sent from the Solr - User mailing list archive at Nabble.com.