Re: Nested table support ability
Hi Otis, Thanks for the update. My paramteric search has to span across customer table and 30 child tables. We have close to 1 million customers. Do you think Lucene/Solr is the right fsolution for such requirements? or database search would be more optimal. Regards, Amit -- View this message in context: http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field missing when use distributed search + dismax
Hi. Lance. Thanks for replying. Yes. I especially checked the schema.xml and did another simple test. The broker is running on localhost:7499/solr. A solr instance is running on localhost:7498/solr. For this test, I only use these 2 instances. 7499's index is empty. 7498 has 12 documents in index. I copied the schema.xml from 7498 to 7499 before test. 1. http://localhost:7498/solr/select I get: . result name=response numFound=12 start=0 - doc str name=idgppost_6179/str str name=typegppost/str /doc . 2. http://localhost:7499/solr/select I get: result name=response numFound=0 start=0/ 3. http://localhost:7499/solr/select?shards=localhost:7498/solr I get: result name=response numFound=12 start=0 - doc str name=idgppost_6179/str /doc - doc str name=idgppost_6282/str /doc So strange! I then checked with standard searchhandler. 1. http://localhost:7499/solr/select?shards=localhost:7498/solrq=marship result name=response numFound=1 start=0 - doc str name=idmember_marship11/str str name=typemember/str date name=date2010-01-21T00:00:00Z/date /doc /result And 2. http://localhost:7499/solr/select?shards=localhost:7498/solrq=marshipqt=dismax result name=response numFound=1 start=0 - doc str name=idmember_marship11/str /doc /result So strange! On Wed, Jun 23, 2010 at 11:12 AM, Lance Norskog goks...@gmail.com wrote: Do all of the Solr instances, including the broker, use the same schema.xml? On 6/22/10, Scott Zhang macromars...@gmail.com wrote: Hi. All. I was using distributed search over 30 solr instance, the previous one was using the standard query handler. And the result was returned correctly. each result has 2 fields. ID and type. Today I want to use search withk dismax, I tried search with each instance with dismax. It works correctly, return ID and type for each result. The strange thing is when I use distributed search, the result only have ID. The field type disappeared. I need that type to know what the ID refer to. Why solr eat my type? Thanks. Regards. Scott -- Lance Norskog goks...@gmail.com
Re: Nested table support ability
Amit - unless you test it would not be apparent. Key piece is as Otis mentioned flatten everything. This requires effort from your side to actually create documents in manner suitable for your searches. The relationship needs to be merged into the document. To avoid storing text representations - you may want to store just the identifier and use front end to translate between human readable text vs stored identifier. Taking your case further - Rather than storing ADMIN store just a representation may be a smallint with customer information. On Wed, Jun 23, 2010 at 11:30 AM, amit_ak amit...@mindtree.com wrote: Hi Otis, Thanks for the update. My paramteric search has to span across customer table and 30 child tables. We have close to 1 million customers. Do you think Lucene/Solr is the right fsolution for such requirements? or database search would be more optimal. Regards, Amit -- View this message in context: http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field Collapsing SOLR-236
Hi, Patching did work. but when i build the trunk, i get the following exception: [SolrTrunk]# ant compile Buildfile: /testWorkspace/SolrTrunk/build.xml init-forrest-entities: [mkdir] Created dir: /testWorkspace/SolrTrunk/build [mkdir] Created dir: /testWorkspace/SolrTrunk/build/web compile-lucene: BUILD FAILED /testWorkspace/SolrTrunk/common-build.xml:207: /testWorkspace/modules/analysis/common does not exist. Regards, Raakhi On Wed, Jun 23, 2010 at 2:39 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: What exactly did not work? Patching, compiling or running it? On 22 June 2010 16:06, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, I tried checking out the latest code (rev 956715) the patch did not work on it. Infact i even tried hunting for the revision mentioned earlier in this thread (i.e. rev 955615) but cannot find it in the repository. (it has revision 955569 followed by revision 955785). Any pointers?? Regards Raakhi On Tue, Jun 22, 2010 at 2:03 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Oh in that case is the code stable enough to use it for production? - Well this feature is a patch and I think that says it all. Although bugs are fixed it is deferentially an experimental feature and people should keep that in mind when using one of the patches. Does it support features which solr 1.4 normally supports? - As far as I know yes. am using facets as a workaround but then i am not able to sort on any other field. is there any workaround to support this feature?? - Maybee http://wiki.apache.org/solr/Deduplication prevents from adding duplicates in you index, but then you miss the collapse counts and other computed values On 21 June 2010 09:04, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Oh in that case is the code stable enough to use it for production? Does it support features which solr 1.4 normally supports? I am using facets as a workaround but then i am not able to sort on any other field. is there any workaround to support this feature?? Regards, Raakhi On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Rakhi, The patch is not compatible with 1.4. If you want to work with the trunk. I'll need to get the src from https://svn.apache.org/repos/asf/lucene/dev/trunk/ Martijn On 18 June 2010 13:46, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi Moazzam, Where did u get the src code from?? I am downloading it from https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 and the latest revision in this location is 955469. so applying the latest patch(dated 17th june 2010) on it still generates errors. Any Pointers? Regards, Raakhi On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan moazz...@gmail.com wrote: I knew it wasn't me! :) I found the patch just before I read this and applied it to the trunk and it works! Thanks Mark and martijn for all your help! - Moazzam On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen martijn.is.h...@gmail.com wrote: I've added a new patch to the issue, so building the trunk (rev 955615) with the latest patch should not be a problem. Due to recent changes in the Lucene trunk the patch was not compatible. On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com wrote: On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote: p.s. I'd be glad to contribute our Maven build re-organization back to the community to get Solr properly Mavenized so that it can be distributed and released more often. For us the benefit of this structure is that we will be able to overlay addons such as RequestHandlers and other third party support without having to rebuild Solr from scratch. But you don't have to rebuild Solr from scratch to add a new request handler or other plugins - simply compile your custom stuff into a JAR and put it in solr-home/lib (or point to it with lib in solrconfig.xml). Ideally, a Maven Archetype could be created that would allow one rapidly produce a Solr webapp and fire it up in Jetty in mere seconds. How's that any different than cd example; java -jar start.jar? Or do you mean a Solr client webapp? Finally, with projects such as Bobo, integration with Spring would make configuration more consistent and request significantly less java coding just to add new capabilities everytime someone authors a new RequestHandler. It's one line of config to add a new request handler. How many ridiculously ugly confusing lines of Spring XML would it take? The biggest thing I learned
Re: Field Collapsing SOLR-236
Oops this is probably i didn't checkout the modules file from the trunk. doing that right now :) Regards Raakhi On Wed, Jun 23, 2010 at 1:12 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Patching did work. but when i build the trunk, i get the following exception: [SolrTrunk]# ant compile Buildfile: /testWorkspace/SolrTrunk/build.xml init-forrest-entities: [mkdir] Created dir: /testWorkspace/SolrTrunk/build [mkdir] Created dir: /testWorkspace/SolrTrunk/build/web compile-lucene: BUILD FAILED /testWorkspace/SolrTrunk/common-build.xml:207: /testWorkspace/modules/analysis/common does not exist. Regards, Raakhi On Wed, Jun 23, 2010 at 2:39 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: What exactly did not work? Patching, compiling or running it? On 22 June 2010 16:06, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, I tried checking out the latest code (rev 956715) the patch did not work on it. Infact i even tried hunting for the revision mentioned earlier in this thread (i.e. rev 955615) but cannot find it in the repository. (it has revision 955569 followed by revision 955785). Any pointers?? Regards Raakhi On Tue, Jun 22, 2010 at 2:03 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Oh in that case is the code stable enough to use it for production? - Well this feature is a patch and I think that says it all. Although bugs are fixed it is deferentially an experimental feature and people should keep that in mind when using one of the patches. Does it support features which solr 1.4 normally supports? - As far as I know yes. am using facets as a workaround but then i am not able to sort on any other field. is there any workaround to support this feature?? - Maybee http://wiki.apache.org/solr/Deduplication prevents from adding duplicates in you index, but then you miss the collapse counts and other computed values On 21 June 2010 09:04, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Oh in that case is the code stable enough to use it for production? Does it support features which solr 1.4 normally supports? I am using facets as a workaround but then i am not able to sort on any other field. is there any workaround to support this feature?? Regards, Raakhi On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Rakhi, The patch is not compatible with 1.4. If you want to work with the trunk. I'll need to get the src from https://svn.apache.org/repos/asf/lucene/dev/trunk/ Martijn On 18 June 2010 13:46, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi Moazzam, Where did u get the src code from?? I am downloading it from https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 and the latest revision in this location is 955469. so applying the latest patch(dated 17th june 2010) on it still generates errors. Any Pointers? Regards, Raakhi On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan moazz...@gmail.com wrote: I knew it wasn't me! :) I found the patch just before I read this and applied it to the trunk and it works! Thanks Mark and martijn for all your help! - Moazzam On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen martijn.is.h...@gmail.com wrote: I've added a new patch to the issue, so building the trunk (rev 955615) with the latest patch should not be a problem. Due to recent changes in the Lucene trunk the patch was not compatible. On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com wrote: On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote: p.s. I'd be glad to contribute our Maven build re-organization back to the community to get Solr properly Mavenized so that it can be distributed and released more often. For us the benefit of this structure is that we will be able to overlay addons such as RequestHandlers and other third party support without having to rebuild Solr from scratch. But you don't have to rebuild Solr from scratch to add a new request handler or other plugins - simply compile your custom stuff into a JAR and put it in solr-home/lib (or point to it with lib in solrconfig.xml). Ideally, a Maven Archetype could be created that would allow one rapidly produce a Solr webapp and fire it up in Jetty in mere seconds. How's that any different than cd example; java -jar start.jar? Or do you mean a Solr client webapp? Finally, with projects such as Bobo, integration with Spring would make configuration more consistent and request significantly less java coding just to add new capabilities everytime someone authors
Re: Field missing when use distributed search + dismax
Hi. All. I found more about fields missing things. I tried the default distributed search example which configured 2 instances, one on 8983 and another on 7574. When I try search with standard query handler, the result fields are all right. When I search with the deafult dismax, some fields disappeared. Not sure why. Can anyone test this and confirm the reason? Thanks. Regards. On Wed, Jun 23, 2010 at 2:50 PM, Scott Zhang macromars...@gmail.com wrote: Hi. Lance. Thanks for replying. Yes. I especially checked the schema.xml and did another simple test. The broker is running on localhost:7499/solr. A solr instance is running on localhost:7498/solr. For this test, I only use these 2 instances. 7499's index is empty. 7498 has 12 documents in index. I copied the schema.xml from 7498 to 7499 before test. 1. http://localhost:7498/solr/select I get: . result name=response numFound=12 start=0 - doc str name=idgppost_6179/str str name=typegppost/str /doc . 2. http://localhost:7499/solr/select I get: result name=response numFound=0 start=0/ 3. http://localhost:7499/solr/select?shards=localhost:7498/solr I get: result name=response numFound=12 start=0 - doc str name=idgppost_6179/str /doc - doc str name=idgppost_6282/str /doc So strange! I then checked with standard searchhandler. 1. http://localhost:7499/solr/select?shards=localhost:7498/solrq=marship result name=response numFound=1 start=0 - doc str name=idmember_marship11/str str name=typemember/str date name=date2010-01-21T00:00:00Z/date /doc /result And 2. http://localhost:7499/solr/select?shards=localhost:7498/solrq=marshipqt=dismax result name=response numFound=1 start=0 - doc str name=idmember_marship11/str /doc /result So strange! On Wed, Jun 23, 2010 at 11:12 AM, Lance Norskog goks...@gmail.com wrote: Do all of the Solr instances, including the broker, use the same schema.xml? On 6/22/10, Scott Zhang macromars...@gmail.com wrote: Hi. All. I was using distributed search over 30 solr instance, the previous one was using the standard query handler. And the result was returned correctly. each result has 2 fields. ID and type. Today I want to use search withk dismax, I tried search with each instance with dismax. It works correctly, return ID and type for each result. The strange thing is when I use distributed search, the result only have ID. The field type disappeared. I need that type to know what the ID refer to. Why solr eat my type? Thanks. Regards. Scott -- Lance Norskog goks...@gmail.com
Re: Searching across multiple repeating fields
Cheers, Geert-Jan, that's very helpful. We won't always be searching with dates and we wouldn't want duplicates to show up in the results, so your second suggestion looks like a good workaround if I can't solve the actual problem. I didn't know about FieldCollapsing, so I'll definitely keep it in mind. Thanks Mark On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote: Perhaps my answer is useless, bc I don't have an answer to your direct question, but: You *might* want to consider if your concept of a solr-document is on the correct granular level, i.e: your problem posted could be tackled (afaik) by defining a document being a 'sub-event' with only 1 daterange. So for each event-doc you have now, this is replaced by several sub- event docs in this proposed situation. Additionally each sub-event doc gets an additional field 'parent- eventid' which maps to something like an event-id (which you're probably using) . So several sub-event docs can point to the same event-id. Lastly, all sub-event docs belonging to a particular event implement all the other fields that you may have stored in that particular event-doc. Now you can query for events based on data-rages like you envisioned, but instead of returning events you return sub-event-docs. However since all data of the original event (except the multiple dateranges) is available in the subevent-doc this shouldn't really bother the client. If you need to display all dates of an event (the only info missing from the returned solr-doc) you could easily store it in a RDB and fetch it using the defined parent-eventid. The only caveat I see, is that possibly multiple sub-events with the same 'parent-eventid' might get returned for a particular query. This however depends on the type of queries you envision. i.e: 1) If you always issue queries with date-filters, and *assuming* that sub-events of a particular event don't temporally overlap, you will never get multiple sub-events returned. 2) if 1) doesn't hold and assuming you *do* mind multiple sub- events of the same actual event, you could try to use Field Collapsing on 'parent-eventid' to only return the first sub-event per parent- eventid that matches the rest of your query. (Note however, that Field Collapsing is a patch at the moment. http://wiki.apache.org/solr/FieldCollapsing) Not sure if this helped you at all, but at the very least it was a nice conceptual exercise ;-) Cheers, Geert-Jan 2010/6/22 Mark Allan mark.al...@ed.ac.uk Hi all, Firstly, I apologise for the length of this email but I need to describe properly what I'm doing before I get to the problem! I'm working on a project just now which requires the ability to store and search on temporal coverage data - ie. a field which specifies a date range during which a certain event took place. I hunted around for a few days and couldn't find anything which seemed to fit, so I had a go at writing my own field type based on solr.PointType. It's used as follows: schema.xml fieldType name=temporal class=solr.TemporalCoverage dimension=2 subFieldSuffix=_i/ field name=daterange type=temporal indexed=true stored=true multiValued=true/ data.xml add doc ... field name=daterange1940,1945/field /doc /add Internally, this gets stored as: arr name=daterangestr1940,1945/str/arr int name=daterange_0_i1940/int int name=daterange_1_i1945/int In due course, I'll declare the subfields as a proper date type, but in the meantime, this works absolutely fine. I can search for an individual date and Solr will check (queryDate daterange_0 AND queryDate daterange_1 ) and the correct documents are returned. My code also allows the user to input a date range in the query but I won't complicate matters with that just now! The problem arises when a document has more than one daterange field (imagine a news broadcast which covers a variety of topics and hence time periods). A document with two daterange fields doc ... field name=daterange19820402,19820614/field field name=daterange1990,2000/field /doc gets stored internally as arr name=daterangestr19820402,19820614/strstr1990,2000/str/ arr arr name=daterange_0_iint19820402/intint1990/int/ arr arr name=daterange_1_iint19820614/intint2000/int/ arr In this situation, searching for 1985 should yield zero results as it is contained within neither daterange, however, the above document is returned in the result set. What Solr is doing is checking that the queryDate (1985) is greater than *any* of the values in daterange_0 AND queryDate is less than *any* of the values in daterange_1. How can I get Solr to respect the positions of each item in the daterange_0 and _1 arrays? Ideally I'd like the search to use the following logic, thus preventing the above document from being returned in a search for 1985:
Re: OOM on sorting on dynamic fields
Hi to all, we moved solr with patched lucene's FieldCache in production environment. During tests we noticed random ConcurrentModificationException calling the getCacheEntries method due to this bug https://issues.apache.org/jira/browse/LUCENE-2273 We applied that patch as well, and added an abstract int getCacheSize() method to FieldCache abstract class and its implementation in abstract Cache inner class in CacheFieldImpl that returns the cache size without instantiating a CacheEntry array. Response time are slower on cache purging but acceptable from the user point of view. Regards, Matteo On 22 June 2010 22:41, Matteo Fiandesio matteo.fiande...@gmail.com wrote: Fields over i'm sorting to are dynamic so one query sorts on erick_time_1,erick_timeA_1 and other sorts on erick_time_2 and so on.What we see in the heap are a lot of arrays,most of them,filled with 0s maybe due to the fact that this timestamps fields are not present in all the documents. By the way, I have a script that generates the OOM in 10 minutes on our solr instance and with the temporary patch it runned without any problems. The side effect is that when the cache is purged next query that regenerates the cache is a little bit slower. I'm aware that the solution is unelegant and we are investigating to solve the problem in another way. Regards, Matteo On 22 June 2010 19:25, Erick Erickson erickerick...@gmail.com wrote: Hmmm, I'm missing something here then. Sorting over 15 fields of type long shouldn't use much memory, even if all the values are unique. When you say 12-15 dynamic fields, are you talking about 12-15 fields per query out of XXX total fields? And is XXX large? At a guess, how many different fields do you think you're sorting over cumulative by the time you get your OOM? Note if you sort over the field erick_time in 10 different queries, I'm only counting that as 1 field. I guess another way of asking this is how many dynamic fields are there total?. If this is really a sorting issue, you should be able to force this to happen almost immediately by firing off enough sort queries at the server. It'll tell you a lot if you can't make this happen, even on a relatively small test machine. Best Erick On Tue, Jun 22, 2010 at 12:59 PM, Matteo Fiandesio matteo.fiande...@gmail.com wrote: Hi Erick, the index is quite small (1691145 docs) but sorting is massive and often on unique timestamp fields. OOM occur after a range of time between three and four hours. Depending as well if users browse a part of the application. We use solrj to make the queries so we did not use Readers objects directly. Without sorting we don't see the problem Regards, Matteo On 22 June 2010 17:01, Erick Erickson erickerick...@gmail.com wrote: H.. A couple of details I'm wondering about. How many documents are we talking about in your index? Do you get OOMs when you start fresh or does it take a while? You've done some good investigations, so it seems like there could well be something else going on here than just the usual suspects of sorting I'm wondering if you aren't really closing readers somehow. Are you updating your index frequently and re-opening readers often? If so, how? I'm assuming that if you do NOT sort on all these fields, you don't have the problem, is that true? Best Erick On Fri, Jun 18, 2010 at 10:52 AM, Matteo Fiandesio matteo.fiande...@gmail.com wrote: Hello, we are experiencing OOM exceptions in our single core solr instance (on a (huge) amazon EC2 machine). We investigated a lot in the mailing list and through jmap/jhat dump analyzing and the problem resides in the lucene FieldCache that fills the heap and blows up the server. Our index is quite small but we have a lot of sort queries on fields that are dynamic,of type long representing timestamps and are not present in all the documents. Those queries apply sorting on 12-15 of those fields. We are using solr 1.4 in production and the dump shows a lot of Integer/Character and Byte Array filled up with 0s. With solr's trunk code things does not change. In the mailing list we saw a lot of messages related to this issues: we tried truncating the dates to day precision,using missingSortLast = true,changing the field type from slong to long,setting autowarming to different values,disabling and enabling caches with different values but we did not manage to solve the problem. We were thinking to implement an LRUFieldCache field type to manage the FieldCache as an LRU and preventing but, before starting a new development, we want to be sure that we are not doing anything wrong in the solr configuration or in the index generation. Any help would be appreciated. Regards, Matteo
Re: Field Collapsing SOLR-236
fieldType:analyzer without class or tokenizer filter list seems to point to the config - you may want to correct. On Wed, Jun 23, 2010 at 3:09 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, I checked out modules lucene from the trunk. Performed a build using the following commands ant clean ant compile ant example Which compiled successfully. I then put my existing index(using schema.xml from solr1.4.0/conf/solr/) in the multicore folder, configured solr.xml and started the server When i type in http://localhost:8983/solr i get the following error: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType:analyzer without class or tokenizer filter list at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:122) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:286) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:198) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:194) at org.mortbay.start.Main.start(Main.java:534) at org.mortbay.start.Main.start(Main.java:441) at org.mortbay.start.Main.main(Main.java:119) Caused by: org.apache.solr.common.SolrException: analyzer without class or tokenizer filter list at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:908) at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:60) at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450) at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:142) ... 32 more Then i picked up an existing index (schema.xml from solr1.3/solr/conf) and put it in multicore folder, configured solr.xml and restarted my index Collapsing worked fine. Any pointers, which part of schema.xml (solr 1.4) is causing this exception? Regards, Raakhi On Wed, Jun 23, 2010 at 1:35 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Oops this is probably i didn't checkout the modules file from the trunk. doing that right now :) Regards Raakhi On Wed, Jun 23, 2010 at 1:12 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Patching did work. but when i build the trunk, i get the following exception: [SolrTrunk]# ant compile Buildfile: /testWorkspace/SolrTrunk/build.xml init-forrest-entities: [mkdir] Created dir: /testWorkspace/SolrTrunk/build [mkdir] Created dir: /testWorkspace/SolrTrunk/build/web compile-lucene: BUILD FAILED /testWorkspace/SolrTrunk/common-build.xml:207: /testWorkspace/modules/analysis/common does not exist. Regards, Raakhi On Wed, Jun 23, 2010 at 2:39 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: What exactly did not work? Patching, compiling or running it? On 22 June 2010 16:06, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, I tried checking out the latest code (rev 956715) the patch did not work on it.
TermsComponent - AutoComplete - Multiple Term Suggestions Inclusive Search?
Hi, I'm using the Terms Component to se up the autocomplete feature based on a String field. Here are the params I'm using: terms=trueterms.fl=typeterms.lower=catterms.prefix=catterms.lower.incl=false With the above params, I've been able to get suggestions for terms that start with the specified prefix. I'm wondering wether it's possible to: - have inclusive search, i.e., by typing cat, we get category, subcategory, etc.? - start suggestion from any word in the field. i.e., by typing cat, we get The best category...? Thanks! -Saïd
Re: Field Collapsing SOLR-236
Hi, But these is almost no settings in my config heres a snapshot of what i have in my solrconfig.xml config updateHandler class=solr.DirectUpdateHandler2 / requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=2048 / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.XmlUpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / !-- config for the admin interface -- admin defaultQuery*:*/defaultQuery /admin !-- config for field collapsing -- searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent / /config Am i goin wrong anywhere? Regards, Raakhi On Wed, Jun 23, 2010 at 3:28 PM, Govind Kanshi govind.kan...@gmail.comwrote: fieldType:analyzer without class or tokenizer filter list seems to point to the config - you may want to correct. On Wed, Jun 23, 2010 at 3:09 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, I checked out modules lucene from the trunk. Performed a build using the following commands ant clean ant compile ant example Which compiled successfully. I then put my existing index(using schema.xml from solr1.4.0/conf/solr/) in the multicore folder, configured solr.xml and started the server When i type in http://localhost:8983/solr i get the following error: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType:analyzer without class or tokenizer filter list at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:122) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:429) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:286) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:198) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:662) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:194) at org.mortbay.start.Main.start(Main.java:534) at org.mortbay.start.Main.start(Main.java:441) at org.mortbay.start.Main.main(Main.java:119) Caused by: org.apache.solr.common.SolrException: analyzer without class or tokenizer filter list at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:908) at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:60) at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450) at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:142) ... 32 more Then i picked up an existing index (schema.xml from solr1.3/solr/conf) and put it in multicore folder, configured solr.xml and restarted my index Collapsing worked fine. Any pointers, which part of schema.xml (solr 1.4) is causing this exception? Regards, Raakhi On Wed, Jun 23, 2010 at 1:35 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Oops this is probably i didn't
Import XML files different format?
Hi, I'm new to solr. It looks great. I would like to add a XML document in the following format in solr: ?xml version=1.0 encoding=utf-8? race go id![CDATA[...]]/id title![CDATA[...]]/title url![CDATA[...]]/url content![CDATA[...]]/content city![CDATA[...]]/city postcode![CDATA[...]]/postcode contract![CDATA[...]]/contract category![CDATA[...]]/category date![CDATA[...]]/date time![CDATA[...]]/time /go etc... /race Is there a way to do this? If yes how? Or i need to convert it with some scripts to this: add doc field name=authorsPatrick Eagar/field field name=subjectSports/field etc... Thanks for your help Regards
Re: Import XML files different format?
You can use DataImportHandler's XML/XPath capabilities to do this: http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource or you could, of course, convert your XML to Solr's XML format. Another fine option for what this data looks like, CSV format. I'd imagine you have the orginal data in a relational database though? Erik On Jun 23, 2010, at 7:59 AM, scr...@asia.com wrote: Hi, I'm new to solr. It looks great. I would like to add a XML document in the following format in solr: ?xml version=1.0 encoding=utf-8? race go id![CDATA[...]]/id title![CDATA[...]]/title url![CDATA[...]]/url content![CDATA[...]]/content city![CDATA[...]]/city postcode![CDATA[...]]/postcode contract![CDATA[...]]/contract category![CDATA[...]]/category date![CDATA[...]]/date time![CDATA[...]]/time /go etc... /race Is there a way to do this? If yes how? Or i need to convert it with some scripts to this: add doc field name=authorsPatrick Eagar/field field name=subjectSports/field etc... Thanks for your help Regards
Re: Import XML files different format?
Thanks Eric for your answer. I'll try to use DIH via data-config.xml as i might index other content with different XML structure in the futur... Will i need to have different data-config for each XML strucure content file? And then manualy cange between them? -Original Message- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, Jun 23, 2010 2:19 pm Subject: Re: Import XML files different format? You can use DataImportHandler's XML/XPath capabilities to do this: http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource or you could, of course, convert your XML to Solr's XML format. Another fine option for what this data looks like, CSV format. I'd imagine you have the orginal data in a relational database though? Erik On Jun 23, 2010, at 7:59 AM, scr...@asia.com wrote: Hi, I'm new to solr. It looks great. I would like to add a XML document in the following format in solr: ?xml version=1.0 encoding=utf-8? race go id![CDATA[...]]/id title![CDATA[...]]/title url![CDATA[...]]/url content![CDATA[...]]/content city![CDATA[...]]/city postcode![CDATA[...]]/postcode contract![CDATA[...]]/contract category![CDATA[...]]/category date![CDATA[...]]/date time![CDATA[...]]/time /go etc... /race Is there a way to do this? If yes how? Or i need to convert it with some scripts to this: add doc field name=authorsPatrick Eagar/field field name=subjectSports/field etc... Thanks for your help Regards
Re: TermsComponent - AutoComplete - Multiple Term Suggestions Inclusive Search?
Hi Saïd, I think your problem is the field's type: String. You have to use a TextField and apply tokenizers that will find subcategory if you put in cat. (Not sure which filter does that, though. I wouldn't think that the PorterStemmer cuts off prefix syllables of that kind?) If, however, you search on an analyzed version of the field it should return hits as usual according to the analyzer chain, and you can thus use the values of that field listed in the hits as suggestions. Exmple: input: potter field type: solr.TextField (with porter stemmer) finds: Harry Potter and Whatever and also Potters and Plums Cheers, Chantal On Wed, 2010-06-23 at 13:17 +0200, Saïd Radhouani wrote: Hi, I'm using the Terms Component to se up the autocomplete feature based on a String field. Here are the params I'm using: terms=trueterms.fl=typeterms.lower=catterms.prefix=catterms.lower.incl=false With the above params, I've been able to get suggestions for terms that start with the specified prefix. I'm wondering wether it's possible to: - have inclusive search, i.e., by typing cat, we get category, subcategory, etc.? - start suggestion from any word in the field. i.e., by typing cat, we get The best category...? Thanks! -Saïd
Alphabetic range
Hello all, I try since several day to build up an alphabetical range. I will explain all steps (i have the Solr1.4 Enterprise Search Server book written by Smiley and Pugh). I want get all artists beginning by the two first letter. If I request mi, I want to have as response michael jackson and all artists name beginning by mi. I defined a field type similiar to Smiley and Pugh's example p.148 fieldType name=bucketFirstTwoLetters class=solr.TextField sortMissingLast=true omitNorms=true analyser type=index tokenizer class=solr.PatternTokenizerFactory pattern=^([a-zA-Z])([a-zA-Z]).* group=2/ !-- les deux premieres lettres-- /analyser analyser type=query tokenizer class=solr.KeywordTokenizerFactory/ /analyser /fieldType I defined the field ArtistSort like : field name=ArtistSort type=bucketFirstTwoLetters stored=true multivalued=false/ To the request : http://localhost:8983/solr/music/select?indent=onq=yuqt=standardwt=standardfacet=onfacet.field=ArtistSortfacetsort=lexfacet.missing=onfacet.method=enumfl=ArtistSort I get : http://lucene.472066.n3.nabble.com/file/n916716/select.xml select.xml I don't understand why the pattern doesn't my exacty. For example An An Yu matches but I only want artists whom name begins by yu. And I know that an artist named ReYu would match because ReYu would be interpreted as Re Yu (as two words). I also tried to make an other type of queries like : http://localhost:8983/solr/music/select?indent=onversion=2.2q=ArtistSort:mi*fq=start=0rows=10fl=ArtistSortqt=standardwt=standardexplainOther=hl.fl= I get exacly what I would. I made several tries, I get only artist's names wich begins by the good first to letters. But I get very few responses, see there : result name=response numFound=6 start=0 doc str name=ArtistSortmike manne and tiger blues/str /doc − doc str name=ArtistSortmimika/str /doc − doc str name=ArtistSortmiduno/str /doc − doc str name=ArtistSortmilue macïro/str /doc − doc str name=ArtistSortmister pringle/str /doc − doc str name=ArtistSortmimmai/str /doc In my index there is more than 80 000 artists... I really don't understand why I can't get more responses. I think about the problem since days and days and now my brain freezes Thank you in advance. Sophie -- View this message in context: http://lucene.472066.n3.nabble.com/Alphabetic-range-tp916716p916716.html Sent from the Solr - User mailing list archive at Nabble.com.
Setting up Eclipse with merged Lucene Solr source tree
Hi, I'm trying to setup and eclipse environment for combined Lusolr tree. I've created a Lucene project containing /trunk/lusolr/lucene and /trunk/lusolr/modules as one project and /trunk/lusolr/solr as another. I've added lucene project as a dependency to Solr project, removed solr libs from lucene project and added Lucene project to dependencies of Solr project. Lucene source tree is fine but in the Solr tree I get 5 errors The method getTextContent() is undefined for the type Node TestConfig.java /Solr/src/test/org/apache/solr/core line 91 The method getTextContent() is undefined for the type Node TestConfig.java /Solr/src/test/org/apache/solr/core line 94 The method setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java /Solr/src/java/org/apache/solr/core line 113 The method setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory DataImporter.java /Solr/contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport line The method setXIncludeAware(boolean) is undefined for the type Object TestXIncludeConfig.java /Solr/src/test/org/apache/solr/core line 32 Is this the correct way to setup eclipse after the source tree merge? Thanks in advance Ukyo
dataimport.properties is not updated on delta-import
Hello! I am having some difficulties getting dataimport (DIH) to behave correctly in Solr 1.4.0. Indexing itself works just as it is supposed to with both full-import and delta-import adding modified or newly created records to the index. The problem is however that the date and time of the last delta-import is not updated in the dataimport.properites file. The only time the file gets updated is when performing a full-import. Now, this is not a huge problem since delta-import will simply disregard records already imported (due to the primary key), but it seems wasteful to fetch records which have already been added on previous runs. Also, as the database grows the delta-imports will take longer and longer. Does anyone know of anything I might have overlooked or known bugs? Thanks in advance! Johan Andersson -- View this message in context: http://lucene.472066.n3.nabble.com/dataimport-properties-is-not-updated-on-delta-import-tp916753p916753.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing Rich Format Documents using Data Import Handler (DIH) and the TikaEntityProcessor
Please refer to this thread for history: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201006.mbox/%3c4c1b6bb6.7010...@gmail.com%3e I'm trying to integrate the TikaEntityProcessor as suggested. I'm using Solr Version: 1.4.0 and getting the following error: java.lang.ClassNotFoundException: Unable to load BinURLDataSource or org.apache.solr.handler.dataimport.BinURLDataSource curl -s http://test.html|curl http://localhost:9080/solr/update/extract?extractOnly=true --data-binary @- -H 'Content-type:text/html' ... works fine so presumably my Tika processor is working. My data-config.xml looks like this: dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@whatever:12345:whatever user=me name=ds-db password=secret/ dataSource type=BinURLDataSource name=ds-url/ document entity name=my_database dataSource=ds-db query=select * from my_database where rownum lt;=2 field column=CONTENT_IDname=content_id/ field column=CMS_TITLE name=cms_title/ field column=FORM_TITLEname=form_title/ field column=FILE_SIZE name=file_size/ field column=KEYWORDS name=keywords/ field column=DESCRIPTION name=description/ field column=CONTENT_URL name=content_url/ /entity entity name=my_database_url dataSource=ds-url query=select CONTENT_URL from my_database where content_id='${my_database.CONTENT_ID}' entity processor=TikaEntityProcessor dataSource=ds-url format=text url=http://www.mysite.com/${my_database.content_url}; field column=text/ /entity /entity /document /dataConfig I added the entity name=my_database_url section to an existing (working) database entity to be able to have Tika index the content pointed to by the content_url. Is there anything obviously wrong with what I've tried so far because this is not working, it keeps rolling back with the error above. Thanks - Tod
Re: TermsComponent - AutoComplete - Multiple Term Suggestions Inclusive Search?
To build your autocompletion, you can use the NGramFilterFactory. If you type cat It will match subcategory and the best category. If you change your mind and you don't want anymore to match subcategory, you can use the EdgeNGramFilterFactory. -- View this message in context: http://lucene.472066.n3.nabble.com/TermsComponent-AutoComplete-Multiple-Term-Suggestions-Inclusive-Search-tp916530p916769.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimport.properties is not updated on delta-import
Hi, what I have experienced is that the primary key seems to be case sensitive for the delta queries, at least for some jdcd drivers... see http://lucene.472066.n3.nabble.com/Problem-with-DIH-delta-import-on-JDBC-tp763469p765262.html ... so make sure you specify it with the correct case (e.g. ID instead of id) in your db-data-config.xml. Maybe that's the problem... Cheers, Stefan Am 23.06.2010 15:09, schrieb warb: Hello! I am having some difficulties getting dataimport (DIH) to behave correctly in Solr 1.4.0. Indexing itself works just as it is supposed to with both full-import and delta-import adding modified or newly created records to the index. The problem is however that the date and time of the last delta-import is not updated in the dataimport.properites file. The only time the file gets updated is when performing a full-import. Now, this is not a huge problem since delta-import will simply disregard records already imported (due to the primary key), but it seems wasteful to fetch records which have already been added on previous runs. Also, as the database grows the delta-imports will take longer and longer. Does anyone know of anything I might have overlooked or known bugs? Thanks in advance! Johan Andersson -- *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de ***
fuzzy query performance
Hi! How can I improve the performance of a fuzzy search like: mihchael~0.7 through a relative large index (~1 million docs)? It takes over 15 seconds at the moment if we would perform it on the normal text search field. I searched the web and the jira and couldn't find anything related to that. Any pointers or ideas would be appreciated! Regards, Peter.
Re: fuzzy query performance
On 6/23/10 9:48 AM, Peter Karich wrote: Hi! How can I improve the performance of a fuzzy search like: mihchael~0.7 through a relative large index (~1 million docs)? It takes over 15 seconds at the moment if we would perform it on the normal text search field. I searched the web and the jira and couldn't find anything related to that. Any pointers or ideas would be appreciated! Regards, Peter. Solr trunk should have much improved fuzzy speeds (due to some very cool work that was done in Lucene) - you using 1.4? -- - Mark http://www.lucidimagination.com
Re: Setting up Eclipse with merged Lucene Solr source tree
Did you see this page? http://wiki.apache.org/solr/HowToContribute http://wiki.apache.org/solr/HowToContributeEspecially down near the end, the section Development Environment Tips HTH Erick On Wed, Jun 23, 2010 at 8:57 AM, Ukyo Virgden ukyovirg...@gmail.com wrote: Hi, I'm trying to setup and eclipse environment for combined Lusolr tree. I've created a Lucene project containing /trunk/lusolr/lucene and /trunk/lusolr/modules as one project and /trunk/lusolr/solr as another. I've added lucene project as a dependency to Solr project, removed solr libs from lucene project and added Lucene project to dependencies of Solr project. Lucene source tree is fine but in the Solr tree I get 5 errors The method getTextContent() is undefined for the type Node TestConfig.java /Solr/src/test/org/apache/solr/core line 91 The method getTextContent() is undefined for the type Node TestConfig.java /Solr/src/test/org/apache/solr/core line 94 The method setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java /Solr/src/java/org/apache/solr/core line 113 The method setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory DataImporter.java /Solr/contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport line The method setXIncludeAware(boolean) is undefined for the type Object TestXIncludeConfig.java /Solr/src/test/org/apache/solr/core line 32 Is this the correct way to setup eclipse after the source tree merge? Thanks in advance Ukyo
Re: Help with highlighting
Here's my request: q=ASA+AND+minisite_id%3A36version=1.3json.nl=maprows=10start=0wt=jsonhl=truehl.fl=%2Ahl.simple.pre=%3Cspan+class%3D%22hl%22%3Ehl.simple.post=%3C%2Fspan%3Ehl.fragsize=0hl.mergeContiguous=false And here's what happened: It didn't return results, even when I applied an asterisk for which fields highlight. I tried other fields and that didn't work either, however all_text is the only one that works. Any other ideas why the other fields won't highlight? Thanks. -Original Message- From: Erik Hatcher erik.hatc...@gmail.com Sent: Tuesday, June 22, 2010 9:49pm To: solr-user@lucene.apache.org Subject: Re: Help with highlighting You need to share with us the Solr request you made, any any custom request handler settings that might map to. Chances are you just need to twiddle with the highlighter parameters (see wiki for docs) to get it to do what you want. Erik On Jun 22, 2010, at 4:42 PM, n...@frameweld.com wrote: Hi, I need help with highlighting fields that would match a query. So far, my results only highlight if the field is from all_text, and I would like it to use other fields. It simply isn't the case if I just turn highlighting on. Any ideas why it only applies to all_text? Here is my schema: ?xml version=1.0 ? schema name=Search version=1.1 types !-- Basic Solr Bundled Data Types -- !-- Rudimentary types -- fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true / fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true / !-- Non-sortable numeric types -- fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=double class=solr.DoubleField omitNorms=true/ !-- Sortable numeric types -- fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldType name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true/ fieldType name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ fieldType name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true/ !-- Date/Time types -- fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ !-- Pseudo types -- fieldType name=random class=solr.RandomSortField indexed=true / !-- Analyzing types -- fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=textTight class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter
Re: Help with highlighting
It looks to me like a tokenisation issue, all_text content and the query text will match, but the string fieldtype fields 'might not' and therefore will not be highlighted. On Wed, Jun 23, 2010 at 4:40 PM, n...@frameweld.com wrote: Here's my request: q=ASA+AND+minisite_id%3A36version=1.3json.nl =maprows=10start=0wt=jsonhl=truehl.fl=%2Ahl.simple.pre=%3Cspan+class%3D%22hl%22%3Ehl.simple.post=%3C%2Fspan%3Ehl.fragsize=0hl.mergeContiguous=false And here's what happened: It didn't return results, even when I applied an asterisk for which fields highlight. I tried other fields and that didn't work either, however all_text is the only one that works. Any other ideas why the other fields won't highlight? Thanks. -Original Message- From: Erik Hatcher erik.hatc...@gmail.com Sent: Tuesday, June 22, 2010 9:49pm To: solr-user@lucene.apache.org Subject: Re: Help with highlighting You need to share with us the Solr request you made, any any custom request handler settings that might map to. Chances are you just need to twiddle with the highlighter parameters (see wiki for docs) to get it to do what you want. Erik On Jun 22, 2010, at 4:42 PM, n...@frameweld.com wrote: Hi, I need help with highlighting fields that would match a query. So far, my results only highlight if the field is from all_text, and I would like it to use other fields. It simply isn't the case if I just turn highlighting on. Any ideas why it only applies to all_text? Here is my schema: ?xml version=1.0 ? schema name=Search version=1.1 types !-- Basic Solr Bundled Data Types -- !-- Rudimentary types -- fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true / fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true / !-- Non-sortable numeric types -- fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=double class=solr.DoubleField omitNorms=true/ !-- Sortable numeric types -- fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldType name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true/ fieldType name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ fieldType name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true/ !-- Date/Time types -- fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ !-- Pseudo types -- fieldType name=random class=solr.RandomSortField indexed=true / !-- Analyzing types -- fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=textTight class=solr.TextField positionIncrementGap=100
remove from list
Hey SOLR folks -- There's too much info for me to digest, so please remove me from the email threads. However, if we can build you a forum, bulletin board or other web- based tool, please let us know. For that matter, we would be happy to build you a new website. Bill O'Connor is our CTO and the Drupal.org SOLR Redesign Lead. So we love SOLR! Let us know how we can support your efforts. Susan Rust VP of Client Services If you wish to travel quickly, go alone If you wish to travel far, go together Achieve Internet 1767 Grand Avenue, Suite 2 San Diego, CA 92109 800-618-8777 x106 858-453-5760 x106 Susan-Rust (skype) @Susan_Rust (twitter) @Achieveinternet (twitter) @drupalsandiego (San Diego Drupal Users' Group Twitter) This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. On Jun 23, 2010, at 1:52 AM, Mark Allan wrote: Cheers, Geert-Jan, that's very helpful. We won't always be searching with dates and we wouldn't want duplicates to show up in the results, so your second suggestion looks like a good workaround if I can't solve the actual problem. I didn't know about FieldCollapsing, so I'll definitely keep it in mind. Thanks Mark On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote: Perhaps my answer is useless, bc I don't have an answer to your direct question, but: You *might* want to consider if your concept of a solr-document is on the correct granular level, i.e: your problem posted could be tackled (afaik) by defining a document being a 'sub-event' with only 1 daterange. So for each event-doc you have now, this is replaced by several sub- event docs in this proposed situation. Additionally each sub-event doc gets an additional field 'parent- eventid' which maps to something like an event-id (which you're probably using) . So several sub-event docs can point to the same event-id. Lastly, all sub-event docs belonging to a particular event implement all the other fields that you may have stored in that particular event-doc. Now you can query for events based on data-rages like you envisioned, but instead of returning events you return sub-event-docs. However since all data of the original event (except the multiple dateranges) is available in the subevent-doc this shouldn't really bother the client. If you need to display all dates of an event (the only info missing from the returned solr-doc) you could easily store it in a RDB and fetch it using the defined parent-eventid. The only caveat I see, is that possibly multiple sub-events with the same 'parent-eventid' might get returned for a particular query. This however depends on the type of queries you envision. i.e: 1) If you always issue queries with date-filters, and *assuming* that sub-events of a particular event don't temporally overlap, you will never get multiple sub-events returned. 2) if 1) doesn't hold and assuming you *do* mind multiple sub- events of the same actual event, you could try to use Field Collapsing on 'parent-eventid' to only return the first sub-event per parent- eventid that matches the rest of your query. (Note however, that Field Collapsing is a patch at the moment. http://wiki.apache.org/solr/FieldCollapsing) Not sure if this helped you at all, but at the very least it was a nice conceptual exercise ;-) Cheers, Geert-Jan 2010/6/22 Mark Allan mark.al...@ed.ac.uk Hi all, Firstly, I apologise for the length of this email but I need to describe properly what I'm doing before I get to the problem! I'm working on a project just now which requires the ability to store and search on temporal coverage data - ie. a field which specifies a date range during which a certain event took place. I hunted around for a few days and couldn't find anything which seemed to fit, so I had a go at writing my own field type based on solr.PointType. It's used as follows: schema.xml fieldType name=temporal class=solr.TemporalCoverage dimension=2 subFieldSuffix=_i/ field name=daterange type=temporal indexed=true stored=true multiValued=true/ data.xml add doc ... field name=daterange1940,1945/field /doc /add Internally, this gets stored as: arr
RE: remove from list
If you want to unsubscribe, then you can do so [1] without trying to sell something ;) [1]: http://lucene.apache.org/solr/mailing_lists.html Cheers! -Original message- From: Susan Rust su...@achieveinternet.com Sent: Wed 23-06-2010 18:23 To: solr-user@lucene.apache.org; Erik Hatcher erik.hatc...@gmail.com; Subject: remove from list Hey SOLR folks -- There's too much info for me to digest, so please remove me from the email threads. However, if we can build you a forum, bulletin board or other web- based tool, please let us know. For that matter, we would be happy to build you a new website. Bill O'Connor is our CTO and the Drupal.org SOLR Redesign Lead. So we love SOLR! Let us know how we can support your efforts. Susan Rust VP of Client Services If you wish to travel quickly, go alone If you wish to travel far, go together Achieve Internet 1767 Grand Avenue, Suite 2 San Diego, CA 92109 800-618-8777 x106 858-453-5760 x106 Susan-Rust (skype) @Susan_Rust (twitter) @Achieveinternet (twitter) @drupalsandiego (San Diego Drupal Users' Group Twitter) This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. On Jun 23, 2010, at 1:52 AM, Mark Allan wrote: Cheers, Geert-Jan, that's very helpful. We won't always be searching with dates and we wouldn't want duplicates to show up in the results, so your second suggestion looks like a good workaround if I can't solve the actual problem. I didn't know about FieldCollapsing, so I'll definitely keep it in mind. Thanks Mark On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote: Perhaps my answer is useless, bc I don't have an answer to your direct question, but: You *might* want to consider if your concept of a solr-document is on the correct granular level, i.e: your problem posted could be tackled (afaik) by defining a document being a 'sub-event' with only 1 daterange. So for each event-doc you have now, this is replaced by several sub- event docs in this proposed situation. Additionally each sub-event doc gets an additional field 'parent- eventid' which maps to something like an event-id (which you're probably using) . So several sub-event docs can point to the same event-id. Lastly, all sub-event docs belonging to a particular event implement all the other fields that you may have stored in that particular event-doc. Now you can query for events based on data-rages like you envisioned, but instead of returning events you return sub-event-docs. However since all data of the original event (except the multiple dateranges) is available in the subevent-doc this shouldn't really bother the client. If you need to display all dates of an event (the only info missing from the returned solr-doc) you could easily store it in a RDB and fetch it using the defined parent-eventid. The only caveat I see, is that possibly multiple sub-events with the same 'parent-eventid' might get returned for a particular query. This however depends on the type of queries you envision. i.e: 1) If you always issue queries with date-filters, and *assuming* that sub-events of a particular event don't temporally overlap, you will never get multiple sub-events returned. 2) if 1) doesn't hold and assuming you *do* mind multiple sub- events of the same actual event, you could try to use Field Collapsing on 'parent-eventid' to only return the first sub-event per parent- eventid that matches the rest of your query. (Note however, that Field Collapsing is a patch at the moment. http://wiki.apache.org/solr/FieldCollapsing) Not sure if this helped you at all, but at the very least it was a nice conceptual exercise ;-) Cheers, Geert-Jan 2010/6/22 Mark Allan mark.al...@ed.ac.uk Hi all, Firstly, I apologise for the length of this email but I need to describe properly what I'm doing before I get to the problem! I'm working on a project just now which requires the ability to store and search on temporal coverage data - ie. a field which specifies a date range during which a certain event took place. I hunted around for a few days and couldn't find anything which seemed to fit, so I had a go at writing my
Re: remove from list
Will do -- but wasn't selling -- trying to donate! Susan Rust VP of Client Services If you wish to travel quickly, go alone If you wish to travel far, go together Achieve Internet 1767 Grand Avenue, Suite 2 San Diego, CA 92109 800-618-8777 x106 858-453-5760 x106 Susan-Rust (skype) @Susan_Rust (twitter) @Achieveinternet (twitter) @drupalsandiego (San Diego Drupal Users' Group Twitter) This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. On Jun 23, 2010, at 9:30 AM, Markus Jelsma wrote: If you want to unsubscribe, then you can do so [1] without trying to sell something ;) [1]: http://lucene.apache.org/solr/mailing_lists.html Cheers! -Original message- From: Susan Rust su...@achieveinternet.com Sent: Wed 23-06-2010 18:23 To: solr-user@lucene.apache.org; Erik Hatcher erik.hatc...@gmail.com; Subject: remove from list Hey SOLR folks -- There's too much info for me to digest, so please remove me from the email threads. However, if we can build you a forum, bulletin board or other web- based tool, please let us know. For that matter, we would be happy to build you a new website. Bill O'Connor is our CTO and the Drupal.org SOLR Redesign Lead. So we love SOLR! Let us know how we can support your efforts. Susan Rust VP of Client Services If you wish to travel quickly, go alone If you wish to travel far, go together Achieve Internet 1767 Grand Avenue, Suite 2 San Diego, CA 92109 800-618-8777 x106 858-453-5760 x106 Susan-Rust (skype) @Susan_Rust (twitter) @Achieveinternet (twitter) @drupalsandiego (San Diego Drupal Users' Group Twitter) This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. On Jun 23, 2010, at 1:52 AM, Mark Allan wrote: Cheers, Geert-Jan, that's very helpful. We won't always be searching with dates and we wouldn't want duplicates to show up in the results, so your second suggestion looks like a good workaround if I can't solve the actual problem. I didn't know about FieldCollapsing, so I'll definitely keep it in mind. Thanks Mark On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote: Perhaps my answer is useless, bc I don't have an answer to your direct question, but: You *might* want to consider if your concept of a solr-document is on the correct granular level, i.e: your problem posted could be tackled (afaik) by defining a document being a 'sub-event' with only 1 daterange. So for each event-doc you have now, this is replaced by several sub- event docs in this proposed situation. Additionally each sub-event doc gets an additional field 'parent- eventid' which maps to something like an event-id (which you're probably using) . So several sub-event docs can point to the same event-id. Lastly, all sub-event docs belonging to a particular event implement all the other fields that you may have stored in that particular event-doc. Now you can query for events based on data-rages like you envisioned, but instead of returning events you return sub-event-docs. However since all data of the original event (except the multiple dateranges) is available in the subevent-doc this shouldn't really bother the client. If you need to display all dates of an event (the only info missing from the returned solr-doc) you could easily store it in a RDB and fetch it using the defined parent-eventid. The only caveat I see, is that possibly multiple sub-events with the same 'parent-eventid' might get returned for a particular query. This however depends on the type of queries you envision. i.e: 1) If you always issue
Re: Help with highlighting
Thanks, that's exactly the problem. I've tried different types, even a fieldType that had no tokenizers and that didn't work. However, text just gives me my results as wanted. -Original Message- From: dan sutton danbsut...@gmail.com Sent: Wednesday, June 23, 2010 12:06pm To: solr-user@lucene.apache.org Subject: Re: Help with highlighting It looks to me like a tokenisation issue, all_text content and the query text will match, but the string fieldtype fields 'might not' and therefore will not be highlighted. On Wed, Jun 23, 2010 at 4:40 PM, n...@frameweld.com wrote: Here's my request: q=ASA+AND+minisite_id%3A36version=1.3json.nl =maprows=10start=0wt=jsonhl=truehl.fl=%2Ahl.simple.pre=%3Cspan+class%3D%22hl%22%3Ehl.simple.post=%3C%2Fspan%3Ehl.fragsize=0hl.mergeContiguous=false And here's what happened: It didn't return results, even when I applied an asterisk for which fields highlight. I tried other fields and that didn't work either, however all_text is the only one that works. Any other ideas why the other fields won't highlight? Thanks. -Original Message- From: Erik Hatcher erik.hatc...@gmail.com Sent: Tuesday, June 22, 2010 9:49pm To: solr-user@lucene.apache.org Subject: Re: Help with highlighting You need to share with us the Solr request you made, any any custom request handler settings that might map to. Chances are you just need to twiddle with the highlighter parameters (see wiki for docs) to get it to do what you want. Erik On Jun 22, 2010, at 4:42 PM, n...@frameweld.com wrote: Hi, I need help with highlighting fields that would match a query. So far, my results only highlight if the field is from all_text, and I would like it to use other fields. It simply isn't the case if I just turn highlighting on. Any ideas why it only applies to all_text? Here is my schema: ?xml version=1.0 ? schema name=Search version=1.1 types !-- Basic Solr Bundled Data Types -- !-- Rudimentary types -- fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true / fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true / !-- Non-sortable numeric types -- fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=double class=solr.DoubleField omitNorms=true/ !-- Sortable numeric types -- fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldType name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true/ fieldType name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ fieldType name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true/ !-- Date/Time types -- fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ !-- Pseudo types -- fieldType name=random class=solr.RandomSortField indexed=true / !-- Analyzing types -- fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/
Highlight question
I just started working with the highlighting. I am using the default configurations. I have a field that I can get a single highlight to occur marking the data. What I would like to do is this, Given a word say 'tumor', and the sentence the lower tumor grew 1.5 cm. blah blah blah we need to remove the tumor in the next surgery I would like to get em the lower tumor grew 1.5 cm /em. blah blah blah we need to ...em remove the tumor in the next /em. surgery Thus finding multiple references to the work and only grabbing a few words around it. In the solrconfig.xml I have been able to change the hl.simple.pre/post variable, but when I try to change the hl,regex pattern or the hl.snippets they don't have any effect. I thought the hl.snippets would alow me to find more than one and highlight it, and well I tried a bunch of regex patterns but they didn't do anything. here is a snippet of the config file. Any help is appreciated. Gregg !-- A regular-expression-based fragmenter (f.i., for sentence extraction) -- fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter lst name=defaults !-- slightly smaller fragsizes work better because of slop -- int name=hl.snippets4/int int name=hl.fragsize70/int !-- allow 50% slop on fragment sizes -- float name=hl.regex.slop0.2/float !-- a basic sentence pattern -- str name=hl.regex.pattern[-\w ,/\n\']{1,1}/str /lst /fragmenter !-- Configure the standard formatter -- formatter name=html class=org.apache.solr.highlight.HtmlFormatter default=true lst name=defaults int name=hl.snippets4/int int name=hl.fragsize100/int str name=hl.simple.pre![CDATA[...em]]/str str name=hl.simple.post![CDATA[/em]]/str /lst
Help with sorting
Hi everyone , I'm stuck in sorting with solr . I have documents of some institutions differentiated by an id named instanta . I indexed all those documents and among other things I put in the index the date the document was created and the id of the institution .When I want sort the documents wich contain a certain word by date or by instituion all I get is an order that I don't understand . field name=datecreated type=date indexed=true stored=false / field name=instanta type=int indexed=true stored=false required=true / QueryOptions options = new QueryOptions { Rows = resultsPerPage, Start = (pageNumber - 1) * resultsPerPage, OrderBy = new[] { new SortOrder(instanta, Order.DESC) } }; Thank you in advance jud. Adrian Neacsu Presedinte Tribunalul Vrancea http://www.adrianneacsu.jurindex.ro www.jurisprudenta.org www.societateapentrujustitie.ro (+40) 0721949875 ; (+40) 0749182508 fax 0337814221
DIH and dynamicField
I am new to the list so any coaching on asking question is much appreciated. I am having a problem where importing with DIH and attempting to use dynamicField produces no result. I get no error, nor do I get a message in the log. I found this: https://issues.apache.org/jira/browse/SOLR-742 which says the issue was closed in bulk for the 1.4 release. The messages above seem to indicate the patch was in/out/good/bad, so I am not sure if the issue was fixed as we are seeing the same behavior described in the bug. Has this issue, in fact, been resolved? Is anyone using DIH and dynamicField successfully together? Solr is truly fantastic (so is DIH for that matter). Thank you! Boyd Hemphill
Re: fuzzy query performance
Hi Mark! Solr trunk should have much improved fuzzy speeds (due to some very cool work that was done in Lucene) - you using 1.4? yes. So, you mean I should try it out her: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/ or some 'more stable' branch? http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev/ What would you choose? Regards, Peter. Hi! How can I improve the performance of a fuzzy search like: mihchael~0.7 through a relative large index (~1 million docs)? It takes over 15 seconds at the moment if we would perform it on the normal text search field. I searched the web and the jira and couldn't find anything related to that. Any pointers or ideas would be appreciated! Regards, Peter. Solr trunk should have much improved fuzzy speeds (due to some very cool work that was done in Lucene) - you using 1.4?
Stemmed and/or unStemmed field
Hello all, One quick question, trying to find out what scenario would work best. We have huge free text dataset containing product titles, descriptions. Unfortunately, we don't have the data categorized so we rely on 'search relevancy + synonyms' heavily to categorize. Here is what I am trying to do : Someone clicks on 'Comforters Pillows' , we would want the results to be filtered where title has keyword 'Comforter' or 'Pillows' but we have been getting results with word 'comfort' in the title. I assume it is because of stemming. What is the right way to handle this? I am thinking to create another unstemmed field as 'title_unstemmed' which stores the data unstemmed. So basically, with dismax - I could boost score on unstemmed field. I can think of other scenarios where stemming would be needed so stemmed field would still match. Does that sound like something that will work? Any suggestions please? Much appreciated
Can solr return pretty text as the content?
When I feed pretty text into solr for indexing from lucene and search for it, the content is always returned as one long line of text. Is there a way for solr to return the pretty formatted text to me? -- View this message in context: http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-content-tp917912p917912.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Can solr return pretty text as the content?
Define Pretty text. 1)Are you talking about XML/JSON returned by SOLR is not pretty ? If yes, try indent=on with your query params 2)Or talking about data in certain field? Solr returns what you feed it. Look at your filters for that field type. Your filters/tokenizer may be stripping the formatting. From: JohnRodey [via Lucene] [mailto:ml-node+917912-920852633-124...@n3.nabble.com] Sent: Wednesday, June 23, 2010 1:19 PM To: caman Subject: Can solr return pretty text as the content? When I feed pretty text into solr for indexing from lucene and search for it, the content is always returned as one long line of text. Is there a way for solr to return the pretty formatted text to me? _ View message @ http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-conten t-tp917912p917912.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-content-tp917912p917966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlight question
In the solrconfig.xml I have been able to change the hl.simple.pre/post variable, but when I try to change the hl,regex pattern or the hl.snippets they don't have any effect. I thought the hl.snippets would alow me to find more than one and highlight it, and well I tried a bunch of regex patterns but they didn't do anything. int name=hl.snippets4/int param should go to under default section of your default SearchHandler. requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsall/str int name=hl.snippets4/int /lst /requestHandler Also hl.formatter=regex paremter is required to activate regular expression based fragmenter.
Re: Help with sorting
When I want sort the documents wich contain a certain word by date or by instituion all I get is an order that I don't understand . field name=datecreated type=date indexed=true stored=false / field name=instanta type=int indexed=true stored=false required=true / You need to use a sortable type: sint with solr 1.3; tint with solr 1.4 field name=instanta type=tint
Re: fuzzy query performance
On Wed, Jun 23, 2010 at 3:34 PM, Peter Karich peat...@yahoo.de wrote: So, you mean I should try it out her: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/ yes, the speedups are only in trunk. -- Robert Muir rcm...@gmail.com
Re: DIH and dynamicField
Boyd Hemphill-2 wrote: I am having a problem where importing with DIH and attempting to use dynamicField produces no result. I get no error, nor do I get a message in the log. It would help if you posted the relevant parts of your data-config.xml and schema.xml. If you are doing a straight column to name mapping my first guess would be you could have those backwards or there is some misconfiguration in your schema.xml. For example if you have a database column foo and you want to add it to the foo_dynamic field you should be using something like this: solrconfig.xml dynamicField name=*_dynamic .../ data-config.xml field column=foo name=foo_dynamic/ Hope this helps. - Robert Zotter -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-and-dynamicField-tp917823p918189.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stemmed and/or unStemmed field
On Wed, Jun 23, 2010 at 3:58 PM, Vishal A. aboxfortheotherst...@gmail.comwrote: Here is what I am trying to do : Someone clicks on 'Comforters Pillows' , we would want the results to be filtered where title has keyword 'Comforter' or 'Pillows' but we have been getting results with word 'comfort' in the title. I assume it is because of stemming. What is the right way to handle this? from your examples, it seems a more lightweight stemmer might be an easy option: https://issues.apache.org/jira/browse/LUCENE-2503 -- Robert Muir rcm...@gmail.com
RE: Stemmed and/or unStemmed field
Ahh,perfect. Will take a look. thanks From: Robert Muir [via Lucene] [mailto:ml-node+918302-232685105-124...@n3.nabble.com] Sent: Wednesday, June 23, 2010 4:17 PM To: caman Subject: Re: Stemmed and/or unStemmed field On Wed, Jun 23, 2010 at 3:58 PM, Vishal A. [hidden email]wrote: Here is what I am trying to do : Someone clicks on 'Comforters Pillows' , we would want the results to be filtered where title has keyword 'Comforter' or 'Pillows' but we have been getting results with word 'comfort' in the title. I assume it is because of stemming. What is the right way to handle this? from your examples, it seems a more lightweight stemmer might be an easy option: https://issues.apache.org/jira/browse/LUCENE-2503 -- Robert Muir [hidden email] _ View message @ http://lucene.472066.n3.nabble.com/Stemmed-and-or-unStemmed-field-tp917876p9 18302.html To start a new topic under Solr - User, email ml-node+472068-464289649-124...@n3.nabble.com To unsubscribe from Solr - User, click (link removed) GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx here. -- View this message in context: http://lucene.472066.n3.nabble.com/Stemmed-and-or-unStemmed-field-tp917876p918309.html Sent from the Solr - User mailing list archive at Nabble.com.
Some minor Solritas layout tweaks
I grabbed the latest greatest from trunk, and then had to make a few minor layout tweaks. 1. In main.css, the .query-box input { height} isn't tall enough (at least on my Mac 10.5/FF 3.6 config), so character descenders get clipped. I bumped it from 40px to 50px, and that fixed the issue for me. 2. The constraint text (for removing facet constraints) overlaps with the Solr logo. It looks like the div that contains this anchor text is missing a class=constraints, as I see a .constraints in the CSS. I added this class name, and also (to main.css): .constraints { margin-top: 10px; } But IANAWD, so this is probably not the best way to fix the issue. 3. And then I see a .constraints-title in the CSS, but it's not used. Was the intent of this to set the '' character to gray? 4. It seems silly to open JIRA issues for these types of things, but I also don't want to add to noise on the list. Which approach is preferred? Thanks, -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Multiple Solr Webapps in Glassfish with JNDI
Does anybody know how to setup multiple Solr webapps in Glassfish with JNDI? -Kelly -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Solr-Webapps-in-Glassfish-with-JNDI-tp918383p918383.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting up Eclipse with merged Lucene Solr source tree
I have found it easier to make these projects in my Eclipse workspace and make remote links to the parts that I really want. This cuts the total stuff in the project- cuts build times, 'search everywhere' times, menus full of classes named '*file*', etc. But git may have problems with this, and git is a lifesaver for playing with patches etc. Lance On Wed, Jun 23, 2010 at 8:03 AM, Erick Erickson erickerick...@gmail.com wrote: Did you see this page? http://wiki.apache.org/solr/HowToContribute http://wiki.apache.org/solr/HowToContributeEspecially down near the end, the section Development Environment Tips HTH Erick On Wed, Jun 23, 2010 at 8:57 AM, Ukyo Virgden ukyovirg...@gmail.com wrote: Hi, I'm trying to setup and eclipse environment for combined Lusolr tree. I've created a Lucene project containing /trunk/lusolr/lucene and /trunk/lusolr/modules as one project and /trunk/lusolr/solr as another. I've added lucene project as a dependency to Solr project, removed solr libs from lucene project and added Lucene project to dependencies of Solr project. Lucene source tree is fine but in the Solr tree I get 5 errors The method getTextContent() is undefined for the type Node TestConfig.java /Solr/src/test/org/apache/solr/core line 91 The method getTextContent() is undefined for the type Node TestConfig.java /Solr/src/test/org/apache/solr/core line 94 The method setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory Config.java /Solr/src/java/org/apache/solr/core line 113 The method setXIncludeAware(boolean) is undefined for the type DocumentBuilderFactory DataImporter.java /Solr/contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport line The method setXIncludeAware(boolean) is undefined for the type Object TestXIncludeConfig.java /Solr/src/test/org/apache/solr/core line 32 Is this the correct way to setup eclipse after the source tree merge? Thanks in advance Ukyo -- Lance Norskog goks...@gmail.com
Re: DIH and dynamicField
A side comment about patches and JIRA- the second-to-last comment on SOLR-742 says ''Committed'. That means one of the committers (Shalin in this case) committed the fix. It was in 2008 so it's in Solr 1.4. https://issues.apache.org/jira/browse/SOLR-742?focusedCommentId=12643747page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12643747 But, yes, Robert is right: post what you can of your config files. On Wed, Jun 23, 2010 at 3:11 PM, Robert Zotter robertzot...@gmail.com wrote: Boyd Hemphill-2 wrote: I am having a problem where importing with DIH and attempting to use dynamicField produces no result. I get no error, nor do I get a message in the log. It would help if you posted the relevant parts of your data-config.xml and schema.xml. If you are doing a straight column to name mapping my first guess would be you could have those backwards or there is some misconfiguration in your schema.xml. For example if you have a database column foo and you want to add it to the foo_dynamic field you should be using something like this: solrconfig.xml dynamicField name=*_dynamic .../ data-config.xml field column=foo name=foo_dynamic/ Hope this helps. - Robert Zotter -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-and-dynamicField-tp917823p918189.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Multiple Solr Webapps in Glassfish with JNDI
Hi Kelly, I'm not much of a Classfish user, but have you tried following the JNDI instructions for Tomcat, maybe that works for Glassfish, too? http://search-lucene.com/?q=jndifc_project=Solr Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Kelly Taylor wired...@hotmail.com To: solr-user@lucene.apache.org Sent: Wed, June 23, 2010 8:03:48 PM Subject: Multiple Solr Webapps in Glassfish with JNDI Does anybody know how to setup multiple Solr webapps in Glassfish with JNDI? -Kelly -- View this message in context: href=http://lucene.472066.n3.nabble.com/Multiple-Solr-Webapps-in-Glassfish-with-JNDI-tp918383p918383.html; target=_blank http://lucene.472066.n3.nabble.com/Multiple-Solr-Webapps-in-Glassfish-with-JNDI-tp918383p918383.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fuzzy query performance
Btw. here you can see Robert's presentation on what he did to speed up fuzzy queries: http://www.slideshare.net/otisg Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Original Message From: Robert Muir rcm...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, June 23, 2010 5:13:10 PM Subject: Re: fuzzy query performance On Wed, Jun 23, 2010 at 3:34 PM, Peter Karich ymailto=mailto:peat...@yahoo.de; href=mailto:peat...@yahoo.de;peat...@yahoo.de wrote: So, you mean I should try it out her: href=http://svn.apache.org/viewvc/lucene/dev/trunk/solr/; target=_blank http://svn.apache.org/viewvc/lucene/dev/trunk/solr/ yes, the speedups are only in trunk. -- Robert Muir ymailto=mailto:rcm...@gmail.com; href=mailto:rcm...@gmail.com;rcm...@gmail.com
Re: Alphabetic range
Sophie, Go to your Solr Admin page, look for the Analysis page link, go there, enter some artists names, enter the query, check the verbose checkboxes, and submit. This will tell you what is going on with your analysis at index and at search time. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Sophie M. sop...@beezik.com To: solr-user@lucene.apache.org Sent: Wed, June 23, 2010 8:56:39 AM Subject: Alphabetic range Hello all, I try since several day to build up an alphabetical range. I will explain all steps (i have the Solr1.4 Enterprise Search Server book written by Smiley and Pugh). I want get all artists beginning by the two first letter. If I request mi, I want to have as response michael jackson and all artists name beginning by mi. I defined a field type similiar to Smiley and Pugh's example p.148 fieldType name=bucketFirstTwoLetters class=solr.TextField sortMissingLast=true omitNorms=true analyser type=index tokenizer class=solr.PatternTokenizerFactory pattern=^([a-zA-Z])([a-zA-Z]).* group=2/ !-- les deux premieres lettres-- /analyser analyser type=query tokenizer class=solr.KeywordTokenizerFactory/ /analyser /fieldType I defined the field ArtistSort like : field name=ArtistSort type=bucketFirstTwoLetters stored=true multivalued=false/ To the request : href=http://localhost:8983/solr/music/select?indent=onq=yuqt=standardwt=standardfacet=onfacet.field=ArtistSortfacetsort=lexfacet.missing=onfacet.method=enumfl=ArtistSort; target=_blank http://localhost:8983/solr/music/select?indent=onq=yuqt=standardwt=standardfacet=onfacet.field=ArtistSortfacetsort=lexfacet.missing=onfacet.method=enumfl=ArtistSort I get : href=http://lucene.472066.n3.nabble.com/file/n916716/select.xml; target=_blank http://lucene.472066.n3.nabble.com/file/n916716/select.xml select.xml I don't understand why the pattern doesn't my exacty. For example An An Yu matches but I only want artists whom name begins by yu. And I know that an artist named ReYu would match because ReYu would be interpreted as Re Yu (as two words). I also tried to make an other type of queries like : href=http://localhost:8983/solr/music/select?indent=onversion=2.2q=ArtistSort:mi*fq=start=0rows=10fl=ArtistSortqt=standardwt=standardexplainOther=hl.fl=; target=_blank http://localhost:8983/solr/music/select?indent=onversion=2.2q=ArtistSort:mi*fq=start=0rows=10fl=ArtistSortqt=standardwt=standardexplainOther=hl.fl= I get exacly what I would. I made several tries, I get only artist's names wich begins by the good first to letters. But I get very few responses, see there : result name=response numFound=6 start=0 doc str name=ArtistSortmike manne and tiger blues/str /doc − doc str name=ArtistSortmimika/str /doc − doc str name=ArtistSortmiduno/str /doc − doc str name=ArtistSortmilue macïro/str /doc − doc str name=ArtistSortmister pringle/str /doc − doc str name=ArtistSortmimmai/str /doc In my index there is more than 80 000 artists... I really don't understand why I can't get more responses. I think about the problem since days and days and now my brain freezes Thank you in advance. Sophie -- View this message in context: href=http://lucene.472066.n3.nabble.com/Alphabetic-range-tp916716p916716.html; target=_blank http://lucene.472066.n3.nabble.com/Alphabetic-range-tp916716p916716.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance related question on DISMAX handler..
BB, Dismax could be slower than standard, depending on what kinds of queries you throw at either handler. Millions of docs is a bit imprecise (2M or 22M or 222M or 999M, tweet-sized docs or book sized docs), but given adequate hardware and proper treatment shouldn't be a problem. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: bbarani bbar...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, June 22, 2010 2:27:05 PM Subject: Performance related question on DISMAX handler.. Hi, I just want to know if there will be any overhead / performance degradation if I use the Dismax search handler instead of standard search handler? We are planning to index millions of documents and not sure if using Dismax will slow down the search performance. Would be great if someone can share their thoughts. Thanks, BB -- View this message in context: href=http://lucene.472066.n3.nabble.com/Performance-related-question-on-DISMAX-handler-tp914892p914892.html; target=_blank http://lucene.472066.n3.nabble.com/Performance-related-question-on-DISMAX-handler-tp914892p914892.html Sent from the Solr - User mailing list archive at Nabble.com.
Spatial types and DIH
I'm using solr 4.0-2010-06-23_08-05-33 and can't figure out how to add the spatial types (LatLon, Point, GeoHash or SpatialTile) using dataimporthandler. My lat/lngs from the database are in separate fields. Does anyone know how to do his? Eric
Re: Field missing when use distributed search + dismax
Make sure you list it in ...fl=ID,type or set it in the defaults section of your handler. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Scott Zhang macromars...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, June 22, 2010 11:04:07 AM Subject: Field missing when use distributed search + dismax Hi. All. I was using distributed search over 30 solr instance, the previous one was using the standard query handler. And the result was returned correctly. each result has 2 fields. ID and type. Today I want to use search withk dismax, I tried search with each instance with dismax. It works correctly, return ID and type for each result. The strange thing is when I use distributed search, the result only have ID. The field type disappeared. I need that type to know what the ID refer to. Why solr eat my type? Thanks. Regards. Scott
Re: anyone use hadoop+solr?
Marc is referring to the very informative by Ted Dunning from maybe a month or so ago. For what it's worth, we just used Hadoop Streaming, JRuby, and EmbeddedSolr to speed up indexing by parallelizing it. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Marc Sturlese marc.sturl...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, June 22, 2010 12:43:27 PM Subject: Re: anyone use hadoop+solr? Well, the patch consumes the data from a csv. You have to modify the input to use TableInputFormat (I don't remember if it's called exaclty like that) and it will work. Once you've done that, you have to specify as much reducers as shards you want. I know 2 ways to index using hadoop method 1 (solr-1301 nutch): -Map: just get data from the source and create key-value -Reduce: does the analysis and index the data So, the index is build on the reducer side method 2 (hadoop lucene index contrib) -Map: does analysis and open indexWriter to add docs -Reducer: Merge small indexs build in the map So, indexs are build on the map side method 2 has no good integration with Solr at the moment. In the jira (SOLR-1301) there's a good explanation of the advantages and disadvantages of indexing on the map or reduce side. I recomend you to read with detail all the comments on the jira to know exactly how it works. -- View this message in context: href=http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html; target=_blank http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr with hadoop
I don't think it's ever been discussed - your Q below is #1 hit currently: http://search-lucene.com/?q=%2B%28dih+OR+dataimporthandler%29+hdfs Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Jon Baer jonb...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, June 22, 2010 12:47:14 PM Subject: Re: solr with hadoop I was playing around w/ Sqoop the other day, its a simple Cloudera tool for imports (mysql - hdfs) @ href=http://www.cloudera.com/developers/downloads/sqoop/; target=_blank http://www.cloudera.com/developers/downloads/sqoop/ It seems to me (it would be pretty efficient) to dump to HDFS and have something like Data Import Handler be able to read from hdfs:// directly ... Has this route been discussed / developed before (ie DIH w/ hdfs:// handler)? - Jon On Jun 22, 2010, at 12:29 PM, MitchK wrote: I wanted to add a Jira-issue about exactly what Otis is asking here. Unfortunately, I haven't time for it because of my exams. However, I'd like to add a question to Otis' ones: If you destribute the indexing-progress this way, are you able to replicate the different documents correctly? Thank you. - Mitch Otis Gospodnetic-2 wrote: Stu, Interesting! Can you provide more details about your setup? By load balance the indexing stage you mean distribute the indexing process, right? Do you simply take your content to be indexed, split it into N chunks where N matches the number of TaskNodes in your Hadoop cluster and provide a map function that does the indexing? What does the reduce function do? Does that call IndexWriter.addAllIndexes or do you do that outside Hadoop? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Stu Hood ymailto=mailto:stuh...@webmail.us; href=mailto:stuh...@webmail.us;stuh...@webmail.us To: ymailto=mailto:solr-user@lucene.apache.org; href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org Sent: Monday, January 7, 2008 7:14:20 PM Subject: Re: solr with hadoop As Mike suggested, we use Hadoop to organize our data en route to Solr. Hadoop allows us to load balance the indexing stage, and then we use the raw Lucene IndexWriter.addAllIndexes method to merge the data to be hosted on Solr instances. Thanks, Stu -Original Message- From: Mike Klaas ymailto=mailto:mike.kl...@gmail.com; href=mailto:mike.kl...@gmail.com;mike.kl...@gmail.com Sent: Friday, January 4, 2008 3:04pm To: ymailto=mailto:solr-user@lucene.apache.org; href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org Subject: Re: solr with hadoop On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote: I have huge index base (about 110 millions documents, 100 fields each). But size of the index base is reasonable, it's about 70 Gb. All I need is increase performance, since some queries, which match big number of documents, are running slow. So I was thinking is any benefits to use hadoop for this? And if so, what direction should I go? Is anybody did something for integration Solr with Hadoop? Does it give any performance boost? Hadoop might be useful for organizing your data enroute to Solr, but I don't see how it could be used to boost performance over a huge Solr index. To accomplish that, you need to split it up over two machines (for which you might find hadoop useful). -Mike -- View this message in context: href=http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html; target=_blank http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914589.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nested table support ability
Amit, I'd say it depends on the types of queries you need to run. Maybe you mentioned that already, but your reply cut it off (Nabble). I can say this with certainty: 1M is a small number and 30 fields is not a big deal. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: amit_ak amit...@mindtree.com To: solr-user@lucene.apache.org Sent: Wed, June 23, 2010 2:00:50 AM Subject: Re: Nested table support ability Hi Otis, Thanks for the update. My paramteric search has to span across customer table and 30 child tables. We have close to 1 million customers. Do you think Lucene/Solr is the right fsolution for such requirements? or database search would be more optimal. Regards, Amit -- View this message in context: href=http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html; target=_blank http://lucene.472066.n3.nabble.com/Nested-table-support-ability-tp905253p916087.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Non-prefix, hierarchical autocomplete? Would SOLR-1316 work? Solritas?
Hi Andy, I didn't check out SOLR-1316 yet, other then looking at the comments. Sounds more complicated than it should be, but maybe it's great and I really need to try it. Solritas uses TermsComponent, which should work well for individual terms (which country and city names are not, unless you tokenize them as single tokens). I don't think there is anything that will do everything you need out of the box. You can get autocompletion on the country field, but you then need to do a bit of JS work to restrict cities to the country specified in the country field. Actually, now that I wrote this, I think we did something very much like that with http://sematext.com/products/autocomplete/index.html . Finally, for dealing with commas or spaces as tag separators, you can peak at the JS in a service like delicious.com and see how they do it. Their implementation of tag entry is nice. And here is another slick auto-complete with extra niceness in the search form itself, from one of our customers: http://www.etsy.com/explorer Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Andy angelf...@yahoo.com To: solr-user@lucene.apache.org Sent: Sat, June 19, 2010 3:28:15 AM Subject: Non-prefix, hierarchical autocomplete? Would SOLR-1316 work? Solritas? Hi, I've seen some posts on using SOLR-1316 or Solritas for autocomplete. Wondered what is the best solution for my use case: 1) I would like to have an hierarchical autocomplete. For example, I have a Country dropdown list and a City textbox. A user would select a country from the dropdown list, and then type out the City in the textbox. Based on which country he selected, I want to limit the autocomplete suggestions to cities that are relevant for the selected country. This hierarchy could be multi-level. For example, there may be a Neighborhood textbox. The autocomplete suggestions for Neighborhood would be limited to neighborhoods that are relevant for the city entered by the user in the City textbox. 2) I want to have autocomplete suggestions that includes non-prefix matches. For example, if the user type auto, the autocomplete suggestions should include terms such as automata and build automation. 3) I'm doing autocomplete for tags. I would like to allow multi-word tags and use comma (,) as a separator for tags. So when the use hits the space bar, he is still typing out the same tag, but when he hits the comma key, he's starting a new tag. Would SOLR-1316 or Solritas work for the above requirements? If they do how do I set it up? I can't really find much documentation on SOLR-1316 or Solritas in this area. Thanks.
Re: Indexing Different Types
Stephen, Sure, multiple cores, one for each type is one approach. Another one is just adding a 'type' field and restricting auto-completion by type. In our AC implementation we have a piece made for very similar situations, where you have multiple types of entities, but want a single input field (search box) to give you suggestions from all entity types, yet have suggestions for different types visually grouped together. I don't think we have a demo of that anywhere, though you can see AC in action on http://search-lucene.com/ for example. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Divine Mercy itsl...@hotmail.com To: solr-user@lucene.apache.org Sent: Mon, June 21, 2010 4:59:55 PM Subject: Indexing Different Types Hi I have a requirement and I am wondering what is the best way to handle this through Solr. I have different types of unrelated data for example categories, tags and some address information. I would like to implement auto complete on this information, so there would be an auto complete form for each one. What would be the best way for implementing this using SOLR? Would this be using multiple indexes one index for tags, categories and address. Regards Stephen _ href=http://clk.atdmt.com/UKM/go/19780/direct/01/; target=_blank http://clk.atdmt.com/UKM/go/19780/direct/01/ We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now