RE: Is Solr right for my business situation ?
"Staging" the data in a non-Solr store sounds like a potentially reasonable idea to me. You might want to consider a NoSQL store of some kind like MongoDB perhaps, instead of an rdbms. The way to think about Solr is not as a store or a database -- it's an index for serving your application. That's also the way to think about how to get your multiple tables in there -- denormalize, denormalize, denormalize. You need to think about what you actually need to search over, and build your index to serve that efficiently, rather than thinking about normalization or data modelling the way we are used to with rdbms's, it's a different way of thinking. A Solr index basically gives you one collection of documents. But the documents can all have different fields -- so you _could_ (but probably don't want to) essentially put all your tables in there with unique fields --they're all in the same index, they're all just "documents", but some have a table1_title and table1_author, and others have no data in those fields but a table2_productName and a table2_price. Then if you want to query on just one type of thing, you just query on those fields. Except... you don't get any joins. Which is why you probably don't want to do that after all, it probably won't serve your needs. Figuring out the right way to model your data in Solr can be tricky, and it is sometimes hard to do exactly what you want. Solr isn't an rdbms, and in some ways isn't as powerful as an rdbms -- in the sense of being as flexible with what kinds of queries you can run on any given data. What it does is give you very fast access to inverted index lookups and set combinations and facetting that would be very hard to do efficiently in an rdbms. It is a trade-off. But there's not really a general answer to "how do I take these dozen rdbms tables and store them in Solr the best way?" -- it depends on what kinds of searching you need to support and the nature of your data. From: Sharma, Raghvendra [sraghven...@corelogic.com] Sent: Tuesday, September 28, 2010 2:15 AM To: solr-user@lucene.apache.org Subject: RE: Is Solr right for my business situation ? Thanks for the responses people. @Grant 1. can you show me some direction on that.. loading data from an incoming stream.. do I need some third party tools, or need to build something myself... 4. I am basically attempting to build a very fast search interface for the existing data. The volume I mentioned is more like static one (data is already there). The sql statements I mentioned are daily updates coming. The good thing is that the history is not there, so the overall volume is not growing, but I need to apply the update statements. One workaround I had in mind is, (though not so great performance) is to apply the updates to a copy of rdbms, and then feed the rdbms extract to solr. Sounds like overkill, but I don't have another idea right now. Perhaps business discussions would yield something. @All - Some more questions guys. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... Looks like I m close to my solution.. :) --raghav -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, September 28, 2010 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Is Solr right for my business situation ? Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > When do you need to deploy? > > As I understand it, the spatial search in Solr is being rewritten and is > slated for Solr 4.0, the release after next. It will be in 3.x, the next release > > The existing spatial search has some serious problems and is deprecated. > > Right now, I think the only way to get spatial search in Solr is to deploy a > nightly snapshot from the active development on trunk. If you are deploying a > year from now, that might change. > > There is not any support for SQL-like statements or for joins. The best > practice for Solr is to think of your data as a sing
RE: Is Solr right for my business situation ?
Thanks for the responses people. @Grant 1. can you show me some direction on that.. loading data from an incoming stream.. do I need some third party tools, or need to build something myself... 4. I am basically attempting to build a very fast search interface for the existing data. The volume I mentioned is more like static one (data is already there). The sql statements I mentioned are daily updates coming. The good thing is that the history is not there, so the overall volume is not growing, but I need to apply the update statements. One workaround I had in mind is, (though not so great performance) is to apply the updates to a copy of rdbms, and then feed the rdbms extract to solr. Sounds like overkill, but I don't have another idea right now. Perhaps business discussions would yield something. @All - Some more questions guys. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... Looks like I m close to my solution.. :) --raghav -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, September 28, 2010 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Is Solr right for my business situation ? Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > When do you need to deploy? > > As I understand it, the spatial search in Solr is being rewritten and is > slated for Solr 4.0, the release after next. It will be in 3.x, the next release > > The existing spatial search has some serious problems and is deprecated. > > Right now, I think the only way to get spatial search in Solr is to deploy a > nightly snapshot from the active development on trunk. If you are deploying a > year from now, that might change. > > There is not any support for SQL-like statements or for joins. The best > practice for Solr is to think of your data as a single table, essentially > creating a view from your database. The rows become Solr documents, the > columns become Solr fields. There is now group-by capabilities in trunk as well, which may or may not help. > > wunder > > On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: > >> I am sure these kind of questions keep coming to you guys, but I want to >> raise the same question in a different context...my own business situation. >> I am very very new to solr and though I have tried to read through the >> documentation, I have nowhere near completing the whole read. >> >> The need is like this - >> >> We have a huge rdbms database/table. A single table perhaps houses 100+ >> million rows. Though oracle is doing a fine job of handling the insertion >> and updation of data, the querying is where our main concerns lie. Since we >> have spatial data, the index building takes hours and hours for such tables. >> >> That's when we thought of moving away from standard rdbms and thought of >> trying something different and fast. >> My last week has been spent in a journey reading through bigtable to hadoop >> to hbase, to hive and then finally landed on solr. As far as I am in my >> tests, it looks pretty good, but I have a few unanswered questions still. >> Trying this group for them :) (I am sure I can find some answers if I >> read/google more on the topic, but now I m being lazy and feel asking the >> people who are already using it/or perhaps developing it is a better bet). >> >> 1. Can I get my solr instance to load data (fresh data for indexing) from a >> stream (imagine a mq kind of queue, or similar) ? Yes, with a little bit of work. >> 2. Can I host my solr instance to use hbase as the database/file system >> (read HDFS) ? Probably, but I doubt it will be fast. Local disk is usually the best. 100+ M rows is large but not unreasonable. >> 3. are there somewhere any reports available (as in benchmarks ) for a solr >> instance's performance ? You can probably search the web for these. I've personally seen several installs w/ 1B+ docs and subsecond search and faceting and heard of others. You might look at the stuff the Hathi tr
Re: Solr UIMA integration
Hi Maheshkumar, I attached a patch for inclusion of this project as a Solr contrib module [1] , there you can find the patch to apply to the Solr trunk along with needed jars (attached as a zip archive). I think that your issue could be related to the fact that GC project dependency is from Solr 1.4.1, not from trunk, so the patch should fix it. Hope this helps, Tommaso [1] : https://issues.apache.org/jira/browse/SOLR-2129 2010/9/27 maheshkumar > > Hi Tommaso, > > All UIMA dependencies (uima-core,AlchemyAPIAnnotator, OpenCalaisAnnotator, > Tagger, WhitespaceTokenizer) are 2.3.1-SNAPSHOT. All are checkout from svn > > AlchemyAPIAnnotator: > http://svn.apache.org/repos/asf/uima/sandbox/trunk/AlchemyAPIAnnotator > OpenCalaisAnnotator: > http://svn.apache.org/repos/asf/uima/sandbox/trunk/OpenCalaisAnnotator > Tagger: http://svn.apache.org/repos/asf/uima/sandbox/trunk/Tagger > WhitespaceTokenizer: > http://svn.apache.org/repos/asf/uima/sandbox/trunk/WhitespaceTokenizer > > solr-uima: http://solr-uima.googlecode.com/svn/trunk/solr-uima > > I am using the the latest Solr version checkout from svn i guess it is > greater than 1.4.1. > > Tommaso, is it possible for you to upload all the dependency jar @ > http://code.google.com/p/solr-uima/downloads/list. > > Thanks > Mahesh > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-UIMA-integration-tp1528253p1587660.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Search Interface
Hi everybody, I`m implementing my first solr engine for conceptual tests, I`m crawling my wiki intranet to make some searches, the engine is working fine already, but I need some interface to make my searchs. Somebody knows where can I find some search interface just for customizations? Tks -- Claudio Devecchi flickr.com/cdevecchi
Re: FieldType for storing date
: I was wondering what would be the best FieldType for storing date with a : millisecond precision that would allow me to sort and run range queries : against this field. We would like to achieve the best query performance, : minimal heap - fieldcache - requirements, good indexing throughput and : minimal index size in that order. if you don't need sortMissingLast or sortMissingFirst then TrieDateField should be exactly what you are looking for. : We could probably use TrieLongField, however, as we understand, this : doubles the heap requirements for fieldcache. Was wondering if there is : a clever way of achieving this without adding to the heap. TrieDateField uses the long[] FieldCache, I'm not sure what you mean by "doubles the heap requirements" ... unless you are comparing to "int" ? In that case: using TrieIntField seems like what you want? (but if you are comparing to DateField, the FieldCache for TrieDateField is going to be a lot smaller) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Grouping in solr ?
: References: : : In-Reply-To: : : Subject: Grouping in solr ? http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Renaming Solr mbean
: In our setup, we run several instances of Solr under one instance of : Tomcat. I simply rename the WAR to soemthing we use internally - : solr-people, solr-connections, solr-companies, etc etc. This part works : fine and lets us have, use, and maintain invidual instances. ... : What I'm finding is every instance reporting to /solr, which skews my : queries. Although I can somewhat predict which searcher is which, its : not reliable enough to be able to associate statistics with our named : versions of our indices. Take a look at the commit associated with this Jira issue... https://issues.apache.org/jira/browse/SOLR-1843 It's available on the trunk, but is not yet available in a released version of solr. http://svn.apache.org/viewvc?view=revision&revision=942292 -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: DIH ConcurrentModificationException
is this fixed in solr-1.4.1 ? I have seen ConcurrentModificationException during search operation using EmbeddedSolrServer, when tested using jmeter with on more than one Concurrent users. best, Reuben On 5/5/2009 2:25 AM, Shalin Shekhar Mangar wrote: This is fixed in trunk. 2009/5/5 Noble Paul നോബിള് नोब्ळ् hi Walter, it needs synchronization. I shall open a bug. On Mon, May 4, 2009 at 7:31 PM, Walter Ferrara wrote: I've got a ConcurrentModificationException during a cron-ed delta import of DIH, I'm using multicore solr nightly from hudson 2009-04-02_08-06-47. I don't know if this stacktrace maybe useful to you, but here it is: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(Unknown Source) at java.util.LinkedHashMap$EntryIterator.next(Unknown Source) at java.util.LinkedHashMap$EntryIterator.next(Unknown Source) at org.apache.solr.handler.dataimport.DataImporter.getStatusMessages(DataImporter.java:384) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:210 ) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) of-course due to the nature of this exception I doubt it can be reproduced easily (this is the only one I've got, and the croned job runned a lot of times), but maybe should a synchronized be put somewhere? ciao, Walter -- - Noble Paul | Principal Engineer| AOL | http://aol.com --
Re: Is Solr right for our project?
Solr will match this in version 3.1 which is the next major release. Read this page: http://wiki.apache.org/solr/SolrCloud for feature descriptions Coming to a trunk near you - see https://issues.apache.org/jira/browse/SOLR-1873 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 27. sep. 2010, at 17.44, Mike Thomsen wrote: > (I apologize in advance if I missed something in your documentation, > but I've read through the Wiki on the subject of distributed searches > and didn't find anything conclusive) > > We are currently evaluating Solr and Autonomy. Solr is attractive due > to its open source background, following and price. Autonomy is > expensive, but we know for a fact that it can handle our distributed > search requirements perfectly. > > What we need to know is if Solr has capabilities that match or roughly > approximate Autonomy's Distributed Search Handler. What it does it > acts as a front-end for all of Autonomy's IDOL search servers (which > correspond in this scenario to Solr shards). It is configured to know > what is on each shard, which servers hold each shard and intelligently > farms out queries based on that configuration. There is no need to > specify which IDOL servers to hit while querying; the DiSH just knows > where to go. Additionally, I believe in cases where an index piece is > mirrored, it also monitors server health and falls back intelligently > on other backup instances of a shard/index piece based on that. > > I'd appreciate it if someone can give me a frank explanation of where > Solr stands in this area. > > Thanks, > > Mike
Re: Need help with spellcheck city name
No, I checked, there is a city called Swan in Iowa. So, it is getting from the city index, so is Clerk. But why does it favor Swan than San? Spellcheck get weird after I treat city name as one token. If I do it in the old way, it let San go, and correct Jos as Ojos instead of Jose because Ojos is ranked as #1 and Jose at the middle. Any more suggestions? Rank it by frequency first then score doesn't work neither. From: Erick Erickson To: solr-user@lucene.apache.org Sent: Mon, September 27, 2010 5:24:25 PM Subject: Re: Need help with spellcheck city name Hmmm, did you rebuild your spelling index after the config changes? And it really looks like somehow you're getting results from a field other than city. Are you also sure that your cityname field is of type autocomplete1? Shooting in the dark here, but these results are so weird that I suspect it's something fundamental Best Erick On Mon, Sep 27, 2010 at 8:05 PM, Savannah Beckett < savannah_becket...@yahoo.com> wrote: > No, it doesn't work, I got weird result. I set my city name field to be > parsed > as a token as following: > > positionIncrementGap="100"> > > > > > > > > > > > I got following result for spellcheck: > > > - > - > 1 > 0 > 3 > - > swan > > > - > 1 > 4 > 8 > > clark > > > > > > > > > > From: Tom Hill > To: solr-user@lucene.apache.org > Sent: Mon, September 27, 2010 3:52:48 PM > Subject: Re: Need help with spellcheck city name > > Maybe process the city name as a single token? > > On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett > wrote: > > Hi, > > I have city name as a text field, and I want to do spellcheck on it. I > use > > setting in http://wiki.apache.org/solr/SpellCheckComponent > > > > If I setup city name as text field and do spell check on "San Jos" for > San > >Jose, > > I get suggestion for Jos as "ojos". I checked the extendedresult and I > found > > that Jose is in the middle of all 10 suggestions in term of score and > > frequency. I then set city name as string field, and spell check again, > I got > > Van for San and Ross for Jos, which is weird because San is correct. > > > > > > How do you setup spellchecker to spellcheck city names? City name can > have > > multiple words. > > Thanks. > > > > > > > > > > >
Re: Need help with spellcheck city name
Hmmm, did you rebuild your spelling index after the config changes? And it really looks like somehow you're getting results from a field other than city. Are you also sure that your cityname field is of type autocomplete1? Shooting in the dark here, but these results are so weird that I suspect it's something fundamental Best Erick On Mon, Sep 27, 2010 at 8:05 PM, Savannah Beckett < savannah_becket...@yahoo.com> wrote: > No, it doesn't work, I got weird result. I set my city name field to be > parsed > as a token as following: > > positionIncrementGap="100"> > > > > > > > > > > > I got following result for spellcheck: > > > - > - > 1 > 0 > 3 > - > swan > > > - > 1 > 4 >8 > > clark > > > > > > > > > > From: Tom Hill > To: solr-user@lucene.apache.org > Sent: Mon, September 27, 2010 3:52:48 PM > Subject: Re: Need help with spellcheck city name > > Maybe process the city name as a single token? > > On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett > wrote: > > Hi, > > I have city name as a text field, and I want to do spellcheck on it. I > use > > setting in http://wiki.apache.org/solr/SpellCheckComponent > > > > If I setup city name as text field and do spell check on "San Jos" for > San > >Jose, > > I get suggestion for Jos as "ojos". I checked the extendedresult and I > found > > that Jose is in the middle of all 10 suggestions in term of score and > > frequency. I then set city name as string field, and spell check again, > I got > > Van for San and Ross for Jos, which is weird because San is correct. > > > > > > How do you setup spellchecker to spellcheck city names? City name can > have > > multiple words. > > Thanks. > > > > > > > > > > >
Re: Need help with spellcheck city name
No, it doesn't work, I got weird result. I set my city name field to be parsed as a token as following: I got following result for spellcheck: - - 1 0 3 - swan - 1 4 8 clark From: Tom Hill To: solr-user@lucene.apache.org Sent: Mon, September 27, 2010 3:52:48 PM Subject: Re: Need help with spellcheck city name Maybe process the city name as a single token? On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett wrote: > Hi, > I have city name as a text field, and I want to do spellcheck on it. I use > setting in http://wiki.apache.org/solr/SpellCheckComponent > > If I setup city name as text field and do spell check on "San Jos" for San >Jose, > I get suggestion for Jos as "ojos". I checked the extendedresult and I found > that Jose is in the middle of all 10 suggestions in term of score and > frequency. I then set city name as string field, and spell check again, I got > Van for San and Ross for Jos, which is weird because San is correct. > > > How do you setup spellchecker to spellcheck city names? City name can have > multiple words. > Thanks. > > >
Re: Need help with spellcheck city name
Maybe process the city name as a single token? On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett wrote: > Hi, > I have city name as a text field, and I want to do spellcheck on it. I use > setting in http://wiki.apache.org/solr/SpellCheckComponent > > If I setup city name as text field and do spell check on "San Jos" for San > Jose, > I get suggestion for Jos as "ojos". I checked the extendedresult and I found > that Jose is in the middle of all 10 suggestions in term of score and > frequency. I then set city name as string field, and spell check again, I got > Van for San and Ross for Jos, which is weird because San is correct. > > > How do you setup spellchecker to spellcheck city names? City name can have > multiple words. > Thanks. > > >
Need help with spellcheck city name
Hi, I have city name as a text field, and I want to do spellcheck on it. I use setting in http://wiki.apache.org/solr/SpellCheckComponent If I setup city name as text field and do spell check on "San Jos" for San Jose, I get suggestion for Jos as "ojos". I checked the extendedresult and I found that Jose is in the middle of all 10 suggestions in term of score and frequency. I then set city name as string field, and spell check again, I got Van for San and Ross for Jos, which is weird because San is correct. How do you setup spellchecker to spellcheck city names? City name can have multiple words. Thanks.
DIH XML Entity Help (Newbie)
I am trying to configure the data-config.xml using the XPathEntityProcessor to index nested xml entities such as the following: Drug fentanyl sublingual spray Other questionnaire administration The data-config.xml looks like this: but it only indexes the first occurrence of intervention_type_t and intervention_name_t and they are placed as children of root entity instead of being children of intervention. I would appreciate your help! Thanks in advance, Aurelia -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-XML-Entity-Help-Newbie-tp1592723p1592723.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question Related to sorting on Date
Hi Ahson, You'll really want to store an additional date field (make it a TrieDateField type) that has only the date, and in the reverse order from how you've shown it. You can still keep the one you've got, just use it only for 'human viewing' rather than sorting. Something like: 20080205 if your example is 5 Feb, or 20080502 for May 2nd. This way, the parsing is most efficient, you won't have to do any tricky parsing at sort time, and, when your index gets large, your sorted searches will remain fast. On Mon, Sep 27, 2010 at 7:45 PM, Ahson Iqbal wrote: > hi all > > I have a question related to sorting of date field i have Date field that is > indexed like a string and look like "5/2/2008 4:33:30 PM" i want to do > sorting > on this field on the basis of date, time does not matters. any suggestion > how i > could ignore the time part from this field and just sort on the date? > > >
Re: Is Solr right for my business situation ?
Ah, totally looked over that news: spatial search in 3.x! :-D :-D Any idea already when this will be released? Awesome to hear that it has been moved forward! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592448.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is Solr right for my business situation ?
Wow, that is a relief! I was going to have to look at ElasticSearch instead. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Mon, 9/27/10, Grant Ingersoll wrote: > From: Grant Ingersoll > Subject: Re: Is Solr right for my business situation ? > To: solr-user@lucene.apache.org > Date: Monday, September 27, 2010, 12:35 PM > Inline. > > On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > > > When do you need to deploy? > > > > As I understand it, the spatial search in Solr is > being rewritten and is slated for Solr 4.0, the release > after next. > > It will be in 3.x, the next release > > > > > The existing spatial search has some serious problems > and is deprecated. > > > > Right now, I think the only way to get spatial search > in Solr is to deploy a nightly snapshot from the active > development on trunk. If you are deploying a year from now, > that might change. > > > > There is not any support for SQL-like statements or > for joins. The best practice for Solr is to think of your > data as a single table, essentially creating a view from > your database. The rows become Solr documents, the columns > become Solr fields. > > There is now group-by capabilities in trunk as well, which > may or may not help. > > > > > wunder > > > > On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra > wrote: > > > >> I am sure these kind of questions keep coming to > you guys, but I want to raise the same question in a > different context...my own business situation. > >> I am very very new to solr and though I have tried > to read through the documentation, I have nowhere near > completing the whole read. > >> > >> The need is like this - > >> > >> We have a huge rdbms database/table. A single > table perhaps houses 100+ million rows. Though oracle is > doing a fine job of handling the insertion and updation of > data, the querying is where our main concerns lie. > Since we have spatial data, the index building takes hours > and hours for such tables. > >> > >> That's when we thought of moving away from > standard rdbms and thought of trying something different and > fast. > >> My last week has been spent in a journey reading > through bigtable to hadoop to hbase, to hive and then > finally landed on solr. As far as I am in my tests, it looks > pretty good, but I have a few unanswered questions still. > Trying this group for them :) (I am sure I can > find some answers if I read/google more on the topic, but > now I m being lazy and feel asking the people who are > already using it/or perhaps developing it is a better bet). > >> > >> 1. Can I get my solr instance to load data (fresh > data for indexing) from a stream (imagine a mq kind of > queue, or similar) ? > > Yes, with a little bit of work. > > >> 2. Can I host my solr instance to use hbase as the > database/file system (read HDFS) ? > > Probably, but I doubt it will be fast. Local disk is > usually the best. 100+ M rows is large but not > unreasonable. > > >> 3. are there somewhere any reports available (as > in benchmarks ) for a solr instance's performance ? > > You can probably search the web for these. I've > personally seen several installs w/ 1B+ docs and subsecond > search and faceting and heard of others. You might > look at the stuff the Hathi trust has put up. > > >> 4. are there any APIs available which might help > me apply ANSI sql kind of statements to my solr data ? > > No. Question back? What kinds of things are you > trying to do? > > >> > >> It would be great if people could help share their > experience in the area... if it's too much trouble writing > all of it, perhaps url would be easier... I welcome all > kinds of help here... any advice/suggestions are good ... > >> > >> Looking forward to your viewpoints.. > >> > >> --raghav.. > >> > ** > > >> This message may contain confidential or > proprietary information intended only for the use of the > >> addressee(s) named above or may contain > information that is legally privileged. If you are > >> not the intended addressee, or the person > responsible for delivering it to the intended addressee, > >> you are hereby notified that reading, > disseminating, distributing or copying this message is > strictly > >> prohibited. If you have received this message by > mistake, please immediately notify us by > >> replying to the message and delete the original > message and any copies immediately thereafter. > >> > >> Thank you. > >> > ** > > >> CLLD > >> > > > > > > > > > > -- > Grant Ingersoll > http://lucenerevolution.org Apache Lucene/Solr > Conference, Boston Oct 7-8 > >
resources for relevancy score tuning
Can someone share some good resources (books, articles, links, etc.) for tuning relevancy scores with multiple factors? I'm playing with different fields and boosts in my 'qf', 'pf', and 'bf' defaults but I feel like I'm shooting in the dark. http://wiki.apache.org/solr/SolrRelevancyCookbook has a couple of individual tips, but I need some help devising a good combination of boosts across multiple fields for scoring. E.g., I want to tweak scoring derived from a primary identifier field, a name field, a description field, a rating field, and a "number of downloads" field. But it seems when I adjust any single factor, it affects too many others. Thanks, -L
Re: Is Solr right for my business situation ?
@Walter Underwood: Walter Underwood wrote: > > Right now, I think the only way to get spatial search in Solr is to deploy > a nightly snapshot from the active development on trunk. > Could you give me the link to this trunk, I need it very much! Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592330.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is Solr right for my business situation ?
Right, I know, I was curious about it's current closeness to being in main distro, not a patch. Among other things, when those who know better decide it goes in core distro, that makes me more comfortable that they've decided it works acceptably, and also makes more more comfortable that it will continue to be supported in _future_ versions without someone having to prepare a new patch. Ravi Julapalli wrote: Hi Jonathan, Field collpasing is available in 1.4 by applying patch https://issues.apache.org/jira/browse/SOLR-236 -Ravi From: Jonathan Rochkind To: "solr-user@lucene.apache.org" Sent: Mon, September 27, 2010 9:18:20 PM Subject: Re: Is Solr right for my business situation ? Grant Ingersoll wrote: There is now group-by capabilities in trunk as well, which may or may not help. Really, the field collapsing stuff has been committed to trunk finally? Or are you talking about something else? If it's the field collapsing stuff, and it's been committed to trunk, does that mean it'll be in the 3.0 release? Jonathan
Re: Is Solr right for my business situation ?
Hi Jonathan, Field collpasing is available in 1.4 by applying patch https://issues.apache.org/jira/browse/SOLR-236 -Ravi From: Jonathan Rochkind To: "solr-user@lucene.apache.org" Sent: Mon, September 27, 2010 9:18:20 PM Subject: Re: Is Solr right for my business situation ? Grant Ingersoll wrote: > > There is now group-by capabilities in trunk as well, which may or may not help. > Really, the field collapsing stuff has been committed to trunk finally? Or are you talking about something else? If it's the field collapsing stuff, and it's been committed to trunk, does that mean it'll be in the 3.0 release? Jonathan >
Re: Is Solr right for my business situation ?
Grant Ingersoll wrote: There is now group-by capabilities in trunk as well, which may or may not help. Really, the field collapsing stuff has been committed to trunk finally? Or are you talking about something else? If it's the field collapsing stuff, and it's been committed to trunk, does that mean it'll be in the 3.0 release? Jonathan
Re: Is Solr right for my business situation ?
Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > When do you need to deploy? > > As I understand it, the spatial search in Solr is being rewritten and is > slated for Solr 4.0, the release after next. It will be in 3.x, the next release > > The existing spatial search has some serious problems and is deprecated. > > Right now, I think the only way to get spatial search in Solr is to deploy a > nightly snapshot from the active development on trunk. If you are deploying a > year from now, that might change. > > There is not any support for SQL-like statements or for joins. The best > practice for Solr is to think of your data as a single table, essentially > creating a view from your database. The rows become Solr documents, the > columns become Solr fields. There is now group-by capabilities in trunk as well, which may or may not help. > > wunder > > On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: > >> I am sure these kind of questions keep coming to you guys, but I want to >> raise the same question in a different context...my own business situation. >> I am very very new to solr and though I have tried to read through the >> documentation, I have nowhere near completing the whole read. >> >> The need is like this - >> >> We have a huge rdbms database/table. A single table perhaps houses 100+ >> million rows. Though oracle is doing a fine job of handling the insertion >> and updation of data, the querying is where our main concerns lie. Since we >> have spatial data, the index building takes hours and hours for such tables. >> >> That's when we thought of moving away from standard rdbms and thought of >> trying something different and fast. >> My last week has been spent in a journey reading through bigtable to hadoop >> to hbase, to hive and then finally landed on solr. As far as I am in my >> tests, it looks pretty good, but I have a few unanswered questions still. >> Trying this group for them :) (I am sure I can find some answers if I >> read/google more on the topic, but now I m being lazy and feel asking the >> people who are already using it/or perhaps developing it is a better bet). >> >> 1. Can I get my solr instance to load data (fresh data for indexing) from a >> stream (imagine a mq kind of queue, or similar) ? Yes, with a little bit of work. >> 2. Can I host my solr instance to use hbase as the database/file system >> (read HDFS) ? Probably, but I doubt it will be fast. Local disk is usually the best. 100+ M rows is large but not unreasonable. >> 3. are there somewhere any reports available (as in benchmarks ) for a solr >> instance's performance ? You can probably search the web for these. I've personally seen several installs w/ 1B+ docs and subsecond search and faceting and heard of others. You might look at the stuff the Hathi trust has put up. >> 4. are there any APIs available which might help me apply ANSI sql kind of >> statements to my solr data ? No. Question back? What kinds of things are you trying to do? >> >> It would be great if people could help share their experience in the area... >> if it's too much trouble writing all of it, perhaps url would be easier... I >> welcome all kinds of help here... any advice/suggestions are good ... >> >> Looking forward to your viewpoints.. >> >> --raghav.. >> ** >> >> This message may contain confidential or proprietary information intended >> only for the use of the >> addressee(s) named above or may contain information that is legally >> privileged. If you are >> not the intended addressee, or the person responsible for delivering it to >> the intended addressee, >> you are hereby notified that reading, disseminating, distributing or copying >> this message is strictly >> prohibited. If you have received this message by mistake, please immediately >> notify us by >> replying to the message and delete the original message and any copies >> immediately thereafter. >> >> Thank you. >> ** >> >> CLLD >> > > > > -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
Question Related to sorting on Date
hi all I have a question related to sorting of date field i have Date field that is indexed like a string and look like "5/2/2008 4:33:30 PM" i want to do sorting on this field on the basis of date, time does not matters. any suggestion how i could ignore the time part from this field and just sort on the date?
Re: The search response time is too loong
2010/9/27 newsam : > I have setup a SOLR searcher instance with Tomcat 5.5.21. However, the > response time is too long. Here is my scenario: > 1. The index file is 8.2G. The doc num is 6110745. > 2. DELL Server: Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ, 6G Mem. > > I used "Key:*" to query all records by localhost:8080. The response time is > 68703 milliseconds. The cpu load is 50% and mem useage is over 400M. If you wanna get all records use q=*:* instead of Key:* that should give you faster results - way faster :) Why are you actually requesting all results and how many of them are you fetching? Maybe it would be a good idea to explain your usecase / problem first. simon > > Any comments are welcomed. > > >
RE: bi-grams for common terms - any analyzers do that?
Hi Yonik, >>If the new "autoGeneratePhraseQueries" is off, position doesn't matter, and >>the query will >>be treated as "index" OR "reader". Just wanted to make sure, in Solr does autoGeneratePhraseQueries = "off" treat the query with the *default* query operator as set in SolrConfig rather than necessarily using the Boolean "OR" operator? i.e. if and autoGeneratePhraseQueries = off then "IndexReader" -> "index" "reader" -> "index" AND "reader" Tom
RE: bi-grams for common terms - any analyzers do that?
Hi Jonathan, >> I'm afraid I'm having trouble understanding "if the analyzer returns more >> than one position back from a "queryparser token" >>I'm not sure if "the queryparser forms a phrase query without explicit phrase >>quotes" is a problem for me, I had no idea it happened until now, never >>noticed, and still don't really understand in what circumstances it happens. The problem I had was for a Boolean query "l'art AND historie" that the WordDelimiterFilter tokenized "l'art" as two tokens "l" at position 1 and "art" at position 2. So the queryparser decided this means a phrase query for "l" followed immediately by "art". See http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance for details. This would happen whenever any token filter split a token into more than one token. For example a filter that splits foo-bar into "foo" "bar". The exception is SynonymFilter or something like it. In the case of SynonymFilter, its not really a case of "splitting" one token into multiple tokens, but given one token of input, it outputs all the synonyms of the term. However all the tokens have the same position attribute. (see: http://www.lucidimagination.com/search/document/CDRG_ch05_5.6.19?q=synonym%20filter) So for example for the string "the small thing" if you had a synonym list for small: small=>tiny,teeny" input: postion|1 |2|3 token |the |small|thing Would output postion|1 |2|2|2|3 token |the |small| tiny|teeny|thing In this case when the queryParser gets back "small teeny tiny" since they have the same position, they are not turned into a phrase query. for "l'art" input postion|1 token |l'art output postion|1|2 token |l|art In this case there are two tokens with different positions so it treats them as a phrase query. Tom Burton-West
Re: Is Solr right for my business situation ?
When do you need to deploy? As I understand it, the spatial search in Solr is being rewritten and is slated for Solr 4.0, the release after next. The existing spatial search has some serious problems and is deprecated. Right now, I think the only way to get spatial search in Solr is to deploy a nightly snapshot from the active development on trunk. If you are deploying a year from now, that might change. There is not any support for SQL-like statements or for joins. The best practice for Solr is to think of your data as a single table, essentially creating a view from your database. The rows become Solr documents, the columns become Solr fields. wunder On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: > I am sure these kind of questions keep coming to you guys, but I want to > raise the same question in a different context...my own business situation. > I am very very new to solr and though I have tried to read through the > documentation, I have nowhere near completing the whole read. > > The need is like this - > > We have a huge rdbms database/table. A single table perhaps houses 100+ > million rows. Though oracle is doing a fine job of handling the insertion and > updation of data, the querying is where our main concerns lie. Since we have > spatial data, the index building takes hours and hours for such tables. > > That's when we thought of moving away from standard rdbms and thought of > trying something different and fast. > My last week has been spent in a journey reading through bigtable to hadoop > to hbase, to hive and then finally landed on solr. As far as I am in my > tests, it looks pretty good, but I have a few unanswered questions still. > Trying this group for them :) (I am sure I can find some answers if I > read/google more on the topic, but now I m being lazy and feel asking the > people who are already using it/or perhaps developing it is a better bet). > > 1. Can I get my solr instance to load data (fresh data for indexing) from a > stream (imagine a mq kind of queue, or similar) ? > 2. Can I host my solr instance to use hbase as the database/file system (read > HDFS) ? > 3. are there somewhere any reports available (as in benchmarks ) for a solr > instance's performance ? > 4. are there any APIs available which might help me apply ANSI sql kind of > statements to my solr data ? > > It would be great if people could help share their experience in the area... > if it's too much trouble writing all of it, perhaps url would be easier... I > welcome all kinds of help here... any advice/suggestions are good ... > > Looking forward to your viewpoints.. > > --raghav.. > ** > > This message may contain confidential or proprietary information intended > only for the use of the > addressee(s) named above or may contain information that is legally > privileged. If you are > not the intended addressee, or the person responsible for delivering it to > the intended addressee, > you are hereby notified that reading, disseminating, distributing or copying > this message is strictly > prohibited. If you have received this message by mistake, please immediately > notify us by > replying to the message and delete the original message and any copies > immediately thereafter. > > Thank you. > ** > > CLLD >
Is Solr right for my business situation ?
I am sure these kind of questions keep coming to you guys, but I want to raise the same question in a different context...my own business situation. I am very very new to solr and though I have tried to read through the documentation, I have nowhere near completing the whole read. The need is like this - We have a huge rdbms database/table. A single table perhaps houses 100+ million rows. Though oracle is doing a fine job of handling the insertion and updation of data, the querying is where our main concerns lie. Since we have spatial data, the index building takes hours and hours for such tables. That's when we thought of moving away from standard rdbms and thought of trying something different and fast. My last week has been spent in a journey reading through bigtable to hadoop to hbase, to hive and then finally landed on solr. As far as I am in my tests, it looks pretty good, but I have a few unanswered questions still. Trying this group for them :) (I am sure I can find some answers if I read/google more on the topic, but now I m being lazy and feel asking the people who are already using it/or perhaps developing it is a better bet). 1. Can I get my solr instance to load data (fresh data for indexing) from a stream (imagine a mq kind of queue, or similar) ? 2. Can I host my solr instance to use hbase as the database/file system (read HDFS) ? 3. are there somewhere any reports available (as in benchmarks ) for a solr instance's performance ? 4. are there any APIs available which might help me apply ANSI sql kind of statements to my solr data ? It would be great if people could help share their experience in the area... if it's too much trouble writing all of it, perhaps url would be easier... I welcome all kinds of help here... any advice/suggestions are good ... Looking forward to your viewpoints.. --raghav.. ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you. ** CLLD
Is Solr right for our project?
(I apologize in advance if I missed something in your documentation, but I've read through the Wiki on the subject of distributed searches and didn't find anything conclusive) We are currently evaluating Solr and Autonomy. Solr is attractive due to its open source background, following and price. Autonomy is expensive, but we know for a fact that it can handle our distributed search requirements perfectly. What we need to know is if Solr has capabilities that match or roughly approximate Autonomy's Distributed Search Handler. What it does it acts as a front-end for all of Autonomy's IDOL search servers (which correspond in this scenario to Solr shards). It is configured to know what is on each shard, which servers hold each shard and intelligently farms out queries based on that configuration. There is no need to specify which IDOL servers to hit while querying; the DiSH just knows where to go. Additionally, I believe in cases where an index piece is mirrored, it also monitors server health and falls back intelligently on other backup instances of a shard/index piece based on that. I'd appreciate it if someone can give me a frank explanation of where Solr stands in this area. Thanks, Mike
Re: urgent SOLR query server request hangs
On Mon, Sep 27, 2010 at 11:09 AM, Bharat Jain wrote: > We are running into issues with SOLR queries. Our solr queries just hang. Are you perhaps using distributed search and accidentally set up an infinite loop? Do *not* configure a default "shards" param on your /select handler. Other than that - you'll need to get some thread dumps from Solr to see why it's hanging, and provide an example of what requests you are sending. -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
urgent SOLR query server request hangs
Hi, We are running into issues with SOLR queries. Our solr queries just hang. We are using SOLR 1.3 and below is the stack trace from threaddump. We are clueless about what can be causing this issue. We are in the midst of firefighting with our customer and any help is appreciated. Thanks,Bharat "TP-Processor113" daemon prio=3 tid=0x071c3400 nid=0x134 runnable [0xfd7ed72a..0xfd7ed72a3920] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) - locked <0xfd7f26c1caf0> (a java.io.BufferedInputStream) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064) - locked <0xfd7f2a260c50> (a sun.net.www.protocol.http.HttpURLConnection) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:373) at com.xxx..search.solr.SolrSearchServiceImpl.query(SolrSearchServiceImpl.java:271) at com.xxx..search.Searchable.query(Searchable.java:460) at com.xxx..search.JobReqSearchObject.query(JobReqSearchObject.java:903) Thanks Bharat Jain
Re: Re:The search response time is too loong
Also, how many rows are you requesting at one time? I've seen cases where the query time is blazing fast and the response writing is terribly slow because of too many documents being sent in the response. On Mon, Sep 27, 2010 at 6:37 AM, kenf_nc wrote: > > "mem usage is over 400M", do you mean Tomcat mem size? If you don't give > your > cache sizes enough room to grow you will choke the performance. You should > adjust your Tomcat settings to let the cache grow to at least 1GB or better > would be 2GB. You may also want to look into > http://wiki.apache.org/solr/SolrCaching warming the cache to make the > first > time call a little faster. > > For comparison, I also have about 8GB in my index but only 2.8 million > documents. My search query times on a smaller box than you specify are 6533 > milliseconds on an unwarmed (newly rebooted) instance. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Re:The search response time is too loong
"mem usage is over 400M", do you mean Tomcat mem size? If you don't give your cache sizes enough room to grow you will choke the performance. You should adjust your Tomcat settings to let the cache grow to at least 1GB or better would be 2GB. You may also want to look into http://wiki.apache.org/solr/SolrCaching warming the cache to make the first time call a little faster. For comparison, I also have about 8GB in my index but only 2.8 million documents. My search query times on a smaller box than you specify are 6533 milliseconds on an unwarmed (newly rebooted) instance. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Concurrent DB updates and delta import misses few records
You could get it from Solr, yes. That didn't even occur to me because when I was designing my scripts, I didn't yet have a fully integrated Solr index. :) With hindsight, I still wouldn't get it from Solr. I would lose some flexibility and ease of administration. It's certainly possible to store all build-related tracking information in the database. The build system for our old search product did it that way. I decided to go with simple text files in an NFS-mounted directory for the rewrite. It's easier for me to administer, just ssh to a server and examine or modify simple one-line text files. On the script side, the files get read into a Perl hash. With the old system, I found it cumbersome to go through the database interfaces. The only thing that's still in the database is the delete table, because it is populated by triggers on the metadata table. On 9/23/2010 12:48 AM, Shashikant Kore wrote: Thanks for the pointer, Shawn. It, definitely, is useful. I am wondering if you could retrieve minDid from the solr rather than storing it externally. Max id from Solr index and max id from DB should define the lower and upper thresholds, respectively, of the delta range. Am I missing something?
Multi-lingual auto-complete?
I want to provide auto-complete to users when they're inputting tags. The auto-complete tag suggestions would be based on tags that are already in the system. Multiple tags are separated by commas. A single tag could contain multiple words such as "Apple computer". One issue is that a tag could be in multiple languages, including both languages (e.g. English, French) that use whitespace as word separator and languages that don't (e.g. CJK) An example of such a multi-lingual tag is "Apple 电脑". If a user types "apple", I'd like the autocomplete suggestions to include both "Apple computer" (ie. matches are case insensitive) and "green apple" (ie. matches aren't restricted to prefixes). And a user typing "电脑" should match "Apple 电脑". Is it possible to do that? I read the article: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ In that article KeywordTokenizerFactor is used. If I changed it to CJKTokenizer would that work? With an input of "Apple 电脑", what would CJKTokenizer produce? -is it "Apple", "电", "脑" ? or - is it "A", "p", "p", "l", "e", "电", "脑" ? Any help would be greatly appreciated. Andy
Re: Solr UIMA integration
Hi Tommaso, All UIMA dependencies (uima-core,AlchemyAPIAnnotator, OpenCalaisAnnotator, Tagger, WhitespaceTokenizer) are 2.3.1-SNAPSHOT. All are checkout from svn AlchemyAPIAnnotator: http://svn.apache.org/repos/asf/uima/sandbox/trunk/AlchemyAPIAnnotator OpenCalaisAnnotator: http://svn.apache.org/repos/asf/uima/sandbox/trunk/OpenCalaisAnnotator Tagger: http://svn.apache.org/repos/asf/uima/sandbox/trunk/Tagger WhitespaceTokenizer: http://svn.apache.org/repos/asf/uima/sandbox/trunk/WhitespaceTokenizer solr-uima: http://solr-uima.googlecode.com/svn/trunk/solr-uima I am using the the latest Solr version checkout from svn i guess it is greater than 1.4.1. Tommaso, is it possible for you to upload all the dependency jar @ http://code.google.com/p/solr-uima/downloads/list. Thanks Mahesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-UIMA-integration-tp1528253p1587660.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TokenFilter that removes payload ?
Robert & Erik, I appreciate your suggestions but we use Type for other purpose. Also, the product is out and we can't change the design so easily. So it seems the conclusion there is no such TokenFilter. I'll write one. Thanks. On Sep 27, 2010, at 1:00 PM, Robert Muir wrote: > On Sun, Sep 26, 2010 at 11:49 PM, Teruhiko Kurosaka wrote: > >> >> As I understand it, payloads go to the Lucene index. >> In most cases, the part-of-speech tags are not used if >> retrieved by the search applications. So they shouldn't >> go to the index. So I'd like to know if there is an >> existing TokenFilter that does this. Otherwise, I'd like >> to write one. >> > > I agree with Erick, I think a better approach would be to put the part of > speech tags into another attribute. > > For example, you can put them in TypeAttribute, which is not stored in the > index by default. > Then, if the user wants to store them in the index, they just add > TypeAsPayloadTokenFilterFactory, which copies the type into the payload... > but otherwise they would not be stored. > > -- > Robert Muir > rcm...@gmail.com T. "Kuro" Kurosaka, 415-227-9600x122, 617-386-7122(direct)
RE: spellcheck on multiple fields?
You can use copyField to get multiple fields in the field you use for spell checking, don't forget to set it to multiValued. -Original message- From: Savannah Beckett Sent: Mon 27-09-2010 10:08 To: solr-user@lucene.apache.org; Subject: spellcheck on multiple fields? Is it possible to do spellcheck on multiple fields in my solr index? If so, how? The following setup works for only one field: default solr.IndexBasedSpellChecker myfield ./spellchecker1 0.5 true Thanks.
Re:The search response time is too loong
We used SOLR 1.4. All queries were excuted in SOLR back-end. I guess that I/O operations consume the time too much. >From: "newsam" >Reply-To: solr-user@lucene.apache.org"newsam" >To: solr-user@lucene.apache.org >Subject: Re:The search response time is too loong >Date: Mon, 27 Sep 2010 16:05:49 +0800 > >I have setup a SOLR searcher instance with Tomcat 5.5.21. However, the >response time is too long. Here is my scenario: >1. The index file is 8.2G. The doc num is 6110745. >2. DELL Server: Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ, 6G Mem. > >I used "Key:*" to query all records by localhost:8080. The response time is >68703 milliseconds. The cpu load is 50% and mem useage is over 400M. > >Any comments are welcomed. > > >
The search response time is too loong
I have setup a SOLR searcher instance with Tomcat 5.5.21. However, the response time is too long. Here is my scenario: 1. The index file is 8.2G. The doc num is 6110745. 2. DELL Server: Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ, 6G Mem. I used "Key:*" to query all records by localhost:8080. The response time is 68703 milliseconds. The cpu load is 50% and mem useage is over 400M. Any comments are welcomed.
spellcheck on multiple fields?
Is it possible to do spellcheck on multiple fields in my solr index? If so, how? The following setup works for only one field: default solr.IndexBasedSpellChecker myfield ./spellchecker1 0.5 true Thanks.