Re: No group by? looking for an alternative.
Thanks for your response. Unfortunately, I don't think it'll be enough. In fact, I have many other products than shoes in my index, with many other facets fields. I simplified my schema : in reality facets are dynamic fields. -- View this message in context: http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1025256.html Sent from the Solr - User mailing list archive at Nabble.com.
Multiple Facet Dates
Hi, I saw this post : http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html I didn't see work in progress or plans about this feature on the list and bugtracker. Does someone already created a patch, pof, ... I wouldn't have been able to find ? From my naïve point of view the ratio usefulness / added code complexity appears as high. My use-case is to provide, in one request : - the results count for each one of several years (tag-based exclusion) - the results count for each month of a given year - the results count for each day of a given month and year) I pretty sure someone here already encountered the above, isn't ?
RE: Indexing fieldvalues with dashes and spaces
@Michael, @Erick, You both mention interesting things that triggered me. @Erick: Your referenced page is very useful. It seems the whitespace tokenizer under the text_ws is causing issues. You do mention another interesting thing: And do be aware that fields you get back from a request (i.e. a search) are the stored fields, NOT what's indexed. On the page you provided I see this under the Analyzers section: Analyzers are components that pre-process input text at index time and/or at search time. So I dont completely understand how that sentence is in line with your comment. @Michael: You say: use the tokenized field to return results, but have a duplicate field of fieldtype=string to show the untokenized results. E.g. facet on that field. I think your comment applies on my requirement: a city field is something that I want users to search on via text input, so lets say New Yo would give the results for New York. But also a facet Cities is available in which New York is just one of the cities that is clickable. The other facet is theme, which in my example holds values like Gemeentehuis and Strand Zee, that would not be a thing on which can be searched via manual input but IS clickable. Could you please indicate (just for the above fields) what needs to be changed in my schema.xml and if so how that affects the way my request is build up? Thanks so much ahead in getting me started! This is my schema.xml ?xml version=1.0 encoding=UTF-8 ? schema name=db version=1.1 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=double class=solr.DoubleField omitNorms=true/ fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldType name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true/ fieldType name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ fieldType name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true/ fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ fieldType name=random class=solr.RandomSortField indexed=true / fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=textTight class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=alphaOnlySort class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all / /analyzer /fieldType fieldtype name=ignored stored=false
Re: how to take a value from the query result
you should parse the xml and extract the value. Lot's of libraries undoubtably exist for PHP to help you with that (I don't know PHP) Moreover, if all you want from the result is AUC_CAT you should consider using the fl=param like: http://172.16.17.126:8983/search/select/?q=AUC_ID:607136fl=AUC_CAT to return a document of the form: doc int name=AUC_CAT576/int /doc which if more efficient. Still you have to parse the doc with xml though. http://172.16.17.126:8983/search/select/?q=AUC_ID:607136 2010/8/5 twojah e...@tokobagus.com this is my query in browser navigation toolbar http://172.16.17.126:8983/search/select/?q=AUC_ID:607136 and this is the result in browser page: ... doc int name=AP_AUC_PHOTO_AVAIL1/int double name=AUC_AD_PRICE1.0/double int name=AUC_CAT576/int int name=AUC_CLIENT_ID27017/int str name=AUC_DESCR_SHORTBracket Ceiling untuk semua merk projector, panjang 60-90 cm Bahan Besi Cat Hitam = 325rb Bahan Sta/str str name=AUC_HTML_DIR_NL/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html/str int name=AUC_ID607136/int str name=AUC_ISNEGONego/str int name=AUC_LOCATION7/int str name=AUC_PHOTO270/27017/bracket_lcd_plasma_3a-1274291780.JPG/str str name=AUC_START2010-05-19 17:56:45/str str name=AUC_TITLE[UPDATE] BRACKET Projector dan LCD/PLASMA TV/str int name=AUC_TYPE21/int int name=PRO_BACKGROUND0/int int name=PRO_BOLD0/int int name=PRO_COLOR0/int int name=PRO_GALLERY0/int int name=PRO_LINK0/int int name=PRO_SPONSOR0/int int name=cat_id_sub0/int int name=sectioncode28/int /doc I want to get the AUC_CAT value (576) and using it in my PHP, how can I get that value? please help thanks before -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-take-a-value-from-the-query-result-tp1025119p1025119.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH and Cassandra
On Thu, Aug 5, 2010 at 3:07 AM, Dennis Gearon gear...@sbcglobal.net wrote: If data is stored in the index, isn't the index of Solr pretty much already a 'Big/Cassandra Table', except with tokenized columns to make seaching easier? How are Cassandra/Big/Couch DBs doing text/weighted searching? Seems a real duplication to use Cassandra AND Solr. OTOH, I don't know how many 'Tables'/indexes one can make using Solr, I'm still a newbie. I don't think Mark wants to duplicate Solr's functionality through Cassandra. He is just asking if he can use DIH to import data from his data sources into Cassandra. -- Regards, Shalin Shekhar Mangar.
Re: Load cores without restarting/reloading Solr
Can some one please answer this. Is there a way of creating/adding a core and starting it without having to reload Solr ?
RE: Re: Load cores without restarting/reloading Solr
http://wiki.apache.org/solr/CoreAdmin -Original message- From: Karthik K karthikkato...@gmail.com Sent: Thu 05-08-2010 12:00 To: solr-user@lucene.apache.org; Subject: Re: Load cores without restarting/reloading Solr Can some one please answer this. Is there a way of creating/adding a core and starting it without having to reload Solr ?
Re: Auto suggest with spell check
Given below are the steps for auto-suggest and spellcheck in single query: Make the change in TermComponent part in solrconfig.xml searchComponent name=termsComponent class=org.apache.solr.handler.component.TermsComponent/ requestHandler name=/terms class=org.apache.solr.handler.component.SearchHandler lst name=defaults bool name=termstrue/bool /lst arr name=components strtermsComponent/str strspellcheck/str!--Added for using spellcheck with termcomponent -- /arr /requestHandler Use given below query format for getting autosuggest and spellcheck suggestion. http://localhost:8983/solr/terms?terms.fl=textterms.prefix=computrspellcheck.q=computrspellcheck=true -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-with-spell-check-tp1015114p1025688.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH and Cassandra
That is not 100% true. I would think RDBMS and XML would be the most common importers but the real flexibility is with the TikaEntityProcessor [1] that comes w/ DIH ... http://wiki.apache.org/solr/TikaEntityProcessor Im pretty sure it would be able to handle any type of serde (in the case of Cassandra I believe it is Thrift) on it's own w/ the dep libraries. I find the TEP to be underutilized sometimes, I think it's because the docs on the DIH lack more info on what it can do. [1] - http://tika.apache.org - Jon On Aug 4, 2010, at 3:00 PM, Andrei Savu wrote: DIH only works with relational databases and XML files [1], you need to write custom code in order to index data from Cassandra. It should be pretty easy to map documents from Cassandra to Solr. There are a lot of client libraries available [2] for Cassandra. [1] http://wiki.apache.org/solr/DataImportHandler [2] http://wiki.apache.org/cassandra/ClientOptions On Wed, Aug 4, 2010 at 6:41 PM, Mark static.void@gmail.com wrote: Is it possible to use DIH with Cassandra either out of the box or with something more custom? Thanks -- Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr
Using solr response json
Hi, i want to query solr and convert my response object to a json string using solrj when i query from my browser(with wt=json) i get the following result: { responseHeader:{ status:0, QTime:0}, response:{numFound:0,start:0,docs:[] }} At the moment i am using google-gson (a third party api) to directly convert an object into a json string but somehow when i try converting a QueryResponse object into a json string i get: {_header:{nvPairs:[status,0,QTime,1]},_results:[],elapsedTime:121,response:{nvPairs:[responseHeader,{nvPairs:[status,0,QTime,1]},response,[]]}} Any pointers? Regards Raakhi.
Process entire result set
Hi everybody, I would like to know if does make sense to use Solr in the following scenario: - search for large amount of data (like 1000, 1, 10 registers) - each register contains four or five fields (strings and integers) - every time will request for entire result set (I can paginate the results). It would be much better to get all results at once - we need to process the entire set in order to decide which ones will be returned - this kind of request will happen frequently in several machines (several transactions per second) - solr machines and request machines will be in the same cluster - we would like to get the entire result set in less than 500ms. Thanks in advance, Eloi
Re: Load cores without restarting/reloading Solr
On 8/5/10 5:59 AM, Karthik K wrote: Can some one please answer this. Is there a way of creating/adding a core and starting it without having to reload Solr ? Yes, see http://wiki.apache.org/solr/CoreAdmin - Mark lucidimagination.com
word delimiter
I have UPPER12-lower and would like to be able to find it with queries UPPER or lower. What should break this up for the index? A tokenizer or a filter such as WordDelimiterFilterFactory? I have tried various combinations of parameters to WordDelimiterFilterFactory and cant get it to split properly. Here are the results from using standard tokenizer followed directly by the WordDelimiterFilterFactory markup below (from analysis.jsp): 1 | 2 UPPER12-lower | lower --- UPPER | --- 12 | filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/
Re: No group by? looking for an alternative.
I've got only one document per shoes, whatever its size or color. My first try was to create one document per model/size/color, but when i searche for 'converse' for example, the same shoe is retrieved several times, and i want to show only one record for each model. But I don't succeed in grouping results by shoe model. If you look at http://www.amazon.com/s/ref=nb_sb_noss?url=node%3D679255011field-keywords=Converse+All+Star+Leather+Hi+Chuck+Taylor+x=0y=0ih=1_0_0_0_0_0_0_0_0_0.4136_1fsc=-1 amazon for Converse All Star Leather Hi Chuck Taylor . They show the shoe only one time, but if you go on the product details, its exists in several colors and sizes. Now if you filter or color, there is less sizes available. -- View this message in context: http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1026618.html Sent from the Solr - User mailing list archive at Nabble.com.
get-colt
Hi - I am trying to compile Solr source and during ant dist step, the build times out on get-colt: [get] Getting: http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar [get] To: /opt/solr/apache-solr-1.4.0/contrib/clustering/lib/downloads/colt-1.2.0. jar After a while - the steps fails giving the following message BUILD FAILED /opt/solr/apache-solr-1.4.0/common-build.xml:356: The following error occurred while executing this line: /opt/solr/apache-solr-1.4.0/common-build.xml:219: The following error occurred while executing this line: /opt/solr/apache-solr-1.4.0/contrib/clustering/build.xml:79: java.net.ConnectException: Connection timed out Any help is greatly appreciated? Sai Thumuluri
RE: get-colt
This is the message I am getting Error getting http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar -Original Message- From: sai.thumul...@verizonwireless.com [mailto:sai.thumul...@verizonwireless.com] Sent: Thursday, August 05, 2010 1:15 PM To: solr-user@lucene.apache.org Subject: get-colt Hi - I am trying to compile Solr source and during ant dist step, the build times out on get-colt: [get] Getting: http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar [get] To: /opt/solr/apache-solr-1.4.0/contrib/clustering/lib/downloads/colt-1.2.0. jar After a while - the steps fails giving the following message BUILD FAILED /opt/solr/apache-solr-1.4.0/common-build.xml:356: The following error occurred while executing this line: /opt/solr/apache-solr-1.4.0/common-build.xml:219: The following error occurred while executing this line: /opt/solr/apache-solr-1.4.0/contrib/clustering/build.xml:79: java.net.ConnectException: Connection timed out Any help is greatly appreciated? Sai Thumuluri
Re: question about relevance
Thank you for all the help. Greatly appreciated. I have seen the related issues and I see lot of patches in the JIRA mentioned. I am really confused which patch to use (pls excuse my ignorance). Also are the patches production ready? I will greatly appreciate if you can point me to the correct patch or is it that i have to apply all the patches and make it work. Can I apply the patch in solr 1.3? Thanks Bharat Jain On Sat, Jul 31, 2010 at 2:16 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: May I suggest looking at some of the related issues, say SOLR-1682 This issue is related to: SOLR-1682 Implement CollapseComponent SOLR-1311 pseudo-field-collapsing LUCENE-1421 Ability to group search results by field SOLR-1773 Field Collapsing (lightweight version) SOLR-237 Field collapsing Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Bharat Jain bharat.j...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, July 30, 2010 10:40:19 AM Subject: Re: question about relevance Hi, Thanks a lot for the info and your time. I think field collapse will work for us. I looked at the https://issues.apache.org/jira/browse/SOLR-236but which file I should use for patch. We use solr-1.3. Thanks Bharat Jain On Fri, Jul 30, 2010 at 12:53 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : 1. There are user records of type A, B, C etc. (userId field in index is : common to all records) : 2. A user can have any number of A, B, C etc (e.g. think of A being a : language then user can know many languages like french, english, german etc) : 3. Records are currently stored as a document in index. : 4. A given query can match multiple records for the user : 5. If for a user more records are matched (e.g. if he knows both french and : german) then he is more relevant and should come top in UI. This is the : reason I wanted to add lucene scores assuming the greater score means more : relevance. if your goal is to get back users from each search, then you should probably change your indexing strategry so that each user has a single document -- fields like langauge can be multivalued, etc... then a search for language:en langauge:fr will return users who speak english or french, and hte ones that speak both will score higher. if you really cant change the index structure, then essentially waht you are looking for is a field collapsing solution on the userId field, where you want each collapsed group to get a cumulative score. i don't know if the existing field collapsing patches support this -- if you are already willing/capable to do it in the lcient then that may be the simplest thing to support moving foward. Adding the scores is certainly one metric you could use -- it's generally suspicious to try and imply too much meaning to scores in lucene/solr but that's becuase people typically try to imply broader absolute meaning. in the case of a single query the scores are relative eachother, and adding up all the scores for a given userId is approximaly what would happen in my example above -- except that there is also a coord factor that would penalalize documents that only match one clause ... it's complicated, but as an approximation adding the scores might give you what you are looking for -- only you can know for sure based on your specific data. -Hoss
anti-words - exact match
Hi, We have a requirement to NOT display search results if user query contains terms that are in our anti-words field. For example, if user query is I have swollen foot and if some records in our index have swollen foot in anti-words field, we don't want to display those records. How do I go about implementing this? NOTE 1: anti-words field can contain multiple values. Each value can be a one or multiple words (e.g. swollen foot, headache, etc. ) NOTE 2: the match must be exact. If anti-words field contains swollen foot and if user query is I have swollen foot, record must be excluded. If user query is My foot is swollen, the record should not be excluded. Any pointers is greatly appreciated! Thanks, Satish
Re: Index compatibility 1.4 Vs 3.1 Trunk
Hello Mr. Horsetter, I again tried the code from trunk ' https://svn.apache.org/repos/asf/lucene/dev/trunk' on solr 1.4 index and it gave me the following IndexFormatTooOldExceptio which in the first place prompted me to think the indexes are incompatible. Any ideas ? java.lang.RuntimeException: org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported in file '_1d60.fdx': 1 (needs to be between 2 and 2). This version of Lucene only supports indexes created with release 3.0 and later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1067) at org.apache.solr.core.SolrCore.init(SolrCore.java:582) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:453) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:308) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:198) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:123) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:385) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:119) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4529) at org.apache.catalina.core.StandardContext.start(StandardContext.java:5348) at com.sun.enterprise.web.WebModule.start(WebModule.java:353) at com.sun.enterprise.web.LifecycleStarter.doRun(LifecycleStarter.java:58) at com.sun.appserv.management.util.misc.RunnableBase.runSync(RunnableBase.java:304) at com.sun.appserv.management.util.misc.RunnableBase.run(RunnableBase.java:341) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported in file '_1d60.fdx': 1 (needs to be between 2 and 2). This version of Lucene only supports indexes created with release 3.0 and later. at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:109) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:242) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:523) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:494) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:133) at org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:28) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:98) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:92) at org.apache.lucene.index.IndexReader.open(IndexReader.java:415) at org.apache.lucene.index.IndexReader.open(IndexReader.java:294) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1056) ... 21 more Ravi Kiran Bhaskar On Tue, Aug 3, 2010 at 11:15 AM, Ravi Kiran ravi.bhas...@gmail.com wrote: Hello Mr.Hostetter, Thank you very much for the clarification. I do remember that when I first deployed the solr code from trunk on a test server I couldnt open the index (created via 1.4) even via the solr admin page, It kept giving me corrupted index EOF kind of exception, so I was curious. Let me try it out again and report to you with the exact error. On Mon, Aug 2, 2010 at 4:28 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I am trying to use the solr code from ' : https://svn.apache.org/repos/asf/lucene/dev/trunk' as my design warrants use : of PolyType fields. My understanding is that the indexes are incompatible, : am I right ?. I have about a million docs in my index (indexed via solr : 1.4). Is re-indexing my only option or is there a tool of some sort to : convert the 1.4 index to 3.1 format ? a) the trunk is what will ultimately be Solr 4.x, not 3.x ... for the 3.x line there is a 3x branch... http://wiki.apache.org/solr/Solr3.1 http://wiki.apache.org/solr/Solr4.0 b) The 3x branch can read indexes created by Solr 1.4 -- the first time you add a doc and commit the new segments wil automaticly be converted to the new format. I am fairly certian that as of this moment, the 4x trunk can also read indexes created by Solr 1.4, with the same automatic converstion taking place. c) If/When the trunk can no longer read Solr 1.4 indexes, there will be a tool provided
RE: get-colt
Got it working - had to manually copy the jar files under the contrib directories -Original Message- From: sai.thumul...@verizonwireless.com [mailto:sai.thumul...@verizonwireless.com] Sent: Thursday, August 05, 2010 2:00 PM To: solr-user@lucene.apache.org Subject: RE: get-colt This is the message I am getting Error getting http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar -Original Message- From: sai.thumul...@verizonwireless.com [mailto:sai.thumul...@verizonwireless.com] Sent: Thursday, August 05, 2010 1:15 PM To: solr-user@lucene.apache.org Subject: get-colt Hi - I am trying to compile Solr source and during ant dist step, the build times out on get-colt: [get] Getting: http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar [get] To: /opt/solr/apache-solr-1.4.0/contrib/clustering/lib/downloads/colt-1.2.0. jar After a while - the steps fails giving the following message BUILD FAILED /opt/solr/apache-solr-1.4.0/common-build.xml:356: The following error occurred while executing this line: /opt/solr/apache-solr-1.4.0/common-build.xml:219: The following error occurred while executing this line: /opt/solr/apache-solr-1.4.0/contrib/clustering/build.xml:79: java.net.ConnectException: Connection timed out Any help is greatly appreciated? Sai Thumuluri
Re: get-colt
(10/08/06 2:14), sai.thumul...@verizonwireless.com wrote: Hi - I am trying to compile Solr source and during ant dist step, the build times out on get-colt: [get] Getting: http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar [get] To: /opt/solr/apache-solr-1.4.0/contrib/clustering/lib/downloads/colt-1.2.0. jar After a while - the steps fails giving the following message BUILD FAILED /opt/solr/apache-solr-1.4.0/common-build.xml:356: The following error occurred while executing this line: /opt/solr/apache-solr-1.4.0/common-build.xml:219: The following error occurred while executing this line: /opt/solr/apache-solr-1.4.0/contrib/clustering/build.xml:79: java.net.ConnectException: Connection timed out Any help is greatly appreciated? Sai Thumuluri Sai, If there is a proxy in your environment, specify the proxy host and port (and optionally user and password): $ ant dist -Dproxy.home=HOST -Dproxy.port=PORT -Dproxy.user=USER -Dproxy.password=PASSWORD Koji -- http://www.rondhuit.com/en/
Re: No group by? looking for an alternative.
If I understand correctly: 1. products have different product variants ( in case of shoes a combination of color and size + some other fields). 2. Each product is shown once in the result set. (so no multiple product variants of the same product are shown) This would solve that IMO: 1, create 1 document per product (so not a document per product-variant) 2.create a multivalued field on which to facet containing: all combinations of: size-color-any other field-yett another field 3. make sure to include combinations in which the user is indifferent of a particular filter. i.e: don't care about size (dc) + red -- dc-red 4. filtering on that combination would give you all the products that satisfy the product-variant constraints (size, color, etc.) + the extra product constraints ('converse) 5. on the detail page show all available product-variants not filtered by the constraints specified. This would likely be something outside of solr (a simple sql-select on a single product) hope that helps, Geert-Jan 2010/8/5 Mickael Magniez mickaelmagn...@gmail.com I've got only one document per shoes, whatever its size or color. My first try was to create one document per model/size/color, but when i searche for 'converse' for example, the same shoe is retrieved several times, and i want to show only one record for each model. But I don't succeed in grouping results by shoe model. If you look at http://www.amazon.com/s/ref=nb_sb_noss?url=node%3D679255011field-keywords=Converse+All+Star+Leather+Hi+Chuck+Taylor+x=0y=0ih=1_0_0_0_0_0_0_0_0_0.4136_1fsc=-1 amazon for Converse All Star Leather Hi Chuck Taylor . They show the shoe only one time, but if you go on the product details, its exists in several colors and sizes. Now if you filter or color, there is less sizes available. -- View this message in context: http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1026618.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No group by? looking for an alternative.
Mickael Magniez wrote: Thanks for your response. Unfortunately, I don't think it'll be enough. In fact, I have many other products than shoes in my index, with many other facets fields. I simplified my schema : in reality facets are dynamic fields. You could change the way you do indexing, so every product-color-size combo is it's own document. Document1: product: running shoe size: 12 color: red Document2: product: running shoe size: 13 color: red That would let you do the kind of facetting drill-down you want to do. It would of course make other things more complicated. But it's the only way I can think of to let you do the kind of facet drill-down you want, if I understand what you want correctly, which I may not. Jonathan
Re: anti-words - exact match
This is tricky. You could try doing something with the ShingleFilter (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory) at _query time_ to turn the users query: i have a swollen foot into: i, i have, i have a, i have a swollen, have, have a, have a swollen... etc. I _think_ you can get the ShingleFilter factory to do that. But now you only want to exclude if one of those shingles matches the ENTIRE anti-word. So maybe index as non-tokenized, so each of those shingles will somehow only match on the complete thing. You'd want to normalize spacing and punctuation. But then you need to turn that into a _negated_ element of your query. Perhaps by using an fq with a NOT/- in it? And a query which 'matches' (causing 'not' behavior) if _any_ of the shingles match. I have no idea if it's actually possible to put these things together in that way. A non-tokenized field? Which still has it's queries shingle-ized at query time? And then works as a negated query, matching for negation if any of the shingles match? Not really sure how to put that together in your solrconfig.xml and/or application logic if needed. You could try. Another option would be doing the query-time 'shingling' in your app, and then it's a somewhat more normal Solr query. fq= -shingle one -shingle two -shingle three etc. Or put em in seperate fq's depending on how you want to use your filter cache. Still searching on a non-tokenized field, and still normalizing on white-space and punctuation at both index time and (using same normalization logic but in your application logic this time) query time. I think that might work. So I'm not really sure, but maybe that gives you some ideas. Jonathan Satish Kumar wrote: Hi, We have a requirement to NOT display search results if user query contains terms that are in our anti-words field. For example, if user query is I have swollen foot and if some records in our index have swollen foot in anti-words field, we don't want to display those records. How do I go about implementing this? NOTE 1: anti-words field can contain multiple values. Each value can be a one or multiple words (e.g. swollen foot, headache, etc. ) NOTE 2: the match must be exact. If anti-words field contains swollen foot and if user query is I have swollen foot, record must be excluded. If user query is My foot is swollen, the record should not be excluded. Any pointers is greatly appreciated! Thanks, Satish
Re: Solr searching performance issues, using large documents
I've read through the DataImportHandler page a few times, and still can't figure out how to separate a large document into smaller documents. Any hints? :-) Thanks! -Peter On Aug 2, 2010, at 9:01 PM, Lance Norskog wrote: Spanning won't work- you would have to make overlapping mini-documents if you want to support this. I don't know how big the chunks should be- you'll have to experiment. Lance On Mon, Aug 2, 2010 at 10:01 AM, Peter Spam ps...@mac.com wrote: What would happen if the search query phrase spanned separate document chunks? Also, what would the optimal size of chunks be? Thanks! -Peter On Aug 1, 2010, at 7:21 PM, Lance Norskog wrote: Not that I know of. The DataImportHandler has the ability to create multiple documents from one input stream. It is possible to create a DIH file that reads large log files and splits each one into N documents, with the file name as a common field. The DIH wiki page tells you in general how to make a DIH file. http://wiki.apache.org/solr/DataImportHandler From this, you should be able to make a DIH file that puts log files in as separate documents. As to splitting files up into mini-documents, you might have to write a bit of Javascript to achieve this. There is no data structure or software that implements structured documents. On Sun, Aug 1, 2010 at 2:06 PM, Peter Spam ps...@mac.com wrote: Thanks for the pointer, Lance! Is there an example of this somewhere? -Peter On Jul 31, 2010, at 3:13 PM, Lance Norskog wrote: Ah! You're not just highlighting, you're snippetizing. This makes it easier. Highlighting does not stream- it pulls the entire stored contents into one string and then pulls out the snippet. If you want this to be fast, you have to split up the text into small pieces and only snippetize from the most relevant text. So, separate documents with a common group id for the document it came from. You might have to do 2 queries to achieve what you want, but the second query for the same query will be blindingly fast. Often 1ms. Good luck! Lance On Sat, Jul 31, 2010 at 1:12 PM, Peter Spam ps...@mac.com wrote: However, I do need to search the entire document, or else the highlighting will sometimes be blank :-( Thanks! - Peter ps. sorry for the many responses - I'm rushing around trying to get this working. On Jul 31, 2010, at 1:11 PM, Peter Spam wrote: Correction - it went from 17 seconds to 10 seconds - I was changing the hl.regex.maxAnalyzedChars the first time. Thanks! -Peter On Jul 31, 2010, at 1:06 PM, Peter Spam wrote: On Jul 30, 2010, at 1:16 PM, Peter Karich wrote: did you already try other values for hl.maxAnalyzedChars=2147483647 Yes, I tried dropping it down to 21, but it didn't have much of an impact (one search I just tried went from 17 seconds to 15.8 seconds, and this is an 8-core Mac Pro with 6GB RAM - 4GB for java). ? Also regular expression highlighting is more expensive, I think. What does the 'fuzzy' variable mean? If you use this to query via ~someTerm instead someTerm then you should try the trunk of solr which is a lot faster for fuzzy or other wildcard search. fuzzy could be set to * but isn't right now. Thanks for the tips, Peter - this has been very frustrating! - Peter Regards, Peter. Data set: About 4,000 log files (will eventually grow to millions). Average log file is 850k. Largest log file (so far) is about 70MB. Problem: When I search for common terms, the query time goes from under 2-3 seconds to about 60 seconds. TermVectors etc are enabled. When I disable highlighting, performance improves a lot, but is still slow for some queries (7 seconds). Thanks in advance for any ideas! -Peter - 4GB RAM server % java -Xms2048M -Xmx3072M -jar start.jar - schema.xml changes: fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ /analyzer /fieldType ... field name=body type=text_pl indexed=true stored=true multiValued=false termVectors=true termPositions=true termOffsets=true / field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false/ field name=version type=string indexed=true stored=true multiValued=false/ field name=device type=string indexed=true stored=true multiValued=false/ field name=filename type=string indexed=true stored=true multiValued=false/ field
Re: Process entire result set
Eloi Rocha wrote: Hi everybody, I would like to know if does make sense to use Solr in the following scenario: - search for large amount of data (like 1000, 1, 10 registers) - each register contains four or five fields (strings and integers) - every time will request for entire result set (I can paginate the results). It would be much better to get all results at once [...] Depends on what kinds of searching you're doing. Are you doing searching that needs an indexer like Solr? Then Solr is a good tool for your job. Are you not, and you can do what you want just as easily in an rdbms or non-sql store like MongoDB? Then I wouldn't use Solr. Assuming you really do need Solr, I think this should work, but I would not store the actual stored fields in Solr, I'd store those fields in an external store (key-value store, rdbms, whatever). You store only what you need to index in Solr, you do your search, you get ID's back. You ask for the entire result set back, why not. If you give Solr enough RAM, and set your cache settings appropriately (really big document and related caches), then I _think_ it should perform okay. One way to find out. What you'd get back is just ID's, then you'd look up that ID in your external store to get your actual fields you want to operate on. _May_ not be neccesary, maybe you could do it with solr stored fields, but making Solr do only exactly what you really need from it (an index) will maximize it's ability to do what you need in available RAM. If you don't need Solr/Lucene indexing/faceting behavior, and you can do just fine with an rdbms or non-sql store, use that. Jonathan
Re: Sharing index files between multiple JVMs and replication
Oh yes, replication will not work for shared files. It is about making your own copy from another machine. There is no read-only option but there should be. The files and directory can be read-only, I've done it. You could use the OS permission system to enforce read-only. Then you can just do a commit against the read-only instances, and this will reload the index without changing it. Lance On Wed, Aug 4, 2010 at 10:42 AM, Kelly Taylor wired...@yahoo.com wrote: Is anybody else encountering these same issues; IF having a similar setup? And is there a way to configure certain Solr web-apps as read-only (basically dummy instances) so that index changes are not allowed? - Original Message From: Kelly Taylor wired...@yahoo.com To: solr-user@lucene.apache.org Sent: Tue, August 3, 2010 5:48:11 PM Subject: Re: Sharing index files between multiple JVMs and replication Yes, they are on a common file server, and I've been sharing the same index directory between the Solr JVMs. But I seem to be hitting a wall when attempting to use just one instance for changing the index. With Solr replication disabled, I stream updates to the one instance, and this process hangs whenever there are additional Solr JVMs started up with the same configuration in solrconfig.xml - So I then tried, to no avail, using a different configuration, solrconfig-readonly.xml where the updateHandler was commmented out, all /update* requestHandlers removed, mainIndex locktype of none, etc. And with Solr replication enabled, the Slave seems to hang, or at least report unusually long time estimates for the current running replication process to complete. -Kelly - Original Message From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, August 3, 2010 4:56:58 PM Subject: Re: Sharing index files between multiple JVMs and replication Are these files on a common file server? If you want to share them that way, it actually does work just to give them all the same index directory, as long as only one of them changes it. On Tue, Aug 3, 2010 at 4:38 PM, Kelly Taylor wired...@yahoo.com wrote: Is there a way to share index files amongst my multiple Solr web-apps, by configuring only one of the JVMs as an indexer, and the remaining, as read-only searchers? I'd like to configure in such a way that on startup of the read-only searchers, missing cores/indexes are not created, and updates are not handled. If I can get around the files being locked by the read-only instances, I should be able to scale wider in a given environment, as well as have less replicated copies of my master index (Solr 1.4 Java Replication). Then once the commit is issued to the slave, I can fire off a RELOAD script for each of my read-only cores. -Kelly -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com
Re: Support loading queries from external files in QuerySenderListener
You can use an XInclude in solrconfig.xml. Your external query file has to be in the XML format. Lance On Wed, Aug 4, 2010 at 7:57 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Aug 4, 2010 at 3:27 PM, Stanislaw solrgeschic...@googlemail.comwrote: Hi all! I cant load my custom queries from the external file, as written here: https://issues.apache.org/jira/browse/SOLR-784 This option is seems to be not implemented in current version 1.4.1 of Solr. It was deleted or it comes first with new version? That patch was never committed so it is not available in any release. -- Regards, Shalin Shekhar Mangar. -- Lance Norskog goks...@gmail.com
Re: Index compatibility 1.4 Vs 3.1 Trunk
: Hello Mr. Horsetter, Please, call me Hoss. Mr. Horsetter is ... well frankly i have no idea who that is. : I again tried the code from trunk ' : https://svn.apache.org/repos/asf/lucene/dev/trunk' on solr 1.4 index and it Please note my previous comments... : a) the trunk is what will ultimately be Solr 4.x, not 3.x ... for the : 3.x line there is a 3x branch... : : http://wiki.apache.org/solr/Solr3.1 : http://wiki.apache.org/solr/Solr4.0 : : b) The 3x branch can read indexes created by Solr 1.4 -- the first time : you add a doc and commit the new segments wil automaticly be converted to : the new format. I am fairly certian that as of this moment, the 4x trunk : can also read indexes created by Solr 1.4, with the same automatic : converstion taking place. ...aparently i was mistaken about trunk that has already had the code for reading Lucene 2.9 indexes (what's used in Solr 1.4) removed (hence the IndexFormatTooOldException. But that doens't change hte fact that 3.1 will be able to read Solr 1.4 indexes. And 4.0 will be able to read 3.1 indexes. You should, infact, be able to use the 3x branch code today to open your SOlr 1.4 index, add one document to have it convert to a 3x index. then use the trunk code to open that index, add one doucment, andh ave it convert to a trunk index Of course: there is no garuntee that index format in the official 4.0 index format will be the same as what's on trunk right now -- it hasn't been officially released. : c) If/When the trunk can no longer read Solr 1.4 indexes, there will be : a tool provided for upgrading index versions. That should still be true in the the official 4.0 release (i really should have said When 4.0 can no longer read SOlr 1.4 indexes), ... i havne't been following the detials closely, but i suspect that tool hasn't been writen yet because there isn't much point until the full details of the trunk index format are nailed down. -Hoss
Re: Indexing fieldvalues with dashes and spaces
This confuses lots of people. When you index a field, it's Analyzed 10 ways from Sunday. Consider The World is an unknown Entity. When you INDEX it, many thing happen, depending upon the analyser. Stopwords may be removed. each token may be lower cased. Each token may be stemmed. It all depends on what's in your analyzer chain. Assume a simple chain consisting of breaking up tokens on whitespace, lowercasing, and removing stopwords. The actual tokens INDEXED would be world, unknown, and entity. That is what is searched against. However, the string, unchanged, would be STORED if you specified it so. So when you asked for the field to be returned in a search result, you would get The World is an unknown Entity if you asked for the field to be returned as part of a search result that matched on, say, world. HTH Erick On Thu, Aug 5, 2010 at 4:31 AM, PeterKerk vettepa...@hotmail.com wrote: @Michael, @Erick, You both mention interesting things that triggered me. @Erick: Your referenced page is very useful. It seems the whitespace tokenizer under the text_ws is causing issues. You do mention another interesting thing: And do be aware that fields you get back from a request (i.e. a search) are the stored fields, NOT what's indexed. On the page you provided I see this under the Analyzers section: Analyzers are components that pre-process input text at index time and/or at search time. So I dont completely understand how that sentence is in line with your comment. @Michael: You say: use the tokenized field to return results, but have a duplicate field of fieldtype=string to show the untokenized results. E.g. facet on that field. I think your comment applies on my requirement: a city field is something that I want users to search on via text input, so lets say New Yo would give the results for New York. But also a facet Cities is available in which New York is just one of the cities that is clickable. The other facet is theme, which in my example holds values like Gemeentehuis and Strand Zee, that would not be a thing on which can be searched via manual input but IS clickable. Could you please indicate (just for the above fields) what needs to be changed in my schema.xml and if so how that affects the way my request is build up? Thanks so much ahead in getting me started! This is my schema.xml ?xml version=1.0 encoding=UTF-8 ? schema name=db version=1.1 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=double class=solr.DoubleField omitNorms=true/ fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldType name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true/ fieldType name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ fieldType name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true/ fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ fieldType name=random class=solr.RandomSortField indexed=true / fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=textTight class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter
Re: Index compatibility 1.4 Vs 3.1 Trunk
On Thu, Aug 5, 2010 at 9:07 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: That should still be true in the the official 4.0 release (i really should have said When 4.0 can no longer read SOlr 1.4 indexes), ... i havne't been following the detials closely, but i suspect that tool hasn't been writen yet because there isn't much point until the full details of the trunk index format are nailed down. This is news to me? File formats are back-compatible between major versions. Version X.N should be able to read indexes generated by any version after and including version X-1.0, but may-or-may-not be able to read indexes generated by version X-2.N. (And personally I think there is stuff in 2.x like modified-utf8 that i would object to adding support for with terms now as byte[]) -- Robert Muir rcm...@gmail.com
Re: XML Format
can somebody help me please -- View this message in context: http://lucene.472066.n3.nabble.com/XML-Format-tp1024608p1028456.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr searching performance issues, using large documents
You may have to write your own javascript to read in the giant field and split it up. On Thu, Aug 5, 2010 at 5:27 PM, Peter Spam ps...@mac.com wrote: I've read through the DataImportHandler page a few times, and still can't figure out how to separate a large document into smaller documents. Any hints? :-) Thanks! -Peter On Aug 2, 2010, at 9:01 PM, Lance Norskog wrote: Spanning won't work- you would have to make overlapping mini-documents if you want to support this. I don't know how big the chunks should be- you'll have to experiment. Lance On Mon, Aug 2, 2010 at 10:01 AM, Peter Spam ps...@mac.com wrote: What would happen if the search query phrase spanned separate document chunks? Also, what would the optimal size of chunks be? Thanks! -Peter On Aug 1, 2010, at 7:21 PM, Lance Norskog wrote: Not that I know of. The DataImportHandler has the ability to create multiple documents from one input stream. It is possible to create a DIH file that reads large log files and splits each one into N documents, with the file name as a common field. The DIH wiki page tells you in general how to make a DIH file. http://wiki.apache.org/solr/DataImportHandler From this, you should be able to make a DIH file that puts log files in as separate documents. As to splitting files up into mini-documents, you might have to write a bit of Javascript to achieve this. There is no data structure or software that implements structured documents. On Sun, Aug 1, 2010 at 2:06 PM, Peter Spam ps...@mac.com wrote: Thanks for the pointer, Lance! Is there an example of this somewhere? -Peter On Jul 31, 2010, at 3:13 PM, Lance Norskog wrote: Ah! You're not just highlighting, you're snippetizing. This makes it easier. Highlighting does not stream- it pulls the entire stored contents into one string and then pulls out the snippet. If you want this to be fast, you have to split up the text into small pieces and only snippetize from the most relevant text. So, separate documents with a common group id for the document it came from. You might have to do 2 queries to achieve what you want, but the second query for the same query will be blindingly fast. Often 1ms. Good luck! Lance On Sat, Jul 31, 2010 at 1:12 PM, Peter Spam ps...@mac.com wrote: However, I do need to search the entire document, or else the highlighting will sometimes be blank :-( Thanks! - Peter ps. sorry for the many responses - I'm rushing around trying to get this working. On Jul 31, 2010, at 1:11 PM, Peter Spam wrote: Correction - it went from 17 seconds to 10 seconds - I was changing the hl.regex.maxAnalyzedChars the first time. Thanks! -Peter On Jul 31, 2010, at 1:06 PM, Peter Spam wrote: On Jul 30, 2010, at 1:16 PM, Peter Karich wrote: did you already try other values for hl.maxAnalyzedChars=2147483647 Yes, I tried dropping it down to 21, but it didn't have much of an impact (one search I just tried went from 17 seconds to 15.8 seconds, and this is an 8-core Mac Pro with 6GB RAM - 4GB for java). ? Also regular expression highlighting is more expensive, I think. What does the 'fuzzy' variable mean? If you use this to query via ~someTerm instead someTerm then you should try the trunk of solr which is a lot faster for fuzzy or other wildcard search. fuzzy could be set to * but isn't right now. Thanks for the tips, Peter - this has been very frustrating! - Peter Regards, Peter. Data set: About 4,000 log files (will eventually grow to millions). Average log file is 850k. Largest log file (so far) is about 70MB. Problem: When I search for common terms, the query time goes from under 2-3 seconds to about 60 seconds. TermVectors etc are enabled. When I disable highlighting, performance improves a lot, but is still slow for some queries (7 seconds). Thanks in advance for any ideas! -Peter - 4GB RAM server % java -Xms2048M -Xmx3072M -jar start.jar - schema.xml changes: fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ /analyzer /fieldType ... field name=body type=text_pl indexed=true stored=true multiValued=false termVectors=true termPositions=true termOffsets=true / field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false/ field name=version type=string indexed=true stored=true multiValued=false/ field name=device type=string indexed=true stored=true
Re: No group by? looking for an alternative.
I can see how one document per model blows up when you have many options. But how many models of the shoe do they actually make? They can't possibly make 5000, one for every metadat combination. If you go with one document per model, you have to do a second search on that product ID to get all of the models. Field Collapsing is exactly for the 'many shoes for one product' problem, but it is not released, so the second search is what you want. On Thu, Aug 5, 2010 at 4:54 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Mickael Magniez wrote: Thanks for your response. Unfortunately, I don't think it'll be enough. In fact, I have many other products than shoes in my index, with many other facets fields. I simplified my schema : in reality facets are dynamic fields. You could change the way you do indexing, so every product-color-size combo is it's own document. Document1: product: running shoe size: 12 color: red Document2: product: running shoe size: 13 color: red That would let you do the kind of facetting drill-down you want to do. It would of course make other things more complicated. But it's the only way I can think of to let you do the kind of facet drill-down you want, if I understand what you want correctly, which I may not. Jonathan -- Lance Norskog goks...@gmail.com
Query Result is not updated based on the new XML files
hi everyone, I run the query from the browser: http://172.16.17.126:8983/search/select/?q=AUC_CAT:978 the query is based on cat_978.xml which was produced by my PHP script and I got the correct result like this: response lst name=responseHeader int name=status0/int int name=QTime4/int lst name=params str name=q.opAND/str str name=flAUC_ID,AUC_CAT,AUC_DESCR_SHORT/str str name=start0/str str name=qAUC_CAT:978/str str name=rows1000/str /lst /lst result name=response numFound=1575 start=0 doc int name=AUC_CAT978/int str name=AUC_DESCR_SHORTHP Compaq Presario V3700Core 2 duo webcam wifi lan HD 160Gb DDR2 1Gb Tas original windows 7 ultimate/str int name=AUC_ID618436123/int /doc doc int name=AUC_CAT978/int str name=AUC_DESCR_SHORTHP Compaq Presario V3700Core 2 duo webcam wifi lan HD 160Gb DDR2 1Gb Tas original windows 7 ultimate/str int name=AUC_ID618436/int /doc /result /response now, I edit the AUC_ID field in cat_978.xml, I change 618436123 to 618436 (look the bold letters above) and I refresh the browser but it doesn't updated or reflect the changes I was made how to make the query result updated exactly based on cat_978.xml changes? really need your help thanks before -- View this message in context: http://lucene.472066.n3.nabble.com/Query-Result-is-not-updated-based-on-the-new-XML-files-tp1028575p1028575.html Sent from the Solr - User mailing list archive at Nabble.com.