Re: Stopwords
Hi, Not really as the words don’t exist in the corpus field. They way we have got around it in the past is to have another non stopped field that is also searched on (in addition to the the stopped field) with a boost to the score for matches. As an slight alternative you could do the above but choose a stopped or non stopped field if quotes are present when your application builds the query Regards David Stuart M +44(0) 778 854 2157 T +44(0) 845 519 5465 www.axistwelve.com Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK AXIS12 - Enterprise Web Solutions Reg Company No. 7215135 VAT No. 997 4801 60 This e-mail is strictly confidential and intended solely for the ordinary user of the e-mail account to which it is addressed. If you have received this e-mail in error please inform Axis12 immediately by return e-mail or telephone. We advise that in keeping with good computing practice the recipient of this e-mail should ensure that it is virus free. We do not accept any responsibility for any loss or damage that may arise from the use of this email or its contents. On 26 Jun 2014, at 10:33, Geert Van Huychem ge...@iframeworx.be wrote: Hello We have the default dutch stopwords implemented in our Solr instance, so words like ‘de’, ‘het’, ‘ben’ are filtered at index time. Is there a way to trick Solr into ignoring those stopwords at query time, when users puts the search terms between quotes? Best Geert Van Huychem IT Services Applications Manager T. +32 2 741 60 22 M. +32 497 27 69 03 ge...@iframeworx.be Media ID CVBA Rue Barastraat 175 1070 Bruxelles - Brussel (BE) www.media-id.be
Solr CoreAdmin RELOAD + Properties
Hey, In the Solr CoreAdmin CREATE action you have the ability to define arbitrary properties by defining propery.[name] = value, this works well in both Solr 3.x and Solr 4.x. To change a property value on a core in Solr 3.x you could run the CREATE command again and this would overwrite the value. In Solr 4.x you get a error saying core exitst (make sense) but I can’t see a way of update the properties values via a url without unloading and re creating the core (which is not great a this could cause a outage on the live system). I tried adding the propery.[name] = value as a part of the RELOAD action but that was ignored. Any ideas? If not I will create a patch for RELOAD to support this functionality. Regards, David Stuart M +44(0) 778 854 2157 T +44(0) 845 519 5465 www.axistwelve.com Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK AXIS12 - Enterprise Web Solutions Reg Company No. 7215135 VAT No. 997 4801 60 This e-mail is strictly confidential and intended solely for the ordinary user of the e-mail account to which it is addressed. If you have received this e-mail in error please inform Axis12 immediately by return e-mail or telephone. We advise that in keeping with good computing practice the recipient of this e-mail should ensure that it is virus free. We do not accept any responsibility for any loss or damage that may arise from the use of this email or its contents.
Re: Elevation and core create
HI Erick, Thanks for the response. On the wiki it states config-file Path to the file that defines query elevation. This file must exist in $instanceDir/conf/config-file or$dataDir/config-file. If the file exists in the /conf/ directory it will be loaded once at startup. If it exists in the data directory, it will be reloaded for each IndexReader. Which is the elevate.xml. So looks like I will go down the custom coding route. Regards, David Stuart M +44(0) 778 854 2157 T +44(0) 845 519 5465 www.axistwelve.com Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK AXIS12 - Enterprise Web Solutions Reg Company No. 7215135 VAT No. 997 4801 60 This e-mail is strictly confidential and intended solely for the ordinary user of the e-mail account to which it is addressed. If you have received this e-mail in error please inform Axis12 immediately by return e-mail or telephone. We advise that in keeping with good computing practice the recipient of this e-mail should ensure that it is virus free. We do not accept any responsibility for any loss or damage that may arise from the use of this email or its contents. On 2 Mar 2014, at 18:07, Erick Erickson erickerick...@gmail.com wrote: Hmmm, you _ought_ to be able to specify a relative path in str name=confFilessolrconfig_slave.xml:solrconfig.xml,x.xml,y.xml/str But there's certainly the chance that this is hard-coded in the query elevation component so I can't say that this'll work with assurance. Best, Erick On Sun, Mar 2, 2014 at 6:14 AM, David Stuart d...@axistwelve.com wrote: Hi sorry for the cross post but I got no response in the dev group so assumed I posted in the wrong place. I am using Solr 3.6 and am trying to automate the deployment of cores with a custom elevate file. It is proving to be difficult as most of the file (schema, stop words etc) support absolute path elevate seems to need to be in either a conf directory as a sibling to data or in the data directory itself. I am able to achieve my goal by having a secondary process that places the file but thought I would as the group just in case I have missed the obvious. Should I move to Solr 4 is it fixed here? I could also go down the root of extending the SolrCore create function to accept additional params and move the file into the defined data directory. Ideas? Thanks for your help David Stuart M +44(0) 778 854 2157 T +44(0) 845 519 5465 www.axistwelve.com Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK AXIS12 - Enterprise Web Solutions Reg Company No. 7215135 VAT No. 997 4801 60 This e-mail is strictly confidential and intended solely for the ordinary user of the e-mail account to which it is addressed. If you have received this e-mail in error please inform Axis12 immediately by return e-mail or telephone. We advise that in keeping with good computing practice the recipient of this e-mail should ensure that it is virus free. We do not accept any responsibility for any loss or damage that may arise from the use of this email or its contents.
Elevation and core create
Hi sorry for the cross post but I got no response in the dev group so assumed I posted in the wrong place. I am using Solr 3.6 and am trying to automate the deployment of cores with a custom elevate file. It is proving to be difficult as most of the file (schema, stop words etc) support absolute path elevate seems to need to be in either a conf directory as a sibling to data or in the data directory itself. I am able to achieve my goal by having a secondary process that places the file but thought I would as the group just in case I have missed the obvious. Should I move to Solr 4 is it fixed here? I could also go down the root of extending the SolrCore create function to accept additional params and move the file into the defined data directory. Ideas? Thanks for your help David Stuart M +44(0) 778 854 2157 T +44(0) 845 519 5465 www.axistwelve.com Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK AXIS12 - Enterprise Web Solutions Reg Company No. 7215135 VAT No. 997 4801 60 This e-mail is strictly confidential and intended solely for the ordinary user of the e-mail account to which it is addressed. If you have received this e-mail in error please inform Axis12 immediately by return e-mail or telephone. We advise that in keeping with good computing practice the recipient of this e-mail should ensure that it is virus free. We do not accept any responsibility for any loss or damage that may arise from the use of this email or its contents.
Re: Solr DataImportHandler (DIH) and Cassandra
This is good timing I am/was just to embark on a spike if anyone is keen to help out On 30 Nov 2010, at 00:37, Mark wrote: The DataSource subclass route is what I will probably be interested in. Are there are working examples of this already out there? On 11/29/10 12:32 PM, Aaron Morton wrote: AFAIK there is nothing pre-written to pull the data out for you. You should be able to create your DataSource sub class http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/DataSource.html Using the Hector java library to pull data from Cassandra. I'm guessing you will need to consider how to perform delta imports. Perhaps using the secondary indexes in 0.7* , or maintaining your own queues or indexes to know what has changed. There is also the Lucandra project, not exactly what your after but may be of interest anyway https://github.com/tjake/Lucandra Hope that helps. Aaron On 30 Nov, 2010,at 05:04 AM, Mark static.void@gmail.com wrote: Is there anyway to use DIH to import from Cassandra? Thanks
Re: Does Solr reload schema.xml dynamically?
If you are using Solr Multicore http://wiki.apache.org/solr/CoreAdmin you can issue a Reload command http://localhost:8983/solr/admin/cores?action=RELOADcore=core0 On 26 Oct 2010, at 11:09, Swapnonil Mukherjee wrote: Hi Everybody, If I change my schema.xml to, do I have to restart Solr. Is there some way, I can apply the changes to schema.xml without restarting Solr? Swapnonil Mukherjee
Re: DataImportHandler dynamic fields clarification
Two things, one are your DB column uppercase as this would effect the out. Second what does your db-data-config.xml look like Regards, Dave On 30 Sep 2010, at 03:01, harrysmith wrote: Looking for some clarification on DIH to make sure I am interpreting this correctly. I have a wide DB table, 100 columns. I'd rather not have to add 100 values in schema.xml and data-config.xml. I was under the impression that if the column name matched a dynamic Field name, it would be added. I am not finding this is the case, but only works when the column name is explicitly listed as a static field. Example: 100 column table, columns named 'COLUMN_1, COLUMN_2 ... COLUMN_100' If I add something like: field name=column_60 type=string indexed=true stored=true/ to schema.xml, and don't reference the column in data-config entity/field tag, it gets imported, as expected. However, if I use: dynamicField name=column_* type=string indexed=true stored=true/ It does not get imported into Solr, I would expect it would. Is this the expected behavior? -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1606159.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Combine Drupal solrconfig.xml with Nutch solrconfig.xml?
I would use the string version as Drupal will probably populate it with a url like thing something that may not validate as type url On 27 Jul 2010, at 04:00, Savannah Beckett wrote: I am trying to merge the schema.xml that is the solr/nutch setup with the one from drupal apache solr module. I encounter a field that is not mergeable. From drupal module: field name=url type=string indexed=true stored=true/ From solr/nutch setup: field name=url type=url stored=true indexed=true required=true/ I am not sure if there are any more stuff like this that is not mergeable. Is there a easy way to deal with schema.xml? Thanks. From: David Stuart david.stu...@progressivealliance.co.uk To: solr-user@lucene.apache.org Sent: Mon, July 26, 2010 1:46:58 PM Subject: Re: How to Combine Drupal solrconfig.xml with Nutch solrconfig.xml? Hi Savannah, I have just answered this question over on drupal.org. http://drupal.org/node/811062 Response number 5 and 11 will help you. On the solrconfig.xml side of things you will only really need Drupal's version. Although still in alpha my Nutch module will help you out with integration http://drupal.org/project/nutch Regards, David Stuart On 26 Jul 2010, at 21:37, Savannah Beckett wrote: I am using Drupal ApacheSolr module to integrate solr with drupal. I already integrated solr with nutch. I already moved nutch's solrconfig.xml and schema.xml to solr's example directory, and it work. I tried to append Drupal's ApacheSolr module's own solrconfig.xml and schema.xml into the same xml files, but I got the following error when I java -jar start.jar: Jul 26, 2010 1:18:31 PM org.apache.solr.common.SolrException log SEVERE: Exception during parsing file: solrconfig.xml:org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124) at org.apache.solr.core.Config.init(Config.java:110) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:130) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:134) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) Why? does solrconfig.xml allow to have 2 config sections? does schema.xml allow to have 2 schema sections? Thanks.
Re: How to Combine Drupal solrconfig.xml with Nutch solrconfig.xml?
Hi Savannah, I have just answered this question over on drupal.org. http://drupal.org/node/811062 Response number 5 and 11 will help you. On the solrconfig.xml side of things you will only really need Drupal's version. Although still in alpha my Nutch module will help you out with integration http://drupal.org/project/nutch Regards, David Stuart On 26 Jul 2010, at 21:37, Savannah Beckett wrote: I am using Drupal ApacheSolr module to integrate solr with drupal. I already integrated solr with nutch. I already moved nutch's solrconfig.xml and schema.xml to solr's example directory, and it work. I tried to append Drupal's ApacheSolr module's own solrconfig.xml and schema.xml into the same xml files, but I got the following error when I java -jar start.jar: Jul 26, 2010 1:18:31 PM org.apache.solr.common.SolrException log SEVERE: Exception during parsing file: solrconfig.xml:org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124) at org.apache.solr.core.Config.init(Config.java:110) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:130) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:134) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) Why? does solrconfig.xml allow to have 2 config sections? does schema.xml allow to have 2 schema sections? Thanks.
Re: Importing large datasets
On 3 Jun 2010, at 02:58, Dennis Gearon gear...@sbcglobal.net wrote: When adding data continuously, that data is available after committing and is indexed, right? Yes If so, how often is reindexing do some good? You should only need to reindex if the data changes or you change your schema. The DIH in solr 1.4 supports delta imports so you should only really be adding of updating (which is actually deleting and adding) items when necessary. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 6/2/10, Andrzej Bialecki a...@getopt.org wrote: From: Andrzej Bialecki a...@getopt.org Subject: Re: Importing large datasets To: solr-user@lucene.apache.org Date: Wednesday, June 2, 2010, 4:52 AM On 2010-06-02 13:12, Grant Ingersoll wrote: On Jun 2, 2010, at 6:53 AM, Andrzej Bialecki wrote: On 2010-06-02 12:42, Grant Ingersoll wrote: On Jun 1, 2010, at 9:54 PM, Blargy wrote: We have around 5 million items in our index and each item has a description located on a separate physical database. These item descriptions vary in size and for the most part are quite large. Currently we are only indexing items and not their corresponding description and a full import takes around 4 hours. Ideally we want to index both our items and their descriptions but after some quick profiling I determined that a full import would take in excess of 24 hours. - How would I profile the indexing process to determine if the bottleneck is Solr or our Database. As a data point, I routinely see clients index 5M items on normal hardware in approx. 1 hour (give or take 30 minutes). When you say quite large, what do you mean? Are we talking books here or maybe a couple pages of text or just a couple KB of data? How long does it take you to get that data out (and, from the sounds of it, merge it with your item) w/o going to Solr? - In either case, how would one speed up this process? Is there a way to run parallel import processes and then merge them together at the end? Possibly use some sort of distributed computing? DataImportHandler now supports multiple threads. The absolute fastest way that I know of to index is via multiple threads sending batches of documents at a time (at least 100). Often, from DBs one can split up the table via SQL statements that can then be fetched separately. You may want to write your own multithreaded client to index. SOLR-1301 is also an option if you are familiar with Hadoop ... If the bottleneck is the DB, will that do much? Nope. But the workflow could be set up so that during night hours a DB export takes place that results in a CSV or SolrXML file (there you could measure the time it takes to do this export), and then indexing can work from this file. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Importing large datasets
On 3 Jun 2010, at 02:51, Dennis Gearon gear...@sbcglobal.net wrote: Well, I hope to have around 5 million datasets/documents within 1 year, so this is good info. BUT if I DO have that many, then the market I am aiming at will end giving me 100 times more than than within 2 years. Are there good references/books on using Solr/Lucen/(linux/nginx) for 500 million plus documents? As far as I'm aware there aren't any books yet that cover this for solr. The wiki, this mailing list, nabble are your best sources and there have been some quite indepth conversations on the matter in this list in the past The data is easily shardible geographially, as one given. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 6/2/10, Grant Ingersoll gsing...@apache.org wrote: From: Grant Ingersoll gsing...@apache.org Subject: Re: Importing large datasets To: solr-user@lucene.apache.org Date: Wednesday, June 2, 2010, 3:42 AM On Jun 1, 2010, at 9:54 PM, Blargy wrote: We have around 5 million items in our index and each item has a description located on a separate physical database. These item descriptions vary in size and for the most part are quite large. Currently we are only indexing items and not their corresponding description and a full import takes around 4 hours. Ideally we want to index both our items and their descriptions but after some quick profiling I determined that a full import would take in excess of 24 hours. - How would I profile the indexing process to determine if the bottleneck is Solr or our Database. As a data point, I routinely see clients index 5M items on normal hardware in approx. 1 hour (give or take 30 minutes). When you say quite large, what do you mean? Are we talking books here or maybe a couple pages of text or just a couple KB of data? How long does it take you to get that data out (and, from the sounds of it, merge it with your item) w/o going to Solr? - In either case, how would one speed up this process? Is there a way to run parallel import processes and then merge them together at the end? Possibly use some sort of distributed computing? DataImportHandler now supports multiple threads. The absolute fastest way that I know of to index is via multiple threads sending batches of documents at a time (at least 100). Often, from DBs one can split up the table via SQL statements that can then be fetched separately. You may want to write your own multithreaded client to index. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: Importing large datasets
On 3 Jun 2010, at 03:51, Blargy zman...@hotmail.com wrote: Would dumping the databases to a local file help at all? I would suspect not especally with the size of your data. But it would be good to know how long that takes i.e. Creating a SQL script that just pulls that data out how long does that take? Also have many fields are you indexing per document 10,50,100? -- View this message in context: http://lucene.472066.n3.nabble.com/Importing-large-datasets-f tp863447p866538.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Importing large datasets
How long does it take to do a grab of all the data via SQL? I found by denormalizing the data into a lookup table meant that I was able to index about 300k rows of similar data size with dih regex spilting on some fields in about 8mins I know it's not quite the scale bit with batching... David Stuar On 2 Jun 2010, at 17:58, Blargy zman...@hotmail.com wrote: One thing that might help indexing speed - create a *single* SQL query to grab all the data you need without using DIH's sub-entities, at least the non-cached ones. Not sure how much that would help. As I mentioned that without the item description import the full process takes 4 hours which is bearable. However once I started to import the item description which is located on a separate machine/database the import process exploded to over 24 hours. -- View this message in context: http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865324.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Storing different entities in Solr
Hi, So for your use case are you wanting to search for a consultant then look at all of his or her request or pull both at the same time? In both cases one index should suffice. In you define a primary key field and use it for both doc types it shouldn't be an issue. Unless your dataset in very large it would reduce the overhead of running a multicore solution especially in indexing etc David Stuart On 28 May 2010, at 18:12, Moazzam Khan moazz...@gmail.com wrote: Thanks for all your answers guys. Requests and consultants have a many to many relationship so I can't store request info in a document with advisorID as the primary key. Bill's solution and multicore solutions might be what I am looking for. Bill, will I be able to have 2 primary keys (so I can update and delete documents)? If yes, can you please give me a link or someting where I can get more info on this? Thanks, Moazzam On Fri, May 28, 2010 at 11:50 AM, Bill Au bill.w...@gmail.com wrote: You can keep different type of documents in the same index. If each document has a type field. You can restrict your searches to specific type(s) of document by using a filter query, which is very fast and efficient. Bill On Fri, May 28, 2010 at 12:28 PM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: Multi-core is an option, but keep in mind if you go that route you will need to do two searches to correlate data between the two. -Kallin Nagelberg -Original Message- From: Robert Zotter [mailto:robertzot...@gmail.com] Sent: Friday, May 28, 2010 12:26 PM To: solr-user@lucene.apache.org Subject: Re: Storing different entities in Solr Sounds like you'll want to use a multiple core setup. One core fore each type of document http://wiki.apache.org/solr/CoreAdmin -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-different-entities-in-Solr-tp852299p852346.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to patch solr-236 in mac os
Hey, In osx you shoud be able to patch in the same way as on liux patch -p [level] name_of_patch.patch. You can do this from the shell including on the mac. David Stuart On 11 May 2010, at 17:15, Jonty Rhods jonty.rh...@gmail.com wrote: hi all, I am very new to solr. Now I required to patch solr (patch no 236). I download the latest src code and patch, but unable to finde suitable way to patch. I have eclipse installed. please guide me..
Re: how to patch solr-236 in mac os
Hi jonty, In then root directory of the src run patch -p0 name_of_patch.patch David Stuart On 11 May 2010, at 17:50, Jonty Rhods jonty.rh...@gmail.com wrote: hi David, thanks for quick reply.. please give me full command. so I can patch. what is meaning of [level]. As I write I had downloaded latest src from trunk.. So please also tell that, in terminal what will be command and from where I can run.. should I try patch -p[level] name_of_patch.patch thanks On Tue, May 11, 2010 at 10:02 PM, David Stuart david.stu...@progressivealliance.co.uk wrote: Hey, In osx you shoud be able to patch in the same way as on liux patch -p[level] name_of_patch.patch. You can do this from the shell including on the mac. David Stuart On 11 May 2010, at 17:15, Jonty Rhods jonty.rh...@gmail.com wrote: hi all, I am very new to solr. Now I required to patch solr (patch no 236). I download the latest src code and patch, but unable to finde suitable way to patch. I have eclipse installed. please guide me..
Re: Switching cores dynamically
Using a multicore setup should do the trick see http://wiki.apache.org/solr/CoreAdmin specificly the swap option Cheers David Stuart On 19 Mar 2010, at 10:18, muneeb muneeba...@hotmail.com wrote: Hi, I have indexed almost 7 million articles on two separate cores, each with their own conf/ and data/ folder, i.e. they have their individual index. What I normally do is, use core0 for querying and core1 for any updates and once updates are finished i copy the index of core1 to core0's data folder. I know this isn't an efficient way of doing this, since this brings a downtime on my search service for a couple of minutes. I was wondering if its possible to switch between cores dynamically (keeping my current setup in mind) in such a way that there is no downtime at all during switching. Thanks very much in advance. -M -- View this message in context: http://old.nabble.com/Switching-cores-dynamically-tp27950928p27950928.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on Solr Scalability
Hi, I think your needs would meet better with Distributed Search http://wiki.apache.org/solr/DistributedSearch Which allows sharding to live on different servers and will search across all of those shard when a query comes in. There are a few patch which will hopefully be available in the Solr 1.5 release that will improve this including distributed tf idf across shards Regards, David On 11 Feb 2010, at 07:12, abhishes abhis...@gmail.com wrote: Suppose I am indexing very large data (5 billion rows in a database) Now I want to use the Solr Core feature to split the index into manageable chunks. However I have two questions 1. Can Cores reside on difference physical servers? 2. when a query comes, will the query be answered by index in 1 core or the query will be sent to all the cores? My desire is to have a system which from outside appears as a single large index... but inside it is multiple small indexes running on different hardware machines. -- View this message in context: http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html Sent from the Solr - User mailing list archive at Nabble.com.
Default value attribute in RSS DIH
Hey All, Can anyone tell me what the attribute name is for defining a default value in the field tag of the RSS data import handler?? Basically I want to do something like field column=type value=external_source commonField=true/ Any Ideas? Regards, Dave
Re: MoreLikeThis - How to pass in external text?
The MoreLikeThisHandler allows external text to be streamed to it see http://wiki.apache.org/solr/MoreLikeThisHandler#Using_ContentStreams. The url feature is quite good if you have a lot of text and start hitting the character limit in the url Regards, Dave On 22 Jan 2010, at 05:24, Otis Gospodnetic wrote: Hi, Try what I suggested, please. Or, if you want, go to that (or any other) web page, copy a large chunk of its content, and paste it into Google/Yahoo/Bing. I just did that. Google said my query was too long, but Yahoo took it. Guess what hit #1 was? The page I copied the text from! Very much more like this-like. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: ldung dung@gmail.com To: solr-user@lucene.apache.org Sent: Fri, January 22, 2010 12:08:26 AM Subject: Re: MoreLikeThis - How to pass in external text? I want to use MoreLikeThis since i want to find text in the Solr data that is similar to the input text. I want to see how will this works against just a standard keyword search. I want to do something similar to the article below. http://www.bbc.co.uk/blogs/radiolabs/2008/06/wikipedia_plus_lucene_morelikethis.shtml In the article the author uses MoreLikeThis to classifiy text according into pre-existing categories. Otis Gospodnetic wrote: Hi, if you have text to pass in, why do you need MoreLikeThis? The text you speak of can be used as a normal query, so pass it in as a regular multi-word query. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: ldung To: solr-user@lucene.apache.org Sent: Thu, January 21, 2010 8:08:41 PM Subject: MoreLikeThis - How to pass in external text? How can I have the MoreLikeThis query process a piece of text that is passed into the query. Currently I can only get it MoreLikeThis to work only for pieces of text that are already indexed by Solr. For example here is a query that works for using MoreLikeThis for document with id:134847893. http://localhost:8983/solr/select?mlt=trueq=id:134847893mlt.fl=descmlt.mindf=1mlt.mintf=1debugQuery=on How can I pass in some external text like 'Solr Rocks'. Below is an example of how it would look like. http://localhost:8983/solr/select?mlt=trueexternal.text=Solr Rocksmlt.fl=descmlt.mindf=1mlt.mintf=1debugQuery=on -- View this message in context: http://old.nabble.com/MoreLikeThis---How-to-pass-in-external-text--tp27266316p27266316.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/MoreLikeThis---How-to-pass-in-external-text--tp27266316p27268777.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Drupal module problem
Hi, The Drupal Solr Module will work with both Solr 1.3 and 1.4 I currently have client installations using both these versions with Drupal (verison 5 and 6 ) Regards, Dave On 14 Jan 2010, at 23:08, Otis Gospodnetic wrote: You may want to ask on Drupal's mailing lists. I hear about Drupal and Solr constantly, I can't imagine them not having Solr 1.4 support, esp. if you say their configs contain referenes to things that are in Solr 1.4.0. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: reallove thereall...@gmail.com To: solr-user@lucene.apache.org Sent: Thu, January 14, 2010 5:54:55 PM Subject: Re: Solr Drupal module problem Hello, Thanks for the answer. Unfortunately, in the Debian repositories, even in testing, latest Solr version is 1.3.0 . Can I use that for the Drupal module to work ? I highly prefer to use the Debian repositories instead the source code. Thank you. Otis Gospodnetic wrote: Hi, Solr 1.2.0 didn't have TrieIntField. Use the latest Solr - Solr 1.4.0 Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: reallove To: solr-user@lucene.apache.org Sent: Thu, January 14, 2010 5:43:23 PM Subject: Solr Drupal module problem Hello, System : Debian 5.0 Java , tomcat solr installed from the repositories. Java version 1.6_12 , tomcat 5.5 and solr 1.2.0 . I am trying to use the schema.xml and the solrconfig.xml from the Drupal module, but they fail to work. The error I am getting is : Error loading class 'solr.TrieIntField' . How can I fix this ? Thank you ! -- View this message in context: http://old.nabble.com/Solr-Drupal-module-problem-tp27169365p27169365.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Solr-Drupal-module-problem-tp27169365p27169511.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: I cant get it to work
Hi, The answer is it depends ;) If your 10 tables represent an entity e.g a person their address etc the one document entity works But if your 10 tables each represnt a series of entites that you want to surface in your search results separately then make a document for each (I.e it depends on your data). What is your use case? Are you wanting a search index that is able to search on every field in your 10 tables or just a few? Think of it this way if you where creating SQL to pull the data out of the db using joins etc what fields would you grab, do you get multiple rows back because some of you tables have a one to many relationship. Once you have formed that query that is your document minus the duplicate information caused by the rows Cheers David On 15 Dec 2009, at 08:05, Faire Mii faire@gmail.com wrote: I just cant get it. If i got 10 tables in mysql and they are all related to eachother with foreign keys. Should i have 10 documents in solr? or just one document with rows from all tables in it? i have tried in vain for 2 days now...plz help regards fayer
Re: Log of zero result searches
The returning XML result tag has a numFound attribute that will report 0 if nothing matches your search criteria David On 15 Dec 2009, at 08:16, Roland Villemoes r...@alpha-solutions.dk wrote: Hi Question: How do you log zero result searches? I quite important from a business perspective to know what searches that returns zero/empty results. Does anybody know a way to get this information? Roland Villemoes
Re: is it possible to use Xinclude in schema.xml?
Yea i tried it as well it doesn't seem to implement xpointer properly so you can't add multiple fields or field types David On 28 Nov 2009, at 18:49, Peter Wolanin peter.wola...@acquia.com wrote: Follow-up: it seems the schema parser doesn't barf if you use xinclude with a single analyzer element, but so far seems like it's impossible for a field type. So this seems to work: fieldType name=text class=solr.TextField positionIncrementGap=100 xi:include href=solr/core2/conf/text-analyzer.xml xi:fallback analyzer type=index ... /analyzer /xi:fallback /xi:include analyzer type=query ... /analyzer /fieldType On Sat, Nov 28, 2009 at 1:40 PM, Peter Wolanin peter.wola...@acquia.com wrote: I'm trying to determine if it's possible to use Xinclude to (for example) have a base schema file and then substitute various pieces. It seems that the schema fieldTypes throw exceptions if there is an unexpected attribute? SEVERE: java.lang.RuntimeException: schema fieldtype text(org.apache.solr.schema.TextField) invalid arguments:{xml:base=solr/core2/conf/text-analyzer.xml} This is what I'm trying to do (details of the analyzer chain omitted - nothing unusual) - so the error occurs when the external xml file is actually included: xi:include href=solr/core2/conf/text-analyzer.xml xmlns:xi=http://www.w3.org/2001/XInclude; xi:fallback fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index ... /analyzer analyzer type=query ... /analyzer /fieldType /xi:fallback /xi:include Where (for testing) the text-analyzer.xml file just looks like the fallback: ?xml version=1.0 encoding=UTF-8 ? fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index ... /analyzer analyzer type=query ... /analyzer /fieldType -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: memory size
Hi This is a php problem you need to increase your per thread memory limit in your php.ini the field name is memory_limit Regards David On 11 Nov 2009, at 07:56, Jörg Agatz joerg.ag...@googlemail.com wrote: Hallo, I have a Problem withe the Memory Size, but i dont know how i can repair it. Maby it is a PHP problem, but i dont know. My Error: Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 16515072 bytes) I hope you can help me KinGArtus
Re: apply a patch on solr
You should be ok with the the revision option below. Look for the highest revision number in the list of files in the patch as subversion increments revision number on a repo basis not a file basis so the highest number will represent the current state of all the files when the patch was made if that make sense Regards Dave On 4 Nov 2009, at 03:40, michael8 mich...@saracatech.com wrote: Perfect. This is what I need to know instead of patching 'in the dark'. Good thing SVN revision cuts across all files like a tag. Thanks Mike! Michael cambridgemike wrote: You can see what revision the patch was written for at the top of the patch, it will look like this: Index: org/apache/solr/handler/MoreLikeThisHandler.java === --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437) +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy) now check out revision 772437 using the --revision switch in svn, patch away, and then svn up to make sure everything merges cleanly. This is a good guide to follow as well: http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html cheers, -mike On Mon, Nov 2, 2009 at 3:55 PM, michael8 mich...@saracatech.com wrote: Hi, First I like to pardon my novice question on patching solr (1.4). What I like to know is, given a patch, like the one for collapse field, how would one go about knowing what solr source that patch is meant for since this is a source level patch? Wouldn't the exact versions of a set of java files to be patched critical for the patch to work properly? So far what I have done is to pull the latest collapse field patch down from http://issues.apache.org/jira/browse/SOLR-236 (field- collapse-5.patch), and then svn up the latest trunk from http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and build. Intuitively I was thinking I should be doing svn up to a specific revision/tag instead of just latest. So far everything seems fine, but I just want to make sure I'm doing the right thing and not just being lucky. Thanks, Michael -- View this message in context: http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26189573.html Sent from the Solr - User mailing list archive at Nabble.com.
Conditional copyField
Hi, I am pushing data to solr from two different sources nutch and a cms. I have a data clash in that in nutch a copyField is required to push the url field to the id field as it is used as the primary lookup in the nutch solr intergration update. The other cms also uses the url field but also populates the id field with a different value. Now I can't really change either source definition so is there a way in solrconfig or schema to check if id is empty and only copy if true or is there a better way via the updateprocessor? Thanks for your help in advance Regards David
xincludes schema help
Hi, I am trying to get xincludes with xpointer working in schema.xml as per this closed issue requrest https://issues.apache.org/jira/browse/SOLR-1167 . To make our upgrade path easier I want to be able to include extra custom fields in the schema and am including an extra set of fields inside the fields tags but keep getting a XPointer resolution unsuccessful error. Files below. schema types.../types fields field name=site type=string indexed=true stored=true/ field name=hash type=string indexed=true stored=true/ field name=url type=string indexed=true stored=true/ xi:include href=/usr/local/solr_home/solr/db/conf/ nutch_schema.xml parse=xml xpointer=./nutch/* xmlns:xi=http://www.w3.org/2001/XInclude/ /fields /schema -- Include file -- nutch !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true / field name=content type=text stored=true indexed=true / copyField source=content dest=body/ copyField source=content dest=teaser/ /nutch I have also tried this to add muliple extra fieldType definitions xi:include href=/usr/local/solr_home/solr/db/conf/ nutch_schema.xml parse=xml xpointer=./extraFieldTypes/* xmlns:xi=http://www.w3.org/2001/XInclude/ extraFieldTypes fieldType name=url class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory / /analyzer /fieldType fieldType/fieldType /extraFieldTypes Any thoughts Thanks for your help Regards, David