RE: Faceting with null dates
yes, I see that my question was a bit confusing. But thanks for your answers. I will try to clarify a bit. I query on a date field, validToDate. The value for this field is not present for 99% of the documents. What I would like to get is 1) the number of documents for a given date range R1 that do not have a value for the validToDate, i.e. the 99% of the documents 2) the number of documents for a given date range R2 that do have a value for the validToDate My question is really: is it possible to have just one query, or do I need to have two queries; one for 1) and one for 2). Will the facet.range.other=all help me in any way here? /k Date: Thu, 15 Dec 2011 12:25:12 -0800 From: hossman_luc...@fucit.org To: solr-user@lucene.apache.org Subject: Re: Faceting with null dates First of all, we need to clarify some terminology here: there is no such thing as a null date in solr -- or for that matter, there is no such thing as a full value in any field. documents either have some value(s) for a field, or they do not hvae any values. If you want to constrain your query to only documents that have a value in a field, you can use something like fq=field_name:[* TO *] ... if you want to constraint your query to only documents that do *NOT* have a value in a field, you can use fq=-field_name:[* TO *] Now, having said that, like Erick, i'm a little confused by your question -- it's not clear if what you really want to do is: a) change the set of documents returned in the main result list b) change the set of documents considered when generating facet counts (w/o changing the main result list) c) return an additional count of documents that are in the main result list, but are not in the facet counts because they do not have the field being faceted on. My best guess is that you are asking about c based on your last sentence... : get is 3 results and 7 non-null validToDate facets. And as I write this, : I start to wonder if this is possible at all as the facets are dependent : on the result set and that this might be better to handle in the : application layer by just extracting 10-7=3... ...subtracting the sum of all constraint counts from your range facet from the total number of documents found won't neccessarily tell you the number of documents that have no value in the field you are faceting on -- because documents may have values out side the range of your start/end. Depending on what exactly it is you are looking for, you might find the facet.range.other=all param useful, as it will return things like the between counts (summing up all the docs between start-end) as well as the before and after counts. But if you really just want to know how many docs have no value for my validToDate field? you can get that very explicitly and easily using facet.query=-validToDate:[* TO *] : codestr name=facettrue/strstr : name=f.validToDate.facet.range.startNOW/DAYS-4MONTHS/strstr : name=facet.mincount1/strstr name=q(*:*)/strarr : name=facet.rangestrvalidToDate/str/arrstr : name=facet.range.endNOW/DAY+1DAY/strstr : name=facet.range.gap+1MONTH/str/code : : result name=response numFound=10 start=0lst : name=facet_countslst name=facet_ranges lst name=validToDate : lst name=counts int name=2011-11-14T00:00:00Z7/int -Hoss
Re: Replication not working
Yeh the drop index via the URL command doesn't help anyway - when rebuilding the index the timestamp is obviously ahead of master (as the slave is being created now) so the replication will still not happen. On 21 Dec 2011, at 16:37, Dean Pullen wrote: I can't see a way, if the slave is on another server. We're going to upgrade solr - as you can delete the index after unloading a core in this way: cores?action=UNLOADcore=liveCoredeleteIndex=true From v3.3 (I think) On 21 Dec 2011, at 16:11, Dean Pullen wrote: Thought as much, thanks for the reply. Is there an easy way of dropping the index on the slave, or do I have to manually delta the index files? Regards, Dean. On 21 Dec 2011, at 15:54, Erick Erickson wrote: You've probably hit it on the head. The slave version is greater than the master version, so replication isn't necessary. BTW, the version starts life as a timestamp, but then is simply incremented on successive commits, which accounts for what you are seeing. You should be able to blow the index away on the slave and wait for replication and go from there. Another possibility: How much faith do you have in your slave index? If it's all good, you could simply copy *that* to the master manually and go from there. If you're rebuilding your entire index, just blow the master index away, re-index from scratch and that should work too (be sure to disable replication during the rebuild unless you want a partial index on the slave). Although copying the files *then* deciding not to use them doesn't seem like a good thing. Not sure if 3.x has the same behavior or not... Best Erick On Wed, Dec 21, 2011 at 10:46 AM, Dean Pullen dean.pul...@semantico.com wrote: E.g. I see this in the slave logs: 2011-12-21 15:45:27,635 INFO handler.SnapPuller:265 - Master's version: 1271406570655, generation: 376 2011-12-21 15:45:27,635 INFO handler.SnapPuller:266 - Slave's version: 1271406571565, generation: 1286 2011-12-21 15:45:27,636 INFO handler.SnapPuller:267 - Starting replication process 2011-12-21 15:45:27,639 INFO handler.SnapPuller:270 - Number of files in latest index in master: 9 … 2011-12-21 15:45:50,997 INFO handler.SnapPuller:286 - Total time taken for download : 23 secs 2011-12-21 15:45:51,050 INFO handler.SnapPuller:586 - New index installed. Updating index properties… Yet the index doesn't change! On 21 Dec 2011, at 15:37, Dean Pullen wrote: Hi all, I have an odd problem locally when attempting replication with solr 1.4 The problem is, though the master files get copied to a temp directory in the slave data directory (I see this happen at runtime), they are then not copied over the actual slave index data. We were wondering if it was due to the index version of the restored master data being behind the slave index version after a restore? Any other ideas would be appreciated. Thanks, Dean Pullen
Re: Replication not working
We're simply restoring the master via a backed up snapshot (created using the ReplicationHandler) and then trying to get the slave to replicate it. On 21 Dec 2011, at 18:09, Erick Erickson wrote: You can't. But index restoration should be a very rare thing, or you have some lurking problem in your process. Or this is an XY problem, what problem are you trying to solve? see: http://people.apache.org/~hossman/#xyproblem Best Erick On Wed, Dec 21, 2011 at 12:21 PM, Dean Pullen dean.pul...@semantico.com wrote: I can't understand, then, how we could ever restore and get replication to work without manual intervention! Dean On 21 Dec 2011, at 16:37, Dean Pullen wrote: I can't see a way, if the slave is on another server. We're going to upgrade solr - as you can delete the index after unloading a core in this way: cores?action=UNLOADcore=liveCoredeleteIndex=true From v3.3 (I think) On 21 Dec 2011, at 16:11, Dean Pullen wrote: Thought as much, thanks for the reply. Is there an easy way of dropping the index on the slave, or do I have to manually delta the index files? Regards, Dean. On 21 Dec 2011, at 15:54, Erick Erickson wrote: You've probably hit it on the head. The slave version is greater than the master version, so replication isn't necessary. BTW, the version starts life as a timestamp, but then is simply incremented on successive commits, which accounts for what you are seeing. You should be able to blow the index away on the slave and wait for replication and go from there. Another possibility: How much faith do you have in your slave index? If it's all good, you could simply copy *that* to the master manually and go from there. If you're rebuilding your entire index, just blow the master index away, re-index from scratch and that should work too (be sure to disable replication during the rebuild unless you want a partial index on the slave). Although copying the files *then* deciding not to use them doesn't seem like a good thing. Not sure if 3.x has the same behavior or not... Best Erick On Wed, Dec 21, 2011 at 10:46 AM, Dean Pullen dean.pul...@semantico.com wrote: E.g. I see this in the slave logs: 2011-12-21 15:45:27,635 INFO handler.SnapPuller:265 - Master's version: 1271406570655, generation: 376 2011-12-21 15:45:27,635 INFO handler.SnapPuller:266 - Slave's version: 1271406571565, generation: 1286 2011-12-21 15:45:27,636 INFO handler.SnapPuller:267 - Starting replication process 2011-12-21 15:45:27,639 INFO handler.SnapPuller:270 - Number of files in latest index in master: 9 … 2011-12-21 15:45:50,997 INFO handler.SnapPuller:286 - Total time taken for download : 23 secs 2011-12-21 15:45:51,050 INFO handler.SnapPuller:586 - New index installed. Updating index properties… Yet the index doesn't change! On 21 Dec 2011, at 15:37, Dean Pullen wrote: Hi all, I have an odd problem locally when attempting replication with solr 1.4 The problem is, though the master files get copied to a temp directory in the slave data directory (I see this happen at runtime), they are then not copied over the actual slave index data. We were wondering if it was due to the index version of the restored master data being behind the slave index version after a restore? Any other ideas would be appreciated. Thanks, Dean Pullen
Re: Update schema.xml using solrj APIs
Hi Ahmed, if you have a multi core setup, you could change the file programmatically (e.g. via XML parser), copy the new file to the existing one (programmatically, of course), then reload the core. I haven't reloaded the core programmatically, yet, but that should be doable via SolrJ. Or - if you are not using Java, then call the specific core admin URL in your programme. You will have to re-index after changing the schema.xml. Chantal On Thu, 2011-12-22 at 04:34 +0100, Otis Gospodnetic wrote: Ahmed, At this point in time - no. You need to edit it manually and restart Solr to see the changed. This will change in the future. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Ahmed Abdeen Hamed ahmed.elma...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, December 21, 2011 4:12 PM Subject: Update schema.xml using solrj APIs Hello friend, I am new to Solrj and I am wondering if there is a away you can update the schema.xml file via the APIs. I would appreciate any help. Thanks very much, -Ahmed
Re: delta-import of rich documents like word and pdf files!
Hi Guys, I probably found a way to mime the delta import for the fileEntityProcessor ( I have used it for xml files ... ) Adding this configuration in the xml-data-config : entity name=personeImpreseList rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=^.*\.xml$ recursive=false baseDir=/data/listPersoneImprese *newerThan='${dataimporter.last_index_time}'* And using command : *command=full-importclean=false* * * Solr adds to the index only the files that were changed from the last indexing session . Probably this was an obvious way, but I want to know your opinion about this. Cheers 2011/11/12 neuron005 neuron...@gmail.com I want to perform delta import of my rich documents like pdf and word files. I added pk=something in my data-config.xml file. But now I dont know my next step. How delta-import will come to know which fields get updated .I am not getting connected to database. Is there any query like database queries of deltaImportQuery and deltaQuery? Does anyone has a solution? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/delta-import-of-rich-documents-like-word-and-pdf-files-tp3502039p3502039.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Personal Page: http://tigerbolt.altervista.org Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Faceting with null dates
1) the number of documents for a given date range R1 that do not have a value for the validToDate, i.e. the 99% of the documents Makes no sense either. for a given date range R1 that don't have a value. You can't specify a range for a document that doesn't have a value! I think you're asking for the documents that *satisfy my query* that don't have a date value. In which case Chris' suggestion to use a pure-negative will give you what you want. You can specify arbitrary facet.query clauses along with your facet.range stuff, they're just treated as separate facets. Just tack it on your query and it'll come back in a separate section of the response. Best Erick On Thu, Dec 22, 2011 at 3:45 AM, kenneth hansen kenh...@hotmail.co.uk wrote: yes, I see that my question was a bit confusing. But thanks for your answers. I will try to clarify a bit. I query on a date field, validToDate. The value for this field is not present for 99% of the documents. What I would like to get is 1) the number of documents for a given date range R1 that do not have a value for the validToDate, i.e. the 99% of the documents 2) the number of documents for a given date range R2 that do have a value for the validToDate My question is really: is it possible to have just one query, or do I need to have two queries; one for 1) and one for 2). Will the facet.range.other=all help me in any way here? /k Date: Thu, 15 Dec 2011 12:25:12 -0800 From: hossman_luc...@fucit.org To: solr-user@lucene.apache.org Subject: Re: Faceting with null dates First of all, we need to clarify some terminology here: there is no such thing as a null date in solr -- or for that matter, there is no such thing as a full value in any field. documents either have some value(s) for a field, or they do not hvae any values. If you want to constrain your query to only documents that have a value in a field, you can use something like fq=field_name:[* TO *] ... if you want to constraint your query to only documents that do *NOT* have a value in a field, you can use fq=-field_name:[* TO *] Now, having said that, like Erick, i'm a little confused by your question -- it's not clear if what you really want to do is: a) change the set of documents returned in the main result list b) change the set of documents considered when generating facet counts (w/o changing the main result list) c) return an additional count of documents that are in the main result list, but are not in the facet counts because they do not have the field being faceted on. My best guess is that you are asking about c based on your last sentence... : get is 3 results and 7 non-null validToDate facets. And as I write this, : I start to wonder if this is possible at all as the facets are dependent : on the result set and that this might be better to handle in the : application layer by just extracting 10-7=3... ...subtracting the sum of all constraint counts from your range facet from the total number of documents found won't neccessarily tell you the number of documents that have no value in the field you are faceting on -- because documents may have values out side the range of your start/end. Depending on what exactly it is you are looking for, you might find the facet.range.other=all param useful, as it will return things like the between counts (summing up all the docs between start-end) as well as the before and after counts. But if you really just want to know how many docs have no value for my validToDate field? you can get that very explicitly and easily using facet.query=-validToDate:[* TO *] : codestr name=facettrue/strstr : name=f.validToDate.facet.range.startNOW/DAYS-4MONTHS/strstr : name=facet.mincount1/strstr name=q(*:*)/strarr : name=facet.rangestrvalidToDate/str/arrstr : name=facet.range.endNOW/DAY+1DAY/strstr : name=facet.range.gap+1MONTH/str/code : : result name=response numFound=10 start=0lst : name=facet_countslst name=facet_ranges lst name=validToDate : lst name=counts int name=2011-11-14T00:00:00Z7/int -Hoss
Re: Solr - Mutivalue field search on different elements
positionIncrementGap is only really relevant for phrase searches. For non-phrase searches you can effectively ignore it. The problem here is what you mean by consecutive element. In your original example, if you mean that searching for michael singer should NOT match, then you want to use phrase searches with a positionIncrementGap as Tanguy says and no slop. Maybe this will help. The purpose of setting positionIncrementGap to non-zero is to do the opposite of what you want. Say each entry in a multiValued field is a sentence and you do NOT want a search that contains words in two different sentences to match. Say further that your sentences will never be longer than 100 words. Setting positionIncrementGap to 100 and using phrases for all your searches like this word search~100 would guarantee that no match would occur for a document in which one sentence contained word and another sentence contained search, but documents *would* match where single sentence contained both words. Best Erick On Thu, Dec 22, 2011 at 1:17 AM, meghana meghana.rav...@amultek.com wrote: Hi Tanguy, Thanks for your reply.. this is really useful. but i have one questions on that. my multivalued field is not just simple text. it has values like below str1s:[This is very nice day.]/str str3s:[Christmas is about come and christmas]/str str4s:[preparation is just on ]/str now i if i search with christmas preparation , then this should match. if i set positionIncrementGap to 0 then do it will match? Or how value of positionIncrementGap behave on my search? Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Mutivalue-field-search-on-different-elements-tp3604213p3605938.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.5 | Highlighting
Le 21/12/2011 23:49, Koji Sekiguchi a écrit : (11/12/21 22:28), Tanguy Moal wrote: Dear all, [...] I tried using both legacy highlighter and FVH but the same issue occurs. The issue only triggers when relying on hl.q. Thank you very much for any help, -- Tanguy Tanguy, Thank you for reporting this! The issue only triggers when relying on hl.q. That is not good. Can you reproduce the problem on Solr example environment? If we can share same environment (solrconfig.xml and schema.xml), request params to reproduce and data, I'd like to look into it. koji Koji, First, thank you for your quick reply. Indeed isolating the issue was the key to resolve it! Once isolated in the distribution's example directory I couldn't reproduce the issue (and achieved the expected behaviour). I then started to look at my setup a little closer and realized that I wasn't using the same solr distribution on my master server (solr 3.4) and on my slave server (solr 3.5 with brand new hl.q parameter). Since it isn't a recommended setup, I'll simply assume that the error was on my side. Sorry for false alerting :-D New highlighter is great! -- Tanguy
Re: Solr - Mutivalue field search on different elements
Thanks Erik . i seen that , how it work with slop after making few operations :) . so i am happy with this now. but still i have one issue , when i do search i also need to show highlighting on that field, setting positionIncrementGap to 0, and then when i make phrase search . it does not return me highlighting on that words of phrase. can i handle this by doing some configuration changes? Thanks Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Mutivalue-field-search-on-different-elements-tp3604213p3606597.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Update schema.xml using solrj APIs
Thanks everyone! That was very helpful. -Ahmed On Thu, Dec 22, 2011 at 5:15 AM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Hi Ahmed, if you have a multi core setup, you could change the file programmatically (e.g. via XML parser), copy the new file to the existing one (programmatically, of course), then reload the core. I haven't reloaded the core programmatically, yet, but that should be doable via SolrJ. Or - if you are not using Java, then call the specific core admin URL in your programme. You will have to re-index after changing the schema.xml. Chantal On Thu, 2011-12-22 at 04:34 +0100, Otis Gospodnetic wrote: Ahmed, At this point in time - no. You need to edit it manually and restart Solr to see the changed. This will change in the future. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Ahmed Abdeen Hamed ahmed.elma...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, December 21, 2011 4:12 PM Subject: Update schema.xml using solrj APIs Hello friend, I am new to Solrj and I am wondering if there is a away you can update the schema.xml file via the APIs. I would appreciate any help. Thanks very much, -Ahmed
Re: Hardware resource indication
On Thu, Dec 22, 2011 at 7:02 AM, Zoran | Bax-shop.nl zoran.bi...@bax-shop.nl wrote: Hello, What are (ballpark figure) the hardware requirement (diskspace, memory) SOLR will use i this case: * Heavy Dutch traffic webshop, 30.000 - 50.000 visitors a day Unique users doesn't much matter. * Visitors relying heavily on the search engine of the site o 3.000.000 - 5.000.000 searches a day This is what matters. Assume 20,000 seconds per day (less than the real number by 4x, but allows for peak rates). That gives about 250 queries / second. Is this rate growing? * Around 20.000 products to be indexed. In an XML this is around 22 MB in size o Around 100-200 products that will need reindexing everyday because of copyrighters This is small enough to not much matter. * About 20 fields to be indexed per document (product) * Using many features of SOLR o Boosting queries o Faceted search (price ranges, categories, in stock, etc.) o Spellchecker o Suggester (completion) o Phonectic search Just make sure that you are serving search results from memory, not disk. The current index directory is around 20 MB, but that's my testing environment. On my testing server indexing the 20K documents took under 10 seconds. Nice. I tried to be as comprehensive as possible with these specs. Hopefully it's enough to make an estimation. So the next step is to build a test rig and see how many queries per second each server will handle. Since your index is small, this should be pretty easy. The required rate of 250 queries/s should be pretty easy to achieve. Nothing will substitute for a real test here. You should make sure you have staging / spare hardware and room to grow if necessary.
Re: Exception using SolrJ
On 12/21/2011 9:43 AM, Chantal Ackermann wrote: Hi Shawn, maybe the requests that fail have a certain pattern - for example that they are longer than all the others. The query for the exception I sent is shown in the pastebin. Here is the query and for reference, the pastebin URL: did:(286861384 OR 286861312 OR 286861313 OR 284220972) http://pastebin.com/XnB83Jay This is a typical query for the failures. This field (did) is a tlong with a precisionStep of 16. There are about 11 million documents in the index referenced, total size about 20GB. Most often it has been a search query like this that has failed, though sometimes it is the actual deletebyQuery that immediately follows this, or it is an attempt to add documents which comes after that. Something to add: Solr's log, running at INFO, does not show these requests that fail, and does not log an exception. These requests do not pass through a load balancer. I do use haproxy on port 8983 for queries made by our website, but the SolrJ application talks to Solr directly on port 8981. I can't say whether the request shows up in the jetty log, because everything is using POST. Thanks, Shawn
Re: solr.home
On 12/21/2011 4:13 AM, Thomas Fischer wrote: I'm trying to move forward with my solr system from 1.4 to 3.5 and ran into some problems with solr home. Is this a known problem? My solr 1.4 gives me the following messages (amongst many many others…) in catalina.out: INFO: No /solr/home in JNDI INFO: using system property solr.solr.home: '/srv/solr' INFO: looking for solr.xml: /'/srv/solr'/solr.xml then finds the solr.xml and proceeds from there (this is multicore). With solr 3.5 I get: INFO: No /solr/home in JNDI INFO: using system property solr.solr.home: '/srv/solr' INFO: Solr home set to ''/srv/solr'/' INFO: Solr home set to ''/srv/solr'/./' SCHWERWIEGEND: java.lang.RuntimeException: Can't find resource '' in classpath or ''/srv/solr'/./conf/', cwd=/ After that solr is somehow started but not aware of the cores present. This can be solved by putting a solr.xml file into $CATALINA_HOME/conf/Catalina/localhost/ with Environment name=solr/home type=java.lang.String value=/srv/solr override=true / which results in INFO: Using JNDI solr.home: /srv/solr and everything seems to run smoothely afterwards, although solr.xml is never mentioned. I would like to know when this changed and why, and why solr 3.5 is looking for solrconfig.xml instead of solr.xml in solr.home (Am I the only one who finds it confusing to have the three names solr.solr.home (system property), solr.home (JNDI), solr/home (Environment name) for the same object?) Here's what I have as a commandline option when starting Jetty: -Dsolr.solr.home=/index/solr This is what my log from Solr 3.5.0 says at the very beginning. Dec 14, 2011 8:42:28 AM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) Dec 14, 2011 8:42:28 AM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: using system property solr.solr.home: /index/solr Dec 14, 2011 8:42:28 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/index/solr/' Note that in my log it shows the system property without any kind of quotes, but in yours, it is surrounded - '/srv/solr'. I am guessing that wherever you are defining solr.solr.home, you have included those quotes, and that removing them would probably fix the problem. If this is indeed the problem, the newer version is probably interpreting input values much more literally, the old version probably ran the final path value through a parser that took care of removing the quotes for you, but that parser also removed certain characters that some users actually needed. Notice that the quotes are interspersed in the full solr.xml path in your 1.4 log. Thanks, Shawn
Re: Hardware resource indication
Hi Zoran, These numbers are all pretty small, so you will be fine even with a pair of average servers - it looks like everything will fit in RAM even if you have only 2 GB of it. 245 QPS is not trivial, but with everything in RAM I believe even on modest hardware you will be just fine. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Zoran | Bax-shop.nl zoran.bi...@bax-shop.nl To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, December 22, 2011 10:02 AM Subject: Hardware resource indication Hello, What are (ballpark figure) the hardware requirement (diskspace, memory) SOLR will use i this case: * Heavy Dutch traffic webshop, 30.000 - 50.000 visitors a day * Visitors relying heavily on the search engine of the site o 3.000.000 - 5.000.000 searches a day * Around 20.000 products to be indexed. In an XML this is around 22 MB in size o Around 100-200 products that will need reindexing everyday because of copyrighters * About 20 fields to be indexed per document (product) * Using many features of SOLR o Boosting queries o Faceted search (price ranges, categories, in stock, etc.) o Spellchecker o Suggester (completion) o Phonectic search o ... The current index directory is around 20 MB, but that's my testing environment. On my testing server indexing the 20K documents took under 10 seconds. I tried to be as comprehensive as possible with these specs. Hopefully it's enough to make an estimation. Thanks, ZB
Re: How to apply relevant Stemmer to each document
Not really. And it's hard to make sense of how this would work in practice because stemming the document (even if you could) because that's only half the battle. How would querying work then? No matter what language you used for your stemming, it would be wrong for all the documents that used a different stemmer (or a stemmer based on a different language). So I wouldn't hold out too much hope here. Best Erick On Wed, Dec 21, 2011 at 4:09 PM, alx...@aim.com wrote: Hello, I would like to know if in the latest version of solr is it possible to apply relevant stemmer to each doc depending on its lang field. I searched solr-user mailing lists and fount this thread http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-td3235341.html but not sure if it was developed into a jira ticket. Thanks. Alex.
Re: solr-xslt question
You're probably hitting the default limit on a field. This is set in solrconfig.xml, the maxFieldLength element. The first thing I'd try is upping that to, say, 1000 reindex and see if that fixes your problem. This is the number of *tokens*, not characters. Roughly the number of words... Searching for the common word is probably a complete red herring. Best Erick On Wed, Dec 21, 2011 at 4:36 PM, Bent Jensen bentjen...@yahoo.com wrote: Being new to xml/xslt/solr, I am hoping someone can explain/help me with the following: Using Apache-Solr 3.4.0 . I have a php page for submitting the search, and display the result in html. I indexed a 1.5MB size pdf document (400 pages). Using the admin interface with *:* query everything is returned. I then try using' highlighting' in the query, and modified the xsl file to return the highlighting. It works fine for the text in the beginning of the document. I can also query with a phrase between and it returns the exact match. When searching content approx. beyond the first 100 pages, I see this behavior: I must include common words in a phrase to get a result returned. For example if I search using the word handymen, that only appears in one place towards the end of the document, nothing is returned, but if I add a common word that appears in the sentence where handymen is; e.g. 'handymen that', then both are returned in the highlighting including many other occurrences of 'that'. If I query with handymen that, nothing is returned. thanks Ben _ No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1890 / Virus Database: 2109/4694 - Release Date: 12/21/11
Re: How to apply relevant Stemmer to each document
Hi Erick, Why querying would be wrong? It is my understanding that if I have let say 3 docs and each of them has been indexed with its own language stemmer, then sending a query will search all docs and return matching results? Let say if a query is driving and one of the docs has drive and was stemmed by English Stemmer, then it would return 1 result as opposed if I had applied to all docs Russian lang stemmer and resuilt be 0 docs? Am I missing something? Thanks. Alex. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Dec 22, 2011 11:06 am Subject: Re: How to apply relevant Stemmer to each document Not really. And it's hard to make sense of how this would work in practice because stemming the document (even if you could) because that's only half the battle. How would querying work then? No matter what language you used for your stemming, it would be wrong for all the documents that used a different stemmer (or a stemmer based on a different language). So I wouldn't hold out too much hope here. Best Erick On Wed, Dec 21, 2011 at 4:09 PM, alx...@aim.com wrote: Hello, I would like to know if in the latest version of solr is it possible to apply relevant stemmer to each doc depending on its lang field. I searched solr-user mailing lists and fount this thread http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-td3235341.html but not sure if it was developed into a jira ticket. Thanks. Alex.
Exception in Solr server on more like this
I've been trying to get More like this running under solr 3.5. I get the Exception below. The http request is also highlighted below. I've looked at the FieldType code and I don't understand what's going on there. So, while I know what a null pointer exception means, it isn't telling me what I did or didn't do. FYI - the Body field has termVectors set to true which I thought was sufficient for MLT. What I'm trying to do is submit the phrase country now is the time country to MLT to determine the interesting words (which I want returned) and then return the top most relevant documents. Any help on what might be wrong would be appreciated. Scott 6975 [main] INFO com.mainstreamdata.MediasIndexer.mediasBrowser.SearchFactory - SearchFactory:SearchFactory: Search Factory initialized SolrQuery:: (country now is the time country) Filter:: (Language:en) 15274 [main] ERROR com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch - SolrSearch:getDocTier: Unable to do search: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getTier(SolrSearch.java:309) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getFirstTier(SolrSearch.java:93) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getNextOlderTier(SolrSearch.java:175) at com.mainstreamdata.MediasIndexer.SolrMgrTest.testMoreLikeThis(SolrMgrTest.java:209) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at junit.framework.TestCase.runTest(TestCase.java:164) at junit.framework.TestCase.runBare(TestCase.java:130) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:120) at junit.framework.TestSuite.runTest(TestSuite.java:230) at junit.framework.TestSuite.run(TestSuite.java:225) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: org.apache.solr.common.SolrException: null java.lang.NullPointerException at org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:374) at org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:320) at org.apache.solr.handler.component.MoreLikeThisComponent.getMoreLikeThese(MoreLikeThisComponent.java:82) at org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:57) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at
Re: Replication not working
On 12/22/2011 4:39 AM, Dean Pullen wrote: Yeh the drop index via the URL command doesn't help anyway - when rebuilding the index the timestamp is obviously ahead of master (as the slave is being created now) so the replication will still not happen. If you deleted the index and create the core anew, index version will be 0 and replication will work.
Sort facets by defined custom Collator
Hi, Is it possible to sort fields or facets using custom Collator ? I found only one solution for fields solr.CollationKeyFilterFactory filter but there are some problems with this solution. First problem is, the solution doesn't work with facet sorting. Second problem is that it is additional field so index size is increased. My solution for facets is to develop own facet component with Collator parameter. For other non facet fields I need implement my own field type with Collator. Problem is that solr Facet component isn't easy extensible because it has private embedded classes. Regards, Mariusz Dubielecki
RE: Exception in Solr server on more like this
This turned out to be SOLR-2986. -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Thursday, December 22, 2011 1:24 PM To: solr-user@lucene.apache.org Subject: Exception in Solr server on more like this I've been trying to get More like this running under solr 3.5. I get the Exception below. The http request is also highlighted below. I've looked at the FieldType code and I don't understand what's going on there. So, while I know what a null pointer exception means, it isn't telling me what I did or didn't do. FYI - the Body field has termVectors set to true which I thought was sufficient for MLT. What I'm trying to do is submit the phrase country now is the time country to MLT to determine the interesting words (which I want returned) and then return the top most relevant documents. Any help on what might be wrong would be appreciated. Scott 6975 [main] INFO com.mainstreamdata.MediasIndexer.mediasBrowser.SearchFactory - SearchFactory:SearchFactory: Search Factory initialized SolrQuery:: (country now is the time country) Filter:: (Language:en) 15274 [main] ERROR com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch - SolrSearch:getDocTier: Unable to do search: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getTier(SolrSearch.java:309) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getFirstTier(SolrSearch.java:93) at com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getNextOlderTier(SolrSearch.java:175) at com.mainstreamdata.MediasIndexer.SolrMgrTest.testMoreLikeThis(SolrMgrTest.java:209) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at junit.framework.TestCase.runTest(TestCase.java:164) at junit.framework.TestCase.runBare(TestCase.java:130) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:120) at junit.framework.TestSuite.runTest(TestSuite.java:230) at junit.framework.TestSuite.run(TestSuite.java:225) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: org.apache.solr.common.SolrException: null java.lang.NullPointerException at org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:374) at org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:320) at org.apache.solr.handler.component.MoreLikeThisComponent.getMoreLikeThese(MoreLikeThisComponent.java:82) at org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:57) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
Re: How to apply relevant Stemmer to each document
Sure, but what about inappropriate stemming in one language that happens to match something in another? In general, putting multiple languages into a single field usually only makes sense when the overwhelming number of documents are in one language... Best Erick On Thu, Dec 22, 2011 at 2:41 PM, alx...@aim.com wrote: Hi Erick, Why querying would be wrong? It is my understanding that if I have let say 3 docs and each of them has been indexed with its own language stemmer, then sending a query will search all docs and return matching results? Let say if a query is driving and one of the docs has drive and was stemmed by English Stemmer, then it would return 1 result as opposed if I had applied to all docs Russian lang stemmer and resuilt be 0 docs? Am I missing something? Thanks. Alex. -Original Message- From: Erick Erickson erickerick...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Thu, Dec 22, 2011 11:06 am Subject: Re: How to apply relevant Stemmer to each document Not really. And it's hard to make sense of how this would work in practice because stemming the document (even if you could) because that's only half the battle. How would querying work then? No matter what language you used for your stemming, it would be wrong for all the documents that used a different stemmer (or a stemmer based on a different language). So I wouldn't hold out too much hope here. Best Erick On Wed, Dec 21, 2011 at 4:09 PM, alx...@aim.com wrote: Hello, I would like to know if in the latest version of solr is it possible to apply relevant stemmer to each doc depending on its lang field. I searched solr-user mailing lists and fount this thread http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-td3235341.html but not sure if it was developed into a jira ticket. Thanks. Alex.
Re: SolrCloud Cores
thanks, that's what I had thought. Wasn't sure if there was a benefit either way. On Fri, Dec 16, 2011 at 3:29 PM, Mark Miller markrmil...@gmail.com wrote: On Fri, Dec 16, 2011 at 8:14 AM, Jamie Johnson jej2...@gmail.com wrote: What is the most appropriate way to configure Solr when deploying in a cloud environment? Should the core name on all instances be the collection name or is it more appropriate that each shard be a separate core, or should each solr instance be a separate core (i.e. master1, master1-replica are 2 separate cores)? At this point, its probably best/easiest to name them after the collection. -- - Mark http://www.lucidimagination.com
solr-xslt question
Being new to xml/xslt/solr, I am hoping someone can explain/help me with the following: Using Apache-Solr 3.4.0 . I have a php page for submitting the search, and display the result in html. I indexed a 1.5MB size pdf document (400 pages). Using the admin interface with *:* query everything is returned. I then try using' highlighting' in the query, and modified the xsl file to return the highlighting. It works fine for the text in the beginning of the document. I can also query with a phrase between and it returns the exact match. When searching content approx. beyond the first 100 pages, I see this behavior: I must include common words in a phrase to get a result returned. For example if I search using the word handymen, that only appears in one place towards the end of the document, nothing is returned, but if I add a common word that appears in the sentence where handymen is; e.g. 'handymen that', then both are returned in the highlighting including many other occurrences of 'that'. If I query with handymen that, nothing is returned. thanks Ben _ No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1890 / Virus Database: 2109/4694 - Release Date: 12/21/11 - No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1890 / Virus Database: 2109/4694 - Release Date: 12/21/11 - No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1901 / Virus Database: 2109/4696 - Release Date: 12/22/11 - No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1901 / Virus Database: 2109/4696 - Release Date: 12/22/11
Re: Profiling Solr
Hi Jean, I am also looking into Profiling Solr and wanted to check with you whether you were able to use YourKit successfully for Solr Profiling and were you able to find out the bottleneck with your situation. Can you share how you were able to find out the performance bottleneck and fix the issue, as I am also facing a similar issue with Solr performance. -Shyam -- View this message in context: http://lucene.472066.n3.nabble.com/Profiling-Solr-tp473500p3607890.html Sent from the Solr - User mailing list archive at Nabble.com.
about partial index update
hi all, I have a object like this public class Link { private long id; private string url; // other 20 property private int vote; //summary of vote, for sorting used } so when I index document, my Lucene document also contain all field from my Link object. e.g doc_id = 1 url = solr.org vote = 23 because the vote is change frequently then other property, every time the vote is increase, I need to reindex the whole document so I can use vote field for sorting. is that anyway just to index partial Lucene document (in this case, only vote field) instead of index again whole document ? any performance concern if is index whole doc (I think is not look like a good solution). any idea ? kiwi
RE: Profiling Solr
Hi, Can someone suggest me on performing Solr Profiling. We are seeing performance issues and using the debug flag it seems the highlighting component is causing the overhead. I wanted to find out what is causing the overhead in highlighting for certain queries. I assume IO or CPU is causing this highlighting overhead and wanted to find more details using Solr Profiling. Let me know for any suggestions. -Shyam -Original Message- From: shyam bhaskaran [mailto:shyam.bhaska...@gmail.com] Sent: Friday, December 23, 2011 6:26 AM To: solr-user@lucene.apache.org Subject: Re: Profiling Solr Hi Jean, I am also looking into Profiling Solr and wanted to check with you whether you were able to use YourKit successfully for Solr Profiling and were you able to find out the bottleneck with your situation. Can you share how you were able to find out the performance bottleneck and fix the issue, as I am also facing a similar issue with Solr performance. -Shyam -- View this message in context: http://lucene.472066.n3.nabble.com/Profiling-Solr-tp473500p3607890.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Profiling Solr
On Fri, Dec 23, 2011 at 11:36 AM, Shyam Bhaskaran shyam.bhaska...@synopsys.com wrote: Hi, Can someone suggest me on performing Solr Profiling. Have you looked at JMX: http://wiki.apache.org/solr/SolrJmx ? Regards, Gora