Re: Solr is NoSQL database or not?
On 01/03/2014 23:53, Jack Krupansky wrote: NoSQL? To me it's just a marketing term, like Big Data. +1 Depends very much who you talk to. Marketing folks like to ride the current wave, so if NoSQL is current, they'll jump on that one, likewise Big Data. Technical types like to be correct in their definitions :) C -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Solr Shard Query From Inside Search Component Sometimes Gives Wrong Results
Hi, I am using Solr 4.6 and doing Solr query on shard from inside Solr search component and try to use the obtained results for my custom logic. I have used a Solrj for doing distributed search, but the result coming from this distributed search vary some time. So the my questions are, 1. Can we do distributed search from Solr Search component. 2. Do we need to handle concurrency between Solr server by using synchronization or other technique. Is there a way to make a distributed search in the Solr Search Component and get the matched documents from all the shards? If anyone have Idea then help me. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Shard-Query-From-Inside-Search-Component-Sometimes-Gives-Wrong-Results-tp4120840.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR cloud disaster recovery
On Fri, Feb 28, 2014 at 7:50 PM, Per Steffensen st...@designware.dk wrote: I might be able to find something for you. Which version are you using - I have some scripts that work on 4.0 and some other scripts that work for 4.4 (and maybe later). This sounds useful. I am using 4.6.1. Kind regards Jan
Re: Slow query time on stemmed fields
Sorry for the delay, I did not have access to the server and could not query anything. This is my Query: http://server:port /solr/core/select?q=keyword1+keyword2wt=xmlindent=truehl.fragsize=120f.file_URI_tokenized.hl.fragsize=1000spellcheck=truef.file_content.hl.alternateField=spellhl.simple.pre=%3Cb%3Ehl.fl=file_URI_tokenized,xmp_title,file_contenthl=truerows=10fl=file_URI,file_URI_tokenized,file_name,file_lastModification,file_lastModification_raw,xmp_creation_date,xmp_title,xmp_content_type,score,file_URI,host,xmp_manual_summaryhl.snippets=1hl.useFastVectorHighlighter=truehl.maxAlternateFieldLength=120start=0q=itdz+berlinhl.simple.post=%3C/b%3Efq=file_readright:%22wiki-access%22debugQuery=truedefType=edismaxqf=file_URI_tokenized^10.0+file_content^10.0+xmp_title^5.0+spell^0.001pf=file_URI_tokenized~2^1.0+file_content~100^2.0+xmp_title~2^1.0 newly extended testing showed that the normal QTime without a search on the spell field is expected to be about 713 while it turns out to be at 70503 with the stemming parameter included like in the url above. Therefor its just 100x slower at the moment. Here comes the debug: lst name=debug str name=rawquerystringkeyword1 keyword2/str str name=querystringkeyword1 keyword2/str str name=parsedquery(+((DisjunctionMaxQuery((file_URI_tokenized:keyword1^10.0 | xmp_title:keyword1^5.0 | spell:keyword1^0.0010 | file_content:keyword1^10.0)) DisjunctionMaxQuery((file_URI_tokenized:keyword2^10.0 | xmp_title:keyword2^5.0 | spell:keyword2^0.0010 | file_content:keyword2^10.0)))~2) DisjunctionMaxQuery((file_URI_tokenized:keyword1 keyword2~2)) DisjunctionMaxQuery((file_content:keyword1 keyword2~100^2.0)) DisjunctionMaxQuery((xmp_title:keyword1 keyword2~2)))/no_coord/str str name=parsedquery_toString+(((file_URI_tokenized:keyword1^10.0 | xmp_title:keyword1^5.0 | spell:keyword1^0.0010 | file_content:keyword1^10.0) (file_URI_tokenized:keyword2^10.0 | xmp_title:keyword2^5.0 | spell:keyword2^0.0010 | file_content:keyword2^10.0))~2) (file_URI_tokenized:keyword1 keyword2~2) (file_content:keyword1 keyword2~100^2.0) (xmp_title:keyword1 keyword2~2)/str lst name=explain str name=.../str str name=.../str str name=.../str str name=.../str str name=.../str str name=.../str str name=.../str str name=.../str str name=.../str str name=... 0.035045296 = (MATCH) sum of: 0.035045296 = (MATCH) sum of: 0.0318122 = (MATCH) max of: 8.29798E-4 = (MATCH) weight(spell:keyword1^0.0010 in 71660) [DefaultSimilarity], result of: 8.29798E-4 = score(doc=71660,freq=2.0 = termFreq=2.0 ), product of: 6.7839865E-5 = queryWeight, product of: 0.0010 = boost 8.64913 = idf(docFreq=618, maxDocs=1299169) 0.0078435475 = queryNorm 12.231716 = fieldWeight in 71660, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 8.64913 = idf(docFreq=618, maxDocs=1299169) 1.0 = fieldNorm(doc=71660) 0.0318122 = (MATCH) weight(file_content:keyword1^10.0 in 71660) [DefaultSimilarity], result of: 0.0318122 = score(doc=71660,freq=2.0 = termFreq=2.0 ), product of: 0.6720717 = queryWeight, product of: 10.0 = boost 8.568466 = idf(docFreq=670, maxDocs=1299169) 0.0078435475 = queryNorm 0.047334533 = fieldWeight in 71660, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 8.568466 = idf(docFreq=670, maxDocs=1299169) 0.00390625 = fieldNorm(doc=71660) 0.003233097 = (MATCH) max of: 0.003233097 = (MATCH) weight(file_content:keyword2^10.0 in 71660) [DefaultSimilarity], result of: 0.003233097 = score(doc=71660,freq=1.0 = termFreq=1.0 ), product of: 0.25479192 = queryWeight, product of: 10.0 = boost 3.2484267 = idf(docFreq=137146, maxDocs=1299169) 0.0078435475 = queryNorm 0.012689167 = fieldWeight in 71660, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.2484267 = idf(docFreq=137146, maxDocs=1299169) 0.00390625 = fieldNorm(doc=71660) /str /lst str name=QParserExtendedDismaxQParser/str null name=altquerystring/ null name=boost_queries/ arr name=parsed_boost_queries/ null name=boostfuncs/ arr name=filter_queries strfile_readright:wiki-access/str/arr arr name=parsed_filter_queriesstrfile_readright:wiki-access/str/arr lst name=timing double name=time66359.0/double lst name=prepare/lst lst name=process double name=time66357.0/double lst name=query double name=time80.0/double/lst lst name=facet double name=time0.0/double/lst lst name=mlt double name=time0.0/double/lst lst name=highlight double name=time65981.0/double/lst lst name=stats double name=time0.0/double/lst lst name=spellcheck double name=time38.0/double/lst lst name=debug double name=time258.0/double/lst /lst /lst Why does the Highlighting take up this mutch time? is it a problem
Re: Solr Shard Query From Inside Search Component Sometimes Gives Wrong Results
What was the query you are making? What is the sort order for the query? Are you sure you are not indexing data in between making these requests? Are you able to reproduce this outside of your search component? It is hard to answer questions about custom code without actually looking at the code. On Mon, Mar 3, 2014 at 3:37 PM, Vishnu Mishra vdil...@gmail.com wrote: Hi, I am using Solr 4.6 and doing Solr query on shard from inside Solr search component and try to use the obtained results for my custom logic. I have used a Solrj for doing distributed search, but the result coming from this distributed search vary some time. So the my questions are, 1. Can we do distributed search from Solr Search component. 2. Do we need to handle concurrency between Solr server by using synchronization or other technique. Is there a way to make a distributed search in the Solr Search Component and get the matched documents from all the shards? If anyone have Idea then help me. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Shard-Query-From-Inside-Search-Component-Sometimes-Gives-Wrong-Results-tp4120840.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Solr Heap, MMaps and Garbage Collection
On 3/3/2014 1:54 AM, KNitin wrote: 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings) As others have pointed out, this is really unusual for Solr. We often see high permgen in our app servers due to dynamic class loading that the framework performs; maybe you are somehow loading lots of new Solr plugins, or otherwise creating lots of classes? Of course if you have a plugin or something that does a lot of string interning, that could also be an explanation. -Mike
Solution for reverse order of year facets?
If I understand the docs right, it is only possible to sort facets by count or value in ascending order. Both variants are not very helpful for year facets if I want the most recent years at the top (or appear at all if I restrict the number of facet entries). It looks like a requirement that was articulated repeatedly and the recommended solution seems to be to do some math like 1 - year and index that. So far so good. Only problem is that I have many data sources and I would like to avoid to change every connector to include the new field. I think a better solution would be to have a custom TokenFilterFactory that does it. Since it seems a common request, did someone already build such a TokenFilterFactory? If not, do you think I could build one myself? I do some (script-)programming but have no experience with Java, so I think I could adapt an example. Are there any guides out there? Or even better, is there a built-in solution I haven't heard of? -Michael
Re: Solr Permgen Exceptions when creating/removing cores
Josh, You've mentioned a couple of times that you've got PermGen set to 512M but then you say you're running with -XX:MaxPermSize=64M. These two statements are contradictory so are you *sure* that you're running with 512M of PermGen? Assuming your on a *nix box can you provide `ps` output proving this? Thanks, Greg On Feb 28, 2014, at 5:22 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; You can also check here: http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled Thanks; Furkan KAMACI 2014-02-26 22:35 GMT+02:00 Josh jwda...@gmail.com: Thanks Timothy, I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to cause the error to happen more quickly. With this option on it didn't seemed to do any intermittent garbage collecting that delayed the issue in with it off. I was already using a max of 512MB, and I can reproduce it with it set this high or even higher. Right now because of how we have this implemented just increasing it to something high just delays the problem :/ Anything else you could suggest I would really appreciate. On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter tim.pot...@lucidworks.com wrote: Hi Josh, Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM versions, permgen collection was disabled by default. Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M may be too small. Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com From: Josh jwda...@gmail.com Sent: Wednesday, February 26, 2014 12:27 PM To: solr-user@lucene.apache.org Subject: Solr Permgen Exceptions when creating/removing cores We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows installation with 64bit Java 1.7U51 and we are seeing consistent issues with PermGen exceptions. We have the permgen configured to be 512MB. Bitnami ships with a 32bit version of Java for windows and we are replacing it with a 64bit version. Passed in Java Options: -XX:MaxPermSize=64M -Xms3072M -Xmx6144M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 This is our use case: We have what we call a database core which remains fairly static and contains the imported contents of a table from SQL server. We then have user cores which contain the record ids of results from a text search outside of Solr. We then query for the data we want from the database core and limit the results to the content of the user core. This allows us to combine facet data from Solr with the search results from another engine. We are creating the user cores on demand and removing them when the user logs out. Our issue is the constant creation and removal of user cores combined with the constant importing seems to push us over our PermGen limit. The user cores are removed at the end of every session and as a test I made an application that would loop creating the user core, import a set of data to it, query the database core using it as a limiter and then remove the user core. My expectation was in this scenario that all the permgen associated with that user cores would be freed upon it's unload and allow permgen to reclaim that memory during a garbage collection. This was not the case, it would constantly go up until the application would exhaust the memory. I also investigated whether the there was a connection between the two cores left behind because I was joining them together in a query but even unloading the database core after unloading all the user cores won't prevent the limit from being hit or any memory to be garbage collected from Solr. Is this a known issue with creating and unloading a large number of cores? Could it be configuration based for the core? Is there something other than unloading that needs to happen to free the references? Thanks Notes: I've tried using tools to determine if it's a leak within Solr such as Plumbr and my activities turned up nothing.
Re: Solr 4.5.0 replication numDocs larger in slave
I just ran into an issue similar to this that effected document scores on distributed searches. You might try doing an optimize and purging your deleted documents while no indexing is being done then checking your counts. Once I optimized all my indexes the document counts on all of my cores matched up and scoring was consistent. Thanks, Greg On Feb 28, 2014, at 8:22 PM, Erick Erickson erickerick...@gmail.com wrote: That really shouldn't be happening IF indexing is shut off. Otherwise the slave is taking a snapshot of the master index and synching. bq: The slave has about 33 more documents and one fewer segements (according to Overview in solr admin Sounds like the master is still indexing and you've deleted documents on the master. Best, Erick On Fri, Feb 28, 2014 at 11:08 AM, Geary, Frank frank.ge...@zoominfo.comwrote: Hi, I'm using Solr 4.5.0, I have a single master replicating to a single slave. Only the master is being indexed to - never the slave. The master is committed once each night. After the first commit and replication the numDoc counts are identical. After the next nightly commit and after the second replication a few minutes later, the numDocs has increased in both the master and the slave as expected, but numDocs is not the same in the master as it is in the slave. The slave has about 33 more documents and one fewer segements (according to Overview in solr admin). I suspect the numDocs may be in sync again after tonight, but can anyone explain what is going on here? Is it possible a few deletions got committed to the master but not replicated to the slave? Thanks Frank
Re: Solr Permgen Exceptions when creating/removing cores
It's a windows installation using a bitnami solr installer. I incorrectly put 64M into the configuration for this, as I had copied the test configuration I was using to recreate the permgen issue we were seeing on our production system (that is configured to 512M) as it takes awhile with to recreate the issue with larger permgen values. In the test scenario there was a small 180 document data core that's static with 8 dynamic user cores that are used to index the unique document ids in the users view, which is then merged into a single user core. The final user core contains the same number of document ids as the data core and the data core is queried against with the ids in the final merged user core as the limiter. The user cores are then unloaded, and deleted from the drive and then the test is reran again with the user cores re-created We are also using the core discovery mode to store/find our cores and the database data core is using dynamic fields with a mix of single value and multi value fields. The user cores use a static configuration. The data is indexed from SQL Server using jtDS for both the user and data cores. As a note we also reversed the test case I mention above where we keep the user cores static and dynamically create the database core and this created the same issue only it leaked faster. We assumed this because the configuration was larger/loaded more classes then the simpler user core. When I get the time I'm going to put together a SolrJ test app to recreate the issue outside of our environment to see if others see the same issue we're seeing to rule out any kind of configuration problem. Right now we're interacting with solr with POCO via the restful interface and it's not very easy for us to spin this off into something someone else could use. In the mean time we've made changes to make the user cores more static, this has slowed down the build up of permgen to something that can be managed by a weekly reset. Sorry about the confusion in my initial email and I appreciate the response. Anything about my configuration that you can think might be useful just let me know and I can provide it. We have a work around, but it really hampers what our long term goals were for our Solr implementation. Thanks Josh On Mon, Mar 3, 2014 at 9:57 AM, Greg Walters greg.walt...@answers.comwrote: Josh, You've mentioned a couple of times that you've got PermGen set to 512M but then you say you're running with -XX:MaxPermSize=64M. These two statements are contradictory so are you *sure* that you're running with 512M of PermGen? Assuming your on a *nix box can you provide `ps` output proving this? Thanks, Greg On Feb 28, 2014, at 5:22 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; You can also check here: http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled Thanks; Furkan KAMACI 2014-02-26 22:35 GMT+02:00 Josh jwda...@gmail.com: Thanks Timothy, I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to cause the error to happen more quickly. With this option on it didn't seemed to do any intermittent garbage collecting that delayed the issue in with it off. I was already using a max of 512MB, and I can reproduce it with it set this high or even higher. Right now because of how we have this implemented just increasing it to something high just delays the problem :/ Anything else you could suggest I would really appreciate. On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter tim.pot...@lucidworks.com wrote: Hi Josh, Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM versions, permgen collection was disabled by default. Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M may be too small. Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com From: Josh jwda...@gmail.com Sent: Wednesday, February 26, 2014 12:27 PM To: solr-user@lucene.apache.org Subject: Solr Permgen Exceptions when creating/removing cores We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows installation with 64bit Java 1.7U51 and we are seeing consistent issues with PermGen exceptions. We have the permgen configured to be 512MB. Bitnami ships with a 32bit version of Java for windows and we are replacing it with a 64bit version. Passed in Java Options: -XX:MaxPermSize=64M -Xms3072M -Xmx6144M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 This is our use case: We have what we call a database core which remains fairly static and contains the imported contents of a table from SQL server. We then have user cores which contain the record ids of results from a text search outside of Solr. We then query for the data we want from
Re: Solution for reverse order of year facets?
Hi, Currently there are two storing criteria available. However sort by index - to return the constraints sorted in their index order (lexicographic by indexed term) - should return most recent year at top, no? Ahmet On Monday, March 3, 2014 4:36 PM, Michael Lackhoff mich...@lackhoff.de wrote: If I understand the docs right, it is only possible to sort facets by count or value in ascending order. Both variants are not very helpful for year facets if I want the most recent years at the top (or appear at all if I restrict the number of facet entries). It looks like a requirement that was articulated repeatedly and the recommended solution seems to be to do some math like 1 - year and index that. So far so good. Only problem is that I have many data sources and I would like to avoid to change every connector to include the new field. I think a better solution would be to have a custom TokenFilterFactory that does it. Since it seems a common request, did someone already build such a TokenFilterFactory? If not, do you think I could build one myself? I do some (script-)programming but have no experience with Java, so I think I could adapt an example. Are there any guides out there? Or even better, is there a built-in solution I haven't heard of? -Michael
Multiple partial match
Hi Guys, Faced with a problem: make query to SOLR *name:co*^5* It returns me two docs with equal score: {id: 1, name: 'Coca-Cola Company'}, {id: 2, name: Microsoft Corporation}. How can I boost Coca-Cola Company because it contains more partial matches ? P.S. All normalization used by TF-IDF engine disabled. -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-partial-match-tp4120886.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solution for reverse order of year facets?
On 03.03.2014 16:33 Ahmet Arslan wrote: Currently there are two storing criteria available. However sort by index - to return the constraints sorted in their index order (lexicographic by indexed term) - should return most recent year at top, no? No, it returns them -- as you say -- in lexicographic order and that means oldest first, like: 1815 1820 ... 2012 2013 (might well stop before we get here) 2014 -Michael
Re: Solr is NoSQL database or not?
Hi; I said that: What are the main differences between ElasticSearch and Solr that makes ElasticSearc a NoSQL store but not Solr. because it is just a marketing term as Jack indicated after me. Also I said: The first link you provided includes ElasticSearch: http://en.wikipedia.org/wiki/NoSQL as a Document Store I mean if you can add Solr to the wikipedia page but it is not a reference. Because these are all marketin terms as like Big Data. You should remember the definition of Big Data: Data that is much more than you can process with traditional methods so it is not an exactly defined definition. One can say Big Data for something but one can not. It is similar to NoSQL. Thanks; Furkan KAMACI 2014-03-03 11:28 GMT+02:00 Charlie Hull char...@flax.co.uk: On 01/03/2014 23:53, Jack Krupansky wrote: NoSQL? To me it's just a marketing term, like Big Data. +1 Depends very much who you talk to. Marketing folks like to ride the current wave, so if NoSQL is current, they'll jump on that one, likewise Big Data. Technical types like to be correct in their definitions :) C -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
RE: Solr 4.5.0 replication numDocs larger in slave
Thanks Erick. Indexing is not happening to the slave since it has never been set up there - there aren't even any commits happening on the slave (which we normally do via cron job). But Indexing is definitely happening to the master at the time replication happens. Sounds like the master is still indexing and you've deleted documents on the master.: Yes, that's exactly what I suspect is happening. But if that's true, I'd like to understand how those deletes could find there way into being replicated to the slave when the only commit happening on the master was presumably completed before the replication. Do deletes get committed in some special way outside of an explicit commit? Or do they get copied over to the slave as part of the replication and therefore effectively get committed to the slave before they are committed to the master? My replication is configured to replicate after commit and after startup. The slave polls the master every 10 minutes. The master commits only once a day. Presumably the only time the number of documents changes is at the end of the commit. Then once the commit is done I'd expect replication to begin. So in order to end up with a different numDocs in the slave there would need to be some sort of commit happening during the replication, right? Frank -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, February 28, 2014 9:22 PM To: solr-user@lucene.apache.org Subject: Re: Solr 4.5.0 replication numDocs larger in slave That really shouldn't be happening IF indexing is shut off. Otherwise the slave is taking a snapshot of the master index and synching. bq: The slave has about 33 more documents and one fewer segements (according to Overview in solr admin Sounds like the master is still indexing and you've deleted documents on the master. Best, Erick On Fri, Feb 28, 2014 at 11:08 AM, Geary, Frank frank.ge...@zoominfo.comwrote: Hi, I'm using Solr 4.5.0, I have a single master replicating to a single slave. Only the master is being indexed to - never the slave. The master is committed once each night. After the first commit and replication the numDoc counts are identical. After the next nightly commit and after the second replication a few minutes later, the numDocs has increased in both the master and the slave as expected, but numDocs is not the same in the master as it is in the slave. The slave has about 33 more documents and one fewer segements (according to Overview in solr admin). I suspect the numDocs may be in sync again after tonight, but can anyone explain what is going on here? Is it possible a few deletions got committed to the master but not replicated to the slave? Thanks Frank
Re: SolrCloud: heartbeat succeeding while node has failing SSD?
Thanks, Mark! The supervised process sounds very promising but complicated to get right. E.g. where does the supervisor run, where do nodes report their status to, are the checks active or passive, etc. Having each node perform a regular background self-check and remove itself from the cluster if that healthcheck doesn't pass seems like a great first step, though. The most common failure we've seen has been disk failure and a self-check should usually detect that. (JIRA: https://issues.apache.org/jira/browse/SOLR-5805) It would also be nice, as a cluster operator, to have an easy way to remove a failing node from the cluster. Ideally, right from the Solr UI, but even from a command-line script would be great. In the cases of disk failure, we can often not SSH into a node to shut down the VM that's still connected to ZooKeeper. We have to physically power it down. Having something quicker would be great. (JIRA: https://issues.apache.org/jira/browse/SOLR-5806) On Sun, Mar 2, 2014 at 9:36 PM, Mark Miller markrmil...@gmail.com wrote: The heartbeat that keeps the node alive is the connection it maintains with ZooKeeper. We don't currently have anything built in that will actively make sure each node can serve queries and remove it from clusterstatem.json if it cannot. If a replica is maintaining it's connection with ZooKeeper and in most cases, if it is accepting updates, it will appear up. Load balancing should handle the failures, but I guess it depends on how sticky the request fails are. In the past, I've seen this handled on a different search engine by having a variety of external agent scripts that would occasionally attempt to do a query, and if things did not go right, it killed the process to cause it to try and startup again (supervised process). I'm not sure what the right long term feature for Solr is here, but feel free to start a JIRA issue around it. One simple improvement might even be a background thread that periodically checks some local readings and depending on the results, pulls itself out of the mix as best it can (remove itself from clusterstate.json or simply closes it's zk conneciton). - Mark http://about.me/markrmiller On Mar 2, 2014, at 3:42 PM, Gregg Donovan gregg...@gmail.com wrote: We had a brief SolrCloud outage this weekend when a node's SSD began to fail but the node still appeared to be up to the rest of the SolrCloud cluster (i.e. still green in clusterstate.json). Distributed queries that reached this node would fail but whatever heartbeat keeps the node in the clustrstate.json must have continued to succeed. We eventually had to power the node down to get it to be removed from clusterstate.json. This is our first foray into SolrCloud, so I'm still somewhat fuzzy on what the default heartbeat mechanism is and how we may augment it to be sure that the disk is checked as part of the heartbeat and/or we verify that it can serve queries. Any pointers would be appreciated. Thanks! --Gregg
Configuration problem
Hello, for some reason I have problems to get my local solr system to run (MacBook, tomcat 6.0.35). The setting is solr directories (I use different solr versions at the same time): /srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the new discovery type (no cores), and inside the core directories are empty files core.properties and symbolic links to the universal conf directory. solr webapps (I use very different webapps simultaneously): /srv/www/webapps/solr/solr4.6.1 is the solr webapp I tried to convey this information to the tomcat server by putting a file solr4.6.1.xml into the cataiina/localhost folder with the contents ?xml version=1.0 encoding=utf-8? Context docBase=/srv/www/webapps/solr/solr4.6.1 debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/srv/solr/solr4.6.1 override=true/ /Context The Tomcat Manager shows solr4.6.1 as started, but following the given link gives an error with the message: SolrCore 'collection1' is not available due to init failure: Could not load config file /srv/solr4.6.1/collection1/solrconfig.xml which is plausible, since 1. there is no folder /srv/solr4.6.1/collection1 and 2.for the actual cores solrconfig.xml is inside of /srv/solr4.6.1/cores/geo/conf/ But why does Tomcat try to find a solrconfig.xml there? The problem persists if I start tomcat with -Dsolr.solr.home=/srv/solr/solr4.6.1, it seems that the system just ignores the solr home setting. Can somebody give me a hint what I'm doing wrong? Best regards Thomas P.S.: Is there a way to stop Tomcat from throwing these errors into my face threefold: once as heading (h1!), once as message and once as description?
Re: Solr is NoSQL database or not?
For the record, I am +1 for somebody to add Solr to the NoSQL wikipedia page, in much the same way that Elasticsearch is already there. From a LucidWorks webinar blurb: The long awaited Solr 4 release brings a large amount of new functionality that blurs the line between search engines and NoSQL databases. Now you can have your cake and search it too with Atomic updates, Versioning and Optimistic Concurrency, Durability, and Real-time Get! Learn about new Solr NoSQL features and implementation details of how the distributed indexing of Solr Cloud was designed from the ground up to accommodate them. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, March 3, 2014 10:58 AM To: solr-user@lucene.apache.org Subject: Re: Solr is NoSQL database or not? Hi; I said that: What are the main differences between ElasticSearch and Solr that makes ElasticSearc a NoSQL store but not Solr. because it is just a marketing term as Jack indicated after me. Also I said: The first link you provided includes ElasticSearch: http://en.wikipedia.org/wiki/NoSQL as a Document Store I mean if you can add Solr to the wikipedia page but it is not a reference. Because these are all marketin terms as like Big Data. You should remember the definition of Big Data: Data that is much more than you can process with traditional methods so it is not an exactly defined definition. One can say Big Data for something but one can not. It is similar to NoSQL. Thanks; Furkan KAMACI 2014-03-03 11:28 GMT+02:00 Charlie Hull char...@flax.co.uk: On 01/03/2014 23:53, Jack Krupansky wrote: NoSQL? To me it's just a marketing term, like Big Data. +1 Depends very much who you talk to. Marketing folks like to ride the current wave, so if NoSQL is current, they'll jump on that one, likewise Big Data. Technical types like to be correct in their definitions :) C -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: Fetching uniqueKey and other int quickly from documentCache?
Yonik, That's a very clever idea. Unfortunately, I think that will skip the distributed query optimization we were hoping to take advantage of in SOLR-1880 [1], but it should work with the proposed distrib.singlePass optimization in SOLR-5768 [2]. Does that sound right? --Gregg [1] https://issues.apache.org/jira/browse/SOLR-1880 [2] https://issues.apache.org/jira/browse/SOLR-5768 On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley yo...@heliosearch.com wrote: You could try forcing things to go through function queries (via pseudo-fields): fl=field(id), field(myfield) If you're not requesting any stored fields, that *might* currently skip that step. -Yonik http://heliosearch.org - native off-heap filters and fieldcache for solr On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan gregg...@gmail.com wrote: We fetch a large number of documents -- 1000+ -- for each search. Each request fetches only the uniqueKey or the uniqueKey plus one secondary integer key. Despite this, we find that we spent a sizable amount of time in SolrIndexSearcher#doc(int docId, SetString fields). Time is spent fetching the two stored fields, LZ4 decoding, etc. I would love to be able to tell Solr to always fetch these two fields from memory. We have them both in the fieldCache so we're already spending the RAM. I've seen this asked previously [1], so it seems like a fairly common need, especially for distributed search. Any ideas? A few possible ideas I had: --Check FieldCache.html#getCacheEntries() before going to stored fields. --Give the documentCache config a list of fields it should load from the fieldCache Having an in-memory mapping from docId-uniqueKey has come up for us before. We've used a custom SolrCache maintaining that mapping to quickly filter over personalized collections. Maybe the uniqueKey should be more optimized out of the box? Perhaps a custom uniqueKey codec that also maintained the docId-uniqueKey mapping in memory? --Gregg [1] http://search-lucene.com/m/oCUKJ1heHUU1
Solr Filter Cache Size
How can we calculate how much heap memory the filter cache will consume? We understand that in order to determine a good size we also need to evaluate how many filterqueries would be used over a certain time period. Here's our setting: filterCache class=solr.FastLRUCache size=30 initialSize=30 autowarmCount=5/ According to the post below, 53 GB of RAM would be needed just by the filter cache alone with 1.4 million Docs. Not sure if this true and how this would work. Reference: http://stackoverflow.com/questions/2004/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem We filled the filterquery cache with Solr Meter and had a JVM Heap Size of far less than 53 GB. Can anyone chime in and enlighten us? Thank you! Ben Wiens Benjamin Mosior
Re: Solr Permgen Exceptions when creating/removing cores
Hey Josh,I am not an expert in Java performance, but I would start withdumping a the heapand investigatewith visualvm (the free tool that comes with JDK).In my experience, the most common cause for PermGen exception is the app createstoo manyinterned strings.Solr (actually Lucene) interns the field names so if you havetoo many fields, it might be the cause. How many fields in total across cores did youcreate before the exception?Can you reproduce the problem with the standard Solr? Is the bitnami distribution justSolr or do they have some other libraries?Hope this helps,TriOn Mar 03, 2014, at 07:28 AM, Josh jwda...@gmail.com wrote:It's a windows installation using a bitnami solr installer. I incorrectly put 64M into the configuration for this, as I had copied the test configuration I was using to recreate the permgen issue we were seeing on our production system (that is configured to 512M) as it takes awhile with to recreate the issue with larger permgen values. In the test scenario there was a small 180 document data core that's static with 8 dynamic user cores that are used to index the unique document ids in the users view, which is then merged into a single user core. The final user core contains the same number of document ids as the data core and the data core is queried against with the ids in the final merged user core as the limiter. The user cores are then unloaded, and deleted from the drive and then the test is reran again with the user cores re-created We are also using the core discovery mode to store/find our cores and the database data core is using dynamic fields with a mix of single value and multi value fields. The user cores use a static configuration. The data is indexed from SQL Server using jtDS for both the user and data cores. As a note we also reversed the test case I mention above where we keep the user cores static and dynamically create the database core and this created the same issue only it leaked faster. We assumed this because the configuration was larger/loaded more classes then the simpler user core. When I get the time I'm going to put together a SolrJ test app to recreate the issue outside of our environment to see if others see the same issue we're seeing to rule out any kind of configuration problem. Right now we're interacting with solr with POCO via the restful interface and it's not very easy for us to spin this off into something someone else could use. In the mean time we've made changes to make the user cores more static, this has slowed down the build up of permgen to something that can be managed by a weekly reset. Sorry about the confusion in my initial email and I appreciate the response. Anything about my configuration that you can think might be useful just let me know and I can provide it. We have a work around, but it really hampers what our long term goals were for our Solr implementation. Thanks Josh On Mon, Mar 3, 2014 at 9:57 AM, Greg Walters greg.walt...@answers.comwrote: Josh,You've mentioned a couple of times that you've got PermGen set to 512M butthen you say you're running with -XX:MaxPermSize=64M. These two statementsare contradictory so are you *sure* that you're running with 512M ofPermGen? Assuming your on a *nix box can you provide `ps` output provingthis?Thanks,GregOn Feb 28, 2014, at 5:22 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; You can also check here:http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled Thanks; Furkan KAMACI 2014-02-26 22:35 GMT+02:00 Josh jwda...@gmail.com: Thanks Timothy, I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to causethe error to happen more quickly. With this option on it didn't seemed to do any intermittent garbage collecting that delayed the issue in with itoff. I was already using a max of 512MB, and I can reproduce it with it setthis high or even higher. Right now because of how we have this implementedjust increasing it to something high just delays the problem :/ Anything else you could suggest I would really appreciate. On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter tim.pot...@lucidworks.com wrote: Hi Josh, Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM versions, permgen collection was disabled by default. Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64Mmay be too small. Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com From: Josh jwda...@gmail.com Sent: Wednesday, February 26, 2014 12:27 PM To: solr-user@lucene.apache.org Subject: Solr Permgen Exceptions when creating/removing cores We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows installation with 64bit Java 1.7U51 and we are seeing consistent issues with PermGen exceptions. We have the permgen configured to be 512MB. Bitnami ships with a 32bit version of Java for windows and we are replacing it with a 64bit version. Passed in Java Options: -XX:MaxPermSize=64M
Re: Multiple partial match
Add a function query boost that uses the term frequency, tf: bf=tf(name,'co') -- additive boost boost=tf(name,'co') -- multiplicative boost That does of course require that term frequency is not disabled for that field in the schema. You can multiply the term frequency as well in the function query. boost=product(tf(name,'co'),10) -- Jack Krupansky -Original Message- From: Zwer Sent: Monday, March 3, 2014 10:34 AM To: solr-user@lucene.apache.org Subject: Multiple partial match Hi Guys, Faced with a problem: make query to SOLR *name:co*^5* It returns me two docs with equal score: {id: 1, name: 'Coca-Cola Company'}, {id: 2, name: Microsoft Corporation}. How can I boost Coca-Cola Company because it contains more partial matches ? P.S. All normalization used by TF-IDF engine disabled. -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-partial-match-tp4120886.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fetching uniqueKey and other int quickly from documentCache?
On Mon, Mar 3, 2014 at 11:14 AM, Gregg Donovan gregg...@gmail.com wrote: Yonik, That's a very clever idea. Unfortunately, I think that will skip the distributed query optimization we were hoping to take advantage of in SOLR-1880 [1], but it should work with the proposed distrib.singlePass optimization in SOLR-5768 [2]. Does that sound right? Yep, the two together should do the trick. -Yonik http://heliosearch.org - native off-heap filters and fieldcache for solr --Gregg [1] https://issues.apache.org/jira/browse/SOLR-1880 [2] https://issues.apache.org/jira/browse/SOLR-5768 On Wed, Feb 26, 2014 at 8:53 PM, Yonik Seeley yo...@heliosearch.com wrote: You could try forcing things to go through function queries (via pseudo-fields): fl=field(id), field(myfield) If you're not requesting any stored fields, that *might* currently skip that step. -Yonik http://heliosearch.org - native off-heap filters and fieldcache for solr On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan gregg...@gmail.com wrote: We fetch a large number of documents -- 1000+ -- for each search. Each request fetches only the uniqueKey or the uniqueKey plus one secondary integer key. Despite this, we find that we spent a sizable amount of time in SolrIndexSearcher#doc(int docId, SetString fields). Time is spent fetching the two stored fields, LZ4 decoding, etc. I would love to be able to tell Solr to always fetch these two fields from memory. We have them both in the fieldCache so we're already spending the RAM. I've seen this asked previously [1], so it seems like a fairly common need, especially for distributed search. Any ideas? A few possible ideas I had: --Check FieldCache.html#getCacheEntries() before going to stored fields. --Give the documentCache config a list of fields it should load from the fieldCache Having an in-memory mapping from docId-uniqueKey has come up for us before. We've used a custom SolrCache maintaining that mapping to quickly filter over personalized collections. Maybe the uniqueKey should be more optimized out of the box? Perhaps a custom uniqueKey codec that also maintained the docId-uniqueKey mapping in memory? --Gregg [1] http://search-lucene.com/m/oCUKJ1heHUU1
Re: Multiple partial match
AFAICS tf(name, 'co') returns 0 on the {id:1, name:'Coca-Cola Company'} because it does not support partial match. tf(name, 'company') will return 1 -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-partial-match-tp4120886p4120919.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Permgen Exceptions when creating/removing cores
In the user core there are two fields, the database core in question was 40, but in production environments the database core is dynamic. My time has been pretty crazy trying to get this out the door and we haven't tried a standard solr install yet but it's on my plate for the test app and I don't know enough about Solr/Bitnami to know if they've done any serious modifications to it. I had tried doing a dump from VisualVM previously but it didn't seem to give me anything useful but then again I didn't know how to look for interned strings. This is something I can take another look at in the coming weeks when I do my test case against a standard solr install with SolrJ. The exception with user cores happens after 80'ish runs, so 640'ish user cores with the PermGen set to 64MB. The database core test was far lower, it was in the 10-15 range. As a note once the permgen limit is hit, if we simply restart the service with the same number of cores loaded the permgen usage is minimal even with the amount of user cores being high in our production environment (500-600). If this does end up being the interning of strings, is there anyway it can be mitigated? Our production environment for our heavier users would see in the range of 3200+ user cores created a day. Thanks for the help. Josh On Mon, Mar 3, 2014 at 11:24 AM, Tri Cao tm...@me.com wrote: Hey Josh, I am not an expert in Java performance, but I would start with dumping a the heap and investigate with visualvm (the free tool that comes with JDK). In my experience, the most common cause for PermGen exception is the app creates too many interned strings. Solr (actually Lucene) interns the field names so if you have too many fields, it might be the cause. How many fields in total across cores did you create before the exception? Can you reproduce the problem with the standard Solr? Is the bitnami distribution just Solr or do they have some other libraries? Hope this helps, Tri On Mar 03, 2014, at 07:28 AM, Josh jwda...@gmail.com wrote: It's a windows installation using a bitnami solr installer. I incorrectly put 64M into the configuration for this, as I had copied the test configuration I was using to recreate the permgen issue we were seeing on our production system (that is configured to 512M) as it takes awhile with to recreate the issue with larger permgen values. In the test scenario there was a small 180 document data core that's static with 8 dynamic user cores that are used to index the unique document ids in the users view, which is then merged into a single user core. The final user core contains the same number of document ids as the data core and the data core is queried against with the ids in the final merged user core as the limiter. The user cores are then unloaded, and deleted from the drive and then the test is reran again with the user cores re-created We are also using the core discovery mode to store/find our cores and the database data core is using dynamic fields with a mix of single value and multi value fields. The user cores use a static configuration. The data is indexed from SQL Server using jtDS for both the user and data cores. As a note we also reversed the test case I mention above where we keep the user cores static and dynamically create the database core and this created the same issue only it leaked faster. We assumed this because the configuration was larger/loaded more classes then the simpler user core. When I get the time I'm going to put together a SolrJ test app to recreate the issue outside of our environment to see if others see the same issue we're seeing to rule out any kind of configuration problem. Right now we're interacting with solr with POCO via the restful interface and it's not very easy for us to spin this off into something someone else could use. In the mean time we've made changes to make the user cores more static, this has slowed down the build up of permgen to something that can be managed by a weekly reset. Sorry about the confusion in my initial email and I appreciate the response. Anything about my configuration that you can think might be useful just let me know and I can provide it. We have a work around, but it really hampers what our long term goals were for our Solr implementation. Thanks Josh On Mon, Mar 3, 2014 at 9:57 AM, Greg Walters greg.walt...@answers.com wrote: Josh, You've mentioned a couple of times that you've got PermGen set to 512M but then you say you're running with -XX:MaxPermSize=64M. These two statements are contradictory so are you *sure* that you're running with 512M of PermGen? Assuming your on a *nix box can you provide `ps` output proving this? Thanks, Greg On Feb 28, 2014, at 5:22 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; You can also check here: http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled Thanks;
RE: Solr 4.5.0 replication numDocs larger in slave
Thanks Greg. We optimize the master once a week (early in the day Sunday) and we do not do a commit Sunday evening (the only evening of the week when we do not commit). So now after optimization/replication the master/slave pair that were out on sync on Friday now have the same numDocs (and every other value on the Overview page agrees except size under Replication where it shows the slave is smaller). Unfortunately, a different master/slave pair now have different numDocs after the optimize and replication done yesterday. For the newly out of sync master/slave pair, the Version (Under Statistics on the Overview page) is 4 revisions earlier on the slave than on the master and there are two fewer segments on the slave than there are on the master. Under Replication on the Overview page, the Versions and Gen's are all the same, but the size of the slave is smaller than the master. The slave has 51 fewer documents than the master. But indexing is continuing on the master (but no commit has happened since the optimization early Sunday.) I wonder if this is related to the NRT functionality in some way. I see Impl: org.apache.solr.core.NRTCachingDirectoryFactory on the Overview page. I've been trying to rely on default behavior whenever possible. But perhaps I need to turn something off? Frank -Original Message- From: Greg Walters [mailto:greg.walt...@answers.com] Sent: Monday, March 03, 2014 10:00 AM To: solr-user@lucene.apache.org Subject: Re: Solr 4.5.0 replication numDocs larger in slave I just ran into an issue similar to this that effected document scores on distributed searches. You might try doing an optimize and purging your deleted documents while no indexing is being done then checking your counts. Once I optimized all my indexes the document counts on all of my cores matched up and scoring was consistent. Thanks, Greg On Feb 28, 2014, at 8:22 PM, Erick Erickson erickerick...@gmail.com wrote: That really shouldn't be happening IF indexing is shut off. Otherwise the slave is taking a snapshot of the master index and synching. bq: The slave has about 33 more documents and one fewer segements (according to Overview in solr admin Sounds like the master is still indexing and you've deleted documents on the master. Best, Erick On Fri, Feb 28, 2014 at 11:08 AM, Geary, Frank frank.ge...@zoominfo.comwrote: Hi, I'm using Solr 4.5.0, I have a single master replicating to a single slave. Only the master is being indexed to - never the slave. The master is committed once each night. After the first commit and replication the numDoc counts are identical. After the next nightly commit and after the second replication a few minutes later, the numDocs has increased in both the master and the slave as expected, but numDocs is not the same in the master as it is in the slave. The slave has about 33 more documents and one fewer segements (according to Overview in solr admin). I suspect the numDocs may be in sync again after tonight, but can anyone explain what is going on here? Is it possible a few deletions got committed to the master but not replicated to the slave? Thanks Frank
Re: Solr Permgen Exceptions when creating/removing cores
If it's really the interned strings, you could try upgrade JDK, as the newer HotSpotJVM puts interned strings in regular heap:http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html(search for String.intern() in that release)I haven't got a chance to look into the new core auto discovery code, so I don't knowif it's implemented with reflection or not. Reflection and dynamic class loading is anothersource of PermGen exception, in my experience.I don't see anything wrong with your JVM config, which is very much standard.Hope this helps,TriOn Mar 03, 2014, at 08:52 AM, Josh jwda...@gmail.com wrote:In the user core there are two fields, the database core in question was 40, but in production environments the database core is dynamic. My time has been pretty crazy trying to get this out the door and we haven't tried a standard solr install yet but it's on my plate for the test app and I don't know enough about Solr/Bitnami to know if they've done any serious modifications to it. I had tried doing a dump from VisualVM previously but it didn't seem to give me anything useful but then again I didn't know how to look for interned strings. This is something I can take another look at in the coming weeks when I do my test case against a standard solr install with SolrJ. The exception with user cores happens after 80'ish runs, so 640'ish user cores with the PermGen set to 64MB. The database core test was far lower, it was in the 10-15 range. As a note once the permgen limit is hit, if we simply restart the service with the same number of cores loaded the permgen usage is minimal even with the amount of user cores being high in our production environment (500-600). If this does end up being the interning of strings, is there anyway it can be mitigated? Our production environment for our heavier users would see in the range of 3200+ user cores created a day. Thanks for the help. Josh On Mon, Mar 3, 2014 at 11:24 AM, Tri Cao tm...@me.com wrote: Hey Josh,I am not an expert in Java performance, but I would start with dumping athe heapand investigate with visualvm (the free tool that comes with JDK).In my experience, the most common cause for PermGen exception is the appcreatestoo many interned strings. Solr (actually Lucene) interns the field namesso if you havetoo many fields, it might be the cause. How many fields in total acrosscores did youcreate before the exception?Can you reproduce the problem with the standard Solr? Is the bitnamidistribution justSolr or do they have some other libraries?Hope this helps,TriOn Mar 03, 2014, at 07:28 AM, Josh jwda...@gmail.com wrote:It's a windows installation using a bitnami solr installer. I incorrectlyput 64M into the configuration for this, as I had copied the testconfiguration I was using to recreate the permgen issue we were seeing onour production system (that is configured to 512M) as it takes awhile withto recreate the issue with larger permgen values. In the test scenariothere was a small 180 document data core that's static with 8 dynamic usercores that are used to index the unique document ids in the users view,which is then merged into a single user core. The final user core containsthe same number of document ids as the data core and the data core isqueried against with the ids in the final merged user core as the limiter.The user cores are then unloaded, and deleted from the drive and then thetest is reran again with the user cores re-createdWe are also using the core discovery mode to store/find our cores and thedatabase data core is using dynamic fields with a mix of single value andmulti value fields. The user cores use a static configuration. The data isindexed from SQL Server using jtDS for both the user and data cores. As anote we also reversed the test case I mention above where we keep the usercores static and dynamically create the database core and this created thesame issue only it leaked faster. We assumed this because the configurationwas larger/loaded more classes then the simpler user core.When I get the time I'm going to put together a SolrJ test app to recreatethe issue outside of our environment to see if others see the same issuewe're seeing to rule out any kind of configuration problem. Right now we'reinteracting with solr with POCO via the restful interface and it's not veryeasy for us to spin this off into something someone else could use. In themean time we've made changes to make the user cores more static, this hasslowed down the build up of permgen to something that can be managed by aweekly reset.Sorry about the confusion in my initial email and I appreciate theresponse. Anything about my configuration that you can think might beuseful just let me know and I can provide it. We have a work around, but itreally hampers what our long term goals were for our Solr implementation.ThanksJoshOn Mon, Mar 3, 2014 at 9:57 AM, Greg Walters greg.walt...@answers.comwrote:Josh,You've mentioned a couple of times that you've got PermGen
RE: How to best handle search like Dave David
Thanks, Arun for sharing the idea on EdgeNGramFilter. In our case we are doing search using automated process so may EdgeNGramFilter may not work. Wwe have used NGramFilterFactory in the past but will look into it again. For cases like Dave David and other English names does anyone has idea which stemmer (currently using PorterStemFilterFactory) can work better? -Original Message- From: Arun Rangarajan [mailto:arunrangara...@gmail.com] Sent: Sunday, March 02, 2014 1:47 PM To: solr-user@lucene.apache.org Subject: Re: How to best handle search like Dave David If you are trying to serve results as users are typing, then you can use EdgeNGramFilter (see https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory ). Let's say you configure your field like this, as shown in the Solr wiki: fieldType name=text_general_edge_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ /analyzer /fieldType Then this is what happens at index time for your tokens: David --- | LowerCaseTokenizerFactory | --- david --- | EdgeNGramFilterFactory | --- da dav davi david Dave --- | LowerCaseTokenizerFactory | --- dave --- | EdgeNGramFilterFactory | --- da dav dave And at query time, when your user enters 'Dav' it will match both those tokens. Note that the moment your user starts typing more, say 'davi' it won't match 'Dave' since you are doing edge N gramming only at index time and not at query time. You can also do edge N gramming at query time if you want 'Dave' to match 'David', probably keeping a larger minGramSize (in this case 3) to avoid noise (like say 'Dave' matching 'Dana' though with a lower score), but it will be expensive to do n-gramming at query time. On Fri, Feb 28, 2014 at 3:22 PM, Susheel Kumar susheel.ku...@thedigitalgroup.net wrote: Hi, We have name searches on Solr for millions of documents. User may search like Morrison Dave or other may search like Morrison David. What's the best way to handle that both brings similar results. Adding Synonym is the option we are using right. But we may need to add around such 50,000+ synonyms for different names for each specific name there can be couple of synonyms like for Richard, it can be Rich, Rick, Richie etc. Any experience adding so many synonyms or any other thoughts? Stemming may help in few situations but not like Dave and David. Thanks, Susheel
RegexTransformer and xpath in DataImportHandler
Good afternoon, I have this DIH: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource type=URLDataSource / document entity name=blogFeed pk=id url=https://redacted/; processor=XPathEntityProcessor forEach=/rss/channel/item transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field column=id xpath=/rss/channel/item/id / field column=link xpath=/rss/channel/item/link / field column=blogtitle xpath=/rss/channel/item/title / field column=short_blogtitle xpath=/rss/channel/item/title / field column=short_blogtitle regex=^(.{250})([^\.]*\.)(.*)$ replaceWith=$1 sourceColName=blogtitle / field column=pubdateiso xpath=/rss/channel/item/pubDate dateTimeFormat=-MM-dd / field column=category xpath=/rss/channel/item/category / field column=author xpath=/rss/channel/item/author / field column=authoremail xpath=/rss/channel/item/authoremail / field column=content xpath=/rss/channel/item/content / field column=summary xpath=/rss/channel/item/summary / field column=index_category template=ConnectionsBlogs/ /entity /document /dataConfig I can't seem to populate BOTH blogtitle and short_blogtitle with the same xpath. I can only do one or the other; why can't I put the same xpath in 2 different fields? I removed the short_blogtitle (with the xpath statement) and left in the regex statement and blogtitle gets populated and short_blogtitle goes to my update.chain (to the auto complete index) but the field itself is blank in this index. If I leave the dih as above, then blogtitle doesn't get populated but short_blogtitle does. What am I doing wrong here? Is there a way to populate both? And I CANNOT use copyfield here because then the update.chain won't work Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/RegexTransformer-and-xpath-in-DataImportHandler-tp4120946.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facets, termvectors, relevancy and Multi word tokenizing
Hi guys, I'm on my way to solve it properly. This is how my field looks like now: fieldType name=text_en class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=(#)|(%23) replacement=79f20724d6985c5b857d2fa06a3ff8c6/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=(((?i)((european parliament)|(parlament europeenne)))|(EP)|(PE)) replacement=0ee062d61f44ae0a2aee145076ca6a69european_parliament/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.StopFilterFactory words=blacklist.txt ignoreCase=true/ filter class=solr.StopFilterFactory words=en ignoreCase=true/ filter class=solr.HunspellStemFilterFactory dictionary=en_GB.dic affix=en_GB.aff ignoreCase=true / filter class=solr.PatternReplaceFilterFactory pattern=0ee062d61f44ae0a2aee145076ca6a69european_parliament replacement=european parliament replace=all / filter class=solr.PatternReplaceFilterFactory pattern=79f20724d6985c5b857d2fa06a3ff8c6 replacement=# replace=all / /analyzer I still have one case where I'm facing issues because in fact I want to preserve the #: - #European Parliament is translated into one token instead of two: #European and Parliament... anyway, I have some ideas on how to do it. Ill let you know whatss the final solution -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4120948.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Elevation and core create
HI Erick, Thanks for the response. On the wiki it states config-file Path to the file that defines query elevation. This file must exist in $instanceDir/conf/config-file or$dataDir/config-file. If the file exists in the /conf/ directory it will be loaded once at startup. If it exists in the data directory, it will be reloaded for each IndexReader. Which is the elevate.xml. So looks like I will go down the custom coding route. Regards, David Stuart M +44(0) 778 854 2157 T +44(0) 845 519 5465 www.axistwelve.com Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK AXIS12 - Enterprise Web Solutions Reg Company No. 7215135 VAT No. 997 4801 60 This e-mail is strictly confidential and intended solely for the ordinary user of the e-mail account to which it is addressed. If you have received this e-mail in error please inform Axis12 immediately by return e-mail or telephone. We advise that in keeping with good computing practice the recipient of this e-mail should ensure that it is virus free. We do not accept any responsibility for any loss or damage that may arise from the use of this email or its contents. On 2 Mar 2014, at 18:07, Erick Erickson erickerick...@gmail.com wrote: Hmmm, you _ought_ to be able to specify a relative path in str name=confFilessolrconfig_slave.xml:solrconfig.xml,x.xml,y.xml/str But there's certainly the chance that this is hard-coded in the query elevation component so I can't say that this'll work with assurance. Best, Erick On Sun, Mar 2, 2014 at 6:14 AM, David Stuart d...@axistwelve.com wrote: Hi sorry for the cross post but I got no response in the dev group so assumed I posted in the wrong place. I am using Solr 3.6 and am trying to automate the deployment of cores with a custom elevate file. It is proving to be difficult as most of the file (schema, stop words etc) support absolute path elevate seems to need to be in either a conf directory as a sibling to data or in the data directory itself. I am able to achieve my goal by having a secondary process that places the file but thought I would as the group just in case I have missed the obvious. Should I move to Solr 4 is it fixed here? I could also go down the root of extending the SolrCore create function to accept additional params and move the file into the defined data directory. Ideas? Thanks for your help David Stuart M +44(0) 778 854 2157 T +44(0) 845 519 5465 www.axistwelve.com Axis12 Ltd | The Ivories | 6/18 Northampton Street, London | N1 2HY | UK AXIS12 - Enterprise Web Solutions Reg Company No. 7215135 VAT No. 997 4801 60 This e-mail is strictly confidential and intended solely for the ordinary user of the e-mail account to which it is addressed. If you have received this e-mail in error please inform Axis12 immediately by return e-mail or telephone. We advise that in keeping with good computing practice the recipient of this e-mail should ensure that it is virus free. We do not accept any responsibility for any loss or damage that may arise from the use of this email or its contents.
Re: range types in SOLR
The main reference for this approach is here: http://wiki.apache.org/solr/SpatialForTimeDurations Hoss’s illustrations he developed for the meetup presentation are great. However, there are bugs in the instruction — specifically it’s important to slightly buffer the query and choose an appropriate maxDistErr. Also, it’s more preferable to use the rectangle range query style of spatial query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using “Intersects(minX minY maxX maxY)”. There’s no technical difference but the latter is deprecated and will eventually be removed from Solr 5 / trunk. All this said, recognize this is a bit of a hack (one that works well). There is a good chance a more ideal implementation approach is going to be developed this year. ~ David On 3/1/14, 2:54 PM, Shawn Heisey s...@elyograg.org wrote: On 3/1/2014 11:41 AM, Thomas Scheffler wrote: Am 01.03.14 18:24, schrieb Erick Erickson: I'm not clear what you're really after here. Solr certainly supports ranges, things like time:[* TO date_spec] or date_field:[date_spec TO date_spec] etc. There's also a really creative use of spatial (of all things) to, say answer questions involving multiple dates per record. Imagine, for instance, employees with different hours on different days. You can use spatial to answer questions like which employees are available on Wednesday between 4PM and 8PM. And if none of this is relevant, how about you give us some use-cases? This could well be an XY problem. Hi, lets try this example to show the problem. You have some old text that was written in two periods of time: 1.) 2nd half of 13th century: - 1250-1299 2.) Beginning of 18th century: - 1700-1715 You are searching for text that were written between 1300-1699, than this document described above should not be hit. If you make start date and end date multiple this results in: start: [1250, 1700] end: [1299, 1715] A search for documents written between 1300-1699 would be: (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 TO *]) (+start:[*-1699] +end:[1700 TO *]) You see that the document above would obviously hit by (+start:[* TO 1300] +end:[1300 TO *]) This sounds exactly like the spatial use case that Erick just described. http://wiki.apache.org/solr/SpatialForTimeDurations https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117 / I am not sure whether the following presentation covers time series with spatial, but it does say deep dive. It's over an hour long, and done by David Smiley, who wrote most of the Spatial code in Solr: http://www.lucenerevolution.org/2013/Lucene-Solr4-Spatial-Deep-Dive Hopefully someone who has actually used this can hop in and give you some additional pointers. Thanks, Shawn
Re: Solution for reverse order of year facets?
Hi Michael, Yes you are correct, oldest comes fist. There is no built in solution for this. Two workaround : 1) use facet.limit=-1 and invert the list (faceting response) at client side 2) use multiples facet.query a)facet.query=year:[2012 TO 2014]facet.query=year:[2010 TO 2012] b)facet.query=year:2014facet.query=year:2013 ... On Monday, March 3, 2014 5:45 PM, Michael Lackhoff mich...@lackhoff.de wrote: On 03.03.2014 16:33 Ahmet Arslan wrote: Currently there are two storing criteria available. However sort by index - to return the constraints sorted in their index order (lexicographic by indexed term) - should return most recent year at top, no? No, it returns them -- as you say -- in lexicographic order and that means oldest first, like: 1815 1820 ... 2012 2013 (might well stop before we get here) 2014 -Michael
Re: Solr Permgen Exceptions when creating/removing cores
Thanks Tri, I really appreciate the response. When I get some free time shortly I'll start giving some of these a try and report back. On Mon, Mar 3, 2014 at 12:42 PM, Tri Cao tm...@me.com wrote: If it's really the interned strings, you could try upgrade JDK, as the newer HotSpot JVM puts interned strings in regular heap: http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html(search for String.intern() in that release) I haven't got a chance to look into the new core auto discovery code, so I don't know if it's implemented with reflection or not. Reflection and dynamic class loading is another source of PermGen exception, in my experience. I don't see anything wrong with your JVM config, which is very much standard. Hope this helps, Tri On Mar 03, 2014, at 08:52 AM, Josh jwda...@gmail.com wrote: In the user core there are two fields, the database core in question was 40, but in production environments the database core is dynamic. My time has been pretty crazy trying to get this out the door and we haven't tried a standard solr install yet but it's on my plate for the test app and I don't know enough about Solr/Bitnami to know if they've done any serious modifications to it. I had tried doing a dump from VisualVM previously but it didn't seem to give me anything useful but then again I didn't know how to look for interned strings. This is something I can take another look at in the coming weeks when I do my test case against a standard solr install with SolrJ. The exception with user cores happens after 80'ish runs, so 640'ish user cores with the PermGen set to 64MB. The database core test was far lower, it was in the 10-15 range. As a note once the permgen limit is hit, if we simply restart the service with the same number of cores loaded the permgen usage is minimal even with the amount of user cores being high in our production environment (500-600). If this does end up being the interning of strings, is there anyway it can be mitigated? Our production environment for our heavier users would see in the range of 3200+ user cores created a day. Thanks for the help. Josh On Mon, Mar 3, 2014 at 11:24 AM, Tri Cao tm...@me.com wrote: Hey Josh, I am not an expert in Java performance, but I would start with dumping a the heap and investigate with visualvm (the free tool that comes with JDK). In my experience, the most common cause for PermGen exception is the app creates too many interned strings. Solr (actually Lucene) interns the field names so if you have too many fields, it might be the cause. How many fields in total across cores did you create before the exception? Can you reproduce the problem with the standard Solr? Is the bitnami distribution just Solr or do they have some other libraries? Hope this helps, Tri On Mar 03, 2014, at 07:28 AM, Josh jwda...@gmail.com wrote: It's a windows installation using a bitnami solr installer. I incorrectly put 64M into the configuration for this, as I had copied the test configuration I was using to recreate the permgen issue we were seeing on our production system (that is configured to 512M) as it takes awhile with to recreate the issue with larger permgen values. In the test scenario there was a small 180 document data core that's static with 8 dynamic user cores that are used to index the unique document ids in the users view, which is then merged into a single user core. The final user core contains the same number of document ids as the data core and the data core is queried against with the ids in the final merged user core as the limiter. The user cores are then unloaded, and deleted from the drive and then the test is reran again with the user cores re-created We are also using the core discovery mode to store/find our cores and the database data core is using dynamic fields with a mix of single value and multi value fields. The user cores use a static configuration. The data is indexed from SQL Server using jtDS for both the user and data cores. As a note we also reversed the test case I mention above where we keep the user cores static and dynamically create the database core and this created the same issue only it leaked faster. We assumed this because the configuration was larger/loaded more classes then the simpler user core. When I get the time I'm going to put together a SolrJ test app to recreate the issue outside of our environment to see if others see the same issue we're seeing to rule out any kind of configuration problem. Right now we're interacting with solr with POCO via the restful interface and it's not very easy for us to spin this off into something someone else could use. In the mean time we've made changes to make the user cores more static, this has slowed down the build up of permgen to something that can
Re: Solution for reverse order of year facets?
On 3/3/2014 7:35 AM, Michael Lackhoff wrote: If I understand the docs right, it is only possible to sort facets by count or value in ascending order. Both variants are not very helpful for year facets if I want the most recent years at the top (or appear at all if I restrict the number of facet entries). There's already an issue in Jira. https://issues.apache.org/jira/browse/SOLR-1672 I can't take a look now, but I will later if someone else hasn't taken it up. Thanks, Shawn
Re: Solution for reverse order of year facets?
Hi Ahmet, There is no built in solution for this. Yes, I know, that's why I would like the TokenFilterFactory Two workaround : 1) use facet.limit=-1 and invert the list (faceting response) at client side 2) use multiples facet.query a)facet.query=year:[2012 TO 2014]facet.query=year:[2010 TO 2012] b)facet.query=year:2014facet.query=year:2013 ... I thought about these but they have the disadvantage that 1) could return hundreds of facet entries. 2b) is better but would need about 30 facet-queries which makes quite a long URL and it wouldn't always work as expected. There are subjects that were very popular in the past but with no (or very few) recent publications. For these I would get empty results for my 2014-1985 facet-queries but miss all the stuff from the 1960s. From all these thoughts I came to the conclusion that a custom TokenFilterFactory could do exactly what I want. In effect it would give me a reverse sort: 1 - 2014 = 7986 1 - 2013 = 7987 ... The client code can easily regain the original year values for display. And I think it shouldn't be too difficult to write such a beast, only problem is I am not a Java programmer. That is why I asked if someone has done it already or if there is a guide I could use. After all it is just a simple subtraction... -Michael
Re: Solr Heap, MMaps and Garbage Collection
Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud (containing custom parsing, analyzers). But I haven't specifically enabled any string interning. Does solr intern all strings in a collection by default? I agree with doc and Filter Query Cache. Query Result cache hits are practically 0 for the large collection since our queries are tail by nature Thanks Nitin On Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: On 3/3/2014 1:54 AM, KNitin wrote: 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings) As others have pointed out, this is really unusual for Solr. We often see high permgen in our app servers due to dynamic class loading that the framework performs; maybe you are somehow loading lots of new Solr plugins, or otherwise creating lots of classes? Of course if you have a plugin or something that does a lot of string interning, that could also be an explanation. -Mike
Re: Solution for reverse order of year facets?
On 03.03.2014 19:58 Shawn Heisey wrote: There's already an issue in Jira. https://issues.apache.org/jira/browse/SOLR-1672 Thanks, this is of course the best solution. Only problem is that I use a custom verson from a vendor (based on version 4.3) I want to enhance. But perhaps they apply the patch. In the meantime I still think the custom filter could be a workaround. I can't take a look now, but I will later if someone else hasn't taken it up. That would be great! Thanks -Michael
Re: Solr Heap, MMaps and Garbage Collection
Is there a way to dump the contents of permgen and look at which classes are occupying the most memory in that? - Nitin On Mon, Mar 3, 2014 at 11:19 AM, KNitin nitin.t...@gmail.com wrote: Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud (containing custom parsing, analyzers). But I haven't specifically enabled any string interning. Does solr intern all strings in a collection by default? I agree with doc and Filter Query Cache. Query Result cache hits are practically 0 for the large collection since our queries are tail by nature Thanks Nitin On Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: On 3/3/2014 1:54 AM, KNitin wrote: 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings) As others have pointed out, this is really unusual for Solr. We often see high permgen in our app servers due to dynamic class loading that the framework performs; maybe you are somehow loading lots of new Solr plugins, or otherwise creating lots of classes? Of course if you have a plugin or something that does a lot of string interning, that could also be an explanation. -Mike
SOLR and Kerberos enabled HDFS
Hello, I am trying to connect SOLR (tried 4.4 and 4.7) to kerberos enabled HDFS - I am using Cloudera CDH 4.2.1 http://maven-repository.com/artifact/com.cloudera.cdh/cdh-root/4.2.1/pom_effective the keytab and principal is valid (I tested it with flume as well as simple hdfs cli) did anobody successfully connect SOLR 4.x to CDH 4.2.1? str name=solr.hdfs.security.kerberos.enabled${solr.hdfs.security.kerberos.enabled:true}/str str name=solr.hdfs.security.kerberos.keytabfile${solr.hdfs.security.kerberos.keytabfile:/my.keytab}/str str name=solr.hdfs.security.kerberos.principal${ solr.hdfs.security.kerberos.principal:m...@mydomain.com}/str I am getting follow error HTTP Status 500 - {msg=SolrCore 'collection1' is not available due to init failure: java.io.IOException: Login failure for m...@mydomain.com from keytab /my.keytab, trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: java.io.IOException: Login failure for m...@mydomain.com from keytab /my.keytab at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:251) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: java.io.IOException: Login failure for me@MYDOMAIN.COMfrom keytab /my.keytab at org.apache.solr.core.HdfsDirectoryFactory.initKerberos(HdfsDirectoryFactory.java:282) at org.apache.solr.core.HdfsDirectoryFactory.init(HdfsDirectoryFactory.java:90) at org.apache.solr.core.SolrCore.initDirectoryFactory(SolrCore.java:443) at org.apache.solr.core.SolrCore.init(SolrCore.java:672) at org.apache.solr.core.SolrCore.init(SolrCore.java:629) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:622) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:657) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) ... ... 3 more Caused by: java.io.IOException: Login failure for m...@mydomain.com from keytab /my.keytab at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:825) at org.apache.solr.core.HdfsDirectoryFactory.initKerberos(HdfsDirectoryFactory.java:280) ... 16 more Caused by: javax.security.auth.login.LoginException: java.lang.IllegalArgumentException: Illegal principal name m...@mydomain.com at org.apache.hadoop.security.User.init(User.java:50) at org.apache.hadoop.security.User.init(User.java:43) at org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule.commit(UserGroupInformation.java:159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:769) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703) at
Wildcard searches and tokenization
I'm working on a user name autocomplete feature, and am having some issues with the way we are tokenizing user names. We're using the StandardTokenizerFactory to tokenize user names, so foo-bar gets split into two tokens. We take input from the user and use it as a prefix to search on the user name. This means wildcard searches of fo* and ba* both return foo-bar, which is what we want. We have a problem when someone types in foo-b as a prefix. I would like to split this into foo and b, then use each as a prefix in a wildcard search. Is there an easy way to tell Solr, Tokenize this, then do a prefix search? I've written at least one QParserPlugin, so that's an option. Hopefully there's an easier way I'm unaware of. - Hayden
What types is supported by Solrj addBean() in the fields of POJO objects?
What are supported types of the POJO objects that are sent to SolrServer.addBean(obj)? A quick glance of DocumentObjectBinder seems to suggest that an arbitrary combination of an Collection, List, ArrayList, array ([]), Map, Hashmap, of primitive types, String and Date is supported, but I'm not too sure. I would also like to know what Solr field types are allowed for each object's (Java) field types. Is there documentation explaining this? Kuro
Re: Solution for reverse order of year facets?
Hi, Regarding just a simple subtraction you do it in indexer code or in a update prcessor too. You can either modify original field or you can create an additional one. Java-script could be used : http://wiki.apache.org/solr/ScriptUpdateProcessor Ahmet On Monday, March 3, 2014 9:11 PM, Michael Lackhoff mich...@lackhoff.de wrote: Hi Ahmet, There is no built in solution for this. Yes, I know, that's why I would like the TokenFilterFactory Two workaround : 1) use facet.limit=-1 and invert the list (faceting response) at client side 2) use multiples facet.query a)facet.query=year:[2012 TO 2014]facet.query=year:[2010 TO 2012] b)facet.query=year:2014facet.query=year:2013 ... I thought about these but they have the disadvantage that 1) could return hundreds of facet entries. 2b) is better but would need about 30 facet-queries which makes quite a long URL and it wouldn't always work as expected. There are subjects that were very popular in the past but with no (or very few) recent publications. For these I would get empty results for my 2014-1985 facet-queries but miss all the stuff from the 1960s. From all these thoughts I came to the conclusion that a custom TokenFilterFactory could do exactly what I want. In effect it would give me a reverse sort: 1 - 2014 = 7986 1 - 2013 = 7987 ... The client code can easily regain the original year values for display. And I think it shouldn't be too difficult to write such a beast, only problem is I am not a Java programmer. That is why I asked if someone has done it already or if there is a guide I could use. After all it is just a simple subtraction... -Michael
Re: Solr Heap, MMaps and Garbage Collection
If you just want to see which classes are occupying the most memory in a live JVM,you can do:jmap -permstat pidI don't think you can dump the contents of PERM space.Hope this helps,TriOn Mar 03, 2014, at 11:41 AM, KNitin nitin.t...@gmail.com wrote:Is there a way to dump the contents of permgen and look at which classes are occupying the most memory in that? - Nitin On Mon, Mar 3, 2014 at 11:19 AM, KNitin nitin.t...@gmail.com wrote: Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud(containing custom parsing, analyzers). But I haven't specifically enabledany string interning. Does solr intern all strings in a collection bydefault?I agree with doc and Filter Query Cache. Query Result cache hits arepractically 0 for the large collection since our queries are tail by natureThanksNitinOn Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov msoko...@safaribooksonline.com wrote:On 3/3/2014 1:54 AM, KNitin wrote:3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)As others have pointed out, this is really unusual for Solr. We oftensee high permgen in our app servers due to dynamic class loading that theframework performs; maybe you are somehow loading lots of new Solr plugins,or otherwise creating lots of classes? Of course if you have a plugin orsomething that does a lot of string interning, that could also be anexplanation.-Mike
Re: Solution for reverse order of year facets?
Hi Michael, I forgot to include what I did for one customer : 1) Using StatsComponent I get min and max values of the field (year) 2) Calculate smart gap/range values according to minimum and maximum. 3) Re-issue the same query (for thee second time) that includes a set of facet.query. Ahmet On Monday, March 3, 2014 10:30 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Regarding just a simple subtraction you do it in indexer code or in a update prcessor too. You can either modify original field or you can create an additional one. Java-script could be used : http://wiki.apache.org/solr/ScriptUpdateProcessor Ahmet On Monday, March 3, 2014 9:11 PM, Michael Lackhoff mich...@lackhoff.de wrote: Hi Ahmet, There is no built in solution for this. Yes, I know, that's why I would like the TokenFilterFactory Two workaround : 1) use facet.limit=-1 and invert the list (faceting response) at client side 2) use multiples facet.query a)facet.query=year:[2012 TO 2014]facet.query=year:[2010 TO 2012] b)facet.query=year:2014facet.query=year:2013 ... I thought about these but they have the disadvantage that 1) could return hundreds of facet entries. 2b) is better but would need about 30 facet-queries which makes quite a long URL and it wouldn't always work as expected. There are subjects that were very popular in the past but with no (or very few) recent publications. For these I would get empty results for my 2014-1985 facet-queries but miss all the stuff from the 1960s. From all these thoughts I came to the conclusion that a custom TokenFilterFactory could do exactly what I want. In effect it would give me a reverse sort: 1 - 2014 = 7986 1 - 2013 = 7987 ... The client code can easily regain the original year values for display. And I think it shouldn't be too difficult to write such a beast, only problem is I am not a Java programmer. That is why I asked if someone has done it already or if there is a guide I could use. After all it is just a simple subtraction... -Michael
Re: network slows when solr is running - help
How frequently are you committing? Frequent commits can slow everything down. -- View this message in context: http://lucene.472066.n3.nabble.com/network-slows-when-solr-is-running-help-tp4120523p4120992.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boost query syntax error
: But this query does not work: : : q={!boost : b=if(exists(query({!v='user_type:ADMIN'})),10,1)}id:1rows=1fl=*,score : It gives an error like this: The problem is the way you are trying to nest queries inside of each other w/o any sort of quoting -- the parser has no indication that the b param is if(exists(query({!v='user_type:ADMIN'})),10,1) it thinks it' if(exists(query({!v='user_type:ADMIN' and the rest is confusing it. If you quote the b param to the boost parser, then it should work... http://localhost:8983/solr/select?q={!boost%20b=%22if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29%22}id:1 ...or if you could use variable derefrencing, either of these should work... http://localhost:8983/solr/select?q={!boost%20b=$b}id:1b=if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29 http://localhost:8983/solr/select?q={!boost%20b=if(exists(query($nestedq)),10,1)}id:1nestedq=foo_s:ADMIN -Hoss http://www.lucidworks.com/
Re[2]: query parameters
ok i like the logic, you can do much more. i think this should do it for me: (-organisations:[ TO *] -roles:[ TO *]) (+organisations:(150 42) +roles:(174 72)) i want to use this in fq and i need to set the operator to OR. My q.op is AND but I need OR in fq. I have read about ofq but that is for putting OR between multiple fq. Can I set the operator for fq? The statement should find all docs without organisations and roles or those that have at least one roles and organisations entry. these fields are multivalued. -Original-Nachricht- Von: Erick Erickson erickerick...@gmail.com An: solr-user@lucene.apache.org Datum: 19/02/2014 04:09 Betreff: Re: query parameters Solr/Lucene query language is NOT strictly boolean, see Chris's excellent blog here: http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/ Best, Erick On Tue, Feb 18, 2014 at 11:54 AM, Andreas Owen a...@conx.ch wrote: I tried it in solr admin query and it showed me all the docs without a value in ogranisations and roles. It didn't matter if i used a base term, isn't that give through the q-parameter? -Original Message- From: Raymond Wiker [mailto:rwi...@gmail.com] Sent: Dienstag, 18. Februar 2014 13:19 To: solr-user@lucene.apache.org Subject: Re: query parameters That could be because the second condition does not do what you think it does... have you tried running the second condition separately? You may have to add a base term to the second condition, like what you have for the bq parameter in your config file; i.e, something like (*:* -organisations:[ TO *] -roles:[ TO *]) On Tue, Feb 18, 2014 at 12:16 PM, Andreas Owen a...@conx.ch wrote: It seams that fq doesn't except OR because: (organisations:(150 OR 41) AND roles:(174)) OR (-organisations:[ TO *] AND -roles:[ TO *]) only returns docs that match the first conditions. it doesn't return any docs with the empty fields organisations and roles. -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Montag, 17. Februar 2014 05:08 To: solr-user@lucene.apache.org Subject: query parameters in solrconfig of my solr 4.3 i have a userdefined requestHandler. i would like to use fq to force the following conditions: 1: organisations is empty and roles is empty 2: organisations contains one of the commadelimited list in variable $org 3: roles contains one of the commadelimited list in variable $r 4: rule 2 and 3 snipet of what i got (havent checked out if the is a in operator like in sql for the list value) lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypeedismax/str str name=synonymstrue/str str name=qfplain_text^10 editorschoice^200 title^20 h_*^14 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10 contentmanager^5 links^5 last_modified^5 url^5 /str str name=fq(organisations='' roles='') or (organisations=$org roles=$r) or (organisations='' roles=$r) or (organisations=$org roles='')/str str name=bq(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str !-- tested: now or newer or empty gets small boost -- str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --
Re: Configuration problem
On 3/3/2014 9:02 AM, Thomas Fischer wrote: The setting is solr directories (I use different solr versions at the same time): /srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the new discovery type (no cores), and inside the core directories are empty files core.properties and symbolic links to the universal conf directory. solr webapps (I use very different webapps simultaneously): /srv/www/webapps/solr/solr4.6.1 is the solr webapp I tried to convey this information to the tomcat server by putting a file solr4.6.1.xml into the cataiina/localhost folder with the contents ?xml version=1.0 encoding=utf-8? Context docBase=/srv/www/webapps/solr/solr4.6.1 debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/srv/solr/solr4.6.1 override=true/ /Context Your message is buried deep in another message thread about NoSQL, because you replied to an existing message rather than starting a new message to solr-user@lucene.apache.org. On list-mirroring forums like Nabble, nobody will even see your message (or this reply) unless they actually open that other thread. This is what it looks like on a threading mail reader (Thunderbird): https://www.dropbox.com/s/87ilv7jls7y5gym/solr-reply-thread.png I don't use Tomcat, so I can't even begin to comment on that. I can talk about your solr home setting and what Solr is going to do with that. You probably do not have /srv/solr/solr4.6.1/solr.xml on your system. Solr will look for solr.mxl in your solr home, and if it cannot find it, it assumes that you are not running multicore, so it look for things like collection1/conf/solrconfig.xml instead. There is a solr.xml in the example. Use that, changing as necessary, or create a solr.xml file with just the following line in it. It will probably start working: solr/ You *might* need the following instead, but since Solr uses standard XML parsing libraries, I would guess that the above line will work. solr /solr Thanks, Shawn
is it possible to consolidate filterquery cache strings
lets say I have a largish set of data (120M docs) and that I am partitioning my data by groups of states (using the state codes) Someone suggested that I could use the following format in my solrconfig.xml when defining the filterqueries work: listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=q*:*/str str name=fqState:AL/str str name=fqState:AK/str ... str name=fqState:WY/str /arr /listener Would that work, and if so how would I know that the cache is being hit? Or do I need to use the following traditional syntax instead: listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=q*:*/str str name=fqState:AL/str /str lst str name=q*:*/str str name=fqState:AK/str /str ... lst str name=q*:*/str str name=fqState:WY/str /str /arr /listener any help appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Startup
A quick ping on this. To give more stats, I have 100's of collections on every node. The time it takes for one collection to boot up /loadonStartup is around 10-20 seconds (and sometimes even 1 minute). I do not have any query auto warming etc. On a per collection basis I load a bunch of libraries (for custom analyzer plugins) to compute the classpath. That might be a reason for the high boot up time My solrconfig.xml entry is as follows lib dir=/mnt/solr/lib/ regex=.*\.jar / Every core that boots up seems to be loading all jars over and over again. Is there a way to ask solr to load all jars only once? Thanks - Nitin On Wed, Feb 26, 2014 at 3:06 PM, KNitin nitin.t...@gmail.com wrote: Thanks, Shawn. I will try to upgrade solr soon Reg firstSearcher: I think it does nothing now. I have configured to use ExternalFileLoader but there the external file has no contents. Most of the queries hitting the collection are expensive and tail queries. What will be your recommendation to warm the first Searcher/new Searcher? Thanks Nitin On Tue, Feb 25, 2014 at 4:12 PM, Shawn Heisey s...@elyograg.org wrote: On 2/25/2014 4:30 PM, KNitin wrote: Jeff : Thanks. I have tried reload before but it is not reliable (atleast in 4.3.1). A few cores get initialized and few dont (show as just recovering or down) and hence had to move away from it. Is it a known issue in 4.3.1? With Solr 4.3.1, you are running into this bug with reloads under SolrCloud: https://issues.apache.org/jira/browse/SOLR-4805 The only way to recover from this bug is to restart Solr.The bug is fixed in 4.4.0 and later. Shawn,Otis,Erick Yes I have reviewed the page before and have given 1/4 of my mem to JVM and the rest to RAM/Os Cache. (15 Gb heap and 45 G to rest. Totally 60G machine). I have also reviewed the tlog file and they are in the order of KB (4-10 or 30). I have SSD and the reads are hardly noticable (in the order of 100Kb during that time frame). I have also disabled swap on all machines Regarding firstSearcher, It is currently set to externalFileLoader. What is the use of first searcher? I havent played around with it I don't think it's a good idea to have extensive warming queries. I do exactly one query in firstSearcher and newSearcher: a query for all documents with zero rows, sorted on our most common sort field. This is designed purely to preload the sort data into the FieldCache. Thanks, Shawn
Re: SolrCloud Startup
On 3/3/2014 3:30 PM, KNitin wrote: A quick ping on this. To give more stats, I have 100's of collections on every node. The time it takes for one collection to boot up /loadonStartup is around 10-20 seconds (and sometimes even 1 minute). I do not have any query auto warming etc. On a per collection basis I load a bunch of libraries (for custom analyzer plugins) to compute the classpath. That might be a reason for the high boot up time My solrconfig.xml entry is as follows lib dir=/mnt/solr/lib/ regex=.*\.jar / Every core that boots up seems to be loading all jars over and over again. Is there a way to ask solr to load all jars only once? Three steps: 1) Get rid of all your lib directives in solrconfig.xml entirely. 2) Copy all the extra jars that you need into ${solr.solr.home}/lib. 3) Remove any sharedLib parameter from your solr.xml file. Step 3 is required because you are on 4.3.1 (or later if you have already upgraded). The final comment on the following issue summarizes issues that I ran into while migrating this approach from 4.2.1 to later releases: https://issues.apache.org/jira/browse/SOLR-4852 Thanks, Shawn
Re: Configuration problem
Am 03.03.2014 um 22:43 schrieb Shawn Heisey: On 3/3/2014 9:02 AM, Thomas Fischer wrote: The setting is solr directories (I use different solr versions at the same time): /srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the new discovery type (no cores), and inside the core directories are empty files core.properties and symbolic links to the universal conf directory. solr webapps (I use very different webapps simultaneously): /srv/www/webapps/solr/solr4.6.1 is the solr webapp I tried to convey this information to the tomcat server by putting a file solr4.6.1.xml into the cataiina/localhost folder with the contents ?xml version=1.0 encoding=utf-8? Context docBase=/srv/www/webapps/solr/solr4.6.1 debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/srv/solr/solr4.6.1 override=true/ /Context Your message is buried deep in another message thread about NoSQL, because you replied to an existing message rather than starting a new message to solr-user@lucene.apache.org. On list-mirroring forums like Nabble, nobody will even see your message (or this reply) unless they actually open that other thread. This is what it looks like on a threading mail reader (Thunderbird): https://www.dropbox.com/s/87ilv7jls7y5gym/solr-reply-thread.png Yes, I'm sorry, I only afterwards realized that my question inherited the thread from the E-Mail I was reading and using as a template for the answer. Meanwhile I figured out that I overlooked the third place to define solr home for Tomcat (after JAVA_OPTS and JNDI): web.xml in WEB-INF of the given webapp. This overrides the other definitions and created the impression that I couldn't set solr home. But now I get the message Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml for the core geo. In the solr wiki I read (http://wiki.apache.org/solr/ConfiguringSolr): In each core, Solr will look for a conf/solrconfig.xml file and expected solr to look for /srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml (which exists), but obviously it doesn't. Why? My misunderstanding? Best Thomas
Re: is it possible to consolidate filterquery cache strings
note: by partitioning I mean that I have sharded the 120M docs into 9 Solr partitions (each on a separate server) -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121012.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Startup
Thanks, Shawn. Right now my solr.solr.home is not being passed from the java runtime Lets say /mnt/solr/ is my solr root. I can add all jars to /mnt/solr/lib/ and use -Dsolr.solr.home=/mnt/solr/ , that should do it right? Thanks Nitin On Mon, Mar 3, 2014 at 2:44 PM, Shawn Heisey s...@elyograg.org wrote: On 3/3/2014 3:30 PM, KNitin wrote: A quick ping on this. To give more stats, I have 100's of collections on every node. The time it takes for one collection to boot up /loadonStartup is around 10-20 seconds (and sometimes even 1 minute). I do not have any query auto warming etc. On a per collection basis I load a bunch of libraries (for custom analyzer plugins) to compute the classpath. That might be a reason for the high boot up time My solrconfig.xml entry is as follows lib dir=/mnt/solr/lib/ regex=.*\.jar / Every core that boots up seems to be loading all jars over and over again. Is there a way to ask solr to load all jars only once? Three steps: 1) Get rid of all your lib directives in solrconfig.xml entirely. 2) Copy all the extra jars that you need into ${solr.solr.home}/lib. 3) Remove any sharedLib parameter from your solr.xml file. Step 3 is required because you are on 4.3.1 (or later if you have already upgraded). The final comment on the following issue summarizes issues that I ran into while migrating this approach from 4.2.1 to later releases: https://issues.apache.org/jira/browse/SOLR-4852 Thanks, Shawn
solrconfig.xml
Hello, I'm sorry to repeat myself but I didn't manage to get out of the thread I inadvertently slipped into. My problem now is this: I have a core geo (with an empty file core.properties inside) and solrconfig.xml at /srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml following the hint from the solr wiki (http://wiki.apache.org/solr/ConfiguringSolr): In each core, Solr will look for a conf/solrconfig.xml file But I get the error message: Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml Why? My misunderstanding? Best Thomas
Re: is it possible to consolidate filterquery cache strings
: Would that work, and if so how would I know that the cache is being hit? It should work -- filters are evaluated independently, so the fact that you are using all of them in query query (vs all of them in individual queries) won't change anything as far as the filterCache goes. You can prove that it works by looking at the cache stats (available from the Admin UI) after opening a new searcher and verifying that they are all in the new caches. you can also then do a query for soemthing like q=foofq=State:AK and reload the cache stats and see a hit on your filterCcahe. : Or do I need to use the following traditional syntax instead: The only reason to break them all out like that is if you in addition to populating the *filterCache* you also want to populate the *queryResultCache* with ~50 queries for *:* each with a different fq applied. -Hoss http://www.lucidworks.com/
Re: Boost query syntax error
All of them work like a charm! Thanks, Chris. On Mon, Mar 3, 2014 at 1:28 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : But this query does not work: : : q={!boost : b=if(exists(query({!v='user_type:ADMIN'})),10,1)}id:1rows=1fl=*,score : It gives an error like this: The problem is the way you are trying to nest queries inside of each other w/o any sort of quoting -- the parser has no indication that the b param is if(exists(query({!v='user_type:ADMIN'})),10,1) it thinks it' if(exists(query({!v='user_type:ADMIN' and the rest is confusing it. If you quote the b param to the boost parser, then it should work... http://localhost:8983/solr/select?q={!boost%20b=%22if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29%22}id:1 ...or if you could use variable derefrencing, either of these should work... http://localhost:8983/solr/select?q={!boost%20b=$b}id:1b=if%28exists%28query%28{!v=%27foo_s:ADMIN%27}%29%29,10,1%29 http://localhost:8983/solr/select?q={!boost%20b=if(exists(query($nestedq)),10,1)}id:1nestedq=foo_s:ADMIN -Hoss http://www.lucidworks.com/
Re: solrconfig.xml
File permissions? Malformed XML? Are there any other exceptions earlier in the log? If you substitute that file with one from example distribution, does it work? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Mar 4, 2014 at 6:07 AM, Thomas Fischer fischer...@aon.at wrote: Hello, I'm sorry to repeat myself but I didn't manage to get out of the thread I inadvertently slipped into. My problem now is this: I have a core geo (with an empty file core.properties inside) and solrconfig.xml at /srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml following the hint from the solr wiki (http://wiki.apache.org/solr/ConfiguringSolr): In each core, Solr will look for a conf/solrconfig.xml file But I get the error message: Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml Why? My misunderstanding? Best Thomas
Re: is it possible to consolidate filterquery cache strings
would not breaking the FQs out by state be faster for warming up the fq caches? -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121030.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrconfig.xml
: I have a core geo (with an empty file core.properties inside) and solrconfig.xml at : /srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml ... : But I get the error message: : Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml 1) what does your solr.xml file look like? 2) what does cores/geo/core.properties look like? 3) do you get any other errors before this one in your log? 4) what kind of file permissions are set on cores, cores/geo, cores/geo/conf, etc... ? It's possible that this just a mistake in the error message after some real error with your actual geo/conf/solrconfig.xml has already been logged. Or it's possible that solr couldn't read geo/conf/solrconfig.xml (permissions) and tried to fallback by looking for geo/solrconfig.xml (we used to do that, look in the instanceDir as a last resort -- not sure if the code is still in there) and you're just looking at the last errror. -Hoss http://www.lucidworks.com/
Re: java.lang.Exception: Conflict with StreamingUpdateSolrServer
: Subject: java.lang.Exception: Conflict with StreamingUpdateSolrServer the fact that you are using StreamingUpdateSolrServer isn't really a factor here -- what matters is the data you are sending to solr in the updates... : location=StreamingUpdateSolrServer line=162 Status for: null is 409 ... : Conflict A 409 HTTP Status is a Conflict. It means that Optimistic concurrency failed. Your update indicated a document version but the version of hte document on the server didn't meet the version requirements... https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency -Hoss http://www.lucidworks.com/
Re: Searching with special chars
So as there was no quick work around to this issue, we simply change the http method from get to post, to avoid further problems which could be triggered by user input too. though this violates the restful standards... at least we have something running properly - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-with-special-chars-tp4120047p4121043.html Sent from the Solr - User mailing list archive at Nabble.com.
Please add me to wiki contributors
Hi, Can you please add me to wiki contributors. I wanted to add some stats on Linux vs Windows we came across recently, CSV update handler examples, and also wanted to add company name to public server page. Thanks, Susheel
Automate search results filtering based on scoring
Hi, We are looking to automate searches (name searches) filter out the results based on some scoring confidence. Any suggestions on what different approaches we can use to pick only top closer matches and filter out rest of the results. Thanks, Susheel
Re: java.lang.Exception: Conflict with StreamingUpdateSolrServer
Thanks Chirs, I found in our application code it was related to optimistic concurrency failure. On Mon, Mar 3, 2014 at 6:13 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: java.lang.Exception: Conflict with StreamingUpdateSolrServer the fact that you are using StreamingUpdateSolrServer isn't really a factor here -- what matters is the data you are sending to solr in the updates... : location=StreamingUpdateSolrServer line=162 Status for: null is 409 ... : Conflict A 409 HTTP Status is a Conflict. It means that Optimistic concurrency failed. Your update indicated a document version but the version of hte document on the server didn't meet the version requirements... https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency -Hoss http://www.lucidworks.com/
RE: Please add me to wiki contributors
My user name is SusheelKumar for solr wiki. -Original Message- From: Susheel Kumar [mailto:susheel.ku...@thedigitalgroup.net] Sent: Monday, March 03, 2014 9:36 PM To: solr-user@lucene.apache.org Subject: Please add me to wiki contributors Hi, Can you please add me to wiki contributors. I wanted to add some stats on Linux vs Windows we came across recently, CSV update handler examples, and also wanted to add company name to public server page. Thanks, Susheel
Re: range types in SOLR
Am 03.03.2014 19:12, schrieb Smiley, David W.: The main reference for this approach is here: http://wiki.apache.org/solr/SpatialForTimeDurations Hoss’s illustrations he developed for the meetup presentation are great. However, there are bugs in the instruction — specifically it’s important to slightly buffer the query and choose an appropriate maxDistErr. Also, it’s more preferable to use the rectangle range query style of spatial query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using “Intersects(minX minY maxX maxY)”. There’s no technical difference but the latter is deprecated and will eventually be removed from Solr 5 / trunk. All this said, recognize this is a bit of a hack (one that works well). There is a good chance a more ideal implementation approach is going to be developed this year. Thank you, having a working example is great but having a practically working example that hides this implementation detail would even better. I would like to store: 2014-03-04T07:05:12,345Z, 2014-03-04, 2014-03 and 2014 into one field and make queries on that field. Currently I have to normalize all to the first format (inventing information). That is only the worst approximation. Normalize them to a range would be the best in my opinion. So a query like date:2014 would hit all but also date:[2014-01 TO 2014-03]. kind regards, Thomas
Re: SOLRJ and SOLR compatibility
Am 27.02.2014 09:15, schrieb Shawn Heisey: On 2/27/2014 12:49 AM, Thomas Scheffler wrote: What problems have you seen with mixing 4.6.0 and 4.6.1? It's possible that I'm completely ignorant here, but I have not heard of any. Actually bug reports arrive me that sound like Unknown type 19 Aha! I found it! It was caused by the change applied for SOLR-5658, fixed in 4.7.0 (just released) by SOLR-5762. Just my luck that there's a bug bad enough to contradict what I told you. https://issues.apache.org/jira/browse/SOLR-5658 https://issues.apache.org/jira/browse/SOLR-5762 I've added a comment that will help users find SOLR-5762 with a search for Unknown type 19. If you use SolrJ 4.7.0, compatibility should be better. Hi, I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR 4.5.1. I received a client stack trace this morning and still waiting for a Log-Output from the Server: -- ERROR unable to submit tasks org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Unknown type 19 at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) -- There is not much information in that Stacktrace, I know. I'll send further information, when I receive more. In the mean time I asked our customer not to upgrade the SOLR server to resolve the issue. So we could dig deeper. kind regards, Thomas
Re: SOLRJ and SOLR compatibility
Am 04.03.2014 07:21, schrieb Thomas Scheffler: Am 27.02.2014 09:15, schrieb Shawn Heisey: On 2/27/2014 12:49 AM, Thomas Scheffler wrote: What problems have you seen with mixing 4.6.0 and 4.6.1? It's possible that I'm completely ignorant here, but I have not heard of any. Actually bug reports arrive me that sound like Unknown type 19 Aha! I found it! It was caused by the change applied for SOLR-5658, fixed in 4.7.0 (just released) by SOLR-5762. Just my luck that there's a bug bad enough to contradict what I told you. https://issues.apache.org/jira/browse/SOLR-5658 https://issues.apache.org/jira/browse/SOLR-5762 I've added a comment that will help users find SOLR-5762 with a search for Unknown type 19. If you use SolrJ 4.7.0, compatibility should be better. Hi, I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR 4.5.1. I received a client stack trace this morning and still waiting for a Log-Output from the Server: Here we go for the server side (4.5.1): Mrz 03, 2014 2:39:26 PM org.apache.solr.core.SolrCore execute Information: [clausthal_test] webapp=/solr path=/select params={fl=*,scoresort=mods.dateIssued+descq=%2BobjectType:mods+%2Bcategory:clausthal_status\:publishedwt=javabinversion=2rows=3} hits=186 status=0 QTime=2 Mrz 03, 2014 2:39:38 PM org.apache.solr.update.processor.LogUpdateProcessor finish Information: [clausthal_test] webapp=/solr path=/update params={wt=javabinversion=2} {} 0 0 Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log Schwerwiegend: java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log Schwerwiegend: null:java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) at