In-memory collections?
Hi Is there a way I can configure Solrs so that it handles its shared completely in memory? If yes, how? No writing to disk - neither transactionlog nor lucene indices. Of course I accept that data is lost if the Solr crash or is shut down. Regards, Per Steffensen
Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
On Thu, 2013-08-01 at 15:24 +0200, Grzegorz Sobczyk wrote: Today I found in solr logs exception: java.lang.OutOfMemoryError: Requested array size exceeds VM limit. At that time memory usage was ~200MB / Xmx3g [...] Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at org.apache.lucene.util.PriorityQueue.init(PriorityQueue.java:64) at org.apache.lucene.util.PriorityQueue.init(PriorityQueue.java:37) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.init(ShardDoc.java:113) Are you requesting a very large number of results? Integer.MAX_VALUE perhaps? If so, you need to change that to a more manageable number. - Toke Eskildsen, State and University Library, Denmark
Re: In-memory collections?
On 8/7/2013 12:13 AM, Per Steffensen wrote: Is there a way I can configure Solrs so that it handles its shared completely in memory? If yes, how? No writing to disk - neither transactionlog nor lucene indices. Of course I accept that data is lost if the Solr crash or is shut down. The lucene index part can be done using RAMDirectoryFactory. It's generally not a good idea, though. If you have enough RAM for that, then you have enough RAM to fit your entire index into the OS disk cache. I don't think you can really do anything about the transaction log being on disk, but I could be incorrect about that. Relying on the OS disk cache and the default directory implementation will usually give you equivalent or better query performance compared to putting your index into JVM memory. You won't need a massive Java heap and the garbage collection problems that it creates. A side bonus: you don't lose your index when Solr shuts down. If you have extremely heavy indexing, then RAMDirectoryFactory might work better -- assuming you've got your GC heavily tuned. A potentially critical problem with RAMDirectoryFactory is that merging/optimizing will require at least twice as much RAM as your total index size. Here's a complete discussion about this: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html NB: That article was written for 3.x, when NRTCachingDirectoryFactory (the default in 4.x) wasn't available. The NRT factory *uses* MMapDirectory. Thanks, Shawn
RE: entity classification solr
Yes, you can copyField the source's contents to another field, use the KeepWordTokenFilter to keep only those words you really care about. Using (e)dismax you can then apply a heavy boost on the field. All special words in that field will show up higher if queried for. -Original message- From:smanad sma...@gmail.com Sent: Wednesday 7th August 2013 3:23 To: solr-user@lucene.apache.org Subject: entity classification solr I have the following situation when using Solr 4.3. My document contains entities for example peanut butter. I have a list of such entities. These are items that go together and are not to be treated as two individual words. During indexing, I want solr to realize this and treat peanut butter as an entity. For example if someone searches for peanut then documents that have the word peanut should rank higher than documents that have the word peanut butter. However if someone searches for peanut butter then the document that has peanut butter should show up higher than ones that have just peanut. Is there a config setting somewhere which can be modified such that the entity list can be specified in a file and Solr would do the needful? Should I be using KeepWordFilterFactory for this? Any pointers will be much appreciated. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/entity-classification-solr-tp4082923.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Large config files in SolrCloud
With SOLR-5115 there's support for forcing ZkResourceLoader to use SolrResourceLoader using file:/// prefix in your schema. This forces Solr to load files from the FS as usual. https://issues.apache.org/jira/browse/SOLR-5115 -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Friday 2nd August 2013 15:27 To: solr-user@lucene.apache.org Subject: RE: Large config files in SolrCloud Ok, i managed to load a config file from the node's sharedLib directory by pointing it to `../../lib/file`. This works fine in normal mode, in cloud mode all it does is attempting to find it in Zookeeper by trying /configs/COLLECTION_NAME/path. Properties such as ${solr.home} are not recognized in the schema, it seems. Any idea on how to force Solr in cloud mode to grab the file from the local FS? Thanks, Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Friday 2nd August 2013 14:34 To: solr-user@lucene.apache.org Subject: RE: Large config files in SolrCloud Yes, all the usual config files are well under 1MB and work as expected. This file is under 2MB and the limit i set is 5MB. Setting jute.maxbuffer (all lowercase) did work during a test a long time ago but we'd like to put the new features in production and we're stuck at this trivial issue :) Thanks -Original message- From:Erick Erickson erickerick...@gmail.com Sent: Friday 2nd August 2013 14:28 To: solr-user@lucene.apache.org Subject: Re: Large config files in SolrCloud Hmmm, does it work with smaller config files? There's been a limit of 1M for ZK files, and I'm wondering if your setup would work with, say, 2M configs as a check that it's something else rather than just the 1M limit. FWIW, Erick On Fri, Aug 2, 2013 at 8:18 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, I have a few very large configuration files but it doens't work in cloud mode due to the KeeperException$ConnectionLossException. All 10 Solr nodes run trunk and have jute.maxbuffer set to 5242880 (5MB). I can confirm it is set properly by looking at the args in the Solr GUI. All Zookeepers have exactly the same key/value set, i can confirm this by looking with ps at the process, it is really there, the first parameter. But it doesn't work! I have had it working once but it doesn't like me anymore. Putting the config files in the node's sharedLib or the core's lib directory doesn't work either, the files are clearly loaded according to the logs but the TokenFilters cannot access them and complain about a config file not being found. I'm out of ideas, any to share? Thanks, Markus
Re: In-memory collections?
On 8/7/13 9:04 AM, Shawn Heisey wrote: On 8/7/2013 12:13 AM, Per Steffensen wrote: Is there a way I can configure Solrs so that it handles its shared completely in memory? If yes, how? No writing to disk - neither transactionlog nor lucene indices. Of course I accept that data is lost if the Solr crash or is shut down. The lucene index part can be done using RAMDirectoryFactory. It's generally not a good idea, though. If you have enough RAM for that, then you have enough RAM to fit your entire index into the OS disk cache. I don't think you can really do anything about the transaction log being on disk, but I could be incorrect about that. Relying on the OS disk cache and the default directory implementation will usually give you equivalent or better query performance compared to putting your index into JVM memory. You won't need a massive Java heap and the garbage collection problems that it creates. A side bonus: you don't lose your index when Solr shuts down. If you have extremely heavy indexing, then RAMDirectoryFactory might work better -- assuming you've got your GC heavily tuned. A potentially critical problem with RAMDirectoryFactory is that merging/optimizing will require at least twice as much RAM as your total index size. Here's a complete discussion about this: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html NB: That article was written for 3.x, when NRTCachingDirectoryFactory (the default in 4.x) wasn't available. The NRT factory *uses* MMapDirectory. Thanks, Shawn Thanks, Shawn The thing is that this will be used for a small ever-changing collection. In our system we load a lot of documents into a SolrCloud cluster. A lot of processes across numerous machines work in parallel on loading those documents. Those processes needs to coordinate (hold each other back) from time to time and they do so by taking distributed locks. Until now we have used the ZooKeeper cluster at hand for taking those distributed locks, but the need for locks is so heavy that it causes congestion in ZooKeeper, and ZooKeeper really cannot scale in that area. We could use several ZooKeeper clusters, but we have decided to use a locking collection in Solr instead - that will scale. You can implement locking in Solr using versioning and optimistic locking. So this collection will at any time just contain the few locks (counted in max a few hundreds) that are current right now. Lots of locks will be taken, but each of them will only exist in a few ms before deleted again. Therefore it will not take up a lot of memory, I guess? Guess we will try RAMDirectoryFactory, and I will look into how we can avoid Solr transactionlog being written (to disk at least). Regards, Per Steffensen
Solr 4.4. creating an index that 4.3 can't read (but in LUCENE_43 mode)
I had been running a Solr 4.3.0 index, which I upgraded to 4.4.0 (but hadn't changed LuceneVersion, so it was still using the LUCENE_43 codec). I then had to back-out and return to a 4.3 system, and got an error when it tried to read the index. Now, it was only a dev system, so not a problem, and normally I would use restore a backup anyway, but shouldn't this work? If I haven't changed the codec, then Solr 4.4 should be using the same code as 4.3, so the data should be compatible, no? I noticed its in DocValues, but I thought they were supposed to be compatible using the default format which we do? Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource: NIOFSIndexInput(path=/bb/news/search/solr/main/data/index/_3bs_Lucene42_0.dvm)): 1 (needs to be between 0 and 0) at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:148) at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:130) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.init(Lucene42DocValuesProducer.java:84) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesFormat.fieldsProducer(Lucene42DocValuesFormat.java:133) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.init(PerFieldDocValuesFormat.java:213) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:282) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:134) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34) at org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169) ... 18 more Cheers, Daniel
Re: Solr Split Shard - Document loss and down time
Hi Erick, I have a question. Suppose if any error occurred during shard split , is there any approach to revert back the split action? . This is seriously breaking my head. For me documents are getting lost when any of the node for that shard is dead when split shard is in progress. Thanks Ranjith -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002p4082973.html Sent from the Solr - User mailing list archive at Nabble.com.
Data Import from MYSQL and POSTGRESQL
For the data import handler I have moved he mysql and postgresql jar files to the solr lib directory (/opt/solr/lib). My issue is in the data-config.xml I have put two datasources, however, I am stuck on what to put for the driver values and the urls. dataSource name=quot;mysqlquot; driver=quot;lt;driver url=url user=user password=pass / dataSource name=quot;postgresqlquot; driver=quot;lt;driver url=url user=user password=pass/ Is anyone able to tell me hat I should be putting for these values please? -- View this message in context: http://lucene.472066.n3.nabble.com/Data-Import-from-MYSQL-and-POSTGRESQL-tp4082974.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Split Shard - Document loss and down time
Hi Ranjith, Here are a few things to note about shard split: 1. The command auto-retries. Also, if there's something that went wrong during a split, you should wait for it to complete. 2. In case of a failure, the parent shard is supposed to be intact and the new sub-shards wouldn't replace the parent shard. 3. If you tried using 4.3.*, the commit isn't called and so the documents wouldn't be visible on the subshards unless you call an explicit commit. Having said that, I'd highly recommend you not to use 4.3 for trying to shard splitting. Can you explain further by what you mean by documents are getting lost? AFAIR, the code is supposed to handle failure midway through the shard split call, including dead leader/overseer. On Wed, Aug 7, 2013 at 3:07 PM, Ranjith Venkatesan ranjit...@zohocorp.comwrote: Hi Erick, I have a question. Suppose if any error occurred during shard split , is there any approach to revert back the split action? . This is seriously breaking my head. For me documents are getting lost when any of the node for that shard is dead when split shard is in progress. Thanks Ranjith -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002p4082973.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta http://www.anshumgupta.net
Data Import Handler Help
Hi, I'm looking for a bit of guidance in implementing a data import handler for mongodb. I am using https://github.com/sucode/solrMongoDBImporter/blob/master/README.md as a starting point, and I can get full imports working properly with a few adjustments to the source. The problem comes in when I try delta imports. After adding code to support delta queries and looking at how the sql import handler works, I get deltas reads but the counts grow out of control. Its as if DocBuilder does not know when to stop processing. Example: I have one doc to be read but I get 2 docs added/updated. Has anyone seen this before? Using 4.2.0. Thanks
Solr doesn't make indexes for all the enteries
Hello, I am a newbie to solr. I have installed and configured it with my django project. I am using the following versions: django-haystack - 2.0.0 ApacheSolr - 3.5.0 Django - 1.4 mysql - 5.5.32-0 Here is the model, whose data I want to index: http://tny.cz/422c5fb7 Here is search_indexes.py: http://tny.cz/8de95043 Have created the file templates/search/indexes/myapp/userprofile_text.txt Have created a template to show the results after querying from database. I have build schema using the command $ ./manage.py build_solr_schema and replaced the contents of example/solr/conf/schema.xml with the output. Here is it http://tny.cz/49fe8e1d When I use the command $ ./manage.py rebuild_index to create indexes, it shows: WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'. Your choices after this are to restore from backups or rebuild via the `rebuild_index` command. Are you sure you wish to continue? [y/N] y Removing all documents from your index because you said so. All documents removed. Indexing 18 user profiles But indexes are shown only for 10 user profiles. I am seeing the indexes here: http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on Only those userprofiles are shown, when searched for. Why does it happen? And how to solve this issue? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-doesn-t-make-indexes-for-all-the-enteries-tp4082977.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: poor facet search performance
On Tue, 2013-07-30 at 21:48 +0200, Robert Stewart wrote: [Custom facet structure] Then we sorted those sets of facet fields by total document frequency so we enumerate the more frequent facet fields first, and we stop looking when we find a facet field which has less total document matches than the top N facet counts we are looking for. So the structure was Facet-docIDs? A bit like enum in Solr? Your top-N cut-off is an interesting optimization for that. [...] The slaves just pre-load that binary structure directly into ram in one shot in the background when opening a new snapshot for search. We used a similar pre-calculation some years ago but abandoned it as the cost of Pre-generate_structure + #duplicates * (distribute_structure + open_structure) was just as costly and less flexible than #duplicates * generate_structure for us. We have 200 million docs, 10 shards, about 20 facet fields, some of which contain about 20,000 unique values. We show top 10 facets for about 10 different fields in results page. We provide search results with lots of facets and date counts in around 200-300ms using this technique. Currently, we are porting this entire system to SOLR. For a single core index of 8 million docs, using similar documents and facet fields from our production indexes, I cant get faceted search to perform anywhere close to 300ms for general searches. More like 1.5-3 seconds. Solr fc faceting treats each facet independently and in a docID-facet manner so what happens is foreach facet { foreach docIDinResultSet { foreach tagIDinDocument { facet.counter[tagID]++ } } } With 10 facets, 8M documents and 1 tag/doc/facet, the total loop count it 80M. That does not normally take 1.5-3 seconds in, so something seems off. Do you have a lot of facet tags (aka terms) for each document? Is there anything else that I should look into for getting better facet performance? Could you list the part of the Solr log with the facet structures? Just grep for UnInverted. They look something like this: UnInverted multi-valued field {field=lma_long,memSize=42711405,tindexSize=42,time=979,phase1=964,nTerms=23,bigTerms=6,termInstances=1958468,uses=0} Given these metrics (200m docs, 20 facet fields, some fields with 20,000 unique values), what kind of facet search performance should I expect? Due to the independent faceting handling in Solr, the facet time will scale a bit worse than linear to the number of documents, relative to your test setup. With a loop count of 200M*10 (or 20? I am a bit confused on how many facets you show at a time) = 2G, this will take multiple seconds. Unless you go experimental (SOLR-2412 to bang my own drum), your facet count needs to go down or you need to shard with Solr. Also we need to issue frequent commits since we are constantly streaming new content into the system. You could use a setup with a smaller live shard and multiple stale ones, but depending on corpus your ranking might suffer. - Toke Eskildsen, State and University Library, Denmark
Solr 4.4 ShingleFilterFactroy exception
Hi, I have setup solr 4.4 with cloud. When i start solr, I get an exception as below, *ERROR [CoreContainer] Unable to create core: mycore_sh1: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text_shingle: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.shingle.ShingleFilterFactory'* at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [:1.6.0_43] at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) [:1.6.0_43] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [:1.6.0_43] at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [:1.6.0_43] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [:1.6.0_43] at java.lang.Thread.run(Thread.java:662) [:1.6.0_43] Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.shingle.ShingleFilterFactory' The same file works well with solr 4.2. Pls help. Thanks, Prasi
Re: Multiple sorting does not work as expected
Well, at least it's not throwing an error G. Sorting on a tokenized field is not supported, or rather the behavior is undefined. Your Name field is tokenized if it's the stock text_en field. Best Erick On Tue, Aug 6, 2013 at 11:03 AM, Mysurf Mail stammail...@gmail.com wrote: I don't see how it is sorted. this is the order as displayed above 1- BOM Total test2 2- BOM Total test - Copy 3- BOM Total test2 all in the same 2.2388418 score On Tue, Aug 6, 2013 at 5:28 PM, Jack Krupansky j...@basetechnology.com wrote: The Name field is sorted as you have requested - desc. I suspect that you wanted name to be sorted asc (natural order.) -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Tuesday, August 06, 2013 10:22 AM To: solr-user@lucene.apache.org Subject: Re: Multiple sorting does not work as expected my schema field name=Name type=text_en indexed=true stored=true required=true/ field name=Version type=int indexed=true stored=true required=true/ On Tue, Aug 6, 2013 at 5:06 PM, Mysurf Mail stammail...@gmail.com wrote: My documents has 2 indexed attribute - name (string) and version (number) I want within the same score the documents will be displayed by the following order score(desc),name(desc),**version(desc) Therefor I query using : http://localhost:8983/solr/**vault/select http://localhost:8983/solr/vault/select ? q=BOMfl=*:score sort=score+desc,Name+desc,**Version+desc And I get the following inside the result: doc str name=NameBOM Total test2/str ... int name=Version2/int ... float name=score2.2388418/float /doc doc str name=NameBOM Total test - Copy/str ... int name=Version2/int ... float name=score2.2388418/float /doc doc str name=NameBOM Total test2/str ... int name=Version1/int ... float name=score2.2388418/float /doc The scoring is equal, but the name is not sorted. What am I doing wrong here?
Re: Solr 4.4 ShingleFilterFactroy exception
Any suggestions pls? On Wed, Aug 7, 2013 at 5:17 PM, Prasi S prasi1...@gmail.com wrote: Hi, I have setup solr 4.4 with cloud. When i start solr, I get an exception as below, *ERROR [CoreContainer] Unable to create core: mycore_sh1: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text_shingle: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.shingle.ShingleFilterFactory'* at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [:1.6.0_43] at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) [:1.6.0_43] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [:1.6.0_43] at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [:1.6.0_43] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [:1.6.0_43] at java.lang.Thread.run(Thread.java:662) [:1.6.0_43] Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.shingle.ShingleFilterFactory' The same file works well with solr 4.2. Pls help. Thanks, Prasi
Re: 'Optimizing' Solr Index Size
The general advice is to not merge (optimize) unless your index is relatively static. You're quite correct, optimizing simply recovers the space from deleted documents, otherwise it won't change much (except having fewer segments). Here's a _great_ video that Mike McCandless put together: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html But in general _whenever_ segments are merged, the resulting segment will have all the data from deleted docs removed, and segments are merged continually when data is being added to the index. Quick-n-dirty way to estimate the space savings optimize will give you. Look at the admin page for the core and the ratio of deleted docs to numDocs is about the unused space that would be regained by an optimize. From there it's your call G... Best Erick On Tue, Aug 6, 2013 at 12:02 PM, Brendan Grainger brendan.grain...@gmail.com wrote: To maybe answer another one of my questions about the 50Gb recovered when running: curl ' http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false ' It looks to me that it was from deleted docs being completely removed from the index. Thanks On Tue, Aug 6, 2013 at 11:45 AM, Brendan Grainger brendan.grain...@gmail.com wrote: Well, I guess I can answer one of my questions which I didn't exactly explicitly state, which is: how do I force solr to merge segments to a given maximum. I forgot about doing this: curl ' http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false ' which reduced the number of segments in my index from 12 to 10. Amazingly, it also reduced the space used by almost 50Gb. Is that even possible? Thanks again Brendan On Tue, Aug 6, 2013 at 10:55 AM, Brendan Grainger brendan.grain...@gmail.com wrote: Hi All, First of all, what I was actually trying to do is actually get a little space back. So if there is a better way to do this by adjusting the MergePolicy or something else please let me know. My index is currently 200Gb. In the past (Solr 1.4) we've found that optimizing the index will double the size of the index temporarily then usually when it's done we end up with a smaller index and slightly faster search query times. Should I even bother optimizing? My impression was that with the TieredMergePolicy this would be less necessary. Would merging segments into larger ones save any space and if so is there a way to tell solr to do that? Thanks Brendan -- Brendan Grainger www.kuripai.com -- Brendan Grainger www.kuripai.com
Re: external zookeeper with SolrCloud
Hmmm, shouldn't be happening. How sure are you that the upgrade to 4.4 was carried out on all machines? Erick On Tue, Aug 6, 2013 at 5:23 PM, Joshi, Shital shital.jo...@gs.com wrote: Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at
Re: Transform data at index time: country - continent
Walter: Oooh, nice! One could even use a copyField if one wanted to keep them separate... Erick On Tue, Aug 6, 2013 at 12:38 PM, Walter Underwood wun...@wunderwood.orgwrote: Would synonyms help? If you generate the query terms for the continents, you could do something like this: usa = continent-na canada = continent-na germany = continent-europe und so weiter. wunder On Aug 6, 2013, at 2:18 AM, Christian Köhler - ZFMK wrote: Am 05.08.2013 15:52, schrieb Jack Krupansky: You can write a brute force JavaScript script using the StatelessScript update processor that hard-codes the mapping. I'll probably do something like this. Unfortunately I have no influence on the original db itself, so I have fix this in solr. Cheers Chris -- Zoologisches Forschungsmuseum Alexander Koenig - Leibniz-Institut für Biodiversität der Tiere - Adenauerallee 160, 53113 Bonn, Germany www.zfmk.de Stiftung des öffentlichen Rechts; Direktor: Prof. J. Wolfgang Wägele Sitz: Bonn
Re: Solr design. Choose Cores or Shards?
Shards are special cores, usually hosted on separate machines that comprise one single large (logical) index. Shards need to have the same schema, config, etc usually. So unless you have a corpus that's too large to fit on a single piece of your hardware, you'll always be using cores. And since your cores have different types of data (and presumably use different schemas), you're talking cores. Best Erick On Tue, Aug 6, 2013 at 10:49 PM, manju16832003 manju16832...@gmail.comwrote: Hi, I have a confusion over choosing Cores or Shards for the project scenario. My scenario is as follows I have three entities 1. Customers 2. Product Info 3. Listings [Contains all the listings posted by customer based on product] I'm planning to design Solr structure for the above scenario like this 1. Customers Core 2. Product Info Core 3. Listings Core 4. Searchable Listing Core [Indexing searchable parameters selected from Listings, Product Info and Customer entities]. Having in mind that there wouldn't be much updates to Customers, Product Info. There will be regular updates to Listings that in turn I need to update Searchable listings that I could manage it. My Confusion is it feasible to choose many cores or use shards. I do not have much experience on how shards works and why they are used for. I would like to know the suggestions :-) for the design like this. What are the implications if I were to choose to use many cores and handle stuff at application level calling different cores. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-design-Choose-Cores-or-Shards-tp4082930.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Split Shard - Document loss and down time
I have explained in the above post with screenshots. Indexing gets failed when any node is down and also shard splitting is in progress -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Split-Shard-Document-loss-and-down-time-tp4082002p4082994.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: external zookeeper with SolrCloud
You said earlier that you had 6 zookeeper instances, but the zkHost param only shows 5 instances... is that correct? On Tue, Aug 6, 2013 at 11:23 PM, Joshi, Shital shital.jo...@gs.com wrote: Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at
Re: Solr doesn't make indexes for all the enteries
You're explicitly asking for only 10 search results - that's what the rows=10 parameter does. If you want to see alll results, you can either increase rows, or run multiple queries, increasing offset each time. On Wed, Aug 7, 2013 at 12:21 PM, Kamaljeet Kaur kamal.kaur...@gmail.comwrote: Hello, I am a newbie to solr. I have installed and configured it with my django project. I am using the following versions: django-haystack - 2.0.0 ApacheSolr - 3.5.0 Django - 1.4 mysql - 5.5.32-0 Here is the model, whose data I want to index: http://tny.cz/422c5fb7 Here is search_indexes.py: http://tny.cz/8de95043 Have created the file templates/search/indexes/myapp/userprofile_text.txt Have created a template to show the results after querying from database. I have build schema using the command $ ./manage.py build_solr_schema and replaced the contents of example/solr/conf/schema.xml with the output. Here is it http://tny.cz/49fe8e1d When I use the command $ ./manage.py rebuild_index to create indexes, it shows: WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'. Your choices after this are to restore from backups or rebuild via the `rebuild_index` command. Are you sure you wish to continue? [y/N] y Removing all documents from your index because you said so. All documents removed. Indexing 18 user profiles But indexes are shown only for 10 user profiles. I am seeing the indexes here: http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on Only those userprofiles are shown, when searched for. Why does it happen? And how to solve this issue? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-doesn-t-make-indexes-for-all-the-enteries-tp4082977.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: external zookeeper with SolrCloud
I went through Admin page - Dashboard of all 10 nodes and verified that each one is using solr-spec 4.4.0. solr-spec 4.4.0 solr-impl 4.4.0 1504776 - sarowe - 2013-07-19 02:58:35 lucene-spec 4.4.0 lucene-impl 4.4.0 1504776 - sarowe - 2013-07-19 02:53:42 Is there anything else I can check to verify that we upgraded to solr 4.4.0? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, August 07, 2013 8:10 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud Hmmm, shouldn't be happening. How sure are you that the upgrade to 4.4 was carried out on all machines? Erick On Tue, Aug 6, 2013 at 5:23 PM, Joshi, Shital shital.jo...@gs.com wrote: Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital
[solr 4.3.1 admin ui] bug in Plugins / Stats Refresh Values option?
On the first click the values are refreshed. On the second click the page gets redirected: from: http://localhost:8983/solr/#/statements/plugins/cache to: http://localhost:8983/solr/#/ Is this intentional? Regards, Dmitry
RE: external zookeeper with SolrCloud
We have all 6 instances in zkhost parameter. -Original Message- From: Raymond Wiker [mailto:rwi...@gmail.com] Sent: Wednesday, August 07, 2013 8:29 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud You said earlier that you had 6 zookeeper instances, but the zkHost param only shows 5 instances... is that correct? On Tue, Aug 6, 2013 at 11:23 PM, Joshi, Shital shital.jo...@gs.com wrote: Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. Looks like this bug is fixed in Solr 4.4. Do you have any date for official release of 4.4? Looks like it might come out in a couple of weeks. Is there any instruction available on how to build Solr 4.4 from SVN repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 10, 2013 8:05 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud This might be https://issues.apache.org/jira/browse/SOLR-4899 - Mark On Jun 10, 2013, at 5:59 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We're setting up 5 shard SolrCloud with external zoo keeper. When we bring up Solr nodes while the zookeeper instance is not up and running, we see this error in Solr logs. java.net.ConnectException: Connection refused
Re: Measuring SOLR performance
Hi Roman, Finally, this has worked! Thanks for quick support. The graphs look awesome. At least on the index sample :) It is quite easy to setup and run + possible to run directly on the shard server in background mode. my test run was: python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60 -R foo -t /solr/statements -e statements Thanks! Dmitry On Wed, Aug 7, 2013 at 6:54 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, I've modified the solrjmeter to retrieve data from under the core (the -t parameter) and the rest from the /solr/admin - I could test it only against 4.0, but it is there the same as 4.3 - it seems...so you can try the fresh checkout my test was: python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -t /solr/collection1 -R foo -q ./queries/demo/* -p 9002 -s adsate Thanks! roman On Tue, Aug 6, 2013 at 9:46 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi, Thanks for the clarification, Shawn! So with this in mind, the following work: http://localhost:8983/solr/statements/admin/system?wt=json http://localhost:8983/solr/statements/admin/mbeans?wt=json not copying their output to save space. Roman: is this something that should be set via -t parameter as well? Dmitry On Tue, Aug 6, 2013 at 4:34 PM, Shawn Heisey s...@elyograg.org wrote: On 8/6/2013 6:17 AM, Dmitry Kan wrote: Of three URLs you asked for, only the 3rd one gave response: snip The rest report 404. On Mon, Aug 5, 2013 at 8:38 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, So I think the admin pages are different on your version of solr, what do you see when you request... ? http://localhost:8983/solr/admin/system?wt=json http://localhost:8983/solr/admin/mbeans?wt=json http://localhost:8983/solr/admin/cores?wt=json Unless you have a valid defaultCoreName set in your (old-style) solr.xml, the first two URLs won't work, as you've discovered. Without that valid defaultCoreName (or if you wanted info from a different core), you'd need to add a core name to the URL for them to work. The third one, which works for you, is a global handler for manipulating cores, so naturally it doesn't need a core name to function. The URL path for this handler is defined by solr.xml. Thanks, Shawn
Re: softCommit doesn't work - ?
(a bit late, I know) On 07/23/2013 02:09 PM, Erick Erickson wrote: First a minor nit. The server.add(doc, time) is a hard commit, not a soft one. By default, no, commitWithin is indeed a soft commit. As per http://lucene.472066.n3.nabble.com/near-realtime-search-and-dih-td494.html#a4000133 commitWithin is a soft commit on Solr 4. I just verified in 4.4 code, SolrConfig has : getBool(updateHandler/commitWithin/softCommit,true) -- André Bois-Crettez Software Architect Search Developer http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: [solr 4.3.1 admin ui] bug in Plugins / Stats Refresh Values option?
It shouldn't .. but from your description sounds as the javascript-onclick handler doesn't work on the second click (which would do a page reload). if you use chrome, firefox or safari .. can you open the developer tools and check if they report any javascript error? which would explain why .. BTW: You don't have to use that button in the meantime .. just refresh the page (that is exactly what the button does). sure, it should work, but shouldn't stop you from refreshing the page :) - Stefan On Wednesday, August 7, 2013 at 3:00 PM, Dmitry Kan wrote: On the first click the values are refreshed. On the second click the page gets redirected: from: http://localhost:8983/solr/#/statements/plugins/cache to: http://localhost:8983/solr/#/ Is this intentional? Regards, Dmitry
Error loading class 'solr.ISOLatin1AccentFilterFactory'
Hi, I am trying to use solr.ISOLatin1AccentFilterFactory in solr4.3.1,But its giving error Error loading class 'solr.ISOLatin1AccentFilterFactory'. However its working fine in Solr3.6 ... Can anybody suggest me how to remove this error..Or is there any new FilterFactory I have to use? -- View this message in context: http://lucene.472066.n3.nabble.com/Error-loading-class-solr-ISOLatin1AccentFilterFactory-tp4083012.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Measuring SOLR performance
Hi Roman, One more question. I tried to compare different runs (g1 vs cms) using the command below, but get an error. Should I attach some other param(s)? python solrjmeter.py -C g1,foo -c hour -x ./jmx/SolrQueryTest.jmx **ERROR** File solrjmeter.py, line 1427, in module main(sys.argv) File solrjmeter.py, line 1303, in main check_options(options, args) File solrjmeter.py, line 185, in check_options error(The folder '%s' does not exist % rf) File solrjmeter.py, line 66, in error traceback.print_stack() The folder '0' does not exist Dmitry On Wed, Aug 7, 2013 at 4:13 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, Finally, this has worked! Thanks for quick support. The graphs look awesome. At least on the index sample :) It is quite easy to setup and run + possible to run directly on the shard server in background mode. my test run was: python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60 -R foo -t /solr/statements -e statements Thanks! Dmitry On Wed, Aug 7, 2013 at 6:54 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, I've modified the solrjmeter to retrieve data from under the core (the -t parameter) and the rest from the /solr/admin - I could test it only against 4.0, but it is there the same as 4.3 - it seems...so you can try the fresh checkout my test was: python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -t /solr/collection1 -R foo -q ./queries/demo/* -p 9002 -s adsate Thanks! roman On Tue, Aug 6, 2013 at 9:46 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi, Thanks for the clarification, Shawn! So with this in mind, the following work: http://localhost:8983/solr/statements/admin/system?wt=json http://localhost:8983/solr/statements/admin/mbeans?wt=json not copying their output to save space. Roman: is this something that should be set via -t parameter as well? Dmitry On Tue, Aug 6, 2013 at 4:34 PM, Shawn Heisey s...@elyograg.org wrote: On 8/6/2013 6:17 AM, Dmitry Kan wrote: Of three URLs you asked for, only the 3rd one gave response: snip The rest report 404. On Mon, Aug 5, 2013 at 8:38 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, So I think the admin pages are different on your version of solr, what do you see when you request... ? http://localhost:8983/solr/admin/system?wt=json http://localhost:8983/solr/admin/mbeans?wt=json http://localhost:8983/solr/admin/cores?wt=json Unless you have a valid defaultCoreName set in your (old-style) solr.xml, the first two URLs won't work, as you've discovered. Without that valid defaultCoreName (or if you wanted info from a different core), you'd need to add a core name to the URL for them to work. The third one, which works for you, is a global handler for manipulating cores, so naturally it doesn't need a core name to function. The URL path for this handler is defined by solr.xml. Thanks, Shawn
RE: poor facet search performance
A data structure like: fieldId - BitArray (for fields with docFreq 1/9 of total docs) fieldId - VIntList (variable byte encoded array of ints, for fields with docFreq 1/9 of total docs) And the list is sorted top to bottom with most frequent fields at the top (highest doc freqs at the top). We enumerate top to bottom getting intersection against set of docs matching search: * If BitArray, then do special intersection using bit couting of internal uint[] structure of BitArray vs. BitArray of doc result set * If VIntList, then do enumeration of VIntList against BitArray of doc result set We push the counts into priority queue with size of facet.limit and when we fill that up, if the next fieldId in the structure has docFreq the min count in the priority queue it breaks out of the loop. Since a lot of our facet fields have power curve distribution (a long tail of less frequent values), breaking out early helps a lot. Also, we do facet counts in parallel using Parallel.ForEach (in C# not Java), so each field in list of facet.field is done on its own thread (I believe SOLR is doing something similar now). We have a lot of cores on our servers so it works well. From: Toke Eskildsen [t...@statsbiblioteket.dk] Sent: Wednesday, August 07, 2013 7:45 AM To: solr-user@lucene.apache.org Subject: Re: poor facet search performance On Tue, 2013-07-30 at 21:48 +0200, Robert Stewart wrote: [Custom facet structure] Then we sorted those sets of facet fields by total document frequency so we enumerate the more frequent facet fields first, and we stop looking when we find a facet field which has less total document matches than the top N facet counts we are looking for. So the structure was Facet-docIDs? A bit like enum in Solr? Your top-N cut-off is an interesting optimization for that. [...] The slaves just pre-load that binary structure directly into ram in one shot in the background when opening a new snapshot for search. We used a similar pre-calculation some years ago but abandoned it as the cost of Pre-generate_structure + #duplicates * (distribute_structure + open_structure) was just as costly and less flexible than #duplicates * generate_structure for us. We have 200 million docs, 10 shards, about 20 facet fields, some of which contain about 20,000 unique values. We show top 10 facets for about 10 different fields in results page. We provide search results with lots of facets and date counts in around 200-300ms using this technique. Currently, we are porting this entire system to SOLR. For a single core index of 8 million docs, using similar documents and facet fields from our production indexes, I cant get faceted search to perform anywhere close to 300ms for general searches. More like 1.5-3 seconds. Solr fc faceting treats each facet independently and in a docID-facet manner so what happens is foreach facet { foreach docIDinResultSet { foreach tagIDinDocument { facet.counter[tagID]++ } } } With 10 facets, 8M documents and 1 tag/doc/facet, the total loop count it 80M. That does not normally take 1.5-3 seconds in, so something seems off. Do you have a lot of facet tags (aka terms) for each document? Is there anything else that I should look into for getting better facet performance? Could you list the part of the Solr log with the facet structures? Just grep for UnInverted. They look something like this: UnInverted multi-valued field {field=lma_long,memSize=42711405,tindexSize=42,time=979,phase1=964,nTerms=23,bigTerms=6,termInstances=1958468,uses=0} Given these metrics (200m docs, 20 facet fields, some fields with 20,000 unique values), what kind of facet search performance should I expect? Due to the independent faceting handling in Solr, the facet time will scale a bit worse than linear to the number of documents, relative to your test setup. With a loop count of 200M*10 (or 20? I am a bit confused on how many facets you show at a time) = 2G, this will take multiple seconds. Unless you go experimental (SOLR-2412 to bang my own drum), your facet count needs to go down or you need to shard with Solr. Also we need to issue frequent commits since we are constantly streaming new content into the system. You could use a setup with a smaller live shard and multiple stale ones, but depending on corpus your ranking might suffer. - Toke Eskildsen, State and University Library, Denmark
RE: poor facet search performance
FYI, I am now using docValues for facet fields with somewhat better performance, at least more consistent performance (especially with frequent commits). Also, I see my main bottleneck seems to be ec2 servers - I am not running on m3.xlarge with provisioned EBS 4000 IOPS and it is looking much better. From: Toke Eskildsen [t...@statsbiblioteket.dk] Sent: Wednesday, August 07, 2013 7:45 AM To: solr-user@lucene.apache.org Subject: Re: poor facet search performance On Tue, 2013-07-30 at 21:48 +0200, Robert Stewart wrote: [Custom facet structure] Then we sorted those sets of facet fields by total document frequency so we enumerate the more frequent facet fields first, and we stop looking when we find a facet field which has less total document matches than the top N facet counts we are looking for. So the structure was Facet-docIDs? A bit like enum in Solr? Your top-N cut-off is an interesting optimization for that. [...] The slaves just pre-load that binary structure directly into ram in one shot in the background when opening a new snapshot for search. We used a similar pre-calculation some years ago but abandoned it as the cost of Pre-generate_structure + #duplicates * (distribute_structure + open_structure) was just as costly and less flexible than #duplicates * generate_structure for us. We have 200 million docs, 10 shards, about 20 facet fields, some of which contain about 20,000 unique values. We show top 10 facets for about 10 different fields in results page. We provide search results with lots of facets and date counts in around 200-300ms using this technique. Currently, we are porting this entire system to SOLR. For a single core index of 8 million docs, using similar documents and facet fields from our production indexes, I cant get faceted search to perform anywhere close to 300ms for general searches. More like 1.5-3 seconds. Solr fc faceting treats each facet independently and in a docID-facet manner so what happens is foreach facet { foreach docIDinResultSet { foreach tagIDinDocument { facet.counter[tagID]++ } } } With 10 facets, 8M documents and 1 tag/doc/facet, the total loop count it 80M. That does not normally take 1.5-3 seconds in, so something seems off. Do you have a lot of facet tags (aka terms) for each document? Is there anything else that I should look into for getting better facet performance? Could you list the part of the Solr log with the facet structures? Just grep for UnInverted. They look something like this: UnInverted multi-valued field {field=lma_long,memSize=42711405,tindexSize=42,time=979,phase1=964,nTerms=23,bigTerms=6,termInstances=1958468,uses=0} Given these metrics (200m docs, 20 facet fields, some fields with 20,000 unique values), what kind of facet search performance should I expect? Due to the independent faceting handling in Solr, the facet time will scale a bit worse than linear to the number of documents, relative to your test setup. With a loop count of 200M*10 (or 20? I am a bit confused on how many facets you show at a time) = 2G, this will take multiple seconds. Unless you go experimental (SOLR-2412 to bang my own drum), your facet count needs to go down or you need to shard with Solr. Also we need to issue frequent commits since we are constantly streaming new content into the system. You could use a setup with a smaller live shard and multiple stale ones, but depending on corpus your ranking might suffer. - Toke Eskildsen, State and University Library, Denmark
RE: SolrCloud Indexing question
Thank you so much for the suggestion, Is the same recommended for querying too i found it very slow when i do query using clousolrserver Kalyan Date: Tue, 6 Aug 2013 13:25:37 -0600 From: s...@elyograg.org To: solr-user@lucene.apache.org Subject: Re: SolrCloud Indexing question On 8/6/2013 12:55 PM, Kalyan Kuram wrote: Hi AllI need suggestion on how to send indexing commands to 2 different solr server,Basically i want to mirror my index,here is the scenarioi have 2 cluster, each cluster has one master and 2 slaves with external zookeeper in the fronti need suggestion on what solr api class i should use to send indexing commands to 2 masters,will LBHttpSolrServer do the indexing or is this only used for querying If there is a better approach please suggest Kalyan If you're using zookeeper, then your index is SolrCloud, and you don't have masters and slaves. The traditional master/slave replication model does not apply to SolrCloud. With SolrCloud, there is no need to have two independent clusters. If a server dies, the other servers in the cloud will keep the cluster operational. When you bring the dead server back with the proper config, it will automatically be synchronized with the cluster. For a Java program with SolrJ, use a CloudSolrServer object for each cluster. The constructor for CloudSolrServer accepts the same zkHost parameter that you give to each Solr server when starting in SolrCloud mode. You cannot index to independent clusters at the same time through one object - if they truly are independent SolrCloud installs, you have to manage updates to both of them independently. Thanks, Shawn
Re: [solr 4.3.1 admin ui] bug in Plugins / Stats Refresh Values option?
Hi Stefan, I was able to debug the second click scenario (was tricky to catch it, since on click redirect happens and logs statements of the previous are gone; worked via setting break-points in plugins.js) and got these errors (firefox 23.0 ubuntu): [17:20:00.731] TypeError: anonymous function does not always return a value @ http://localhost:8983/solr/js/scripts/logging.js?_=4.3.1:294 [17:20:00.743] TypeError: anonymous function does not always return a value @ http://localhost:8983/solr/js/scripts/plugins.js?_=4.3.1:371 [17:20:00.769] TypeError: anonymous function does not always return a value @ http://localhost:8983/solr/js/scripts/replication.js?_=4.3.1:35 [17:20:00.771] TypeError: anonymous function does not always return a value @ http://localhost:8983/solr/js/scripts/schema-browser.js?_=4.3.1:68 [17:20:00.772] TypeError: anonymous function does not always return a value @ http://localhost:8983/solr/js/scripts/schema-browser.js?_=4.3.1:1185 Dmitry On Wed, Aug 7, 2013 at 4:35 PM, Stefan Matheis matheis.ste...@gmail.comwrote: It shouldn't .. but from your description sounds as the javascript-onclick handler doesn't work on the second click (which would do a page reload). if you use chrome, firefox or safari .. can you open the developer tools and check if they report any javascript error? which would explain why .. BTW: You don't have to use that button in the meantime .. just refresh the page (that is exactly what the button does). sure, it should work, but shouldn't stop you from refreshing the page :) - Stefan On Wednesday, August 7, 2013 at 3:00 PM, Dmitry Kan wrote: On the first click the values are refreshed. On the second click the page gets redirected: from: http://localhost:8983/solr/#/statements/plugins/cache to: http://localhost:8983/solr/#/ Is this intentional? Regards, Dmitry
RE: problems running solr 4.4 with HDFS HA
Hi Mark, Setting str name=solr.hdfs.confdir properly in my solrconfig.xml did it. Thanks! Greg Walters | Operations Team 530 Maryville Center Drive, Suite 250 St. Louis, Missouri 63141 t. 314.225.2745 | c. 314.225.2797 gwalt...@sherpaanalytics.com www.sherpaanalytics.com
Re: 'Optimizing' Solr Index Size
Thanks Erick, our index is relatively static. I think the deletes must be coming from 'reindexing' the same documents so definitely handy to recover the space. I've seen that video before. Definitely very interesting. Brendan On Wed, Aug 7, 2013 at 8:04 AM, Erick Erickson erickerick...@gmail.comwrote: The general advice is to not merge (optimize) unless your index is relatively static. You're quite correct, optimizing simply recovers the space from deleted documents, otherwise it won't change much (except having fewer segments). Here's a _great_ video that Mike McCandless put together: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html But in general _whenever_ segments are merged, the resulting segment will have all the data from deleted docs removed, and segments are merged continually when data is being added to the index. Quick-n-dirty way to estimate the space savings optimize will give you. Look at the admin page for the core and the ratio of deleted docs to numDocs is about the unused space that would be regained by an optimize. From there it's your call G... Best Erick On Tue, Aug 6, 2013 at 12:02 PM, Brendan Grainger brendan.grain...@gmail.com wrote: To maybe answer another one of my questions about the 50Gb recovered when running: curl ' http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false ' It looks to me that it was from deleted docs being completely removed from the index. Thanks On Tue, Aug 6, 2013 at 11:45 AM, Brendan Grainger brendan.grain...@gmail.com wrote: Well, I guess I can answer one of my questions which I didn't exactly explicitly state, which is: how do I force solr to merge segments to a given maximum. I forgot about doing this: curl ' http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false ' which reduced the number of segments in my index from 12 to 10. Amazingly, it also reduced the space used by almost 50Gb. Is that even possible? Thanks again Brendan On Tue, Aug 6, 2013 at 10:55 AM, Brendan Grainger brendan.grain...@gmail.com wrote: Hi All, First of all, what I was actually trying to do is actually get a little space back. So if there is a better way to do this by adjusting the MergePolicy or something else please let me know. My index is currently 200Gb. In the past (Solr 1.4) we've found that optimizing the index will double the size of the index temporarily then usually when it's done we end up with a smaller index and slightly faster search query times. Should I even bother optimizing? My impression was that with the TieredMergePolicy this would be less necessary. Would merging segments into larger ones save any space and if so is there a way to tell solr to do that? Thanks Brendan -- Brendan Grainger www.kuripai.com -- Brendan Grainger www.kuripai.com -- Brendan Grainger www.kuripai.com
Re: Transform data at index time: country - continent
Good point. Copying to a separate field that applied synonyms could help. Filtering out the original countries could be tricky. The Javadoc mentiones a keepOrig flag, but the Solr docs do not. If you could set keepOrig=false, that would do the trick. wunder On Aug 7, 2013, at 5:13 AM, Erick Erickson wrote: Walter: Oooh, nice! One could even use a copyField if one wanted to keep them separate... Erick On Tue, Aug 6, 2013 at 12:38 PM, Walter Underwood wun...@wunderwood.orgwrote: Would synonyms help? If you generate the query terms for the continents, you could do something like this: usa = continent-na canada = continent-na germany = continent-europe und so weiter. wunder On Aug 6, 2013, at 2:18 AM, Christian Köhler - ZFMK wrote: Am 05.08.2013 15:52, schrieb Jack Krupansky: You can write a brute force JavaScript script using the StatelessScript update processor that hard-codes the mapping. I'll probably do something like this. Unfortunately I have no influence on the original db itself, so I have fix this in solr. Cheers Chris -- Zoologisches Forschungsmuseum Alexander Koenig - Leibniz-Institut für Biodiversität der Tiere - Adenauerallee 160, 53113 Bonn, Germany www.zfmk.de Stiftung des öffentlichen Rechts; Direktor: Prof. J. Wolfgang Wägele Sitz: Bonn -- Walter Underwood wun...@wunderwood.org
Re: [solr 4.3.1 admin ui] bug in Plugins / Stats Refresh Values option?
Hey Dmitry That sounds a bit odd .. those are more like notices instead of real errors .. sure that those are stopping the UI from working? if so .. we should see more reports like those. Can you verify the problem by using another browser? I mean .. that is really a basic javascript handler .. directly written in the DOM, no chance that it doesn't get loaded. and that normally stops only working if something really bad happens ;o - Stefan On Wednesday, August 7, 2013 at 4:23 PM, Dmitry Kan wrote: Hi Stefan, I was able to debug the second click scenario (was tricky to catch it, since on click redirect happens and logs statements of the previous are gone; worked via setting break-points in plugins.js) and got these errors (firefox 23.0 ubuntu): [17:20:00.731] TypeError: anonymous function does not always return a value @ http://localhost:8983/solr/js/scripts/logging.js?_=4.3.1:294 [17:20:00.743] TypeError: anonymous function does not always return a value @ http://localhost:8983/solr/js/scripts/plugins.js?_=4.3.1:371 [17:20:00.769] TypeError: anonymous function does not always return a value @ http://localhost:8983/solr/js/scripts/replication.js?_=4.3.1:35 [17:20:00.771] TypeError: anonymous function does not always return a value @ http://localhost:8983/solr/js/scripts/schema-browser.js?_=4.3.1:68 [17:20:00.772] TypeError: anonymous function does not always return a value @ http://localhost:8983/solr/js/scripts/schema-browser.js?_=4.3.1:1185 Dmitry On Wed, Aug 7, 2013 at 4:35 PM, Stefan Matheis matheis.ste...@gmail.com (mailto:matheis.ste...@gmail.com)wrote: It shouldn't .. but from your description sounds as the javascript-onclick handler doesn't work on the second click (which would do a page reload). if you use chrome, firefox or safari .. can you open the developer tools and check if they report any javascript error? which would explain why .. BTW: You don't have to use that button in the meantime .. just refresh the page (that is exactly what the button does). sure, it should work, but shouldn't stop you from refreshing the page :) - Stefan On Wednesday, August 7, 2013 at 3:00 PM, Dmitry Kan wrote: On the first click the values are refreshed. On the second click the page gets redirected: from: http://localhost:8983/solr/#/statements/plugins/cache to: http://localhost:8983/solr/#/ Is this intentional? Regards, Dmitry
DIH Problem: create multiple docs from a single entity
Hi I've 2 tables with the following data table 1 id treatment_list 1 a,b 2 b,c table 2 treatment id, name a name1 b name 2 c name 3 Using DIH can you create an index of the form id-treatment-id name 1a name1 1b name2 2b name2 2c name3 In short can I splt the comma separated field and process each as an entity. From the docs and the wiki I can't see anything obvious. I feel I'm missing something easier here. (Note its not my data so can't do anything with the dodgy csv field )
Re: Solr 4.4 ShingleFilterFactroy exception
The answer will be further down in the stack trace. It will relate to an error that occurred when initializing the filter. One possibility is that you have a garbage attribute name in your token filter XML - 4.4 checks for that kind of thing now. -- Jack Krupansky -Original Message- From: Prasi S Sent: Wednesday, August 07, 2013 7:47 AM To: solr-user@lucene.apache.org Subject: Solr 4.4 ShingleFilterFactroy exception Hi, I have setup solr 4.4 with cloud. When i start solr, I get an exception as below, *ERROR [CoreContainer] Unable to create core: mycore_sh1: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text_shingle: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.shingle.ShingleFilterFactory'* at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356) [:4.4.0 1504776 - sarowe - 2013-07-19 02:58:35] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [:1.6.0_43] at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) [:1.6.0_43] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [:1.6.0_43] at java.util.concurrent.FutureTask.run(FutureTask.java:138) [:1.6.0_43] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [:1.6.0_43] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [:1.6.0_43] at java.lang.Thread.run(Thread.java:662) [:1.6.0_43] Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.shingle.ShingleFilterFactory' The same file works well with solr 4.2. Pls help. Thanks, Prasi
Re: Solr 4.4. creating an index that 4.3 can't read (but in LUCENE_43 mode)
On 8/7/2013 3:33 AM, Daniel Collins wrote: I had been running a Solr 4.3.0 index, which I upgraded to 4.4.0 (but hadn't changed LuceneVersion, so it was still using the LUCENE_43 codec). I then had to back-out and return to a 4.3 system, and got an error when it tried to read the index. Now, it was only a dev system, so not a problem, and normally I would use restore a backup anyway, but shouldn't this work? If I haven't changed the codec, then Solr 4.4 should be using the same code as 4.3, so the data should be compatible, no? Using an index from a newer version is never guaranteed, and usually will NOT work. The luceneMatchVersion setting doesn't typically affect index format, it usually affects how analysis and query parser components work - so you can tell Solr to use buggy behavior from an earlier release. Unless you actually change aspects of the codec (postings format, docvalues format, etc), Solr uses the Lucene codec defaults, which can (and usually does) change from release to release. Looking through the Lucene 4.4 CHANGES.txt file (not the Solr file), LUCENE-4936 looks like a change to the DocValues format. I can't tell from the description whether LUCENE-5035 is a format change or a change in how Lucene handles sorting in memory. The evidence I can find suggests that the format is still called Lucene42DocValuesFormat, but apparently it doesn't work the same. Thanks, Shawn
Re: Data Import from MYSQL and POSTGRESQL
On 8/7/2013 3:50 AM, Spadez wrote: My issue is in the data-config.xml I have put two datasources, however, I am stuck on what to put for the driver values and the urls. dataSource name=quot;mysqlquot; driver=quot;lt;driver url=url user=user password=pass / dataSource name=quot;postgresqlquot; driver=quot;lt;driver url=url user=user password=pass/ Is anyone able to tell me hat I should be putting for these values please? Here's my datasource, username and password redacted. Note that you should not be using quot; ... you should use actual quotes. dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver encoding=UTF-8 url=jdbc:mysql://${dih.request.dbHost}:3306/${dih.request.dbSchema}?zeroDateTimeBehavior=convertToNull batchSize=-1 user=REDACTED password=REDACTED/ I pass the hostname and DB name (schema) in as parameters on the URL when I call for a full-import. Thanks, Shawn
Re: Field append
Thanks for the inquiry about “append two fields”; as a result I have added it as an example in Early Access Release #5 of my Solr 4.x Deep Dive book in the chapter on update processors. Actually, there are several examples: - Append One Field to Another with Comma and Space as Delimiter: updateRequestProcessorChain name=append-a-onto-b-delim processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcealpha_s/str str name=destbeta_s/str /processor processor class=solr.ConcatFieldUpdateProcessorFactory str name=fieldNamebeta_s/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain - Append One Field to Another with Space as Delimiter updateRequestProcessorChain name=append-a-onto-b-space processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcealpha_s/str str name=destbeta_s/str /processor processor class=solr.ConcatFieldUpdateProcessorFactory str name=fieldNamebeta_s/str str name=delimiter /str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain - Append One Field to Another with No Delimiter updateRequestProcessorChain name=append-a-onto-b processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcealpha_s/str str name=destbeta_s/str /processor processor class=solr.ConcatFieldUpdateProcessorFactory str name=fieldNamebeta_s/str str name=delimiter/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain - Append One Field to Another with Space as Delimiter and Remove the Source Field updateRequestProcessorChain name=append-a-onto-b-space-delete processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcealpha_s/str str name=destbeta_s/str /processor processor class=solr.ConcatFieldUpdateProcessorFactory str name=fieldNamebeta_s/str str name=delimiter /str /processor processor class=solr.IgnoreFieldUpdateProcessorFactory str name=fieldNamealpha_s/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Let me know if even those examples do not cover your use case. -- Jack Krupansky From: Luís Portela Afonso Sent: Monday, August 05, 2013 7:39 AM To: solr-user@lucene.apache.org Subject: Field append Hi there, Is that possible to append two fields on solr? i would like to append to filters with a custom delimiter. Is that possible? I saw something like a CloneFieldUpdateProcessor, but when i try to use, solr says that cannot find the class. I saw that in the follow site: https://issues.apache.org/jira/browse/SOLR-2599 In the comments i saw: processor class=solr.FieldCopyProcessorFactory str name=sourcecategory/str str name=destcategory_s/str /processor But i'm not able to use it too. Once again solr says that cannot find class. Hope you can help in any way. Thanks
Re: DIH Problem: create multiple docs from a single entity
On Aug 7, 2013, at 18:10 , Lee Carroll lee.a.carr...@googlemail.com wrote: Hi I've 2 tables with the following data table 1 id treatment_list 1 a,b 2 b,c table 2 treatment id, name a name1 b name 2 c name 3 Using DIH can you create an index of the form id-treatment-id name 1a name1 1b name2 2b name2 2c name3 In short can I splt the comma separated field and process each as an entity. From the docs and the wiki I can't see anything obvious. I feel I'm missing something easier here. (Note its not my data so can't do anything with the dodgy csv field ) I think this is an SQL problem, rather than a DIH one. A quick google shows several hits for splitting a string in SQL; I expect that it should be possible to come up with something that fits your purpose.
Re: Solr round ratings to nearest integer value
Thanks for this inquiry; as a result I have added a round JavaScript script for the StatelessScriptUpdate processor to Early Access Release #5 of my Solr 4.x Deep Dive book in the chapter on update processors. The script takes a field name, a number of decimal digits to round to (default is 0), and an output field (defaults to replacing the input field), and an option to convert the type of the rounded number to integer. One thing I just noticed - your message indicates that 0.5 should round to 0.0, but that is not the standard definition of rounding. What is your true intention there? The script can replace the original value with the rounded value, or preserve the original value and place a rounded copy in another field. What is your preference? (Well, the script supports both, anyway.) And I did give the script the integer option to convert the type, so that 1.0 would become 1 Solr int fields. -- Jack Krupansky -Original Message- From: Thyagaraj Sent: Thursday, August 01, 2013 2:37 AM To: solr-user@lucene.apache.org Subject: Solr round ratings to nearest integer value I'm using solr 4.0 with DIH jdbc connector and I use Solr Admin web interface for testing. I have a field called *ratings* which varies like 0, 0.3, 0.5, 0.75, 1, 1.5, 1.6... and so on as per user input. I found the link http://lucene.472066.n3.nabble.com/How-to-round-solr-score-td495198.html http://lucene.472066.n3.nabble.com/How-to-round-solr-score-td495198.html which is beyond of my understanding and I unable to make use of in my case. I just want to round this rating values to nearest integer value through solr like, 0.3 0.5 to 0 0.75 1.5 to 1 1.6 to 2 Anybody help me guiding please? Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-round-ratings-to-nearest-integer-value-tp4081833.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Error loading class 'solr.ISOLatin1AccentFilterFactory'
On 8/7/2013 7:44 AM, Parul Gupta(Knimbus) wrote: I am trying to use solr.ISOLatin1AccentFilterFactory in solr4.3.1,But its giving error Error loading class 'solr.ISOLatin1AccentFilterFactory'. However its working fine in Solr3.6 This filter is deprecated. Here's the actual javadoc for this class in the latest 3.x version, which mentions what to use instead. http://lucene.apache.org/solr/3_6_2/org/apache/solr/analysis/ISOLatin1AccentFilterFactory.html You've now moved to a new major release, where all APIs that were deprecated in the previous major release are completely eliminated. The Solr wiki also contains this information. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory Thanks, Shawn
Question about soft commit and updateRequestProcessorChain
If one allows for a soft commit (rather than a hard commit on each request), when does the updateRequestProcessorChain fire? Does it fire after the commit? Many thanks Jack
Internal shard communication - performance?
Hi, I use a system with solr 3 and 20 shards (3 million docs per shard). At a testsystem with one shard (60 million docs) I get 750 requests per second. At my live system (20 shards) I get 200 requests per second. Is the internal communication between the 20 shards a performance killer? Another question. Is a solr 4 system with solrcloud and Zookeeper a high availability system? Regards, Torsten
Data duplication using Cloud+HDFS+Mirroring
While testing Solr's new ability to store data and transaction directories in HDFS I added an additional core to one of my testing servers that was configured as a backup (active but not leader) core for a shard elsewhere. It looks like this extra core copies the data into its own directory rather than just using the existing directory with the data that's already available to it. Since HDFS likely already has redundancy of the data covered via the replicationFactor is there a reason for non-leader cores to create their own data directory rather than doing reads on the existing master copy? I searched Jira for anything that suggests this behavior might change and didn't find any issues; is there any intent to address this? Thanks, Greg
Some highlighted snippets aren't being returned
Hi Everyone, I'm facing an issue in which my solr query is returning highlighted snippets for some, but not all results. For reference, I'm searching through an index that contains web crawls of human-rights-related websites. I'm running solr as a webapp under Tomcat and I've included the query's solr params from the Tomcat log: ... webapp=/solr-4.2 path=/select params={facet=truesort=score+descgroup.limit=10spellcheck.q=Unanganf.mimetype_code.facet.limit=7hl.simple.pre=codeq.alt=*:*f.organization_type__facet.facet.limit=6f.language__facet.facet.limit=6hl=truef.date_of_capture_.facet.limit=6group.field=original_urlhl.simple.post=/codefacet.field=domainfacet.field=date_of_capture_facet.field=mimetype_codefacet.field=geographic_focus__facetfacet.field=organization_based_in__facetfacet.field=organization_type__facetfacet.field=language__facetfacet.field=creator_name__facethl.fragsize=600f.creator_name__facet.facet.limit=6facet.mincount=1qf=text^1hl.fl=contentshl.fl=titlehl.fl=original_urlwt=rubyf.geographic_focus__facet.facet.limit=6defType=edismaxrows=10f.domain.facet.limit=6q=Unanganf.organization_based_in__facet.facet.limit=6q.op=ANDgroup=truehl.usePhraseHighlighter=true} hits=8 status=0 QTime=108 ... For the query above (which can be simplified to say: find all documents that contain the word unangan and return facets, highlights, etc.), I get five search results. Only three of these are returning highlighted snippets. Here's the highlighting portion of the solr response (note: printed in ruby notation because I'm receiving this response in a Rails app): highlighting= {20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf= {}, 20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf= {}, 20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf= {}, 20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf= {contents= [...actual snippet is returned here...]}, 20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf= {contents= [...actual snippet is returned here...]}, 20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2-uu-no-39-tahun-1999= {contents= [...actual snippet is returned here...]}, 20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no-39-tahun-1999?tmpl=componentformat=raw= {contents= [...actual snippet is returned here...]}, 20120303113654/http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf= {}} I have eight (as opposed to five) results above because I'm also doing a grouped query, grouping by a field called original_url, and this leads to five grouped results. I've confirmed that my highlight-lacking results DO contain the word unangan, as expected, and this term is appearing in a text field that's indexed and stored, and being searched for all text searches. For example, one of the search results is for a crawl of this document: http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf And if you view that document on the web, you'll see that it does contain unangan. Has anyone seen this before? And does anyone have any good suggestions for troubleshooting/fixing the problem? Thanks! - Eric
Re: Internal shard communication - performance?
Three zookeepers give you bare minimum high availability - one can go down. But... I would personally assert that running embedded zookeeper is inherently not high availability, just by definition (okay, by MY definition.) You didn't say whether you were running embedded zookeeper or not. But if you were, to be HA, your cluster should be able to have all but one node per shard go down and your cluster should still service both queries and updates. But with embedded zookeeper on a four-node cluster, taking down two of the nodes running embedded zookeeper would make zookeeper no longer usable, and hence your cluster would not be HA. -- Jack Krupansky -Original Message- From: Torsten Albrecht Sent: Wednesday, August 07, 2013 1:15 PM To: solr-user Subject: Internal shard communication - performance? Hi, I use a system with solr 3 and 20 shards (3 million docs per shard). At a testsystem with one shard (60 million docs) I get 750 requests per second. At my live system (20 shards) I get 200 requests per second. Is the internal communication between the 20 shards a performance killer? Another question. Is a solr 4 system with solrcloud and Zookeeper a high availability system? Regards, Torsten
RE: external zookeeper with SolrCloud
I started looking into what I might have missed while upgrading to Solr 4.4. and I noticed that solr.xml in Solr 4.4 has this: solr solrcloud str name=host${host:}/str int name=hostPort${jetty.port:8983}/int str name=hostContext${hostContext:solr}/str int name=zkClientTimeout${zkClientTimeout:15000}/int bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr While our solr.xml has this: solr persistent=true cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTim eout=${zkClientTimeout:15000} core name=collection1 instanceDir=collection1 shard=${shard:} dataDir=${solr.data.dir} / /cores /solr Do you think not having shardHandlerFactory is causing this bug to appear on our end? -Original Message- From: Raymond Wiker [mailto:rwi...@gmail.com] Sent: Wednesday, August 07, 2013 8:29 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud You said earlier that you had 6 zookeeper instances, but the zkHost param only shows 5 instances... is that correct? On Tue, Aug 6, 2013 at 11:23 PM, Joshi, Shital shital.jo...@gs.com wrote: Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3 Best Erick On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection Watcher: qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com, qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client-ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client-ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark.
Re: DIH Problem: create multiple docs from a single entity
I suppose you can use Substring and Charindex to perform your task at SQL level then use the value in another entity in DIH.. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Problem-create-multiple-docs-from-a-single-entity-tp4083050p4083106.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.4. creating an index that 4.3 can't read (but in LUCENE_43 mode)
It does seem that the Lucene42DocValuesProducer has changed its internal version and that is what its complaining about. Cheers Shawn, Ok my misunderstanding on the codec stuff then, as I said probably not a common occurrence but good to know. On 7 August 2013 17:32, Shawn Heisey s...@elyograg.org wrote: On 8/7/2013 3:33 AM, Daniel Collins wrote: I had been running a Solr 4.3.0 index, which I upgraded to 4.4.0 (but hadn't changed LuceneVersion, so it was still using the LUCENE_43 codec). I then had to back-out and return to a 4.3 system, and got an error when it tried to read the index. Now, it was only a dev system, so not a problem, and normally I would use restore a backup anyway, but shouldn't this work? If I haven't changed the codec, then Solr 4.4 should be using the same code as 4.3, so the data should be compatible, no? Using an index from a newer version is never guaranteed, and usually will NOT work. The luceneMatchVersion setting doesn't typically affect index format, it usually affects how analysis and query parser components work - so you can tell Solr to use buggy behavior from an earlier release. Unless you actually change aspects of the codec (postings format, docvalues format, etc), Solr uses the Lucene codec defaults, which can (and usually does) change from release to release. Looking through the Lucene 4.4 CHANGES.txt file (not the Solr file), LUCENE-4936 looks like a change to the DocValues format. I can't tell from the description whether LUCENE-5035 is a format change or a change in how Lucene handles sorting in memory. The evidence I can find suggests that the format is still called Lucene42DocValuesFormat, but apparently it doesn't work the same. Thanks, Shawn
Re: DIH Problem: create multiple docs from a single entity
Hello Lee, Unfortunately no. It's possible to read csv field by http://wiki.apache.org/solr/DataImportHandler#FieldReaderDataSource but there is no csv like EntityProcessor, which can broke line on entities. Transformers can not emit new entities. On Wed, Aug 7, 2013 at 8:10 PM, Lee Carroll lee.a.carr...@googlemail.comwrote: Hi I've 2 tables with the following data table 1 id treatment_list 1 a,b 2 b,c table 2 treatment id, name a name1 b name 2 c name 3 Using DIH can you create an index of the form id-treatment-id name 1a name1 1b name2 2b name2 2c name3 In short can I splt the comma separated field and process each as an entity. From the docs and the wiki I can't see anything obvious. I feel I'm missing something easier here. (Note its not my data so can't do anything with the dodgy csv field ) -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
How to parse multivalued data into single valued fields?
Hi, I'm currently using solr 4.0 final with Manifoldcf v1.3 dev. I have multivalued titles (the names are all the same so far) that must go into a single valued field. Can a transformer do this? Can anyone show me how to do it? And this has to fire off before an update chain takes place. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-parse-multivalued-data-into-single-valued-fields-tp4083108.html Sent from the Solr - User mailing list archive at Nabble.com.
TermFrequency in a multi-valued field
This might end up being more of a Lucene question, but anyway... For a multivalued field, it appears that term frequency is calculated as something a little like: sum(tf(value1), ..., tf(valueN)) I'd rather my score not give preference based on how *many* of the values in the multivalued field matched, I want it to give preference based on the value that matched *best*. In other words, something more like: max(tf(value1), ..., tf(valueN)) Put another way, I want a search like q=mvf:foo against a document with a multivalued field: mvf: [ foo ] to get scored the exact same as a document with a multivalued field: mvf: [ foo, foo ] but worse than a document with a multivalued field: mvf: [ foo foo ] I'm guessing this'd require a custom Similarity implementation, but I'm beginning to wonder if even that is low enough level. Other thoughts? This seems like a pretty obvious desire. Thanks.
Re: How to parse multivalued data into single valued fields?
before an update chain Really? Why? And if so, then you will definitely have to deal with it before handing the data to Solr since the update chain is where preprocessing of input data normally happens for updates in Solr. Be specific as to what processing you want to occur. Provide an example if you can. -- Jack Krupansky -Original Message- From: eShard Sent: Wednesday, August 07, 2013 3:10 PM To: solr-user@lucene.apache.org Subject: How to parse multivalued data into single valued fields? Hi, I'm currently using solr 4.0 final with Manifoldcf v1.3 dev. I have multivalued titles (the names are all the same so far) that must go into a single valued field. Can a transformer do this? Can anyone show me how to do it? And this has to fire off before an update chain takes place. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-parse-multivalued-data-into-single-valued-fields-tp4083108.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TermFrequency in a multi-valued field
A multivalued text field is directly equivalent to concatenating the values, with a possible position gap between the last and first terms of adjacent values. Term frequency is driven by the terms from the query, not the terms from the field(tf(query-term), not tf(field-term)). Your max formula doesn't quite make sense in that sense. Why do you have two foo in the same field if you don't mean them to be... two foo?? You can use the Uniq update processer to eliminate duplicate values in multivalued fields (where the whole value matches, not individual terms within values.) You need to clarify your use case. -- Jack Krupansky -Original Message- From: Jeff Wartes Sent: Wednesday, August 07, 2013 4:05 PM To: solr-user@lucene.apache.org Subject: TermFrequency in a multi-valued field This might end up being more of a Lucene question, but anyway... For a multivalued field, it appears that term frequency is calculated as something a little like: sum(tf(value1), ..., tf(valueN)) I'd rather my score not give preference based on how *many* of the values in the multivalued field matched, I want it to give preference based on the value that matched *best*. In other words, something more like: max(tf(value1), ..., tf(valueN)) Put another way, I want a search like q=mvf:foo against a document with a multivalued field: mvf: [ foo ] to get scored the exact same as a document with a multivalued field: mvf: [ foo, foo ] but worse than a document with a multivalued field: mvf: [ foo foo ] I'm guessing this'd require a custom Similarity implementation, but I'm beginning to wonder if even that is low enough level. Other thoughts? This seems like a pretty obvious desire. Thanks.
Re: Internal shard communication - performance?
Hi Jack, I would like to run zookeeper external at my old master server. So I have two zookeeper to control my cloud. The third and fourth zookeeper will be a virtual machine. Torsten Von: Jack Krupansky Gesendet: ?Mittwoch?, ?7?. ?August? ?2013 ?20?:?05 An: solr-user@lucene.apache.org Three zookeepers give you bare minimum high availability - one can go down. But... I would personally assert that running embedded zookeeper is inherently not high availability, just by definition (okay, by MY definition.) You didn't say whether you were running embedded zookeeper or not. But if you were, to be HA, your cluster should be able to have all but one node per shard go down and your cluster should still service both queries and updates. But with embedded zookeeper on a four-node cluster, taking down two of the nodes running embedded zookeeper would make zookeeper no longer usable, and hence your cluster would not be HA. -- Jack Krupansky -Original Message- From: Torsten Albrecht Sent: Wednesday, August 07, 2013 1:15 PM To: solr-user Subject: Internal shard communication - performance? Hi, I use a system with solr 3 and 20 shards (3 million docs per shard). At a testsystem with one shard (60 million docs) I get 750 requests per second. At my live system (20 shards) I get 200 requests per second. Is the internal communication between the 20 shards a performance killer? Another question. Is a solr 4 system with solrcloud and Zookeeper a high availability system? Regards, Torsten
Filtering suggestion results
Hi, I have question regarding suggester component. Can we filter suggestion results depending on particular value of filed? like fq=column1:value1 -- View this message in context: http://lucene.472066.n3.nabble.com/Filtering-suggestion-results-tp4083121.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TermFrequency in a multi-valued field
A multivalued text field is directly equivalent to concatenating the values, with a possible position gap between the last and first terms of adjacent values. That, in a nutshell, would be the problem. Maybe the discussion is over at this point. It could be I dumbed down the problem a bit too much for illustration purposes. I'm actually doing phrase query matches with slop. As such, the search phrase I'm interested in could easily be in more than one of the (unique) values, and the score for each value-match could be very different when considered alone. For document scoring purposes, I don't care that (for example) I got a sloppy match on one value if I got a nice phrase out of another value in the same document. In fact, I explicitly want to ignore the fact that there was also a sloppy match. I also don't care if the exact phrase occurred in more than one value, and I don't want the case where it does match more than one influencing that document's score.
logging UI stops working when additional handlers defined
I run solr on tomcat with configured JUL to log solr to separate file: org.apache.solr.level = INFO org.apache.solr.handlers = 4solrerr.org.apache.juli.FileHandler I've noticed that logging UI stops working. Is it normal behavior or is it bug? (When cores are initialized JulWatcher is registered only for root logger.) -- Grzegorz Sobczyk
Is it possible to use phrase query in range queries?
I am trying to use range queries to take advantage of having constant scores in multivalued field but I am not sure if range queries support phrase query.. Ex: The below range query works fine. str name=q _query_:address:([Charlotte TO Charlotte])^5.5 /str The below query doesn't work, str name=q _query_:address:([Charlotte NC TO Charlotte NC])^5.5 /str -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-phrase-query-in-range-queries-tp4083132.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [POLL] Who how does use admin-extra ?
Hmmm .. Didn't get at least one answer (except from Shawn in #solr, telling me he's using a 0 byte file to avoid errors :p) - does that mean, that really no one is using it? Don't be afraid .. tell me, one way or another :) - Stefan On Wednesday, July 17, 2013 at 8:50 AM, Stefan Matheis wrote: Hey List I would be interested to hear who is using admin-extra Functionality in the 4.x UI and especially _how_ that is used: for displaying graphs, providing links for $other_tool, adding other menu items … ? The main reason i'm asking is .. i don't use it myself and i'm always curious while i have to touch it. I can test the example we provide, but that is very basic and doesn't necessarily reflect real-world scenarios. So .. tell me - I'm happy to hear everything .. reports on usage, suggestions for improvements, … :) - Stefan
Obtain shard routing key during document insert
Is it possible to obtain the shard routing key from within an UpdateRequestProcessor when a document is being inserted? Many thanks, Terry
Re: [POLL] Who how does use admin-extra ?
Didn't somebody once say this is used for customization of admin pages? Otis -- SOLR Performance Monitoring -- http://sematext.com/spm Solr ElasticSearch Support -- http://sematext.com/ On Thu, Aug 8, 2013 at 12:24 AM, Stefan Matheis matheis.ste...@gmail.com wrote: Hmmm .. Didn't get at least one answer (except from Shawn in #solr, telling me he's using a 0 byte file to avoid errors :p) - does that mean, that really no one is using it? Don't be afraid .. tell me, one way or another :) - Stefan On Wednesday, July 17, 2013 at 8:50 AM, Stefan Matheis wrote: Hey List I would be interested to hear who is using admin-extra Functionality in the 4.x UI and especially _how_ that is used: for displaying graphs, providing links for $other_tool, adding other menu items … ? The main reason i'm asking is .. i don't use it myself and i'm always curious while i have to touch it. I can test the example we provide, but that is very basic and doesn't necessarily reflect real-world scenarios. So .. tell me - I'm happy to hear everything .. reports on usage, suggestions for improvements, … :) - Stefan
Re: Is it possible to use phrase query in range queries?
The ends of a range query are indeed single terms - they are not queries or any term that would analyze into multiple terms. In some cases you might want composite values as strings so that you can do a range on terms. For example, city + , + state as a string. -- Jack Krupansky -Original Message- From: SolrLover Sent: Wednesday, August 07, 2013 5:53 PM To: solr-user@lucene.apache.org Subject: Is it possible to use phrase query in range queries? I am trying to use range queries to take advantage of having constant scores in multivalued field but I am not sure if range queries support phrase query.. Ex: The below range query works fine. str name=q _query_:address:([Charlotte TO Charlotte])^5.5 /str The below query doesn't work, str name=q _query_:address:([Charlotte NC TO Charlotte NC])^5.5 /str -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-phrase-query-in-range-queries-tp4083132.html Sent from the Solr - User mailing list archive at Nabble.com.
How to pass ranges / values to form query in search component?
Hi, I am currently passing the query by passing the values to my search component. For ex: http://localhost:8983/solr/select?firstname=charleslastname=dawsonqt=person Person search component is configured to accept the values and form the query str name=q ( _query_:{!wp_dismax qf=fname^8.3 v=$firstname} OR _query_:{!wp_dismax qf=lname^8.6 v=$lastname} ) /str Now I am trying to figure out a way to pass the values / ranges like below but I am getting syntax errors.. Ex: str name=q ( _query_:{!v=fname:$firstname} OR _query_:{!v=fname:([$firstname to $firstname])^8.3} ) Can someone let me know if theres a way to overcome this issue? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-pass-ranges-values-to-form-query-in-search-component-tp4083141.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Internal shard communication - performance?
On 8/7/2013 2:45 PM, Torsten Albrecht wrote: I would like to run zookeeper external at my old master server. So I have two zookeeper to control my cloud. The third and fourth zookeeper will be a virtual machine. For true HA with zookepeer, you need at least three instances on separate physical hardware. If you want to use VMs, that would be fine, but you must ensure that you aren't running more than one instance on the same physical server. For best results, use an odd number of ZK instances. With three ZK instances, one can go down and everything still works. With five, two can go down and everything still works. If you've got a fully switched network that's at least gigabit speed, then the network latency involved in internal communication shouldn't really matter. Thanks, Shawn
Re: How to pass ranges / values to form query in search component?
Something smells fishy here... why do you think you need to do this using nested queries and parameter names? Sounds like you're engaging in premature complication. Try simpler approaches first. -- Jack Krupansky -Original Message- From: Noob Sent: Wednesday, August 07, 2013 6:45 PM To: solr-user@lucene.apache.org Subject: How to pass ranges / values to form query in search component? Hi, I am currently passing the query by passing the values to my search component. For ex: http://localhost:8983/solr/select?firstname=charleslastname=dawsonqt=person Person search component is configured to accept the values and form the query str name=q ( _query_:{!wp_dismax qf=fname^8.3 v=$firstname} OR _query_:{!wp_dismax qf=lname^8.6 v=$lastname} ) /str Now I am trying to figure out a way to pass the values / ranges like below but I am getting syntax errors.. Ex: str name=q ( _query_:{!v=fname:$firstname} OR _query_:{!v=fname:([$firstname to $firstname])^8.3} ) Can someone let me know if theres a way to overcome this issue? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-pass-ranges-values-to-form-query-in-search-component-tp4083141.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about soft commit and updateRequestProcessorChain
How are you allowing for a soft commit? IOW how are you triggering it? And what do you speculate the updateRequestProcessorChain has to do with soft commit? Best Erick On Wed, Aug 7, 2013 at 1:04 PM, Jack Park jackp...@topicquests.org wrote: If one allows for a soft commit (rather than a hard commit on each request), when does the updateRequestProcessorChain fire? Does it fire after the commit? Many thanks Jack
Re: Question about soft commit and updateRequestProcessorChain
Most update processor chains will be configured with the Run Update processor as the last processor of the chain. That's were the Lucene index update and optional commit would be done. -- Jack Krupansky -Original Message- From: Jack Park Sent: Wednesday, August 07, 2013 1:04 PM To: solr-user@lucene.apache.org Subject: Question about soft commit and updateRequestProcessorChain If one allows for a soft commit (rather than a hard commit on each request), when does the updateRequestProcessorChain fire? Does it fire after the commit? Many thanks Jack
Re: Solr design. Choose Cores or Shards?
On 8/6/2013 8:49 PM, manju16832003 wrote: My Confusion is it feasible to choose many cores or use shards. I do not have much experience on how shards works and why they are used for. I would like to know the suggestions :-) for the design like this. What are the implications if I were to choose to use many cores and handle stuff at application level calling different cores. Although shards and cores refer to slightly different things, when it comes right down to it, it's difficult to separate the two concepts. Short version: Shards are implemented using cores. The long version follows below. A core is a functionally complete Solr index. You can have more than one core per Solr instance. Multiple cores are discussed in the CoreAdmin wiki page: http://wiki.apache.org/solr/CoreAdmin Shards refer to a concept in distributed search. The index is divided into pieces. The request comes in to Solr. Solr forwards the request to each shard. It then analyzes each shard result into a combined result, pulls the requested fields out of each shard, and sends the response to the requester. If you are planning a new deployment of a sharded index, you probably will want to use SolrCloud. It's possible to use shards without SolrCloud, but SolrCloud automates everything and makes it MUCH easier. In SolrCloud, a collection is a logical index. A collection is composed of one or more shards. It is perfectly acceptable to have only one shard in an index, in which case it won't be using distributed search, but the following still applies: Each shard is composed of replicas. If your replicationFactor is 2, then when your cloud is operating normally, you'll have two replicas of each shard. If the replicationFactor is 5, then you'll have five replicas. One of those replicas will be elected as leader for that shard. You can have a replicationFactor of 1, in which case there will only be one copy, but it will not be a fault-tolerant setup. Now for the relationship between shards and cores: Each replica of a shard *IS* a core. All of the cores in a single collection will typically have the same configuration and schema. More info about SolrCloud: http://wiki.apache.org/solr/SolrCloud Thanks, Shawn
SOLR Copy field if no value on destination
Hi, Is possible to copy a value of a field to another if the destination doesn't have value? An example: Indexing an rss The feed has the fields link and guid, but sometimes guid cannot be present in the feed I have a field that i will copy values with the name finalLink Now i want to copy guid to finalLink, but if guid has not value i want to copy link. My question is, is that possible just with the schema, Processors, solrconfig.xml, and the data-config? Thanks a lot smime.p7s Description: S/MIME cryptographic signature
Re: Measuring SOLR performance
Hi Dmitry, The command seems good. Are you sure your shell is not doing something funny with the params? You could try: python solrjmeter.py -C g1,foo -c hour -x ./jmx/SolrQueryTest.jmx -a where g1 and foo are results of the individual runs, ie. something that was started and saved with '-R g1' and '-R foo' respectively so, for example, i have these comparisons inside '/var/lib/montysolr/different-java-settings/solrjmeter', so I am generating the comparison by: export SOLRJMETER_HOME=/var/lib/montysolr/different-java-settings/solrjmeter python solrjmeter.py -C g1,foo -c hour -x ./jmx/SolrQueryTest.jmx -a roman On Wed, Aug 7, 2013 at 10:03 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, One more question. I tried to compare different runs (g1 vs cms) using the command below, but get an error. Should I attach some other param(s)? python solrjmeter.py -C g1,foo -c hour -x ./jmx/SolrQueryTest.jmx **ERROR** File solrjmeter.py, line 1427, in module main(sys.argv) File solrjmeter.py, line 1303, in main check_options(options, args) File solrjmeter.py, line 185, in check_options error(The folder '%s' does not exist % rf) File solrjmeter.py, line 66, in error traceback.print_stack() The folder '0' does not exist Dmitry On Wed, Aug 7, 2013 at 4:13 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, Finally, this has worked! Thanks for quick support. The graphs look awesome. At least on the index sample :) It is quite easy to setup and run + possible to run directly on the shard server in background mode. my test run was: python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60 -R foo -t /solr/statements -e statements Thanks! Dmitry On Wed, Aug 7, 2013 at 6:54 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, I've modified the solrjmeter to retrieve data from under the core (the -t parameter) and the rest from the /solr/admin - I could test it only against 4.0, but it is there the same as 4.3 - it seems...so you can try the fresh checkout my test was: python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -t /solr/collection1 -R foo -q ./queries/demo/* -p 9002 -s adsate Thanks! roman On Tue, Aug 6, 2013 at 9:46 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi, Thanks for the clarification, Shawn! So with this in mind, the following work: http://localhost:8983/solr/statements/admin/system?wt=json http://localhost:8983/solr/statements/admin/mbeans?wt=json not copying their output to save space. Roman: is this something that should be set via -t parameter as well? Dmitry On Tue, Aug 6, 2013 at 4:34 PM, Shawn Heisey s...@elyograg.org wrote: On 8/6/2013 6:17 AM, Dmitry Kan wrote: Of three URLs you asked for, only the 3rd one gave response: snip The rest report 404. On Mon, Aug 5, 2013 at 8:38 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, So I think the admin pages are different on your version of solr, what do you see when you request... ? http://localhost:8983/solr/admin/system?wt=json http://localhost:8983/solr/admin/mbeans?wt=json http://localhost:8983/solr/admin/cores?wt=json Unless you have a valid defaultCoreName set in your (old-style) solr.xml, the first two URLs won't work, as you've discovered. Without that valid defaultCoreName (or if you wanted info from a different core), you'd need to add a core name to the URL for them to work. The third one, which works for you, is a global handler for manipulating cores, so naturally it doesn't need a core name to function. The URL path for this handler is defined by solr.xml. Thanks, Shawn
Re: Document Similarity Algorithm at Solr/Lucene
Block-quoting and plagiarism are two different questions. Block-quoting is simple: break the text apart into sentences or even paragraphs and make them separate documents. Make facets of the post-analysis text. Now just pull counts of facets and block quotes will be clear. Mahout has a scalable implementation of n-gram based document similarity. It calculates distances between all documents and identifies clusters of similar documents. This is a much more general technique and may help you find obfuscated plagiarism. Lance On 07/23/2013 02:33 AM, Furkan KAMACI wrote: Hi; Sometimes a huge part of a document may exist in another document. As like in student plagiarism or quotation of a blog post at another blog post. Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to detect it?
Re: Question about soft commit and updateRequestProcessorChain
Ok. So, running the update processor chain *is* the commit process? In answer to Erick's question: my habit, an old and apparently bad one, has been to call a hard commit at the end of each update. My question had to do with allowing soft commits to be controlled by settings in solrconfig.xml, say every 30 seconds or something like that (I really haven't studied such options yet). I ask this question because I add an additional call to the update processor, which, after running Lucene, the document is then sent outside to an agent network for further processing. I needed to know if the document was already committed by that time. I am inferring from here that the document has been committed after the first step in the update processor chain, even if that's based on a soft commit. Thanks! JackP On Wed, Aug 7, 2013 at 4:20 PM, Jack Krupansky j...@basetechnology.com wrote: Most update processor chains will be configured with the Run Update processor as the last processor of the chain. That's were the Lucene index update and optional commit would be done. -- Jack Krupansky -Original Message- From: Jack Park Sent: Wednesday, August 07, 2013 1:04 PM To: solr-user@lucene.apache.org Subject: Question about soft commit and updateRequestProcessorChain If one allows for a soft commit (rather than a hard commit on each request), when does the updateRequestProcessorChain fire? Does it fire after the commit? Many thanks Jack
Re: Question about soft commit and updateRequestProcessorChain
No and No... Commit has a life of its own. Autocommit can occur based on time and number of documents, independent of the update processor chain. For example, you can send a few updates with commit within and sit there idle doing no commands and then suddenly after the commitWithin interval the commit magically happens. CommitWithin is a recommended approach - just pick the desired time interval. Unless you have an explicit commit in your update command, there is no guarantee of Run Update doing a commit. No, the document is not committed after the first step in the update processor chain - the Run Update is usually the last or next to last (like if you use the Log Update processor) processor in the chain. IFF you requested commit, soft or hard, on your update command, the commit will occur on the Run Update processor step of the chain. -- Jack Krupansky -Original Message- From: Jack Park Sent: Wednesday, August 07, 2013 7:41 PM To: solr-user@lucene.apache.org Subject: Re: Question about soft commit and updateRequestProcessorChain Ok. So, running the update processor chain *is* the commit process? In answer to Erick's question: my habit, an old and apparently bad one, has been to call a hard commit at the end of each update. My question had to do with allowing soft commits to be controlled by settings in solrconfig.xml, say every 30 seconds or something like that (I really haven't studied such options yet). I ask this question because I add an additional call to the update processor, which, after running Lucene, the document is then sent outside to an agent network for further processing. I needed to know if the document was already committed by that time. I am inferring from here that the document has been committed after the first step in the update processor chain, even if that's based on a soft commit. Thanks! JackP On Wed, Aug 7, 2013 at 4:20 PM, Jack Krupansky j...@basetechnology.com wrote: Most update processor chains will be configured with the Run Update processor as the last processor of the chain. That's were the Lucene index update and optional commit would be done. -- Jack Krupansky -Original Message- From: Jack Park Sent: Wednesday, August 07, 2013 1:04 PM To: solr-user@lucene.apache.org Subject: Question about soft commit and updateRequestProcessorChain If one allows for a soft commit (rather than a hard commit on each request), when does the updateRequestProcessorChain fire? Does it fire after the commit? Many thanks Jack
Re: SOLR Copy field if no value on destination
Yes, it is possible to copy from a field to another field that has no value. In fact, that is the only kind of copy you should be doing unless the field is multivalued. IOW, copy field is not “replace field”. -- Jack Krupansky From: Luís Portela Afonso Sent: Wednesday, August 07, 2013 7:22 PM To: solr-user@lucene.apache.org Subject: SOLR Copy field if no value on destination Hi, Is possible to copy a value of a field to another if the destination doesn't have value? An example: a.. Indexing an rss b.. The feed has the fields link and guid, but sometimes guid cannot be present in the feed c.. I have a field that i will copy values with the name finalLink Now i want to copy guid to finalLink, but if guid has not value i want to copy link. My question is, is that possible just with the schema, Processors, solrconfig.xml, and the data-config? Thanks a lot
Re: SOLR Copy field if no value on destination
Sorry, I am unable to untangle the logic you are expressing, but I can can assure you that JavaScript and the StatelessScriptUpdate processor has full support for implementing spaghetti code logic as tangled as desired! Simpler forms of logic can be implemented directly using non-script update processor sequences, but once you start adding conditionals, there is a 50% chance that you will need a script. There is a Default Value update processor, but it takes a literal value. Hmmm... maybe I’ll come up with a “default-value” script that takes a field name for the default value. IOW, it would copy a specified field to the destination IFF the destination had no value. Ahhh... wait... maybe... you could do this with the First Value Update processor: 1. Copy guid to FinalLink. (Clone Update processor). 2. Copy link to FinalLink. (Clone Update processor). 3. First Value Update processor. So, step 3 would leave link if guid was not there, or keep guid if it is there and discard link. Yes, that should do it. This is worth an example in the book! Thanks for the inspiration! -- Jack Krupansky From: Luís Portela Afonso Sent: Wednesday, August 07, 2013 7:22 PM To: solr-user@lucene.apache.org Subject: SOLR Copy field if no value on destination Hi, Is possible to copy a value of a field to another if the destination doesn't have value? An example: a.. Indexing an rss b.. The feed has the fields link and guid, but sometimes guid cannot be present in the feed c.. I have a field that i will copy values with the name finalLink Now i want to copy guid to finalLink, but if guid has not value i want to copy link. My question is, is that possible just with the schema, Processors, solrconfig.xml, and the data-config? Thanks a lot
Re: SOLR Copy field if no value on destination
Here's the actual update processor I used (and tested): updateRequestProcessorChain name=first-default-field processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcemain_s/str str name=destfinal_s/str /processor processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcebackup_s/str str name=destfinal_s/str /processor processor class=solr.FirstFieldValueUpdateProcessorFactory str name=fieldNamefinal_s/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Wednesday, August 07, 2013 8:20 PM To: solr-user@lucene.apache.org Subject: Re: SOLR Copy field if no value on destination Sorry, I am unable to untangle the logic you are expressing, but I can can assure you that JavaScript and the StatelessScriptUpdate processor has full support for implementing spaghetti code logic as tangled as desired! Simpler forms of logic can be implemented directly using non-script update processor sequences, but once you start adding conditionals, there is a 50% chance that you will need a script. There is a Default Value update processor, but it takes a literal value. Hmmm... maybe I’ll come up with a “default-value” script that takes a field name for the default value. IOW, it would copy a specified field to the destination IFF the destination had no value. Ahhh... wait... maybe... you could do this with the First Value Update processor: 1. Copy guid to FinalLink. (Clone Update processor). 2. Copy link to FinalLink. (Clone Update processor). 3. First Value Update processor. So, step 3 would leave link if guid was not there, or keep guid if it is there and discard link. Yes, that should do it. This is worth an example in the book! Thanks for the inspiration! -- Jack Krupansky From: Luís Portela Afonso Sent: Wednesday, August 07, 2013 7:22 PM To: solr-user@lucene.apache.org Subject: SOLR Copy field if no value on destination Hi, Is possible to copy a value of a field to another if the destination doesn't have value? An example: a.. Indexing an rss b.. The feed has the fields link and guid, but sometimes guid cannot be present in the feed c.. I have a field that i will copy values with the name finalLink Now i want to copy guid to finalLink, but if guid has not value i want to copy link. My question is, is that possible just with the schema, Processors, solrconfig.xml, and the data-config? Thanks a lot
Re: SOLR Copy field if no value on destination
Oh yeah. Hi have seen that Processor on the book and i was not able to remember. Thanks a lot. And thanks a lot for your solution. It works :) On Aug 8, 2013, at 1:52 AM, Jack Krupansky j...@basetechnology.com wrote: Here's the actual update processor I used (and tested): updateRequestProcessorChain name=first-default-field processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcemain_s/str str name=destfinal_s/str /processor processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcebackup_s/str str name=destfinal_s/str /processor processor class=solr.FirstFieldValueUpdateProcessorFactory str name=fieldNamefinal_s/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Wednesday, August 07, 2013 8:20 PM To: solr-user@lucene.apache.org Subject: Re: SOLR Copy field if no value on destination Sorry, I am unable to untangle the logic you are expressing, but I can can assure you that JavaScript and the StatelessScriptUpdate processor has full support for implementing spaghetti code logic as tangled as desired! Simpler forms of logic can be implemented directly using non-script update processor sequences, but once you start adding conditionals, there is a 50% chance that you will need a script. There is a Default Value update processor, but it takes a literal value. Hmmm... maybe I’ll come up with a “default-value” script that takes a field name for the default value. IOW, it would copy a specified field to the destination IFF the destination had no value. Ahhh... wait... maybe... you could do this with the First Value Update processor: 1. Copy guid to FinalLink. (Clone Update processor). 2. Copy link to FinalLink. (Clone Update processor). 3. First Value Update processor. So, step 3 would leave link if guid was not there, or keep guid if it is there and discard link. Yes, that should do it. This is worth an example in the book! Thanks for the inspiration! -- Jack Krupansky From: Luís Portela Afonso Sent: Wednesday, August 07, 2013 7:22 PM To: solr-user@lucene.apache.org Subject: SOLR Copy field if no value on destination Hi, Is possible to copy a value of a field to another if the destination doesn't have value? An example: a.. Indexing an rss b.. The feed has the fields link and guid, but sometimes guid cannot be present in the feed c.. I have a field that i will copy values with the name finalLink Now i want to copy guid to finalLink, but if guid has not value i want to copy link. My question is, is that possible just with the schema, Processors, solrconfig.xml, and the data-config? Thanks a lot smime.p7s Description: S/MIME cryptographic signature
Re: [POLL] Who how does use admin-extra ?
: Didn't somebody once say this is used for customization of admin pages? it can be yes, that's why it originla existed -- Stefan's question was wether anyone was actually using it for that. I used it quite a bit back in the day at CNET as a way to self document what an instance was for and how to find internal documnetnation about hte instance and what queries to use it for etc..., but i haven't personally taken advantage of it in the new UI (where there are more customiation options instead of just a single header file) -Hoss
Re: [POLL] Who how does use admin-extra ?
I was thinking of using it to provide example queries in collections I give as examples. I also tested using it to inject a pop-out page that pulled bootstrap/angular from CDN to do fancy interface to local instance. It could have been useful if - say - example distribution also had a couple of example queries in there, not just empty files (as last time I looked). On the other hand, I am not sure how to use pre-/post- menu entries at all. I thought maybe they were injecting something into the main pane, but they seem to be straight links to somewhere else. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Aug 7, 2013 at 6:24 PM, Stefan Matheis matheis.ste...@gmail.comwrote: Hmmm .. Didn't get at least one answer (except from Shawn in #solr, telling me he's using a 0 byte file to avoid errors :p) - does that mean, that really no one is using it? Don't be afraid .. tell me, one way or another :) - Stefan On Wednesday, July 17, 2013 at 8:50 AM, Stefan Matheis wrote: Hey List I would be interested to hear who is using admin-extra Functionality in the 4.x UI and especially _how_ that is used: for displaying graphs, providing links for $other_tool, adding other menu items … ? The main reason i'm asking is .. i don't use it myself and i'm always curious while i have to touch it. I can test the example we provide, but that is very basic and doesn't necessarily reflect real-world scenarios. So .. tell me - I'm happy to hear everything .. reports on usage, suggestions for improvements, … :) - Stefan
Re: Solr design. Choose Cores or Shards?
Hi Eric, Thanks for your reply. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-design-Choose-Cores-or-Shards-tp4082930p4083178.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [POLL] Who how does use admin-extra ?
The problem I saw was that the styles were all reset in a funny way, so it was hard to just say H1/H2/div/ui and have a reasonable content showing up. It was all small undifferentiated text. So, one had to inject a whole bootstrap/CSS reset to do something useful. And, of course, even that was non-trivial. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Aug 7, 2013 at 10:56 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Didn't somebody once say this is used for customization of admin pages? it can be yes, that's why it originla existed -- Stefan's question was wether anyone was actually using it for that. I used it quite a bit back in the day at CNET as a way to self document what an instance was for and how to find internal documnetnation about hte instance and what queries to use it for etc..., but i haven't personally taken advantage of it in the new UI (where there are more customiation options instead of just a single header file) -Hoss
Re: Question about soft commit and updateRequestProcessorChain
I noticed the example solrconfig.xml has event listeners for commit. I wonder if they could be useful here: listener event=postCommit class=solr.RunExecutableListener I am not sure how they work with hard/soft commits though. Regards, Alex. P.s. Just to make things complicated, UpdateRequestProcessors have processCommit() method. But these seem to be a commit 'request', not commit 'execution' Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Aug 7, 2013 at 7:51 PM, Jack Krupansky j...@basetechnology.comwrote: No and No... Commit has a life of its own. Autocommit can occur based on time and number of documents, independent of the update processor chain. For example, you can send a few updates with commit within and sit there idle doing no commands and then suddenly after the commitWithin interval the commit magically happens. CommitWithin is a recommended approach - just pick the desired time interval. Unless you have an explicit commit in your update command, there is no guarantee of Run Update doing a commit. No, the document is not committed after the first step in the update processor chain - the Run Update is usually the last or next to last (like if you use the Log Update processor) processor in the chain. IFF you requested commit, soft or hard, on your update command, the commit will occur on the Run Update processor step of the chain. -- Jack Krupansky -Original Message- From: Jack Park Sent: Wednesday, August 07, 2013 7:41 PM To: solr-user@lucene.apache.org Subject: Re: Question about soft commit and updateRequestProcessorChain Ok. So, running the update processor chain *is* the commit process? In answer to Erick's question: my habit, an old and apparently bad one, has been to call a hard commit at the end of each update. My question had to do with allowing soft commits to be controlled by settings in solrconfig.xml, say every 30 seconds or something like that (I really haven't studied such options yet). I ask this question because I add an additional call to the update processor, which, after running Lucene, the document is then sent outside to an agent network for further processing. I needed to know if the document was already committed by that time. I am inferring from here that the document has been committed after the first step in the update processor chain, even if that's based on a soft commit. Thanks! JackP On Wed, Aug 7, 2013 at 4:20 PM, Jack Krupansky j...@basetechnology.com wrote: Most update processor chains will be configured with the Run Update processor as the last processor of the chain. That's were the Lucene index update and optional commit would be done. -- Jack Krupansky -Original Message- From: Jack Park Sent: Wednesday, August 07, 2013 1:04 PM To: solr-user@lucene.apache.org Subject: Question about soft commit and updateRequestProcessorChain If one allows for a soft commit (rather than a hard commit on each request), when does the updateRequestProcessorChain fire? Does it fire after the commit? Many thanks Jack
Category and Subcategory handling in 4.4 version
Hi All, Our web application (e commerce ) requires primary and secondary categories in items. Based on this requirement I have following queries : 1) How category and subcategory are handled in solr version 4.4. I have used apache-solr-1.3.0 previously, but facets have undergone many big changes since then so just wanted to know how this can be achieved efficiently now. 2) Does category and subcategories should be saved just in database and should be referred as fields in documents only for navigation, we will require categories for inventory count and as search criteria. Let me know about this. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Category-and-Subcategory-handling-in-4-4-version-tp4083188.html Sent from the Solr - User mailing list archive at Nabble.com.