Solarium Extension
Hi All, I have done installation of Solarium Search on Magento 1.7 ver. my Solr 4.9 is also working in background. My Base OS is Ubuntu 13.10 on which solr 4.9 is running. when i go check the extension in magento admin it only shows Test Connection. Please find attached image. Note - When Installing Extension on Magento through Content Manger it shows Installation done. But gives error please find attached img. for same.
Data Import handler and join select
Hi, I have one problem while indexing with data import hadler while doing a join select. I have two tables, one with products and another one with descriptions for each product in several languages. So it would be: Products: ID, NAME, BRAND, PRICE, ... Descriptions: ID, LANGUAGE, DESCRIPTION I would like to have every product indexed as a document with a multivalued field language which contains every language that has an associated description and several dinamic fields description_ one for each language. So it would be for example: Id: 1 Name: Product Brand: Brand Price: 10 Languages: [es,en] Description_es: Descripción en español Description_en: English description Our first approach was using sub-entities for the data import handler and after implementing some transformers we had everything indexed as we wanted. The sub-entity process added the descriptions for each language to the solr document and then indexed them. The problem was performance. I've read that using sub-entities affected performance greatly, so we changed our process in order to use a join instead. Performance was greatly improved this way but now we have a problem. Each time a row is processed a solr document is generated and indexed into solr, but the data is not added to any previous data, but it replaces it. If we had the previous example the query resulting from the join would be: Id - Name - Brand - Price - Language - Description 1 - Product - Brand - 10 - es - Descripción en español 1 - Product - Brand - 10 - en - English description So when indexing as both have the same id the only information I get is the second row. Is there any way for data import handler to manage this and allow the documents to be indexed updating any previous data? Thanks in advance -- Alejandro Marqués Rodríguez Paradigma Tecnológico http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42
Re: Solarium Extension
On 8/7/2014 12:34 AM, pushkar sawant wrote: I have done installation of Solarium Search on Magento 1.7 ver. my Solr 4.9 is also working in background. My Base OS is Ubuntu 13.10 on which solr 4.9 is running. when i go check the extension in magento admin it only shows Test Connection. Please find attached image. Note - When Installing Extension on Magento through Content Manger it shows Installation done. But gives error please find attached img. for same. The list will eat most attachments. Yours did not make it. Chances are that you won't be able to get much help with Solarium or Magento here. You'll need to find a mailing list or another support venue for those programs. They were not created by the Apache Solr project. Thanks, Shawn
Cannot finish recovery due to always met ReplicationHandler SnapPull failed: Unable to download xxx.fdt completely
I have 2 solr nodes(solr1 and solr2) in a SolrCloud. After some issue happened, solr2 are in recovering state. The peersync cannot finish in about 15 min, so it turn to snappull. But when it's doing snap pull, it always met this issue below. Meanwhile, there are still update requests sent to this recovering node(solr2) and the good node(solr1). And the index in the recovering node is deleted and rebuild again and again. So it takes lots of time to finish. Is it a bug or as solr design? And could anyone help me on accelerate the progress of recovery? Thanks! 2014年7月17日 下午5:12:50ERROR ReplicationHandler SnapPull failed :org.apache.solr.common.SolrException: Unable to download _vdq.fdt completely. Downloaded 0!=182945 SnapPull failed :org.apache.solr.common.SolrException: Unable to download _vdq.fdt completely. Downloaded 0!=182945 at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1305) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1185) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) We have below settings in solrconfig.xml: autoCommit maxDocs1000/maxDocs maxTime${solr.autoCommit.maxTime:15000}/maxTime openSearchertrue/openSearcher /autoCommit autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:-1}/maxTime /autoSoftCommit and the maxIndexingThreads8/maxIndexingThreads is as default. my solrconfig.xml is as attached. solrconfig.xml http://lucene.472066.n3.nabble.com/file/n4151611/solrconfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Cannot-finish-recovery-due-to-always-met-ReplicationHandler-SnapPull-failed-Unable-to-download-xxx-fy-tp4151611.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cannot finish recovery due to always met ReplicationHandler SnapPull failed: Unable to download xxx.fdt completely
Why does PeerSync take so much time? Are these two nodes in different data centers or are they connected by a slow link? On Thu, Aug 7, 2014 at 12:41 PM, forest_soup tanglin0...@gmail.com wrote: I have 2 solr nodes(solr1 and solr2) in a SolrCloud. After some issue happened, solr2 are in recovering state. The peersync cannot finish in about 15 min, so it turn to snappull. But when it's doing snap pull, it always met this issue below. Meanwhile, there are still update requests sent to this recovering node(solr2) and the good node(solr1). And the index in the recovering node is deleted and rebuild again and again. So it takes lots of time to finish. Is it a bug or as solr design? And could anyone help me on accelerate the progress of recovery? Thanks! 2014年7月17日 下午5:12:50ERROR ReplicationHandler SnapPull failed :org.apache.solr.common.SolrException: Unable to download _vdq.fdt completely. Downloaded 0!=182945 SnapPull failed :org.apache.solr.common.SolrException: Unable to download _vdq.fdt completely. Downloaded 0!=182945 at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1305) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1185) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) We have below settings in solrconfig.xml: autoCommit maxDocs1000/maxDocs maxTime${solr.autoCommit.maxTime:15000}/maxTime openSearchertrue/openSearcher /autoCommit autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:-1}/maxTime /autoSoftCommit and the maxIndexingThreads8/maxIndexingThreads is as default. my solrconfig.xml is as attached. solrconfig.xml http://lucene.472066.n3.nabble.com/file/n4151611/solrconfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Cannot-finish-recovery-due-to-always-met-ReplicationHandler-SnapPull-failed-Unable-to-download-xxx-fy-tp4151611.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Cannot finish recovery due to always met ReplicationHandler SnapPull failed: Unable to download xxx.fdt completely
Thanks. My env is 2 VM with good network condition. So not sure why it is happened. We are trying to reproduce it. The peersync fail log is : 2014年7月25日 上午6:30:48 WARN SnapPuller Error in fetching packets java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:154) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:146) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1211) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1174) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:771) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:421) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:322) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:155) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:247) -- View this message in context: http://lucene.472066.n3.nabble.com/Cannot-finish-recovery-due-to-always-met-ReplicationHandler-SnapPull-failed-Unable-to-download-xxx-fy-tp4151611p4151621.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cannot finish recovery due to always met ReplicationHandler SnapPull failed: Unable to download xxx.fdt completely
I have opened one JIRA for it: https://issues.apache.org/jira/browse/SOLR-6333 -- View this message in context: http://lucene.472066.n3.nabble.com/Cannot-finish-recovery-due-to-always-met-ReplicationHandler-SnapPull-failed-Unable-to-download-xxx-fy-tp4151611p4151631.html Sent from the Solr - User mailing list archive at Nabble.com.
org.apache.solr.common.SolrException: no servers hosting shard
I have 2 solr nodes(solr1 and solr2) in a SolrCloud. After this issue happened, solr2 are in recovering state. And after it takes long time to finish recovery, there is this issue again, and it turn to recovery again. It happens again and again. ERROR - 2014-08-04 21:12:27.917; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:148) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626) at java.lang.Thread.run(Thread.java:804) We have those settings in solrconfig.xml different with default: maxIndexingThreads24/maxIndexingThreads ramBufferSizeMB200/ramBufferSizeMB maxBufferedDocs1/maxBufferedDocs autoCommit maxDocs1000/maxDocs maxTime${solr.autoCommit.maxTime:15000}/maxTime openSearchertrue/openSearcher /autoCommit autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:-1}/maxTime /autoSoftCommit filterCache class=solr.FastLRUCache size=16384 initialSize=16384 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=16384 autowarmCount=4096/ documentCache class=solr.LRUCache size=16384 initialSize=16384 autowarmCount=4096/ fieldValueCache class=solr.FastLRUCache size=16384 autowarmCount=1024 showItems=32 / queryResultWindowSize50/queryResultWindowSize The full solrconfig.xml is as attachment. solrconfig_perf0804.xml http://lucene.472066.n3.nabble.com/file/n4151637/solrconfig_perf0804.xml -- View this message in context: http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-no-servers-hosting-shard-tp4151637.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr over hdfs for accessing/ changing indexes outside solr
Thank you very much. But why we should go for solr distributed with hadoop? There is already solrCloud which is pretty applicable in the case of big index. Is there any advantage for sending indexes over map reduce that solrCloud can not provide? Regards. On Wed, Aug 6, 2014 at 9:09 PM, Erick Erickson erickerick...@gmail.com wrote: bq: Are you aware of Cloudera search? I know they provide an integrated Hadoop ecosystem. What Cloudera Search does via the MapReduceIndexerTool (MRIT) is create N sub-indexes for each shard in the M/R paradigm via EmbeddedSolrServer. Eventually, these sub-indexes for each shard are merged (perhaps through some number of levels) in the reduce phase and maybe merged into a live Solr instance (--go-live). You'll note that this tool requires the address of the ZK ensemble from which it can get the network topology, configuration files, all that rot. If you don't use the --go-live option, the output is still a Solr index, it's just that the index for each shard is left in a specific directory on HDFS. Being on HDFS allows this kind of M/R paradigm for massively parallel indexing operations, and perhaps massively complex analysis. Nowhere is there any low-level non-Solr manipulation of the indexes. The Flume fork just writes directly to the Solr nodes. It knows about the ZooKeeper ensemble and the collection too and communicates via SolrJ I'm pretty sure. As far as integrating with HDFS, you're right, HA is part of the package. As far as using the Solr indexes for analysis, well you can write anything you want to use the Solr indexes from anywhere in the M/R world and have them available from anywhere in the cluster. There's no real need to even have Solr running, you could use the output from MRIT and access the sub-shards with the EmbeddedSolrServer if you wanted, leaving out all the pesky servlet container stuff. bq: So why we go for HDFS in the case of analysis if we want to use SolrJ for this purpose? What is the point? Scale and data access in a nutshell. In the HDFS world, you can scale pretty linearly with the number of nodes you can rack together. Frankly though, if your data set is small enough to fit on a single machine _and_ you can get through your analysis in a reasonable time (reasonable here is up to you), then HDFS is probably not worth the hassle. But in the big data world where we're talking petabyte scale, having HDFS as the underpinning opens up possibilities for working on data that were difficult/impossible with Solr previously. Best, Erick On Tue, Aug 5, 2014 at 9:37 PM, Ali Nazemian alinazem...@gmail.com wrote: Dear Erick, I remembered some times ago, somebody asked about what is the point of modify Solr to use HDFS for storing indexes. As far as I remember somebody told him integrating Solr with HDFS has two advantages. 1) having hadoop replication and HA. 2) using indexes and Solr documents for other purposes such as Analysis. So why we go for HDFS in the case of analysis if we want to use SolrJ for this purpose? What is the point? Regards. On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Erick, Hi, Thank you for you reply. Yeah I am aware that SolrJ is my last option. I was thinking about raw I/O operation. So according to your reply probably it is not applicable somehow. What about the Lily project that Michael mentioned? Is that consider SolrJ too? Are you aware of Cloudera search? I know they provide an integrated Hadoop ecosystem. Do you know what is their suggestion? Best regards. On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson erickerick...@gmail.com wrote: What you haven't told us is what you mean by modify the index outside Solr. SolrJ? Using raw Lucene? Trying to modify things by writing your own codec? Standard Java I/O operations? Other? You could use SolrJ to connect to an existing Solr server and both read and modify at will form your M/R jobs. But if you're thinking of trying to write/modify the segment files by raw I/O operations, good luck! I'm 99.99% certain that's going to cause you endless grief. Best, Erick On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com wrote: Actually I am going to do some analysis on the solr data using map reduce. For this purpose it might be needed to change some part of data or add new fields from outside solr. On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey s...@elyograg.org wrote: On 8/5/2014 7:04 AM, Ali Nazemian wrote: I changed solr 4.9 to write index and data on hdfs. Now I am going to connect to those data from the outside of solr for changing some of the values. Could somebody please tell me how that is possible? Suppose I am using Hbase over hdfs for do these changes.
solr in classic asp project
I am using an classic ASP 3.0 application and would like to implement SOLR onto it. My database is SQL server and also it connects to AS/400 using batch processing. Can someone suggest a starting point? *RegardsSandeep*
Re: solr in classic asp project
Can you elaborate on how you plan to use SOLR in your project? Parnab.. CSE, IIT Kharagpur On Thu, Aug 7, 2014 at 12:51 PM, Sandeep Bohra sandeep.bo...@3pillarglobal.com wrote: I am using an classic ASP 3.0 application and would like to implement SOLR onto it. My database is SQL server and also it connects to AS/400 using batch processing. Can someone suggest a starting point? *RegardsSandeep*
Re: solr over hdfs for accessing/ changing indexes outside solr
If SolrCloud meets your needs, without Hadoop, then there's no real reason to introduce the added complexity. There are a bunch of problems that do _not_ work well with SolrCloud over non-Hadoop file systems. For those problems, the combination of SolrCloud and Hadoop make tackling them possible. Best, Erick On Thu, Aug 7, 2014 at 3:55 AM, Ali Nazemian alinazem...@gmail.com wrote: Thank you very much. But why we should go for solr distributed with hadoop? There is already solrCloud which is pretty applicable in the case of big index. Is there any advantage for sending indexes over map reduce that solrCloud can not provide? Regards. On Wed, Aug 6, 2014 at 9:09 PM, Erick Erickson erickerick...@gmail.com wrote: bq: Are you aware of Cloudera search? I know they provide an integrated Hadoop ecosystem. What Cloudera Search does via the MapReduceIndexerTool (MRIT) is create N sub-indexes for each shard in the M/R paradigm via EmbeddedSolrServer. Eventually, these sub-indexes for each shard are merged (perhaps through some number of levels) in the reduce phase and maybe merged into a live Solr instance (--go-live). You'll note that this tool requires the address of the ZK ensemble from which it can get the network topology, configuration files, all that rot. If you don't use the --go-live option, the output is still a Solr index, it's just that the index for each shard is left in a specific directory on HDFS. Being on HDFS allows this kind of M/R paradigm for massively parallel indexing operations, and perhaps massively complex analysis. Nowhere is there any low-level non-Solr manipulation of the indexes. The Flume fork just writes directly to the Solr nodes. It knows about the ZooKeeper ensemble and the collection too and communicates via SolrJ I'm pretty sure. As far as integrating with HDFS, you're right, HA is part of the package. As far as using the Solr indexes for analysis, well you can write anything you want to use the Solr indexes from anywhere in the M/R world and have them available from anywhere in the cluster. There's no real need to even have Solr running, you could use the output from MRIT and access the sub-shards with the EmbeddedSolrServer if you wanted, leaving out all the pesky servlet container stuff. bq: So why we go for HDFS in the case of analysis if we want to use SolrJ for this purpose? What is the point? Scale and data access in a nutshell. In the HDFS world, you can scale pretty linearly with the number of nodes you can rack together. Frankly though, if your data set is small enough to fit on a single machine _and_ you can get through your analysis in a reasonable time (reasonable here is up to you), then HDFS is probably not worth the hassle. But in the big data world where we're talking petabyte scale, having HDFS as the underpinning opens up possibilities for working on data that were difficult/impossible with Solr previously. Best, Erick On Tue, Aug 5, 2014 at 9:37 PM, Ali Nazemian alinazem...@gmail.com wrote: Dear Erick, I remembered some times ago, somebody asked about what is the point of modify Solr to use HDFS for storing indexes. As far as I remember somebody told him integrating Solr with HDFS has two advantages. 1) having hadoop replication and HA. 2) using indexes and Solr documents for other purposes such as Analysis. So why we go for HDFS in the case of analysis if we want to use SolrJ for this purpose? What is the point? Regards. On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Erick, Hi, Thank you for you reply. Yeah I am aware that SolrJ is my last option. I was thinking about raw I/O operation. So according to your reply probably it is not applicable somehow. What about the Lily project that Michael mentioned? Is that consider SolrJ too? Are you aware of Cloudera search? I know they provide an integrated Hadoop ecosystem. Do you know what is their suggestion? Best regards. On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson erickerick...@gmail.com wrote: What you haven't told us is what you mean by modify the index outside Solr. SolrJ? Using raw Lucene? Trying to modify things by writing your own codec? Standard Java I/O operations? Other? You could use SolrJ to connect to an existing Solr server and both read and modify at will form your M/R jobs. But if you're thinking of trying to write/modify the segment files by raw I/O operations, good luck! I'm 99.99% certain that's going to cause you endless grief. Best, Erick On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian alinazem...@gmail.com wrote: Actually I am going to do some analysis on the solr data using map reduce. For this
RE: Data Import handler and join select
Alejandro, You can use a sub-entity with a cache using DIH. This will solve the n+1-select problem and make it run quickly. Unfortunately, the only built-in cache implementation is in-memory so it doesn't scale. There is a fast, disk-backed cache using bdb-je, which I use in production. See https://issues.apache.org/jira/browse/SOLR-2613 . You will need to build this youself and include it on the classpath, and obtain a copy of bdb-je from Oracle. While bdb-je is open source, its license is incompatible with ASL so this will never officially be part of Solr. Once you have a disk-backed cache, you can specify it on the child entity like this: entity name=parent query=select id, ... from parent table entity name=child query=select foreignKey, ... from child_table cacheKey=foreignKey cacheLookup=parent.id processor=SqlEntityProcessor transformer=... cacheImpl=BerkleyBackedCache / /entity If you don't want to go down this path, you can achieve this all with one query, if you include and ORDER BY to sort by whatever field is used as Solr's uniqueKey, and add a dummy row at the end with a UNION: SELECT p.uniqueKey, ..., 'A' as lastInd from PRODUCTS p INNER JOIN DESCRIPTIONS d ON p.uniqueKey = d.productKey UNION SELECT 0 as uniqueKey, ... , 'B' as lastInd from dual ORDER BY uniqueKey, lastInd Then your transformer would need to keep the lastUniqueKey in an instance variable and keep a running map of everything its seen for that key. When the key changes, or if on the last row, send that map as the document. Otherwise, the transformer returns null. This will collect data from each row seen onto one document. Keep in mind also, that in a lot of cases like this, it might just be easiest to write a program that uses solrj to send your documents rather than trying to make DIH's features fit your use-case. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Alejandro Marqués Rodríguez [mailto:amarq...@paradigmatecnologico.com] Sent: Thursday, August 07, 2014 1:43 AM To: solr-user@lucene.apache.org Subject: Data Import handler and join select Hi, I have one problem while indexing with data import hadler while doing a join select. I have two tables, one with products and another one with descriptions for each product in several languages. So it would be: Products: ID, NAME, BRAND, PRICE, ... Descriptions: ID, LANGUAGE, DESCRIPTION I would like to have every product indexed as a document with a multivalued field language which contains every language that has an associated description and several dinamic fields description_ one for each language. So it would be for example: Id: 1 Name: Product Brand: Brand Price: 10 Languages: [es,en] Description_es: Descripción en español Description_en: English description Our first approach was using sub-entities for the data import handler and after implementing some transformers we had everything indexed as we wanted. The sub-entity process added the descriptions for each language to the solr document and then indexed them. The problem was performance. I've read that using sub-entities affected performance greatly, so we changed our process in order to use a join instead. Performance was greatly improved this way but now we have a problem. Each time a row is processed a solr document is generated and indexed into solr, but the data is not added to any previous data, but it replaces it. If we had the previous example the query resulting from the join would be: Id - Name - Brand - Price - Language - Description 1 - Product - Brand - 10 - es - Descripción en español 1 - Product - Brand - 10 - en - English description So when indexing as both have the same id the only information I get is the second row. Is there any way for data import handler to manage this and allow the documents to be indexed updating any previous data? Thanks in advance -- Alejandro Marqués Rodríguez Paradigma Tecnológico http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42
Disabling transaction logs
Hello I am using solr 4.6.1 with over 1000 collections and 8 nodes. Restarting of nodes takes a long time (especially if we have indexing running against it) . I want to see if disabling transaction logs can help with a better robust restart. However I can't see any docs around disabling txn logs in solrcloud Can anyone help with info on how to disable transaction logs ? Thanks Nitin
Re: Character encoding problems
It's not clear to me from any of the comments you've made in this thread wether you've ever confirmed *exactly* what you are getting back from solr, ignoring the PHP completley. (ie: you refer to UTF-8 for all of the web pages suggesting you are only looking at some web application which is consuming dat from solr) What do you see when you use something like curl to talk to solr directly and inspect the raw bytes (in both directions) ? For example... $ echo '[{id:HOSS,fr_s:téléphone}]' french.json $ # sanity check that my shell didn't bork the utf8 $ cat french.json | uniname -ap character byte UTF-32 encoded as glyph name 23 23 E9 C3 A9 é LATIN SMALL LETTER E WITH ACUTE 25 26 E9 C3 A9 é LATIN SMALL LETTER E WITH ACUTE $ curl -sS -X POST 'http://localhost:8983/solr/collection1/update?commit=true' -H 'Content-Type: application/json' -d @french.json {responseHeader:{status:0,QTime:445}} $ curl -sS 'http://localhost:8983/solr/collection1/select?q=id:HOSSwt=jsonomitHeader=trueindent=true' { response:{numFound:1,start:0,docs:[ { id:HOSS, fr_s:téléphone, _version_:1475795659384684544}] }} $ curl -sS 'http://localhost:8983/solr/collection1/select?q=id:HOSSwt=jsonomitHeader=trueindent=true' | uniname -ap character byte UTF-32 encoded as glyph name 94 94 E9 C3 A9 é LATIN SMALL LETTER E WITH ACUTE 96 97 E9 C3 A9 é LATIN SMALL LETTER E WITH ACUTE One other cool diagnostic trick you can use, if the data coming back over the wire is definitely no longer utf8, is to leverate the python response writer, because it generates \uXX escape sequences for non-ASCII strings at the solr level -- if those are correct, that helps you clearly identify that it's the HTTP layer where your values are getting corrupted... $ curl -sS 'http://localhost:8983/solr/collection1/select?q=id:HOSSwt=pythonomitHeader=trueindent=true' { 'response':{'numFound':1,'start':0,'docs':[ { 'id':'HOSS', 'fr_s':u't\u00e9l\u00e9phone', '_version_':1475795807492898816}] }} -Hoss http://www.lucidworks.com/
Re: Disabling transaction logs
Hi Nitin, To answer your question first, yes, you can disable the transaction log by commenting/removing the updateLog part of the solrconfig.xml. At the same time, I'd highly recommend not disabling transaction logs. They are needed for NRT, peer sync, high availability/disaster recovery parts of SolrCloud i.e. a lot of what makes SolrCloud depends on these logs. When you say you want a robust restart, I think that is what you're getting right now. If you mean to make the entire process faster, read the post below and you should be in a much better position. Here's a writeup by Erik Erickson on soft/hard commits and transaction logs in Solr that would help you understand this better. http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ On Thu, Aug 7, 2014 at 9:12 AM, KNitin nitin.t...@gmail.com wrote: Hello I am using solr 4.6.1 with over 1000 collections and 8 nodes. Restarting of nodes takes a long time (especially if we have indexing running against it) . I want to see if disabling transaction logs can help with a better robust restart. However I can't see any docs around disabling txn logs in solrcloud Can anyone help with info on how to disable transaction logs ? Thanks Nitin -- Anshum Gupta http://www.anshumgupta.net
Change order of spell checker suggestions issue
Solr Rev: 4.6 Lucidworks: 2.6.3 This is sort of a repeat question, sorry. In the solrconfig.xml, will changing the value for the comparatorClass affect the sort of suggestions returned? This is my spellcheck component: searchComponent class=com.lucid.spellchecking.LucidSpellCheckComponent name=spellcheck lst name=defaults str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count5/str /lst str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=classnameorg.apache.solr.spelling.DirectSolrSpellChecker/str str name=namedefault/str str name=fieldspell/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix1/int int name=maxInspections5/int str name=comparatorClassscore/str float name=thresholdTokenFrequency1/float int name=minQueryLength4/int float name=maxQueryFrequency0.01/float /lst /searchComponent Searching for unie produces the following suggestions. But the suggestions appear to me to be by frequency (I've indicated Levenshtein distance in []): lst str name=wordunity/str [ 3 ] int name=freq1200/int /lst lst str name=wordunger/str [ 3 ] int name=freq119/int /lst lst str name=wordunick/str [ 3 ] int name=freq16/int /lst lst str name=wordunited/str [ 4 ] int name=freq16/int /lst lst str name=wordunique/str [ 4 ] int name=freq10/int /lst lst str name=wordunity/str [ 3 ] int name=freq7/int /lst lst str name=wordunser/str [ 3 ] int name=freq7/int /lst lst str name=wordunyi/str [ 2 ] int name=freq7/int /lst Is something configured incorrectly or am I just needing more coffee?
Re: solr over hdfs for accessing/ changing indexes outside solr
Dear Erick, Could you please name those problems that SolrCloud can not tackle them alone? Maybe I need solrCloud+ Hadoop and I am not aware of that yet. Regards. On Thu, Aug 7, 2014 at 7:37 PM, Erick Erickson erickerick...@gmail.com wrote: If SolrCloud meets your needs, without Hadoop, then there's no real reason to introduce the added complexity. There are a bunch of problems that do _not_ work well with SolrCloud over non-Hadoop file systems. For those problems, the combination of SolrCloud and Hadoop make tackling them possible. Best, Erick On Thu, Aug 7, 2014 at 3:55 AM, Ali Nazemian alinazem...@gmail.com wrote: Thank you very much. But why we should go for solr distributed with hadoop? There is already solrCloud which is pretty applicable in the case of big index. Is there any advantage for sending indexes over map reduce that solrCloud can not provide? Regards. On Wed, Aug 6, 2014 at 9:09 PM, Erick Erickson erickerick...@gmail.com wrote: bq: Are you aware of Cloudera search? I know they provide an integrated Hadoop ecosystem. What Cloudera Search does via the MapReduceIndexerTool (MRIT) is create N sub-indexes for each shard in the M/R paradigm via EmbeddedSolrServer. Eventually, these sub-indexes for each shard are merged (perhaps through some number of levels) in the reduce phase and maybe merged into a live Solr instance (--go-live). You'll note that this tool requires the address of the ZK ensemble from which it can get the network topology, configuration files, all that rot. If you don't use the --go-live option, the output is still a Solr index, it's just that the index for each shard is left in a specific directory on HDFS. Being on HDFS allows this kind of M/R paradigm for massively parallel indexing operations, and perhaps massively complex analysis. Nowhere is there any low-level non-Solr manipulation of the indexes. The Flume fork just writes directly to the Solr nodes. It knows about the ZooKeeper ensemble and the collection too and communicates via SolrJ I'm pretty sure. As far as integrating with HDFS, you're right, HA is part of the package. As far as using the Solr indexes for analysis, well you can write anything you want to use the Solr indexes from anywhere in the M/R world and have them available from anywhere in the cluster. There's no real need to even have Solr running, you could use the output from MRIT and access the sub-shards with the EmbeddedSolrServer if you wanted, leaving out all the pesky servlet container stuff. bq: So why we go for HDFS in the case of analysis if we want to use SolrJ for this purpose? What is the point? Scale and data access in a nutshell. In the HDFS world, you can scale pretty linearly with the number of nodes you can rack together. Frankly though, if your data set is small enough to fit on a single machine _and_ you can get through your analysis in a reasonable time (reasonable here is up to you), then HDFS is probably not worth the hassle. But in the big data world where we're talking petabyte scale, having HDFS as the underpinning opens up possibilities for working on data that were difficult/impossible with Solr previously. Best, Erick On Tue, Aug 5, 2014 at 9:37 PM, Ali Nazemian alinazem...@gmail.com wrote: Dear Erick, I remembered some times ago, somebody asked about what is the point of modify Solr to use HDFS for storing indexes. As far as I remember somebody told him integrating Solr with HDFS has two advantages. 1) having hadoop replication and HA. 2) using indexes and Solr documents for other purposes such as Analysis. So why we go for HDFS in the case of analysis if we want to use SolrJ for this purpose? What is the point? Regards. On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Erick, Hi, Thank you for you reply. Yeah I am aware that SolrJ is my last option. I was thinking about raw I/O operation. So according to your reply probably it is not applicable somehow. What about the Lily project that Michael mentioned? Is that consider SolrJ too? Are you aware of Cloudera search? I know they provide an integrated Hadoop ecosystem. Do you know what is their suggestion? Best regards. On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson erickerick...@gmail.com wrote: What you haven't told us is what you mean by modify the index outside Solr. SolrJ? Using raw Lucene? Trying to modify things by writing your own codec? Standard Java I/O operations? Other? You could use SolrJ to connect to an existing Solr server and both read and modify at will form your M/R
RE: Change order of spell checker suggestions issue
Corey, Looking more carefully at your responses than I did last time I answered this question, it looks like every correction is 2 edits in this example. unie unity (et , insert y) unie unger (ig , insert r) unie unick (ec , insert k) unie united (delete t , insert d) unie unique (delete q, u) unie unity (et , insert y) unie unser (si , insert r) unie unyi (iy , ei) So both score and freq will give it to you by frequency. Usually when I'm in doubt of something like this working like it should, I try to come up with more than 1 clear-cut example. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] Sent: Thursday, August 07, 2014 11:31 AM To: Solr User List Subject: Change order of spell checker suggestions issue Solr Rev: 4.6 Lucidworks: 2.6.3 This is sort of a repeat question, sorry. In the solrconfig.xml, will changing the value for the comparatorClass affect the sort of suggestions returned? This is my spellcheck component: searchComponent class=com.lucid.spellchecking.LucidSpellCheckComponent name=spellcheck lst name=defaults str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count5/str /lst str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=classnameorg.apache.solr.spelling.DirectSolrSpellChecker/str str name=namedefault/str str name=fieldspell/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix1/int int name=maxInspections5/int str name=comparatorClassscore/str float name=thresholdTokenFrequency1/float int name=minQueryLength4/int float name=maxQueryFrequency0.01/float /lst /searchComponent Searching for unie produces the following suggestions. But the suggestions appear to me to be by frequency (I've indicated Levenshtein distance in []): lst str name=wordunity/str [ 3 ] int name=freq1200/int /lst lst str name=wordunger/str [ 3 ] int name=freq119/int /lst lst str name=wordunick/str [ 3 ] int name=freq16/int /lst lst str name=wordunited/str [ 4 ] int name=freq16/int /lst lst str name=wordunique/str [ 4 ] int name=freq10/int /lst lst str name=wordunity/str [ 3 ] int name=freq7/int /lst lst str name=wordunser/str [ 3 ] int name=freq7/int /lst lst str name=wordunyi/str [ 2 ] int name=freq7/int /lst Is something configured incorrectly or am I just needing more coffee?
Wrong XSLT used in translation
Solr 4.1, in SolrCloud mode. 3 nodes configured, Running in Tomcat 7 w/ Java 7. I have a few cores set up, let's just call them A, B, C and D. They have some uniquely named xslt files, but they all have a rss.xsl file. Sometimes, on just 1 of the nodes, if I do a query for something in A and translate it with the rss.xsl, it will do the query just fine and give the right number of results (solr logged the query and had it going to the correct core), but it uses B or C's rss.xsl. Since the schemas are different, the xml is mostly empty. A refresh will have it go back to using the correct rss.xsl. Has anyone run into a problem like this? Is it a problem with the 4.1 Solr? Will upgrading fix it? Is it a better practice to uniquely name the xslt files for each core (having a-rss.xsl, b-rss.xsl, etc)? Any help/thoughts would be appreciated. -- Chris
Re: Wrong XSLT used in translation
On 8/7/2014 1:46 PM, Christopher Gross wrote: Solr 4.1, in SolrCloud mode. 3 nodes configured, Running in Tomcat 7 w/ Java 7. I have a few cores set up, let's just call them A, B, C and D. They have some uniquely named xslt files, but they all have a rss.xsl file. Sometimes, on just 1 of the nodes, if I do a query for something in A and translate it with the rss.xsl, it will do the query just fine and give the right number of results (solr logged the query and had it going to the correct core), but it uses B or C's rss.xsl. Since the schemas are different, the xml is mostly empty. A refresh will have it go back to using the correct rss.xsl. Has anyone run into a problem like this? Is it a problem with the 4.1 Solr? Will upgrading fix it? Is it a better practice to uniquely name the xslt files for each core (having a-rss.xsl, b-rss.xsl, etc)? I wonder if Solr might have a bug with XSLT caching, where the cache is global and simply looks at the base filename, not the full path. If it works when you use xsl files with different names, then that is the most likely problem. If you determine that the bug I mentioned is what's happening, before filing a bug in Jira, we need to determine whether it's still a problem in the latest version. Version 4.1 came out in January 2013. Upgrading is definitely advised, if you can do it. Thanks, Shawn
Re: Anybody uses Solr JMX?
Hi Paul, There are lots of people/companies using SPM for Solr/SolrCloud and I don't recall anyone saying SPM agent collecting metrics via JMX had a negative impact on Solr performance. That said, some people really dislike JMX and some open source projects choose to expose metrics via custom stats APIs or even files. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Aug 6, 2014 at 11:18 PM, Paul Libbrecht p...@hoplahup.net wrote: Hello Otis, this looks like an excellent idea! I'm in need of that, erm… last week and probably this one too. Is there not a risk that reading certain JMX properties actually hogs the process? (or is it by design that MBeans are supposed to be read without any lock effect?). thanks for the hint. paul On 6 mai 2014, at 04:43, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Alexandre, you could use something like http://blog.sematext.com/2012/09/25/new-tool-jmxc-jmx-console/ to quickly dump everything out of JMX and see if there is anything there Solr Admin UI doesn't expose. I think you'll find there is more in JMX than Solr Admin UI shows. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, May 5, 2014 at 1:56 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Thank you everybody for the links and explanations. I am still curious whether JMX exposes more details than the Admin UI? I am thinking of a troubleshooting context, rather than long-term monitoring one. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, May 5, 2014 at 12:21 PM, Gora Mohanty g...@mimirtech.com wrote: On May 5, 2014 7:09 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: I have religiously kept jmx statement in my solrconfig.xml, thinking it was enabling the web interface statistics output. But looking at the server logs really closely, I can see that JMX is actually disabled without server present. And the Admin UI does not actually seem to care after a quick test. Does anybody have a real experience with Solr JMX? Does it expose more information than Admin UI's Plugins/Stats page? Is it good for Have not been using JMX lately, but we were using it in the past. It does allow monitoring many useful details. As others have commented, it also integrates well with other monitoring tools as JMX is a standard. Regards, Gora
Re: Anybody uses Solr JMX?
useful. -- View this message in context: http://lucene.472066.n3.nabble.com/Anybody-uses-Solr-JMX-tp4134598p4151820.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to sync lib directory in SolrCloud?
mark. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-sync-lib-directory-in-SolrCloud-tp4150405p4151821.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: why solr commit with serval docs
code error by my colleague. -- View this message in context: http://lucene.472066.n3.nabble.com/why-solr-commit-with-serval-docs-tp4150583p4151822.html Sent from the Solr - User mailing list archive at Nabble.com.