Re: Test harness can not load existing index data in Solr 4.2
I think the problem should be EmbeddedSolrServer can't load existing index data. Any committer can help confirm whether it's a bug or not. Thank you. Kane On Mon, Apr 15, 2013 at 7:28 PM, zhu kane kane...@gmail.com wrote: I'm extending Solr's *AbstractSolrTestCase* for unit testing. I have existing 'schema.xml', 'solrconfig.xml' and index data. I want to start an embedded solr server to load existing collection and its data. Then test searching doc in solr. This way works well in Solr 3.6. However it does not work any more after adapting to Solr 4.2.1. After some investigating, I found it looks like the index data is not loaded by SolrCore created by Test harness. This also can be reproduced when using index of example doc of Solr, I posted the detail test class in my stackoverflow question[1]. Is it a bug of test harness? Or is there better way to load existing index data in unit test? Thanks. [1] http://stackoverflow.com/questions/15947116/solr-4-2-test-harness-can-not-load-existing-index-data Mengxin Zhu
Re: Solr metrics in Codahale metrics and Graphite?
Hello Walter, Have you had a chance to get something working with graphite, codahale and solr? Has anyone else tried these tools with Solr 3.x family? How much work is it to set things up? We have tried zabbix in the past. Even though it required lots of up front investment on configuration, it looks like a compelling option. In the meantime, we are looking into something more solr-tailed yet simple. Even without metrics persistence. Tried: jconsole and viewing stats via jmx. Main point for us now is to gather the RAM usage. Dmitry On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood wun...@wunderwood.orgwrote: If it isn't obvious, I'm glad to help test a patch for this. We can run a simulated production load in dev and report to our metrics server. wunder On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote: That approach sounds great. --wunder On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote: I've been thinking about how to improve this reporting, especially now that metrics-3 (which removes all of the funky thread issues we ran into last time I tried to add it to Solr) is close to release. I think we could go about it as follows: * refactor the existing JMX reporting to use metrics-3. This would mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and adding a JmxReporter, keeping the existing config logic to determine which JMX server to use. PluginInfoHandler and SolrMBeanInfoHandler translate the metrics-3 data back into SolrMBean format to keep the reporting backwards-compatible. This seems like a lot of work for no visible benefit, but… * we can then add the ability to define other metrics reporters in solrconfig.xml. There are already reporters for Ganglia and Graphite - you just add then to the Solr lib/ directory, configure them in solrconfig, and voila - Solr can be monitored using the same devops tools you use to monitor everything else. Does this sound sane? Alan Woodward www.flax.co.uk On 6 Apr 2013, at 20:49, Walter Underwood wrote: Wow, that really doesn't help at all, since these seem to only be reported in the stats page. I don't need another non-standard app-specific set of metrics, especially one that needs polling. I need metrics delivered to the common system that we use for all our servers. This is also why SPM is not useful for us, sorry Otis. Also, there is no time period on these stats. How do you graph the 95th percentile? I know there was a lot of work on these, but they seem really useless to me. I'm picky about metrics, working at Netflix does that to you. wunder On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote: In the Jira, but not in the docs. It would be nice to have VM stats like GC, too, so we can have common monitoring and alerting on all our services. wunder On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote: It's there! :) http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood wun...@wunderwood.org wrote: That sounds great. I'll check out the bug, I didn't see anything in the docs about this. And if I can't find it with a search engine, it probably isn't there. --wunder On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote: On 3/29/2013 12:07 PM, Walter Underwood wrote: What are folks using for this? I don't know that this really answers your question, but Solr 4.1 and later includes a big chunk of codahale metrics internally for request handler statistics - see SOLR-1972. First we tried including the jar and using the API, but that created thread leak problems, so the source code was added. Thanks, Shawn -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Solr Cloud 4.2 - Distributed Requests failing with NPE
Just tried the same queries with the 'example' in solr 4.2 build and getting same issue: http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=trueshards=localhost:7574/solr/collection1 trace:java.lang.NullPointerException\r\n\tat org.apache.solr.handler.component.HttpShardHandler.checkDistributed(HttpShardHandler.java:340)\r\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:182)\r\n\tat These are working fine: http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=trueshards=shard1 http://localhost:8983/solr/collection1/select?q=*%3A*wt=jsonindent=trueshards=shard2 Here is the cluster state: {collection1:{ shards:{ shard1:{ range:8000-, state:active, replicas:{10.88.160.145:8983_solr_collection1:{ shard:shard1, state:active, core:collection1, collection:collection1, node_name:10.88.160.145:8983_solr, base_url:http://10.88.160.145:8983/solr;, leader:true}}}, shard2:{ range:0-7fff, state:active, replicas:{10.88.160.145:7574_solr_collection1:{ shard:shard2, state:active, core:collection1, collection:collection1, node_name:10.88.160.145:7574_solr, base_url:http://10.88.160.145:7574/solr;, leader:true, router:compositeId}} Thanks,Sudhakar. On Mon, Apr 22, 2013 at 12:52 PM, Sudhakar Maddineni maddineni...@gmail.com wrote: Hi, We recently upgraded our solr version from 4.1 to 4.2 and started seeing below exceptions when running distributed queries: Any idea what we are missing here - http:// solr-host-1:8080/solr/core1/select?q=*%3A*wt=jsonindent=trueshards=solr-host-1:8080/solr/core1 http:// solr-host-1:8080/solr/core1/select?q=*%3A*wt=jsonindent=trueshards=solr-host-2:8080/solr/core1 http:// solr-host-1:8080/solr/core1/select?q=*%3A*wt=jsonindent=trueshards=solr-host-3:8080/solr/core1 error:{ trace:java.lang.NullPointerException\n\tat org.apache.solr.handler.component.HttpShardHandler.checkDistributed(HttpShardHandler.java:340)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:182)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)\n\tat org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:470)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)\n\tat org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)\n\tat org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)\n\tat org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)\n\tat java.lang.Thread.run(Unknown Source)\n, code:500}} Thanks,Sudhakar.
Re: Average Solr Server Spec.
Walter, Can you share the document count / index size for this shard? Even though these are not decisive parameters, but suit the data points comparison :) On Tue, Apr 9, 2013 at 9:00 PM, Walter Underwood wun...@wunderwood.orgwrote: We mostly run m1.xlarge with an 8GB heap. --wunder On Apr 9, 2013, at 10:57 AM, Otis Gospodnetic wrote: Hi, You are right there is no average. I saw a Solr cluster with a few EC2 micro instances yesterday and regularly see Solr running on 16 or 32 GB RAM and sometimes well over 100 GB RAM. Sometimes they have just 2 CPU cores, sometimes 32 or more. Some use SSDs, some HDDs, some local storage, some SAN, some EBS on AWS. etc. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 9, 2013 at 7:04 AM, Furkan KAMACI furkankam...@gmail.com wrote: This question may not have a generel answer and may be open ended but is there any commodity server spec. for a usual Solr running machine? I mean what is the average server spesification for a Solr machine (i.e. Hadoop running system it is not recommended to have very big storage capably computers.) I will use Solr for indexing web crawled data.
Re: Where to use replicationFactor and maxShardsPerNode at SolrCloud?
Thanks for the answers. 2013/4/23 Erick Erickson erickerick...@gmail.com bq: However what will happen to that 10 nodes when I specify replication factor? I think they just sit around doing nothing. Best Erick On Mon, Apr 22, 2013 at 7:24 AM, Furkan KAMACI furkankam...@gmail.com wrote: Sorry but if I have 10 shards and a collection with replication factor of 1 and if I start up 30 nodes what happens to that last 10 nodes? I mean: 10 nodes as leader 10 nodes as replica if I don't specify replication factor there was going to be a round robin system that assigns other 10 machine as: + 10 nodes as replica However what will happen to that 10 nodes when I specify replication factor? 2013/4/22 Erick Erickson erickerick...@gmail.com 1) Imagine you have lots and lots and lots of different Solr indexes and a 50 node cluster. Further imagine that one of those indexes has 2 shards, and a leader + shard is adequate to handle the load. You need some way to limit the number of nodes your index gets distributed to, that's what replicationFactor is for. So in this case replicationFactor=2 will stop assigning nodes to that particular collection after there's a leader + 1 replica 2 In the system you described, there won't be more than one shard/node. But one strategy for growth is to overshard. That is, in the early days you put (numbers from thin air) 10 shards/node and they are all quite small. As your index grows, you move to two nodes with 5 shards each. And later to 5 nodes with 2 shards and so on. There are cases where you want some way to make the most of your hardware yet plan for expansion. Best Erick On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI furkankam...@gmail.com wrote: I know that: when using SolrCloud we define the number of shards into the system. When we start up new Solr instances each one will be a a leader for a shard, and if I continue to start up new Solr instances (that has exceeded the number number of shards) each one will be a replica for each leader as a round robin process. However when I read wiki there are two parameters: *replicationFactor *and * maxShardsPerNode. *1) Can you give details about what are they. If all newly added Solr instances becomes a replica what is that replication factor for? 2) If what I wrote is true about that round robin process what is that * maxShardsPerNode*? How can be more than one shard at the system I described?
Re: Too many close, count -1
Hoss, I use solr as a SolrCluster, the main feature that I use is faceting to do some analytics and normal queries to do free text search and retrieve data using filters. I don't use any custom plugin or contribute plugin. At the moment I'm importing my data from mysql to solr, I don't use dih, instead I use a custom mechanism. In this import, I don't do hard or soft commits, I relay this responsibility to solr. I don't know if this info is useful but I have a lot of: WARNING: [XXX] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 The cluster is formed by a thousand of collection, I have a collection for each client. My solrconfig: config luceneMatchVersionLUCENE_40/luceneMatchVersion directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ indexConfig ramBufferSizeMB256/ramBufferSizeMB mergeFactor20/mergeFactor mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler/ lockTypenative/lockType !-- Commit Deletion Policy Custom deletion policies can be specified here. The class must implement org.apache.lucene.index.IndexDeletionPolicy. http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/index/IndexDeletionPolicy.html The default Solr IndexDeletionPolicy implementation supports deleting index commit points on number of commits, age of commit point and optimized status. The latest commit point should always be preserved regardless of the criteria. -- !-- deletionPolicy class=solr.SolrDeletionPolicy -- !-- The number of commit points to be kept -- !-- str name=maxCommitsToKeep1/str -- !-- The number of optimized commit points to be kept -- !-- str name=maxOptimizedCommitsToKeep0/str -- !-- Delete all commit points once they have reached the given age. Supports DateMathParser syntax e.g. -- !-- str name=maxCommitAge30MINUTES/str -- str name=maxCommitAge60MINUTES/str !-- /deletionPolicy -- !-- Lucene Infostream To aid in advanced debugging, Lucene provides an InfoStream of detailed information when indexing. Setting The value to true will instruct the underlying Lucene IndexWriter to write its debugging info the specified file -- !-- infoStream file=INFOSTREAM.txtfalse/infoStream -- /indexConfig query !-- If true, stored fields that are not requested will be loaded lazily. This can result in a significant speed improvement if the usual case is to not load all stored fields, especially if the skipped fields are large compressed text fields. -- enableLazyFieldLoadingtrue/enableLazyFieldLoading queryResultWindowSize1000/queryResultWindowSize queryResultMaxDocsCached3000/queryResultMaxDocsCached maxWarmingSearchers2/maxWarmingSearchers useFilterForSortedQuerytrue/useFilterForSortedQuery filterCache class=solr.FastLRUCache size=2000 initialSize=1500 autowarmCount=750 cleanupThread=true/ queryResultCache class=solr.FastLRUCache size=2000 initialSize=1500 autowarmCount=750 cleanupThread=true/ documentCache class=solr.FastLRUCache size=2 initialSize=1 autowarmCount=0 cleanupThread=true/ /query updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dir${solr.data.dir:}/str /updateLog !-- Commit documents definitions -- autoCommit maxDocs5000/maxDocs maxTime1/maxTime /autoCommit autoSoftCommit maxTime2500/maxTime /autoSoftCommit maxPendingDeletes2/maxPendingDeletes /updateHandler requestDispatcher handleSelect=false requestParsers enableRemoteStreaming=true multipartUploadLimitInKB=10485760/ /requestDispatcher requestHandler name=/select class=solr.SearchHandler/ !-- request handler that returns indented JSON by default -- requestHandler name=/query class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str str name=wtjson/str str name=indenttrue/str str name=dftext/str /lst /requestHandler !-- realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. The current implementation relies on the updateLog feature being enabled. -- requestHandler name=/get class=solr.RealTimeGetHandler lst name=defaults str name=omitHeadertrue/str str name=wtjson/str str name=indentfalse/str /lst /requestHandler requestHandler name=/admin/ class=solr.admin.AdminHandlers / requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler
Re: Error creating collection
The solr version is 4.2.1. Here the stack trace: SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore XXX': Could not get shard_id for core: XXX coreNodeName:192.168.20.47:8983_solr_XXX$ at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:483)$ at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:140)$ at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)$ at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591)$ at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192)$ at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)$ at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)$ at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)$ at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)$ at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)$ at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)$ at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)$ at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)$ at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)$ at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)$ at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009)$ at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)$ at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)$ at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)$ at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)$ at java.lang.Thread.run(Unknown Source)$ Caused by: org.apache.solr.common.SolrException: Could not get shard_id for core: XXX coreNodeName:192.168.20.47:8983_solr_XXX$ at org.apache.solr.cloud.ZkController.doGetShardIdProcess(ZkController.java:1221)$ at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1294)$ at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:861)$ at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)$ at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:479)$ 20 more$ - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Error-creating-collection-tp4057859p4058231.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DocValues with docValuesFormat=Disk
Answering myself - adding this line in solrconfig.xml made it work: codecFactory name=CodecFactory class=solr.SchemaCodecFactory / On 4/23/13 3:42 PM, Abhishek Sanoujam wrote: Hi all, I am trying to experiment with DocValues (http://wiki.apache.org/solr/DocValues) and use the Disk docValuesFormat. Here's how my field type declaration looks like: fieldtype name=stringDv class=solr.StrField sortMissingLast=true omitNorms=true docValuesFormat=Disk/ I don't even have any fields using that type. Also I've updated solrconfig.xml with: luceneMatchVersionLUCENE_42/luceneMatchVersion Am running with solr-4.2.1. My solr core is totally empty, and there is nothing in the data dir. Am getting this weird error while starting up the solr core: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.init(SolrCore.java:822) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870) at org.apache.solr.core.SolrCore.init(SolrCore.java:735) ... 13 more Apr 23, 2013 3:34:06 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: p5-upsShard-1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.init(SolrCore.java:822) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) ... 10 more Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870) at org.apache.solr.core.SolrCore.init(SolrCore.java:735) ... 13 more Is there any other config change that I need to do? I've read http://wiki.apache.org/solr/DocValues multiple times, but am unable to see any light to solve the problem. -- - Cheers, Abhishek -- - Cheers, Abhishek
DIH Abort does not close input file
Hi All, I'm using DIH with FileListEntityProcessor in order to index from xml files. If I perform a DIH with command=abort, it seems that the xml file being processed by dataimport is not closed. When I try to delete it, I get an error message this file is opened by Apache Tomcat Is it a known problem ? I'm using Solr Version: 1.4.1.2010 Regards, Jean-Michel -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Abort-does-not-close-input-file-tp4058214.html Sent from the Solr - User mailing list archive at Nabble.com.
Complex Join Query
Is there any other enterprise search other than SOLR which supports Complex Join Query,as Solr does not support the same. As per my requirement I need to search Complex Join Query which will search from document Indexing or in main memory. As it is very faster than any disk based database. Any help is appreciable. Regards, Ashim -- View this message in context: http://lucene.472066.n3.nabble.com/Complex-Join-Query-tp4058233.html Sent from the Solr - User mailing list archive at Nabble.com.
Memory Impact of a New Core
Hi all, We've got quite a lot of (mostly small) solr cores in our Solr instance. They all share the same solrconfig.xml and schema.xml (only the data differs). I'm wondering how far can I go in terms of number of cores. CPU is not an issue, but memory could be. An idea/guideline about the impact of a new Solr Core in a Solr instance? Thanks! Jerome. -- Jerome Eteve +44(0)7738864546 http://www.eteve.net/
dataimporthandler does not distribute documents on solr cloud
Hi we solr cloud with 4 shards and when we try to import the data using dataimporthandler, it does not distribute documents in all 4 shards. Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/dataimporthandler-does-not-distribute-documents-on-solr-cloud-tp4058248.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Memory Impact of a New Core
I'm not an expert, but at some extent I think it will come down to few factors: * How much data is been cached per core. * If memory is an issue and still you want performance, I/O with low cache could be an issue (SSDs?) * Soft commits which implies open searchers per soft commit (and per core) will depend on caches. I do believe at the end it will be a direct result of your caching and I/O. If all you care is performance, caching (memory) could be replaced with faster I/O though soft commits will be fragile to memory due to its nature (depends on caching/memory and low I/O usage) Hope I made sense, I probably tried too many points of view in a single idea. Guido. On 23/04/13 11:50, Jérôme Étévé wrote: Hi all, We've got quite a lot of (mostly small) solr cores in our Solr instance. They all share the same solrconfig.xml and schema.xml (only the data differs). I'm wondering how far can I go in terms of number of cores. CPU is not an issue, but memory could be. An idea/guideline about the impact of a new Solr Core in a Solr instance? Thanks! Jerome.
Re: Memory Impact of a New Core
Thanks! Yeah I know about the caching/commit things My question is more about the impact of a Pure creation of a Solr core, indepently of its usage memory requirements (like caches and stuff). From the experiments I did using JMX, it's not measurable, but I might be wrong. On 23 April 2013 12:25, Guido Medina guido.med...@temetra.com wrote: I'm not an expert, but at some extent I think it will come down to few factors: * How much data is been cached per core. * If memory is an issue and still you want performance, I/O with low cache could be an issue (SSDs?) * Soft commits which implies open searchers per soft commit (and per core) will depend on caches. I do believe at the end it will be a direct result of your caching and I/O. If all you care is performance, caching (memory) could be replaced with faster I/O though soft commits will be fragile to memory due to its nature (depends on caching/memory and low I/O usage) Hope I made sense, I probably tried too many points of view in a single idea. Guido. On 23/04/13 11:50, Jérôme Étévé wrote: Hi all, We've got quite a lot of (mostly small) solr cores in our Solr instance. They all share the same solrconfig.xml and schema.xml (only the data differs). I'm wondering how far can I go in terms of number of cores. CPU is not an issue, but memory could be. An idea/guideline about the impact of a new Solr Core in a Solr instance? Thanks! Jerome. -- Jerome Eteve +44(0)7738864546 http://www.eteve.net/
what is the maximum XML file size to import?
Hello, What is the maximum size limit of the XML document file that is allowed to import into solr to index from java -Durl. As I am testing to import XMLfile of 5 GB and it throws an error like SimplePostTool: WARNING: Solr returned an error #400 Bad Request SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://10.0.1.140:8080/solr/solr1/update -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-the-maximum-XML-file-size-to-import-tp4058263.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: terms starting with multilingual character don't list on solr auto-suggestion list
Hi Jack, Sorry for late response. I have used following settings for auto-suggestion: searchComponent name=terms class=solr.TermsComponent/ requestHandler name=/terms class=solr.SearchHandler startup=lazy lst name=defaults bool name=termstrue/bool /lst arr name=components strterms/str /arr /requestHandler and used the following field fieldType: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$1$2/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$1$2/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/terms-starting-with-multilingual-character-don-t-list-on-solr-auto-suggestion-list-tp4056288p4058264.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Memory Impact of a New Core
The overhead of just opening a core is insignificant relatively to using it so, unless you are worried about hitting the max number of open files limit, it seems unimportant. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 23, 2013 7:46 AM, Jérôme Étévé jerome.et...@gmail.com wrote: Thanks! Yeah I know about the caching/commit things My question is more about the impact of a Pure creation of a Solr core, indepently of its usage memory requirements (like caches and stuff). From the experiments I did using JMX, it's not measurable, but I might be wrong. On 23 April 2013 12:25, Guido Medina guido.med...@temetra.com wrote: I'm not an expert, but at some extent I think it will come down to few factors: * How much data is been cached per core. * If memory is an issue and still you want performance, I/O with low cache could be an issue (SSDs?) * Soft commits which implies open searchers per soft commit (and per core) will depend on caches. I do believe at the end it will be a direct result of your caching and I/O. If all you care is performance, caching (memory) could be replaced with faster I/O though soft commits will be fragile to memory due to its nature (depends on caching/memory and low I/O usage) Hope I made sense, I probably tried too many points of view in a single idea. Guido. On 23/04/13 11:50, Jérôme Étévé wrote: Hi all, We've got quite a lot of (mostly small) solr cores in our Solr instance. They all share the same solrconfig.xml and schema.xml (only the data differs). I'm wondering how far can I go in terms of number of cores. CPU is not an issue, but memory could be. An idea/guideline about the impact of a new Solr Core in a Solr instance? Thanks! Jerome. -- Jerome Eteve +44(0)7738864546 http://www.eteve.net/
Re: Complex Join Query
Have a look at ElasticSearch, maybe it's a better fit. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 23, 2013 6:38 AM, ashimbose ashimb...@gmail.com wrote: Is there any other enterprise search other than SOLR which supports Complex Join Query,as Solr does not support the same. As per my requirement I need to search Complex Join Query which will search from document Indexing or in main memory. As it is very faster than any disk based database. Any help is appreciable. Regards, Ashim -- View this message in context: http://lucene.472066.n3.nabble.com/Complex-Join-Query-tp4058233.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what is the maximum XML file size to import?
This does not seem to be related to the XML size. Check the exact error message on the server side. Looks to me like the URL may not be correct. I think in some cases, post.jar automatically adds /update handler, so maybe you are doubling it up. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Apr 23, 2013 at 8:02 AM, Sharmila Thapa shar...@gmail.com wrote: Hello, What is the maximum size limit of the XML document file that is allowed to import into solr to index from java -Durl. As I am testing to import XMLfile of 5 GB and it throws an error like SimplePostTool: WARNING: Solr returned an error #400 Bad Request SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://10.0.1.140:8080/solr/solr1/update -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-the-maximum-XML-file-size-to-import-tp4058263.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What to test, calculate, measeure for a pre-production version of SolrCloud?
Hi, Let me get my crystal ball OK, now let's try inlining. On Tue, Apr 23, 2013 at 5:48 AM, Furkan KAMACI furkankam...@gmail.com wrote: * I want to measure how much RAM I should define for my Solr instances, * I will try to make some predictions about how much disk space I will need at production step. This one is easy if your index is static or grows slowly. If not, you'll want to set alert thresholds on disk space free/used for capacity planning/expansion purposes. You probably saw threads about needing about 3x the disk space (3x the size of your index) for about a week ago. * Maybe I will check my answer for that question: which RAID to use (or not use) etc. For that questions I got answers from mail list and I have some approximations about them. Also I know that it is not easy to answer such questions and I should test them to get more accurate answers. My question is that:: What do you suggest me at pre-production and test step? * i.e. give much more heap size to Solr instances to calculate RAM Impossible to tell precisely, but you can launch Solr, hammer it (next bullet), look at your monitoring tool or just JConsole, ask the JVM to run GC (you can do that from JConsole), observe heap once everything has been fully loaded (for sorting, faceting, etc.). That will give you an idea of bare minimum heap. Increase from here. Don't expect to find one magic number that will be good forever, because that won't be the case (this is where keeping an eye with monitoring and alerting comes into play) unless your system is completely static (static index, same type, volume, and distribution of queries, etc.) * use solrmeter to test qps for your cluster Sure. JMeter or SolrMeter will do. The latter is written by one of the Solr guys and gives you more Solr-specific data, so +1 for that one. :) * use sematext or anything else for performance monitoring etc. I'm completely unbiased here, of course ;) Yes, you need some sort of monitoring (+alerting) if you are serious about your search in production. If you already have something, hook that up. If you don't have anything or don't want to bother with maintaining a monitoring system, get some SaaS, like SPM for Solr. I need some advices what to test, calculate, measeure etc. Also there was a question about Codahale metrics and Graphite. You can advice something about that too. One of the main decision factors is whether you want the responsibility of maintaining something like Graphite in house or give it up and focus on your service/product. The tendency seems to be the latter, but there are still organizations who choose the former. PS: I use Solr 4.2.1 for tests but if Solr 4.3 becomes ready (if it is tagged at repository) I will use it. If you are in pre-production and asking questions about memory and disk, my feeling is you should wait for 4.3. :) HTH Otis -- Solr ElasticSearch Support http://sematext.com/
Problem with solr, HTTP/Request.php and tomcat.
Hi! I'm using solr with tomcat and i need to add a record using HTTP/Request.php (PEAR). So, i created a test file with the following code: ?php require_once HTTP/Request.php; $req = new HTTP_Request(http://localhost:8080/solr/stats/update;); $req-setMethod(HTTP_REQUEST_METHOD_POST); $xml = 'adddocfield name=typeAllFields/fieldfield name=id412263fc396ab4.19731404/fieldfield name=datestamp2013-02-18T14:25:16Z/fieldfield name=browserFirefox/fieldfield name=browserVersion18.0/fieldfield name=ipaddress192.168.2.22/fieldfield name=referrer http://ijsn627.ijsn.es.gov.br/vufind/Biblioteca//fieldfield name=url/vufind/Biblioteca/Search/Results?lookfor=amp;type=AllFieldsamp;submit=Pesquisar/field/doc/add'; $req-addHeader('Content-Type', 'text/xml; charset=utf-8'); $req-addHeader('Content-Length', strlen($xml)); $req-setBody($xml); if (!PEAR::isError($req-sendRequest())) { $response1 = $req-getResponseBody(); echo $req-getResponseCode(); } else { $response1 = ; } $req-clearPostData(); echo $response1; echo $response2; ? That should work, right? But i'm getting the error code 400. The same error appear when i enable the statistics module in vufind. With curl the update is ok: curl http://localhost:8080/solr/stats/update/?commit=true -H Content-Type: text/xml; charset=utf-8 --data-binary 'adddocfield name=typeAllFields/fieldfield name=id412263fc396ab4.19731404/fieldfield name=datestamp2013-02-18T14:25:16Z/fieldfield name=browserFirefox/fieldfield name=browserVersion18.0/fieldfield name=ipaddress192.168.2.22/fieldfield name=referrer http://ijsn627.ijsn.es.gov.br/vufind/Biblioteca//fieldfield name=url/vufind/Biblioteca/Search/Results?lookfor=amp;type=AllFieldsamp;submit=Pesquisar/field/doc/add'; I have no idea whats happening... Any ideas? Apache-tomcat: 7.0.27 Solr: 3.5
Querying only for + character causes org.apache.lucene.queryParser.ParseException
Hi! Currently I'm working on a basica search engine for, the main problem is that during some tests a problem was detected, in the application if a user search for the + or - term only or the + string it causes an exception in my application, the problem is caused for an org.apache.lucene.queryParser.ParseException in solr. I get the same response if, from the solr admin interface, I search for the + term. For what I've seen the + character gets encoded into %2B which cause the exception. Is there any way of escaping this character so they behave like any other character? or at least get no response for this cases? I'm using solr 3.6.2, deployed in tomcat7. Greetings! http://www.uci.cu
Re: fuzzy search issue with PatternTokenizer Factory
Fuzzy Search is looking independent of all the analyzer, but it seems that its not independent of tokenizer. As If i just change my tokenizer to *Solr.StandardTokenizerFactory* , Fuzzy search started working fine, If it is independent of Tokenizer then this should not occur. And I also , I had analyzed my terms in Admin UI Analysis page, and the term coming perfectly fine as expected, only this is only issue which I am facing. but i cant analyze the fuzzy term in Admin UI Analysis page. so not able to catch the issue. Jack Krupansky-2 wrote Once again, fuzzy search is completely independent of your analyzer or pattern tokenizer. Please use the Solr Admin UI Analysis page to debug whether the terms are what you expect. And realize that fuzzy search has a maximum editing distance of 2 and that includes case changes. -- Jack Krupansky -Original Message- From: meghana Sent: Monday, April 22, 2013 3:25 AM To: solr-user@.apache Subject: Re: fuzzy search issue with PatternTokenizer Factory Jack, the regex will split tokens by anything expect alphabets , numbers, '' , '-' and ns: (where n is number from 0 to , e.g 4323s: ) Lets say for example my text is like below. *this is nice* day sun 53s: is risen. * Then pattern tokenizer should create tokens as *this is nice day sun is risen* pattern seem to working fine with different text, also for fuzzy search *worde~1*, I have checked the results returns for patterntokenizer factory, having punctuation marks like '*WORDS,*' , *WORDED* , etc... One more weird thing is, all the results are in uppercase letters, no results with lowercase results come. although it does not return all results of uppercase letters. but not sure after changing to this fuzzy search not working properly. Jack Krupansky-2 wrote Give us some examples of tokens that you are expecting that pattern to tokenize. And express the pattern in simple English as well. Some some actual input data. I suspect that Solr is working fine - but you may not have precisely specified your pattern. But we don't know what your pattern is supposed to recognize. Maybe some of your previous hits had punctuation adjacent to to the terms that your pattern doesn't recognize. And use the Solr Admin UI Analysis page to see how your sample input data is analyzed. w One other thing... without a group, the pattern specifies what delimiter sequence will split the rest of the input into tokens. I suspect you didn't mean this. -- Jack Krupansky -Original Message- From: meghana Sent: Friday, April 19, 2013 9:01 AM To: solr-user@.apache Subject: fuzzy search issue with PatternTokenizer Factory I m using Solr4.2 , I have changed my text field definition, to use the Solr.PatternTokenizerFactory instead of Solr.StandardTokenizerFactory , and changed my schema defination as below fieldType name=text_token class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.PatternTokenizerFactory pattern=[^a-zA-Z0-9amp;\-']|\d{0,4}s: / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.PatternTokenizerFactory pattern=[^a-zA-Z0-9amp;\-']|\d{0,4}s: / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_extra_query.txt enablePositionIncrements=false / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType after doing so, fuzzy search do not seems to working properly as it was working before. I m searching with search term : worde~1 on search , before it was returning , around 300 records , but now its returning only 5 records. not sure what can be issue. Can anybody help me to make it work!! -- View this message in context: http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275p4057831.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275p4058267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what is the maximum XML file size to import?
On 4/23/2013 6:02 AM, Sharmila Thapa wrote: What is the maximum size limit of the XML document file that is allowed to import into solr to index from java -Durl. As I am testing to import XMLfile of 5 GB and it throws an error like SimplePostTool: WARNING: Solr returned an error #400 Bad Request SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://10.0.1.140:8080/solr/solr1/update Unless the simple post tool is capable of breaking the input XML into many pieces, you'll run into the POST size limit of your servlet container. I don't know if it has this capability, but I would be somewhat surprised if it did. Solr is packaged so the example uses jetty (start.jar), but you may be running under tomcat or one of a few other choices. The history of the POST limit in Solr is a little complex. The example jetty config in Solr 3.x (and possibly earlier) used a 1MiB POST buffer. You could change that value with no problem. If you used another container, you could change it using that container's configuration method. When 4.0 was released, jetty 8.x had a bug and the 1MiB configuration in the example wasn't working, so the limit became 200KB, jetty's default. Just like earlier versions, if you were using another container, you could change the limit using that container's configuration. The bug in jetty has now been fixed. https://bugs.eclipse.org/bugs/show_bug.cgi?id=397130 Solr 4.1 changed things, with SOLR-4265. Now Solr controls the max POST size itself, defaulting formdataUploadLimitInKB in solrconfig.xml to 2048. https://issues.apache.org/jira/browse/SOLR-4265 Thanks, Shawn
Re: Export Index and Re-Index XML
Hi, I have done this many times. First use a curl job or something to download the complete index as CSV q=*:*rows=999wt=csv Then use post.jar to push that csv into the new node. Alternatively you can query with XML and use xslt update request handler with parm tr=updateXml which is a stylesheet for indexing response XML directly. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 23. apr. 2013 kl. 02:11 skrev Kalyan Kuram kalyan.ku...@live.com: Thank you all very much for your help.I do have field configured as stored and index,i did read the FAQ from wiki,I think SolrEntityProcessor is what i think needed.I am trying to index the data from Adobe CQ and its a push based indexing and pain to index data from a very large repository.I think i can manage this with SolrEntityProcessor for now and will think of modelling data for re-indexing purposes Kalyan From: j...@basetechnology.com To: solr-user@lucene.apache.org Subject: Re: Export Index and Re-Index XML Date: Mon, 22 Apr 2013 19:54:26 -0400 Any fields which have stored values can be read and output, but indexed-only, non-stored fields cannot be read or exported. Even if they could be, their values are post-analysis, which means that there is a good chance that they cannot be run through term analysis again. It is always best to keep a copy of your raw source data separate from the data you add to Solr. Or, at least make sure any important data is stored. In short, you need to model your data for reindexing, which is a fact of life in Solr land. -- Jack Krupansky -Original Message- From: Kalyan Kuram Sent: Monday, April 22, 2013 7:07 PM To: solr-user@lucene.apache.org Subject: Export Index and Re-Index XML Hi AllI am new to solr and i wanted to know if i can export the Index as XML and then re-index back into Solr,The reason i need to do this is i misconfigured fieldtype and to make it work i need to re-index the content Kalyan
Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException
Hi, you need to escape that char in search terms. Special chars are + - ! ( ) { } [ ] ^ ~ * ? : \ / at the moment. The %2B is just the url encoding, but it will still be a + for Solr, so just put a \ in front of the chars I mentioned. Cheers, Kai Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez: Hi! Currently I'm working on a basica search engine for, the main problem is that during some tests a problem was detected, in the application if a user search for the + or - term only or the + string it causes an exception in my application, the problem is caused for an org.apache.lucene.queryParser.ParseException in solr. I get the same response if, from the solr admin interface, I search for the + term. For what I've seen the + character gets encoded into %2B which cause the exception. Is there any way of escaping this character so they behave like any other character? or at least get no response for this cases? I'm using solr 3.6.2, deployed in tomcat7. Greetings! http://www.uci.cu
Re: what is the maximum XML file size to import?
DataImportHandler might be a better way to import very large XML files if it can be loaded from Solr-local file system. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Apr 23, 2013 at 9:46 AM, Shawn Heisey s...@elyograg.org wrote: On 4/23/2013 6:02 AM, Sharmila Thapa wrote: What is the maximum size limit of the XML document file that is allowed to import into solr to index from java -Durl. As I am testing to import XMLfile of 5 GB and it throws an error like SimplePostTool: WARNING: Solr returned an error #400 Bad Request SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://10.0.1.140:8080/solr/solr1/update Unless the simple post tool is capable of breaking the input XML into many pieces, you'll run into the POST size limit of your servlet container. I don't know if it has this capability, but I would be somewhat surprised if it did. Solr is packaged so the example uses jetty (start.jar), but you may be running under tomcat or one of a few other choices. The history of the POST limit in Solr is a little complex. The example jetty config in Solr 3.x (and possibly earlier) used a 1MiB POST buffer. You could change that value with no problem. If you used another container, you could change it using that container's configuration method. When 4.0 was released, jetty 8.x had a bug and the 1MiB configuration in the example wasn't working, so the limit became 200KB, jetty's default. Just like earlier versions, if you were using another container, you could change the limit using that container's configuration. The bug in jetty has now been fixed. https://bugs.eclipse.org/bugs/show_bug.cgi?id=397130 Solr 4.1 changed things, with SOLR-4265. Now Solr controls the max POST size itself, defaulting formdataUploadLimitInKB in solrconfig.xml to 2048. https://issues.apache.org/jira/browse/SOLR-4265 Thanks, Shawn
Re: What to test, calculate, measeure for a pre-production version of SolrCloud?
To be clear, there are no solid and reliable prediction rules for Solr - for the simple reason that there are too many non-linear variables - you need to stand up a proof of concept system, load it with representative data and execute representative queries and then measure that system. You can then use those numbers to size your production system. I don't want to give you the impression that this notion of predicting or calculating the size of a production Solr system is a viable option. Sure, you can try and maybe you will get lucky and maybe you won't be lucky. Flip a coin. But what sane manager would want to plan production based on flipping a coin? -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, April 23, 2013 5:48 AM To: solr-user@lucene.apache.org Subject: What to test, calculate, measeure for a pre-production version of SolrCloud? Hi Folks; This week we will make a pre-production version of our system. I've been askng some questions for a time and I gor really good responses from mail list. At pre-production and test step: * I want to measure how much RAM I should define for my Solr instances, * I will try to make some predictions about how much disk space I will need at production step. * Maybe I will check my answer for that question: which RAID to use (or not use) etc. For that questions I got answers from mail list and I have some approximations about them. Also I know that it is not easy to answer such questions and I should test them to get more accurate answers. My question is that:: What do you suggest me at pre-production and test step? * i.e. give much more heap size to Solr instances to calculate RAM * use solrmeter to test qps for your cluster * use sematext or anything else for performance monitoring etc. I need some advices what to test, calculate, measeure etc. Also there was a question about Codahale metrics and Graphite. You can advice something about that too. PS: I use Solr 4.2.1 for tests but if Solr 4.3 becomes ready (if it is tagged at repository) I will use it.
Re: Problem with solr, HTTP/Request.php and tomcat.
On 4/23/2013 7:30 AM, Viviane Ventura wrote: I'm using solr with tomcat and i need to add a record using HTTP/Request.php (PEAR). So, i created a test file with the following code: ?php require_once HTTP/Request.php; At a quick glance (and not having much experience with PHP) your code looks like it SHOULD work, but something is obviously wrong. The Solr server's log should have something useful for you, if you are logging at INFO or higher. The exact location of the log will depend on your servlet container. For tomcat, that is generally in the catalina logfile. The log will include the request parameters, but it won't include the body. The error message may give you a clue, though. You would be better off using a PHP programming API specifically made for Solr, rather than using HTTP directly and sending XML. If you are using Solr 4.x, I believe that all of them may have bugs because Solr 4.0 finished removing options that were deprecated a long time ago, and the PHP programming APIs include those options. There are at least three API choices available: http://wiki.apache.org/solr/SolPHP The PECL plugin for Solr has a filed bug, to which I attached a patch. As it says in the bug notes, I probably didn't fix it right, but I have confirmed with a PHP user that it does fix the problem: https://bugs.php.net/bug.php?id=62332 Thanks, Shawn
Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException
Hi Kai: Thanks for your reply, for what I've understood this logic must be included in my application, It would be possible to, for instance, use some regular expression at querying time in my schema to avoid a query that contains only this characters? for instance + and + would be a good catch to avoid. Thanks in advance! - Mensaje original - De: Kai Becker m...@kai-becker.com Para: solr-user@lucene.apache.org Enviados: Martes, 23 de Abril 2013 9:48:26 Asunto: Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException Hi, you need to escape that char in search terms. Special chars are + - ! ( ) { } [ ] ^ ~ * ? : \ / at the moment. The %2B is just the url encoding, but it will still be a + for Solr, so just put a \ in front of the chars I mentioned. Cheers, Kai Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez: Hi! Currently I'm working on a basica search engine for, the main problem is that during some tests a problem was detected, in the application if a user search for the + or - term only or the + string it causes an exception in my application, the problem is caused for an org.apache.lucene.queryParser.ParseException in solr. I get the same response if, from the solr admin interface, I search for the + term. For what I've seen the + character gets encoded into %2B which cause the exception. Is there any way of escaping this character so they behave like any other character? or at least get no response for this cases? I'm using solr 3.6.2, deployed in tomcat7. Greetings! http://www.uci.cu http://www.uci.cu http://www.uci.cu
Re: What to test, calculate, measeure for a pre-production version of SolrCloud?
Another aspect I neglected to mention: Think about distinguishing between development, test, and production systems - all separately. Your development system is whether you try out ideas and experiment - your proof of concept. Your test or pre-production system is where you verify that your ideas are really ready to go - the test system should parallel the production system and approximate real load. And finally your production system is where you don't have the libery to just try stuff out. For real, cloud systems it's all about scaling of commodity boxes. Pick a reasonable size box and then put a reasonable amount of data on that box, then you can calculate how many boxes you will need for scaling (shards). And your HA (High Availability) and Query load requirements will drive how many replicas you will need for each shard. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Tuesday, April 23, 2013 9:54 AM To: solr-user@lucene.apache.org Subject: Re: What to test, calculate, measeure for a pre-production version of SolrCloud? To be clear, there are no solid and reliable prediction rules for Solr - for the simple reason that there are too many non-linear variables - you need to stand up a proof of concept system, load it with representative data and execute representative queries and then measure that system. You can then use those numbers to size your production system. I don't want to give you the impression that this notion of predicting or calculating the size of a production Solr system is a viable option. Sure, you can try and maybe you will get lucky and maybe you won't be lucky. Flip a coin. But what sane manager would want to plan production based on flipping a coin? -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, April 23, 2013 5:48 AM To: solr-user@lucene.apache.org Subject: What to test, calculate, measeure for a pre-production version of SolrCloud? Hi Folks; This week we will make a pre-production version of our system. I've been askng some questions for a time and I gor really good responses from mail list. At pre-production and test step: * I want to measure how much RAM I should define for my Solr instances, * I will try to make some predictions about how much disk space I will need at production step. * Maybe I will check my answer for that question: which RAID to use (or not use) etc. For that questions I got answers from mail list and I have some approximations about them. Also I know that it is not easy to answer such questions and I should test them to get more accurate answers. My question is that:: What do you suggest me at pre-production and test step? * i.e. give much more heap size to Solr instances to calculate RAM * use solrmeter to test qps for your cluster * use sematext or anything else for performance monitoring etc. I need some advices what to test, calculate, measeure etc. Also there was a question about Codahale metrics and Graphite. You can advice something about that too. PS: I use Solr 4.2.1 for tests but if Solr 4.3 becomes ready (if it is tagged at repository) I will use it.
Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException
If you want to allow your users to search for '+' , you also define your '+' as being a regular ALPHA characters: In config: delimiter_types.txt: # # We let +, # and * be part of normal words. # This is to let c++, c#, c* and RD as words. # + = ALPHA # = ALPHA * = ALPHA = ALPHA @ = ALPHA Then in your solr.WordDelimiterFilterFactory, use types=delimiter_types.txt You'll then be able to let your users search for + as part of a word. If you want to allow them to search for just '+' , a little hacking is necessary in your client code. Personally, I just double quote the query if it's only one char length. Can't be harmful and as it will turn your single + into + , it will be considered as a token (rather than being part of the query syntax) by the parser. Providing you're using the edismax parser, it should be just fine for any other queries, like '+ foo' , 'foo +', '++' ... J. On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cuwrote: Hi Kai: Thanks for your reply, for what I've understood this logic must be included in my application, It would be possible to, for instance, use some regular expression at querying time in my schema to avoid a query that contains only this characters? for instance + and + would be a good catch to avoid. Thanks in advance! - Mensaje original - De: Kai Becker m...@kai-becker.com Para: solr-user@lucene.apache.org Enviados: Martes, 23 de Abril 2013 9:48:26 Asunto: Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException Hi, you need to escape that char in search terms. Special chars are + - ! ( ) { } [ ] ^ ~ * ? : \ / at the moment. The %2B is just the url encoding, but it will still be a + for Solr, so just put a \ in front of the chars I mentioned. Cheers, Kai Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez: Hi! Currently I'm working on a basica search engine for, the main problem is that during some tests a problem was detected, in the application if a user search for the + or - term only or the + string it causes an exception in my application, the problem is caused for an org.apache.lucene.queryParser.ParseException in solr. I get the same response if, from the solr admin interface, I search for the + term. For what I've seen the + character gets encoded into %2B which cause the exception. Is there any way of escaping this character so they behave like any other character? or at least get no response for this cases? I'm using solr 3.6.2, deployed in tomcat7. Greetings! http://www.uci.cu http://www.uci.cu http://www.uci.cu -- Jerome Eteve +44(0)7738864546 http://www.eteve.net/
Update on shards
Hi Is it correct that when inserting or updating document into solr you have to talk to a solr host where at least one shard of that collection is stored? For select you can talk to any host within the collection.configName? BR, Arkadi
Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException
Hi Jérôme: Thanks for your suggestion Jérôme, I'll do as you told me for allowing the search of this specific tokens. I've also taked into account the option of add the quote if lenght is 1 in the application level, but I would like to keep this logic inside of Solr (if possible), this is why I was thinking of some kind of replace regular expresion at query time, so if this change in the future it won't require also changing the application level, can you advice me on this? Greetings! - Mensaje original - De: Jérôme Étévé jerome.et...@gmail.com Para: solr-user@lucene.apache.org Enviados: Martes, 23 de Abril 2013 10:44:39 Asunto: Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException If you want to allow your users to search for '+' , you also define your '+' as being a regular ALPHA characters: In config: delimiter_types.txt: # # We let +, # and * be part of normal words. # This is to let c++, c#, c* and RD as words. # + = ALPHA # = ALPHA * = ALPHA = ALPHA @ = ALPHA Then in your solr.WordDelimiterFilterFactory, use types=delimiter_types.txt You'll then be able to let your users search for + as part of a word. If you want to allow them to search for just '+' , a little hacking is necessary in your client code. Personally, I just double quote the query if it's only one char length. Can't be harmful and as it will turn your single + into + , it will be considered as a token (rather than being part of the query syntax) by the parser. Providing you're using the edismax parser, it should be just fine for any other queries, like '+ foo' , 'foo +', '++' ... J. On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cuwrote: Hi Kai: Thanks for your reply, for what I've understood this logic must be included in my application, It would be possible to, for instance, use some regular expression at querying time in my schema to avoid a query that contains only this characters? for instance + and + would be a good catch to avoid. Thanks in advance! - Mensaje original - De: Kai Becker m...@kai-becker.com Para: solr-user@lucene.apache.org Enviados: Martes, 23 de Abril 2013 9:48:26 Asunto: Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException Hi, you need to escape that char in search terms. Special chars are + - ! ( ) { } [ ] ^ ~ * ? : \ / at the moment. The %2B is just the url encoding, but it will still be a + for Solr, so just put a \ in front of the chars I mentioned. Cheers, Kai Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez: Hi! Currently I'm working on a basica search engine for, the main problem is that during some tests a problem was detected, in the application if a user search for the + or - term only or the + string it causes an exception in my application, the problem is caused for an org.apache.lucene.queryParser.ParseException in solr. I get the same response if, from the solr admin interface, I search for the + term. For what I've seen the + character gets encoded into %2B which cause the exception. Is there any way of escaping this character so they behave like any other character? or at least get no response for this cases? I'm using solr 3.6.2, deployed in tomcat7. Greetings! http://www.uci.cu http://www.uci.cu http://www.uci.cu -- Jerome Eteve +44(0)7738864546 http://www.eteve.net/ http://www.uci.cu http://www.uci.cu
Re: Update on shards
I believe as of 4.2 you can talk to any host in the cloud. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be wrote: Hi Is it correct that when inserting or updating document into solr you have to talk to a solr host where at least one shard of that collection is stored? For select you can talk to any host within the collection.configName? BR, Arkadi
Re: What to test, calculate, measeure for a pre-production version of SolrCloud?
The other thing to keep in the back of your mind as you go through this process is that search is addicting to most organizations. Meaning your Solr solution may quickly become a victim of its own success. The queries we tested before going production 5+ months ago and the queries we handle today are very different beasts. We're now dealing with much more complexity because when we started out, the business side didn't have a full appreciation for what was possible. Now that they've seen Solr in action (pun intended), my team can't keep up with all the great ideas our PM's have for how to leverage Solr in many places that were unforeseen during initial planning. We're entering in our third phase of adoption and are having to increase node count and RAM significantly. Bottom line is to do all the important things Otis and Jack have suggested, but also realize that what you design today may only be valid for 6 months or so. Of course I can't speak to your business situation but we've just accepted that we need to revisit our infrastructure decisions frequently. Admittedly, this is much easier in a cloud like Amazon than if you're buying your own hardware. Cheers, Tim On Tue, Apr 23, 2013 at 8:10 AM, Jack Krupansky j...@basetechnology.com wrote: Another aspect I neglected to mention: Think about distinguishing between development, test, and production systems - all separately. Your development system is whether you try out ideas and experiment - your proof of concept. Your test or pre-production system is where you verify that your ideas are really ready to go - the test system should parallel the production system and approximate real load. And finally your production system is where you don't have the libery to just try stuff out. For real, cloud systems it's all about scaling of commodity boxes. Pick a reasonable size box and then put a reasonable amount of data on that box, then you can calculate how many boxes you will need for scaling (shards). And your HA (High Availability) and Query load requirements will drive how many replicas you will need for each shard. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Tuesday, April 23, 2013 9:54 AM To: solr-user@lucene.apache.org Subject: Re: What to test, calculate, measeure for a pre-production version of SolrCloud? To be clear, there are no solid and reliable prediction rules for Solr - for the simple reason that there are too many non-linear variables - you need to stand up a proof of concept system, load it with representative data and execute representative queries and then measure that system. You can then use those numbers to size your production system. I don't want to give you the impression that this notion of predicting or calculating the size of a production Solr system is a viable option. Sure, you can try and maybe you will get lucky and maybe you won't be lucky. Flip a coin. But what sane manager would want to plan production based on flipping a coin? -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, April 23, 2013 5:48 AM To: solr-user@lucene.apache.org Subject: What to test, calculate, measeure for a pre-production version of SolrCloud? Hi Folks; This week we will make a pre-production version of our system. I've been askng some questions for a time and I gor really good responses from mail list. At pre-production and test step: * I want to measure how much RAM I should define for my Solr instances, * I will try to make some predictions about how much disk space I will need at production step. * Maybe I will check my answer for that question: which RAID to use (or not use) etc. For that questions I got answers from mail list and I have some approximations about them. Also I know that it is not easy to answer such questions and I should test them to get more accurate answers. My question is that:: What do you suggest me at pre-production and test step? * i.e. give much more heap size to Solr instances to calculate RAM * use solrmeter to test qps for your cluster * use sematext or anything else for performance monitoring etc. I need some advices what to test, calculate, measeure etc. Also there was a question about Codahale metrics and Graphite. You can advice something about that too. PS: I use Solr 4.2.1 for tests but if Solr 4.3 becomes ready (if it is tagged at repository) I will use it.
Solr index searcher to lucene index searcher
Hi , Can anyone please point out from where a solr search originates and how it passes to the lucene index searcher and back to solr . I actually what to know which class in solr directly calls the lucene Index Searcher . Thanks. Pom
Re: Solr index searcher to lucene index searcher
org.apache.solr.search.SolrIndexSearcher On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar parnab.2...@gmail.com wrote: Hi , Can anyone please point out from where a solr search originates and how it passes to the lucene index searcher and back to solr . I actually what to know which class in solr directly calls the lucene Index Searcher . Thanks. Pom
EdgeGram filter
Hi, I want to edgeNgram let's say this document that has 'difficult contents' so that if i query (using disman) q=dif it shows me this result. This is working fine. But now if i search for q=con it gives me this document as well. is there any way to only show this document when i search for 'dif' or 'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and 'content'. Any help? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Update on shards
If you use jetty - which you should :) It's what we test with. Tomcat only gets user testing. If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in 4.3 (we are voting on 4.3 now). No clue on other containers. - Mark On Apr 23, 2013, at 10:59 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I believe as of 4.2 you can talk to any host in the cloud. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be wrote: Hi Is it correct that when inserting or updating document into solr you have to talk to a solr host where at least one shard of that collection is stored? For select you can talk to any host within the collection.configName? BR, Arkadi
Re: Solr index searcher to lucene index searcher
Hi , Timothy,Thanks for pointing out . But i have a specific requirement . For any query it passes through the search handler and solr finally directs it to lucene Index Searcher. As results are matched and collected as TopDocs in lucene i want to inspect the top K Docs , reorder them by some logic and pass the final TopDocs to solr which solr may send as a response . I need to know the point where actually these interaction between solr and lucene takes place . Can anyone please help where to look into for this purpose . Thanks.. Pom On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter thelabd...@gmail.comwrote: org.apache.solr.search.SolrIndexSearcher On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar parnab.2...@gmail.com wrote: Hi , Can anyone please point out from where a solr search originates and how it passes to the lucene index searcher and back to solr . I actually what to know which class in solr directly calls the lucene Index Searcher . Thanks. Pom
Re: DocValues with docValuesFormat=Disk
Hi, If you use a codec which is not default, you need to download/build lucene codec jars and put it in solr_home/lib directory and add the codecfactory in the solr config file. Look here for detail instruction http://wiki.apache.org/solr/SimpleTextCodecExample Best, Mou -- View this message in context: http://lucene.472066.n3.nabble.com/DocValues-with-docValuesFormat-Disk-tp4058238p4058344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr index searcher to lucene index searcher
Perhaps http://search-lucene.com/?q=custom+hits+collector ? Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 23, 2013 at 12:32 PM, parnab kumar parnab.2...@gmail.com wrote: Hi , Timothy,Thanks for pointing out . But i have a specific requirement . For any query it passes through the search handler and solr finally directs it to lucene Index Searcher. As results are matched and collected as TopDocs in lucene i want to inspect the top K Docs , reorder them by some logic and pass the final TopDocs to solr which solr may send as a response . I need to know the point where actually these interaction between solr and lucene takes place . Can anyone please help where to look into for this purpose . Thanks.. Pom On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter thelabd...@gmail.comwrote: org.apache.solr.search.SolrIndexSearcher On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar parnab.2...@gmail.com wrote: Hi , Can anyone please point out from where a solr search originates and how it passes to the lucene index searcher and back to solr . I actually what to know which class in solr directly calls the lucene Index Searcher . Thanks. Pom
Re: Solr index searcher to lucene index searcher
Take a look at Solr's DelegatingCollector - this article might be of interest too: http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html On Tue, Apr 23, 2013 at 10:32 AM, parnab kumar parnab.2...@gmail.com wrote: Hi , Timothy,Thanks for pointing out . But i have a specific requirement . For any query it passes through the search handler and solr finally directs it to lucene Index Searcher. As results are matched and collected as TopDocs in lucene i want to inspect the top K Docs , reorder them by some logic and pass the final TopDocs to solr which solr may send as a response . I need to know the point where actually these interaction between solr and lucene takes place . Can anyone please help where to look into for this purpose . Thanks.. Pom On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter thelabd...@gmail.comwrote: org.apache.solr.search.SolrIndexSearcher On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar parnab.2...@gmail.com wrote: Hi , Can anyone please point out from where a solr search originates and how it passes to the lucene index searcher and back to solr . I actually what to know which class in solr directly calls the lucene Index Searcher . Thanks. Pom
Re: Is there a way to load multiple schema when using zookeeper?
Yes, you can effectively chroot all the configs for a collection (to support multiple collections in same ensemble) - see wiki: http://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot On Tue, Apr 23, 2013 at 11:23 AM, bbarani bbar...@gmail.com wrote: I have used multiple schema files by using multiple cores but not sure if I will be able to use multiple schema configuration when integrating SOLR with zookeeper. Can someone please let me know if its possible and if so, how? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-load-multiple-schema-when-using-zookeeper-tp4058358.html Sent from the Solr - User mailing list archive at Nabble.com.
Is there a way to load multiple schema when using zookeeper?
I have used multiple schema files by using multiple cores but not sure if I will be able to use multiple schema configuration when integrating SOLR with zookeeper. Can someone please let me know if its possible and if so, how? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-load-multiple-schema-when-using-zookeeper-tp4058358.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to load multiple schema when using zookeeper?
: Yes, you can effectively chroot all the configs for a collection (to : support multiple collections in same ensemble) - see wiki: : http://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot I don't think chroot is suitable for what's being asked about here ... that would completely isolate two cloud clusters from eachother. i believe the intent of the question is asking about having a cloud cluster in which multiple collections exist, and some collections use differnet schema.xml files then other collections the short answer is: absolutely, each collecion can use completley differnet sets of configs (just like in non-cloud mode each core can use distinct configs) but to help make management easier when you have many collections re-using the same set of configs there is the concept of a config set ... you can push a set of configs into ZK with a specific configName, nad then refer to that config set name when creating collections... https://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API -Hoss
Autocommit and replication have been slowing down
Hi, We migrated recently from Solr 1.4 to 3.6.1. In the new version we have noticed that after some hours (around 8) the autocommit is taking more time to be executed. In the new version we have noticed that after some hours the autocommit is taking more time to be executed. We configured autocommit with maxDocs=50 and maxTime=1ms but we've gotten few (3-5) minutes to index documents (I got this time seeing the docsPending on the Update Stats and refresh page. Is there another way to verify that information?). A similar problem has been happening with the replication. We configured the pollInterval with 60s but the replication takes some minutes to be executed. You could see the timeElapsed value (around 6 minutes) on the Replication Stats. After a server restart the indexing works as we expected for some hours. Our solrconfig.xml file is almost the default. We just increased some params on filterCache, queryResultCache and queryResultWindowSize. Has anyone ever had same problem? Could someone has a hint or direction where to start? *** Update Handlers name:updateHandler class: org.apache.solr.update.DirectUpdateHandler2 version: 1.0 description: Update handler that efficiently directly updates the on-disk main lucene index stats: commits : 1085 autocommit maxDocs : 50 autocommit maxTime : 1ms autocommits : 1085 optimizes : 0 rollbacks : 0 expungeDeletes : 0 docsPending : 18 adds : 18 deletesById : 5 deletesByQuery : 0 errors : 0 cumulative_adds : 6294 cumulative_deletesById : 5397 cumulative_deletesByQuery : 0 cumulative_errors : 0 *** Replication Stats stats: handlerStart : 1366654495647 requests : 0 errors : 0 timeouts : 0 totalTime : 0 avgTimePerRequest : NaN avgRequestsPerSecond : 0.0 indexSize : 2.29 GB indexVersion : 1354902172888 generation : 121266 indexPath : /opt/solr/data/index.20130418170401 isMaster : false isSlave : true masterUrl : http://master:9090/solr/replication pollInterval : 00:00:60 isPollingDisabled : false isReplicating : true timeElapsed : 376 bytesDownloaded : 35835 downloadSpeed : 95 previousCycleTimeInSeconds : 0 indexReplicatedAt : Tue Apr 23 13:44:52 BRT 2013 confFilesReplicatedAt : Mon Mar 18 10:27:00 BRT 2013 replicationFailedAt : Mon Apr 22 08:05:00 BRT 2013 timesFailed : 6 timesIndexReplicated : 45318 lastCycleBytesDownloaded : 35835 timesConfigReplicated : 3 confFilesReplicated : [schema.xml] Thanks, Gustavo Nasu -- View this message in context: http://lucene.472066.n3.nabble.com/Autocommit-and-replication-have-been-slowing-down-tp4058361.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrEntityProcessor doesn't grok responseHeader tag in Ancient Solr 1.2 source
Hi, I'd like to use the SolrEntityProcessor to partially migrate an old index to Solr 4.1. The source is pretty old (dated 2006-06-10 16:05:12Z)... maybe Solr 1.2? My data-config.xml is based on the SolrEntityProcessor example http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor and wt=xml. I'm getting an error from SolrJ complaining about responseHeader status0/status QTime1/QTime /responseHeader in the response. Does anyone know of a work-around? Thanks, Tricia 1734 T12 C0 oasc.SolrException.log SEVERE Exception while processing: sep document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: org.apache.solr.common.SolrException: parsing error Caused by: org.apache.solr.common.SolrException: parsing error Caused by: java.lang.RuntimeException: this must be known type! not: responseHeader at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:222) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:128) ... 43 more
Re: SolrEntityProcessor doesn't grok responseHeader tag in Ancient Solr 1.2 source
You might be out of luck with the SolrEntityProcessor I'd recommend writing a simple little script that pages through /select?q=*:* from the source Solr and write to the destination Solr. Back in the day there was this fun little beast https://github.com/erikhatcher/solr-ruby-flare/blob/master/solr-ruby/lib/solr/importer/solr_source.rb where you could do something like this: Solr::Indexer.new(SolrSource.new(...), mapping).index Erik On Apr 23, 2013, at 13:41 , P Williams wrote: Hi, I'd like to use the SolrEntityProcessor to partially migrate an old index to Solr 4.1. The source is pretty old (dated 2006-06-10 16:05:12Z)... maybe Solr 1.2? My data-config.xml is based on the SolrEntityProcessor example http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor and wt=xml. I'm getting an error from SolrJ complaining about responseHeader status0/status QTime1/QTime /responseHeader in the response. Does anyone know of a work-around? Thanks, Tricia 1734 T12 C0 oasc.SolrException.log SEVERE Exception while processing: sep document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: org.apache.solr.common.SolrException: parsing error Caused by: org.apache.solr.common.SolrException: parsing error Caused by: java.lang.RuntimeException: this must be known type! not: responseHeader at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:222) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:128) ... 43 more
Re: Update on shards
Oopps, Mark you said: If you use tomcat, this won't work in 4.2 or 4.2.1 Can you explain more what won't be at Tomcat and what will change at 4.3? 2013/4/23 Mark Miller markrmil...@gmail.com If you use jetty - which you should :) It's what we test with. Tomcat only gets user testing. If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in 4.3 (we are voting on 4.3 now). No clue on other containers. - Mark On Apr 23, 2013, at 10:59 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I believe as of 4.2 you can talk to any host in the cloud. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be wrote: Hi Is it correct that when inserting or updating document into solr you have to talk to a solr host where at least one shard of that collection is stored? For select you can talk to any host within the collection.configName? BR, Arkadi
Re: Update on shards
The request proxying does not work with tomcat without calling an explicit flush in the code - jetty (which the unit tests are written against) worked without this flush. The flush is added to 4.3. - Mark On Apr 23, 2013, at 2:02 PM, Furkan KAMACI furkankam...@gmail.com wrote: Oopps, Mark you said: If you use tomcat, this won't work in 4.2 or 4.2.1 Can you explain more what won't be at Tomcat and what will change at 4.3? 2013/4/23 Mark Miller markrmil...@gmail.com If you use jetty - which you should :) It's what we test with. Tomcat only gets user testing. If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in 4.3 (we are voting on 4.3 now). No clue on other containers. - Mark On Apr 23, 2013, at 10:59 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I believe as of 4.2 you can talk to any host in the cloud. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be wrote: Hi Is it correct that when inserting or updating document into solr you have to talk to a solr host where at least one shard of that collection is stored? For select you can talk to any host within the collection.configName? BR, Arkadi
Re: Update on shards
Sorry but I want to make clears the things in my mind. Is there any documentation that explains Solr proxying? Is it same thing with that: when I use SolrCloud and if I send document any of the nodes at my cluster the document will be routed into the leader of appropriate shard. So you mean I can not do that if I use Tomcat? 2013/4/23 Mark Miller markrmil...@gmail.com The request proxying does not work with tomcat without calling an explicit flush in the code - jetty (which the unit tests are written against) worked without this flush. The flush is added to 4.3. - Mark On Apr 23, 2013, at 2:02 PM, Furkan KAMACI furkankam...@gmail.com wrote: Oopps, Mark you said: If you use tomcat, this won't work in 4.2 or 4.2.1 Can you explain more what won't be at Tomcat and what will change at 4.3? 2013/4/23 Mark Miller markrmil...@gmail.com If you use jetty - which you should :) It's what we test with. Tomcat only gets user testing. If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in 4.3 (we are voting on 4.3 now). No clue on other containers. - Mark On Apr 23, 2013, at 10:59 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I believe as of 4.2 you can talk to any host in the cloud. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be wrote: Hi Is it correct that when inserting or updating document into solr you have to talk to a solr host where at least one shard of that collection is stored? For select you can talk to any host within the collection.configName? BR, Arkadi
Re: EdgeGram filter
Well, you could copy to another field (using copyField) and then have an analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and then apply the EdgeNGramFilter to that one token. But you would have to query explicitly against that other field. Since you are using dismax, you should be able to add that second field to the qf parameter. And then remove the EdgeNGramFilter from your main field. -- Jack Krupansky -Original Message- From: hassancrowdc Sent: Tuesday, April 23, 2013 12:09 PM To: solr-user@lucene.apache.org Subject: EdgeGram filter Hi, I want to edgeNgram let's say this document that has 'difficult contents' so that if i query (using disman) q=dif it shows me this result. This is working fine. But now if i search for q=con it gives me this document as well. is there any way to only show this document when i search for 'dif' or 'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and 'content'. Any help? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Update on shards
Yeah, I'm confused now too. Do all Solr nodes in a distributed cloud really have to run in the same container type?? Why isn't it just raw HTTP for one cloud no to talk to another? I mean each node could/should be on another machine, right? -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, April 23, 2013 2:33 PM To: solr-user@lucene.apache.org Subject: Re: Update on shards Sorry but I want to make clears the things in my mind. Is there any documentation that explains Solr proxying? Is it same thing with that: when I use SolrCloud and if I send document any of the nodes at my cluster the document will be routed into the leader of appropriate shard. So you mean I can not do that if I use Tomcat? 2013/4/23 Mark Miller markrmil...@gmail.com The request proxying does not work with tomcat without calling an explicit flush in the code - jetty (which the unit tests are written against) worked without this flush. The flush is added to 4.3. - Mark On Apr 23, 2013, at 2:02 PM, Furkan KAMACI furkankam...@gmail.com wrote: Oopps, Mark you said: If you use tomcat, this won't work in 4.2 or 4.2.1 Can you explain more what won't be at Tomcat and what will change at 4.3? 2013/4/23 Mark Miller markrmil...@gmail.com If you use jetty - which you should :) It's what we test with. Tomcat only gets user testing. If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in 4.3 (we are voting on 4.3 now). No clue on other containers. - Mark On Apr 23, 2013, at 10:59 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I believe as of 4.2 you can talk to any host in the cloud. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be wrote: Hi Is it correct that when inserting or updating document into solr you have to talk to a solr host where at least one shard of that collection is stored? For select you can talk to any host within the collection.configName? BR, Arkadi
Re: Update on shards
This request proxying only applies to the read side. The write side forwards updates around, it doesn't proxy requests. - Mark On Apr 23, 2013, at 2:33 PM, Furkan KAMACI furkankam...@gmail.com wrote: Sorry but I want to make clears the things in my mind. Is there any documentation that explains Solr proxying? Is it same thing with that: when I use SolrCloud and if I send document any of the nodes at my cluster the document will be routed into the leader of appropriate shard. So you mean I can not do that if I use Tomcat? 2013/4/23 Mark Miller markrmil...@gmail.com The request proxying does not work with tomcat without calling an explicit flush in the code - jetty (which the unit tests are written against) worked without this flush. The flush is added to 4.3. - Mark On Apr 23, 2013, at 2:02 PM, Furkan KAMACI furkankam...@gmail.com wrote: Oopps, Mark you said: If you use tomcat, this won't work in 4.2 or 4.2.1 Can you explain more what won't be at Tomcat and what will change at 4.3? 2013/4/23 Mark Miller markrmil...@gmail.com If you use jetty - which you should :) It's what we test with. Tomcat only gets user testing. If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in 4.3 (we are voting on 4.3 now). No clue on other containers. - Mark On Apr 23, 2013, at 10:59 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I believe as of 4.2 you can talk to any host in the cloud. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be wrote: Hi Is it correct that when inserting or updating document into solr you have to talk to a solr host where at least one shard of that collection is stored? For select you can talk to any host within the collection.configName? BR, Arkadi
Re: Autocommit and replication have been slowing down
On 4/23/2013 11:27 AM, gustavonasu wrote: We migrated recently from Solr 1.4 to 3.6.1. In the new version we have noticed that after some hours (around 8) the autocommit is taking more time to be executed. In the new version we have noticed that after some hours the autocommit is taking more time to be executed. We configured autocommit with maxDocs=50 and maxTime=1ms but we've gotten few (3-5) minutes to index documents (I got this time seeing the docsPending on the Update Stats and refresh page. Is there another way to verify that information?). Your question is a bit jumbled so I don't know exactly what you are saying for all of this, but I'll attempt to answer what I can. Usually if your commits are taking a really long time, it means you're running into one of two problems: 1) It is taking a really long time to autowarm your Solr caches. In most cases, it is the filterCache that takes the time, but not always. You can see how long it takes to warm the entire searcher as well as each individual cache in the Statistics page of the admin UI. To fix this, you have to reduce the autowarmCount on your caches, reduce the complexity of your queries and filters or both. 2) Your Java heap is getting exhausted and Java is spending too much time doing full garbage collections so it can keep working. Eventually this problem will result in OOM (Out of Memory) errors in your Solr log. To fix this, raise your max heap, which is the -Xmx java option when starting your servlet container. Raising the java heap might also require that you add physical RAM to your server. On version 3.6, I believe that an index update/commit that results in segment merging will wait for that merging to complete. If you do a lot of indexing, eventually you will run into a very large merge, and that can take a lot of time. This would not explain why every autoCommit is taking a long time, though - it would only explain one out of dozens or hundreds. A similar problem has been happening with the replication. We configured the pollInterval with 60s but the replication takes some minutes to be executed. You could see the timeElapsed value (around 6 minutes) on the Replication Stats. If you optimize your index, or do enough index updates so that a large merge takes place, then a very large portion of your index will be comprised of brand new files, and if your index is large, that can take a long time to replicate. It is also possible for the java heap problem (mentioned above) to cause this. Thanks, Shawn
Re: Update on shards
On 4/23/2013 10:14 AM, Mark Miller wrote: If you use jetty - which you should :) It's what we test with. Tomcat only gets user testing. If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in 4.3 (we are voting on 4.3 now). No clue on other containers. - Mark On Apr 23, 2013, at 10:59 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I believe as of 4.2 you can talk to any host in the cloud. What exactly is the 'request proxying' thing that doesn't work on tomcat? Is this something different from basic SolrCloud operation where you send any kind of request to any server and they get directed where they need to go? I haven't heard of that not working on tomcat before. If there's a Jira issue that explains this in detail, you can just send me there. Thanks, Shawn
Re: dataimporthandler does not distribute documents on solr cloud
What version of Solr a re you using? In Solr 4.2+ if you don't specify numShards when creating the collection, the implicit document router will be used. DIH running under the implicit document router most likely would not distribute documents. If this is the case you'll need to recreate the collection specifying numShards. On Tue, Apr 23, 2013 at 7:15 AM, Montu v Boda montu.b...@highqsolutions.com wrote: Hi we solr cloud with 4 shards and when we try to import the data using dataimporthandler, it does not distribute documents in all 4 shards. Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/dataimporthandler-does-not-distribute-documents-on-solr-cloud-tp4058248.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
What is cluster overseer at SolrCloud?
When I read about SolrCloud wiki there writes something about cluster overseer. What is the role of that at read and write processes? How can I see which node is overseer at my cluster?
Re: EdgeGram filter
Hi, I was unable to find more info about LimitTokenCountFilterFactory in solr wiki. Is there any other place to get thorough description of what it does? Thanks. Alex. -Original Message- From: Jack Krupansky j...@basetechnology.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Apr 23, 2013 11:36 am Subject: Re: EdgeGram filter Well, you could copy to another field (using copyField) and then have an analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and then apply the EdgeNGramFilter to that one token. But you would have to query explicitly against that other field. Since you are using dismax, you should be able to add that second field to the qf parameter. And then remove the EdgeNGramFilter from your main field. -- Jack Krupansky -Original Message- From: hassancrowdc Sent: Tuesday, April 23, 2013 12:09 PM To: solr-user@lucene.apache.org Subject: EdgeGram filter Hi, I want to edgeNgram let's say this document that has 'difficult contents' so that if i query (using disman) q=dif it shows me this result. This is working fine. But now if i search for q=con it gives me this document as well. is there any way to only show this document when i search for 'dif' or 'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and 'content'. Any help? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is cluster overseer at SolrCloud?
On Apr 23, 2013, at 2:53 PM, Furkan KAMACI furkankam...@gmail.com wrote: When I read about SolrCloud wiki there writes something about cluster overseer. What is the role of that at read and write processes? How can I see which node is overseer at my cluster? The Overseer's main responsibility is to write the clusterstate.json file based on what individual nodes publish to ZooKeeper. It also does other things, like assign shard and node names. If the Overseer dies, another Overseer is elected and it starts processing the work queue where the dead Oveseer left off. You can see which node is the Overseer by going to the Cloud view in the admin UI. Click the Tree tab. Under /overseer_elect, click on the leader node. Part of it's id should tell you which node is acting as the overseer. - Mark
RE: EdgeGram filter
Always check the javadocs. There's a lot of info to be found there: http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilterFactory.html -Original message- From:alx...@aim.com alx...@aim.com Sent: Tue 23-Apr-2013 21:06 To: solr-user@lucene.apache.org Subject: Re: EdgeGram filter Hi, I was unable to find more info about LimitTokenCountFilterFactory in solr wiki. Is there any other place to get thorough description of what it does? Thanks. Alex. -Original Message- From: Jack Krupansky j...@basetechnology.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Apr 23, 2013 11:36 am Subject: Re: EdgeGram filter Well, you could copy to another field (using copyField) and then have an analyzer with a LimitTokenCountFilterFactory that accepts only 1 token, and then apply the EdgeNGramFilter to that one token. But you would have to query explicitly against that other field. Since you are using dismax, you should be able to add that second field to the qf parameter. And then remove the EdgeNGramFilter from your main field. -- Jack Krupansky -Original Message- From: hassancrowdc Sent: Tuesday, April 23, 2013 12:09 PM To: solr-user@lucene.apache.org Subject: EdgeGram filter Hi, I want to edgeNgram let's say this document that has 'difficult contents' so that if i query (using disman) q=dif it shows me this result. This is working fine. But now if i search for q=con it gives me this document as well. is there any way to only show this document when i search for 'dif' or 'di'. basically i want to edgegram 'difficultcontent' not 'difficult' and 'content'. Any help? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/EdgeGram-filter-tp4058337.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is cluster overseer at SolrCloud?
Thanks for the explanation. 2013/4/23 Mark Miller markrmil...@gmail.com On Apr 23, 2013, at 2:53 PM, Furkan KAMACI furkankam...@gmail.com wrote: When I read about SolrCloud wiki there writes something about cluster overseer. What is the role of that at read and write processes? How can I see which node is overseer at my cluster? The Overseer's main responsibility is to write the clusterstate.json file based on what individual nodes publish to ZooKeeper. It also does other things, like assign shard and node names. If the Overseer dies, another Overseer is elected and it starts processing the work queue where the dead Oveseer left off. You can see which node is the Overseer by going to the Cloud view in the admin UI. Click the Tree tab. Under /overseer_elect, click on the leader node. Part of it's id should tell you which node is acting as the overseer. - Mark
Re: Update on shards
On Apr 23, 2013, at 2:49 PM, Shawn Heisey s...@elyograg.org wrote: What exactly is the 'request proxying' thing that doesn't work on tomcat? Is this something different from basic SolrCloud operation where you send any kind of request to any server and they get directed where they need to go? I haven't heard of that not working on tomcat before. Before 4.2, if you made a read request to a node that didn't contain part of the collection you where searching, it would return 404. Write requests would be forwarded to where they belong no matter what node you sent them to, but read requests required that node have a part of the collection you were accessing. In 4.2 we added request proxying for this read side case. If a piece of the collection you are querying is not found on the node you hit, a simple proxy of the request is done to a node that does contain a piece of the collection. - Mark
Re: dataimporthandler does not distribute documents on solr cloud
Actually, it is Solr 4.1+ where the implicit router will be used if nuShards is not specified. On Tue, Apr 23, 2013 at 2:52 PM, Joel Bernstein joels...@gmail.com wrote: What version of Solr a re you using? In Solr 4.2+ if you don't specify numShards when creating the collection, the implicit document router will be used. DIH running under the implicit document router most likely would not distribute documents. If this is the case you'll need to recreate the collection specifying numShards. On Tue, Apr 23, 2013 at 7:15 AM, Montu v Boda montu.b...@highqsolutions.com wrote: Hi we solr cloud with 4 shards and when we try to import the data using dataimporthandler, it does not distribute documents in all 4 shards. Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/dataimporthandler-does-not-distribute-documents-on-solr-cloud-tp4058248.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks -- Joel Bernstein Professional Services LucidWorks
Re: Update on shards
Hi Mark; All in all you say that when 4.3 is tagged at repository (I mean when it is ready) this feature will work for Tomcat too at a stable version? 2013/4/23 Mark Miller markrmil...@gmail.com On Apr 23, 2013, at 2:49 PM, Shawn Heisey s...@elyograg.org wrote: What exactly is the 'request proxying' thing that doesn't work on tomcat? Is this something different from basic SolrCloud operation where you send any kind of request to any server and they get directed where they need to go? I haven't heard of that not working on tomcat before. Before 4.2, if you made a read request to a node that didn't contain part of the collection you where searching, it would return 404. Write requests would be forwarded to where they belong no matter what node you sent them to, but read requests required that node have a part of the collection you were accessing. In 4.2 we added request proxying for this read side case. If a piece of the collection you are querying is not found on the node you hit, a simple proxy of the request is done to a node that does contain a piece of the collection. - Mark
Re: SolrEntityProcessor doesn't grok responseHeader tag in Ancient Solr 1.2 source
Thanks Erik. I remember Solr Flare :) On Tue, Apr 23, 2013 at 11:56 AM, Erik Hatcher erik.hatc...@gmail.comwrote: You might be out of luck with the SolrEntityProcessor I'd recommend writing a simple little script that pages through /select?q=*:* from the source Solr and write to the destination Solr. Back in the day there was this fun little beast https://github.com/erikhatcher/solr-ruby-flare/blob/master/solr-ruby/lib/solr/importer/solr_source.rb where you could do something like this: Solr::Indexer.new(SolrSource.new(...), mapping).index Erik On Apr 23, 2013, at 13:41 , P Williams wrote: Hi, I'd like to use the SolrEntityProcessor to partially migrate an old index to Solr 4.1. The source is pretty old (dated 2006-06-10 16:05:12Z)... maybe Solr 1.2? My data-config.xml is based on the SolrEntityProcessor example http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor and wt=xml. I'm getting an error from SolrJ complaining about responseHeader status0/status QTime1/QTime /responseHeader in the response. Does anyone know of a work-around? Thanks, Tricia 1734 T12 C0 oasc.SolrException.log SEVERE Exception while processing: sep document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: org.apache.solr.common.SolrException: parsing error Caused by: org.apache.solr.common.SolrException: parsing error Caused by: java.lang.RuntimeException: this must be known type! not: responseHeader at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:222) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:128) ... 43 more
Re: Using Solr For a Real Search Engine
At first I will work on 100 Solr nodes and I want to use Tomcat as container and deploy Solr as a war. I just wonder what folks are using for large systems and what kind of problems or benefits they have with their choices. 2013/3/26 Otis Gospodnetic otis.gospodne...@gmail.com Hi, This question is too open-ended for anyone to give you a good answer. Maybe you want to ask more specific questions? As for embedding vs. war, start with a simpler war and think about the alternatives if that doesn't work for you. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI furkankam...@gmail.com wrote: If I want to use Solr in a web search engine what kind of strategies should I follow about how to run Solr. I mean I can run it via embedded jetty or use war and deploy to a container? You should consider that I will have heavy work load on my Solr.
Re: Is there a way to load multiple schema when using zookeeper?
Ah cool, thanks for clarifying Chris - some of that multi-config management stuff gets confusing but much clearer from your description. Cheers, Tim On Tue, Apr 23, 2013 at 11:36 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Yes, you can effectively chroot all the configs for a collection (to : support multiple collections in same ensemble) - see wiki: : http://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot I don't think chroot is suitable for what's being asked about here ... that would completely isolate two cloud clusters from eachother. i believe the intent of the question is asking about having a cloud cluster in which multiple collections exist, and some collections use differnet schema.xml files then other collections the short answer is: absolutely, each collecion can use completley differnet sets of configs (just like in non-cloud mode each core can use distinct configs) but to help make management easier when you have many collections re-using the same set of configs there is the concept of a config set ... you can push a set of configs into ZK with a specific configName, nad then refer to that config set name when creating collections... https://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API -Hoss
Re: spellcheck: change in behavior and QTime
I apologize for the length of the previous message. I do see a problem with spellcheck becoming faster (notice QTime). I also see an increase in the number of cache hits if spellcheck=false is run one time followed by the original spellcheck query. Seems like spellcheck=false alters the behavior of spellcheck. http://host/solr/select?spellcheck=truespellcheck.q=cucoo's+nestdf=spell http://host/solr/select?spellcheck=falsespellcheck.q=cucoo's+nestdf=spell http://host/solr/select?spellcheck=truespellcheck.q=cucoo's+nestdf=spell --- see a faster response and increase in the number of query cache hits. Thanks. -- Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/spellcheck-change-in-behavior-and-QTime-tp4058014p4058402.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Update on shards
We have a 3rd release candidate for 4.3 being voted on now. I have never tested this feature with Tomcat - only Jetty. Users have reported it does not work with Tomcat. That leads one to think it may have a problem in other containers as well. A previous contributor donated a patch that explicitly flushes a stream in our proxy code - he says this allows the feature to work with Tomcat. I committed this feature - the flush can't hurt, and given the previous contributions of this individual, I'm fairly confident the fix makes things work in Tomcat. I have no first hand knowledge that it does work though. You might take the RC for a spin and test it our yourself: http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/ - Mark On Apr 23, 2013, at 3:20 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Mark; All in all you say that when 4.3 is tagged at repository (I mean when it is ready) this feature will work for Tomcat too at a stable version? 2013/4/23 Mark Miller markrmil...@gmail.com On Apr 23, 2013, at 2:49 PM, Shawn Heisey s...@elyograg.org wrote: What exactly is the 'request proxying' thing that doesn't work on tomcat? Is this something different from basic SolrCloud operation where you send any kind of request to any server and they get directed where they need to go? I haven't heard of that not working on tomcat before. Before 4.2, if you made a read request to a node that didn't contain part of the collection you where searching, it would return 404. Write requests would be forwarded to where they belong no matter what node you sent them to, but read requests required that node have a part of the collection you were accessing. In 4.2 we added request proxying for this read side case. If a piece of the collection you are querying is not found on the node you hit, a simple proxy of the request is done to a node that does contain a piece of the collection. - Mark
Re: Using Solr For a Real Search Engine
Tomcat should work just fine in most cases. The downside to Tomcat is that all of the devs generally run Jetty since it's the default. Also, all of our units tests run against Jetty - in fact, a specific version of Jetty. Usually, Solr will run fine in other webapps. Many, many users run Solr in other webapps. All of our tests run against a specific version of Jetty though. In some (generally rare) cases, that means something might work with Jetty and not another container until/unless the issue is reported by a user and fixed. - Mark On Apr 23, 2013, at 3:25 PM, Furkan KAMACI furkankam...@gmail.com wrote: At first I will work on 100 Solr nodes and I want to use Tomcat as container and deploy Solr as a war. I just wonder what folks are using for large systems and what kind of problems or benefits they have with their choices. 2013/3/26 Otis Gospodnetic otis.gospodne...@gmail.com Hi, This question is too open-ended for anyone to give you a good answer. Maybe you want to ask more specific questions? As for embedding vs. war, start with a simpler war and think about the alternatives if that doesn't work for you. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI furkankam...@gmail.com wrote: If I want to use Solr in a web search engine what kind of strategies should I follow about how to run Solr. I mean I can run it via embedded jetty or use war and deploy to a container? You should consider that I will have heavy work load on my Solr.
Re: Is there a way to load multiple schema when using zookeeper?
If I have a Zookeper Cluster for my Hbase Cluster already, can I use same Zookeper cluster for my SolrCloud too? 2013/4/23 Timothy Potter thelabd...@gmail.com Ah cool, thanks for clarifying Chris - some of that multi-config management stuff gets confusing but much clearer from your description. Cheers, Tim On Tue, Apr 23, 2013 at 11:36 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Yes, you can effectively chroot all the configs for a collection (to : support multiple collections in same ensemble) - see wiki: : http://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot I don't think chroot is suitable for what's being asked about here ... that would completely isolate two cloud clusters from eachother. i believe the intent of the question is asking about having a cloud cluster in which multiple collections exist, and some collections use differnet schema.xml files then other collections the short answer is: absolutely, each collecion can use completley differnet sets of configs (just like in non-cloud mode each core can use distinct configs) but to help make management easier when you have many collections re-using the same set of configs there is the concept of a config set ... you can push a set of configs into ZK with a specific configName, nad then refer to that config set name when creating collections... https://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API -Hoss
Re: Solr index searcher to lucene index searcher
As Timothy mentioned, Solr has the PostFIlter mechanism, but it's not really suited for ranking/sorting changes. To effect the ranking you'd need to work with the TopScoreDocCollector which Solr does not give you access to. If you're doing distributed search you'd need to account for the ranking algorithm at the aggregation step as well. There is a pluggable collectors jira that builds under Solr 4.1 (SOLR-4465) but it is a proof of concept at this time. You may want to chime in on this ticket if you find it useful. On Tue, Apr 23, 2013 at 1:21 PM, Timothy Potter thelabd...@gmail.comwrote: Take a look at Solr's DelegatingCollector - this article might be of interest too: http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html On Tue, Apr 23, 2013 at 10:32 AM, parnab kumar parnab.2...@gmail.com wrote: Hi , Timothy,Thanks for pointing out . But i have a specific requirement . For any query it passes through the search handler and solr finally directs it to lucene Index Searcher. As results are matched and collected as TopDocs in lucene i want to inspect the top K Docs , reorder them by some logic and pass the final TopDocs to solr which solr may send as a response . I need to know the point where actually these interaction between solr and lucene takes place . Can anyone please help where to look into for this purpose . Thanks.. Pom On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter thelabd...@gmail.com wrote: org.apache.solr.search.SolrIndexSearcher On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar parnab.2...@gmail.com wrote: Hi , Can anyone please point out from where a solr search originates and how it passes to the lucene index searcher and back to solr . I actually what to know which class in solr directly calls the lucene Index Searcher . Thanks. Pom -- Joel Bernstein Professional Services LucidWorks
Reordered DBQ.
Hi, Recently I noticed a lot of Reordered DBQs detected messages in logs. As far as I checked in logs it could be related with deleting documents, but not sure. Do you know what is the reason of those messages ? Apr 23, 2013 1:20:14 AM org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@68c8e122 realtime Apr 23, 2013 1:20:15 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection] webapp=/solr path=/update params={update.distrib=FROMLEADER_version_=-1433067860561756160update.from= http://host:8983/solr/collection/wt=javabinversion=2} {deleteByQuery=cmpy:1160027 (-1433067860561756160)} 0 1478 Apr 23, 2013 1:20:15 AM org.apache.solr.update.DirectUpdateHandler2 addDoc INFO: Reordered DBQs detected. Update=add{_version_=1433067860472627200,id=17183780} DBQs=[DBQ{version=1433067860561756160,q=cmpy:1160027}] Apr 23, 2013 1:20:15 AM org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@123289e6 realtime Apr 23, 2013 1:20:16 AM org.apache.solr.update.DirectUpdateHandler2 addDoc INFO: Reordered DBQs detected. Update=add{_version_=1433067860476821504,id=20102172} DBQs=[DBQ{version=1433067860561756160,q=cmpy:1160027}] Apr 23, 2013 1:20:16 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
Re: Using Solr For a Real Search Engine
Thanks for the answer. If I find something that explains using embedded Jetty or Jetty, or Tomcat it would be nice. 2013/4/23 Mark Miller markrmil...@gmail.com Tomcat should work just fine in most cases. The downside to Tomcat is that all of the devs generally run Jetty since it's the default. Also, all of our units tests run against Jetty - in fact, a specific version of Jetty. Usually, Solr will run fine in other webapps. Many, many users run Solr in other webapps. All of our tests run against a specific version of Jetty though. In some (generally rare) cases, that means something might work with Jetty and not another container until/unless the issue is reported by a user and fixed. - Mark On Apr 23, 2013, at 3:25 PM, Furkan KAMACI furkankam...@gmail.com wrote: At first I will work on 100 Solr nodes and I want to use Tomcat as container and deploy Solr as a war. I just wonder what folks are using for large systems and what kind of problems or benefits they have with their choices. 2013/3/26 Otis Gospodnetic otis.gospodne...@gmail.com Hi, This question is too open-ended for anyone to give you a good answer. Maybe you want to ask more specific questions? As for embedding vs. war, start with a simpler war and think about the alternatives if that doesn't work for you. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI furkankam...@gmail.com wrote: If I want to use Solr in a web search engine what kind of strategies should I follow about how to run Solr. I mean I can run it via embedded jetty or use war and deploy to a container? You should consider that I will have heavy work load on my Solr.
Too many unique terms
Hi there, Looking at one of my shards (about 1M docs) i see lot of unique terms, more than 8M which is a significant part of my total term count. These are very likely useless terms, binaries or other meaningless numbers that come with few of my docs. I am totally fine with deleting them so these terms would be unsearchable. Thinking about it i get that 1. It is impossible apriori knowing if it is unique term or not, so i cannot add them to my stop words. 2. I have a performance decrease cause my cached chuncks do contain useless data, and im short on memory. Assuming a constant index, is there a way of deleting all terms that are unique from at least the dictionary tim and tip files? Will i get significant query time performance increase? Does any body know a class of regex that identify meaningless terms that i can add to my updateProcessor? Thanks Manu
RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.
James, Is there a way to determine how many times the collations were tried? Is there a parameter that can be issued that can return this in debug information? This would be very helpful. Appreciate your help with this. Thanks. -- Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058400.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to load multiple schema when using zookeeper?
Yes - better use of existing resources. In this case, the chroot would be helpful to keep Solr znodes separate from HBase. For the most part, Solr in steady-state doesn't put a lot of stress on Zookeeper, for the most part my zk nodes are snoozing. On Tue, Apr 23, 2013 at 1:46 PM, Furkan KAMACI furkankam...@gmail.com wrote: If I have a Zookeper Cluster for my Hbase Cluster already, can I use same Zookeper cluster for my SolrCloud too? 2013/4/23 Timothy Potter thelabd...@gmail.com Ah cool, thanks for clarifying Chris - some of that multi-config management stuff gets confusing but much clearer from your description. Cheers, Tim On Tue, Apr 23, 2013 at 11:36 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Yes, you can effectively chroot all the configs for a collection (to : support multiple collections in same ensemble) - see wiki: : http://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot I don't think chroot is suitable for what's being asked about here ... that would completely isolate two cloud clusters from eachother. i believe the intent of the question is asking about having a cloud cluster in which multiple collections exist, and some collections use differnet schema.xml files then other collections the short answer is: absolutely, each collecion can use completley differnet sets of configs (just like in non-cloud mode each core can use distinct configs) but to help make management easier when you have many collections re-using the same set of configs there is the concept of a config set ... you can push a set of configs into ZK with a specific configName, nad then refer to that config set name when creating collections... https://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API -Hoss
Re: Is there a way to load multiple schema when using zookeeper?
On 4/23/2013 1:46 PM, Furkan KAMACI wrote: If I have a Zookeper Cluster for my Hbase Cluster already, can I use same Zookeper cluster for my SolrCloud too? Yes, you can. It is strongly recommended that you use a chroot with the zkHost parameter if you are sharing zookeeper. It's a really good idea to use a chroot even if you're not sharing. Here's an example zkHost parameter with a chroot of /mysolr. zoo1.example.com:2181,zoo2.example.com:2181,zoo3.example.com:2181/mysolr You only specify the chroot once at the end, not on every host entry. This information is also in the zookeeper documentation. Thanks, Shawn
Re: Is there a way to load multiple schema when using zookeeper?
I will use Nutch with map reduce to crawl huge data and use SolrCloud for many users with high response time. Actually I wonder about performance issues separating Zookeper cluster or using them for both Hbase and Solr. 2013/4/23 Shawn Heisey s...@elyograg.org On 4/23/2013 1:46 PM, Furkan KAMACI wrote: If I have a Zookeper Cluster for my Hbase Cluster already, can I use same Zookeper cluster for my SolrCloud too? Yes, you can. It is strongly recommended that you use a chroot with the zkHost parameter if you are sharing zookeeper. It's a really good idea to use a chroot even if you're not sharing. Here's an example zkHost parameter with a chroot of /mysolr. zoo1.example.com:2181,zoo2.example.com:2181,zoo3.example.com:2181/mysolr You only specify the chroot once at the end, not on every host entry. This information is also in the zookeeper documentation. Thanks, Shawn
Re: Using Solr For a Real Search Engine
My 2 cents on this is if you have a choice, just stick with Jetty. This article has some pretty convincing information: http://www.openlogic.com/wazi/bid/257366/Power-Java-based-web-apps-with-Jetty-application-server The folks over at OpenLogic definitely know their stuff when it comes to supporting open source Java app servers. I was impressed by the fact that Google migrated from Tomcat to Jetty for AppEngine, which is pretty compelling evidence that Jetty works well in a very large cluster. Lastly, the bulk of the processing in Solr happens in Solr/Lucene code and Jetty (or whatever engine you choose) is a very small part of any request. On Tue, Apr 23, 2013 at 1:52 PM, Furkan KAMACI furkankam...@gmail.com wrote: Thanks for the answer. If I find something that explains using embedded Jetty or Jetty, or Tomcat it would be nice. 2013/4/23 Mark Miller markrmil...@gmail.com Tomcat should work just fine in most cases. The downside to Tomcat is that all of the devs generally run Jetty since it's the default. Also, all of our units tests run against Jetty - in fact, a specific version of Jetty. Usually, Solr will run fine in other webapps. Many, many users run Solr in other webapps. All of our tests run against a specific version of Jetty though. In some (generally rare) cases, that means something might work with Jetty and not another container until/unless the issue is reported by a user and fixed. - Mark On Apr 23, 2013, at 3:25 PM, Furkan KAMACI furkankam...@gmail.com wrote: At first I will work on 100 Solr nodes and I want to use Tomcat as container and deploy Solr as a war. I just wonder what folks are using for large systems and what kind of problems or benefits they have with their choices. 2013/3/26 Otis Gospodnetic otis.gospodne...@gmail.com Hi, This question is too open-ended for anyone to give you a good answer. Maybe you want to ask more specific questions? As for embedding vs. war, start with a simpler war and think about the alternatives if that doesn't work for you. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI furkankam...@gmail.com wrote: If I want to use Solr in a web search engine what kind of strategies should I follow about how to run Solr. I mean I can run it via embedded jetty or use war and deploy to a container? You should consider that I will have heavy work load on my Solr.
Re: Using Solr For a Real Search Engine
Is there any documentation that explains using Jetty as embedded or not? I use Solr deployed at Tomcat but after you message I will consider about Jetty. If we think about other issues i.e. when I want to update my Solr jars/wars etc.(this is just an foo example) does any pros and cons Tomcat or Jetty has? 2013/4/23 Timothy Potter thelabd...@gmail.com My 2 cents on this is if you have a choice, just stick with Jetty. This article has some pretty convincing information: http://www.openlogic.com/wazi/bid/257366/Power-Java-based-web-apps-with-Jetty-application-server The folks over at OpenLogic definitely know their stuff when it comes to supporting open source Java app servers. I was impressed by the fact that Google migrated from Tomcat to Jetty for AppEngine, which is pretty compelling evidence that Jetty works well in a very large cluster. Lastly, the bulk of the processing in Solr happens in Solr/Lucene code and Jetty (or whatever engine you choose) is a very small part of any request. On Tue, Apr 23, 2013 at 1:52 PM, Furkan KAMACI furkankam...@gmail.com wrote: Thanks for the answer. If I find something that explains using embedded Jetty or Jetty, or Tomcat it would be nice. 2013/4/23 Mark Miller markrmil...@gmail.com Tomcat should work just fine in most cases. The downside to Tomcat is that all of the devs generally run Jetty since it's the default. Also, all of our units tests run against Jetty - in fact, a specific version of Jetty. Usually, Solr will run fine in other webapps. Many, many users run Solr in other webapps. All of our tests run against a specific version of Jetty though. In some (generally rare) cases, that means something might work with Jetty and not another container until/unless the issue is reported by a user and fixed. - Mark On Apr 23, 2013, at 3:25 PM, Furkan KAMACI furkankam...@gmail.com wrote: At first I will work on 100 Solr nodes and I want to use Tomcat as container and deploy Solr as a war. I just wonder what folks are using for large systems and what kind of problems or benefits they have with their choices. 2013/3/26 Otis Gospodnetic otis.gospodne...@gmail.com Hi, This question is too open-ended for anyone to give you a good answer. Maybe you want to ask more specific questions? As for embedding vs. war, start with a simpler war and think about the alternatives if that doesn't work for you. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 22, 2013 at 8:07 AM, Furkan KAMACI furkankam...@gmail.com wrote: If I want to use Solr in a web search engine what kind of strategies should I follow about how to run Solr. I mean I can run it via embedded jetty or use war and deploy to a container? You should consider that I will have heavy work load on my Solr.
Re: Using Solr For a Real Search Engine
On 4/23/2013 1:52 PM, Furkan KAMACI wrote: Thanks for the answer. If I find something that explains using embedded Jetty or Jetty, or Tomcat it would be nice. 2013/4/23 Mark Miller markrmil...@gmail.com Tomcat should work just fine in most cases. The downside to Tomcat is that all of the devs generally run Jetty since it's the default. Also, all of our units tests run against Jetty - in fact, a specific version of Jetty. Usually, Solr will run fine in other webapps. Many, many users run Solr in other webapps. All of our tests run against a specific version of Jetty though. In some (generally rare) cases, that means something might work with Jetty and not another container until/unless the issue is reported by a user and fixed. Mark outlines a really good reason to use Jetty - it's extremely well tested. New tests are being added all the time, and most of those will start Jetty to run. If you don't already have a good reason to use a container other than the Jetty included in Solr, then go and copy the example setup and modify it until it does what you need. The one thing that's really missing is an init script to manage Solr startup and shutdown. I plan to do something about that, but I've got a lot of cleanup to do on it. I've only come across one truly compelling reason to use something else: If your system admins are already familiar with Tomcat, Glassfish, or something else, then you probably want to stick with that. For instance, you may have automation in place for deploying and managing farms of Tomcat servers. Switching would likely be too painful. There could be features useful for Solr in other containers that I don't know about. If there are, and someone has a good reason for needing those features, let us know about them. Update the wiki. Jetty is a low-overhead servlet container without a lot of fancy features. The Jetty instance that is included in the Solr example is a bare-bones setup. It does not include all of the jars or config found in a full Jetty download, because those features are not needed for Solr. Thanks, Shawn
Re: Using Solr For a Real Search Engine
According to answers here for a huge crawling system and high response time searching SolrCloud system I will try Jetty. If anyone has a good reason they can explain it here, you are right. By the way, Shawn when I read you answer I understand that I should choose embedded Jetty, is that right? 2013/4/23 Shawn Heisey s...@elyograg.org On 4/23/2013 1:52 PM, Furkan KAMACI wrote: Thanks for the answer. If I find something that explains using embedded Jetty or Jetty, or Tomcat it would be nice. 2013/4/23 Mark Miller markrmil...@gmail.com Tomcat should work just fine in most cases. The downside to Tomcat is that all of the devs generally run Jetty since it's the default. Also, all of our units tests run against Jetty - in fact, a specific version of Jetty. Usually, Solr will run fine in other webapps. Many, many users run Solr in other webapps. All of our tests run against a specific version of Jetty though. In some (generally rare) cases, that means something might work with Jetty and not another container until/unless the issue is reported by a user and fixed. Mark outlines a really good reason to use Jetty - it's extremely well tested. New tests are being added all the time, and most of those will start Jetty to run. If you don't already have a good reason to use a container other than the Jetty included in Solr, then go and copy the example setup and modify it until it does what you need. The one thing that's really missing is an init script to manage Solr startup and shutdown. I plan to do something about that, but I've got a lot of cleanup to do on it. I've only come across one truly compelling reason to use something else: If your system admins are already familiar with Tomcat, Glassfish, or something else, then you probably want to stick with that. For instance, you may have automation in place for deploying and managing farms of Tomcat servers. Switching would likely be too painful. There could be features useful for Solr in other containers that I don't know about. If there are, and someone has a good reason for needing those features, let us know about them. Update the wiki. Jetty is a low-overhead servlet container without a lot of fancy features. The Jetty instance that is included in the Solr example is a bare-bones setup. It does not include all of the jars or config found in a full Jetty download, because those features are not needed for Solr. Thanks, Shawn
RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.
If you enable debug-level logging for class org.apache.solr.spelling.SpellCheckCollator, you should get a log message for every collation it tries like this: Collation: will return zzz hits. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: SandeepM [mailto:skmi...@hotmail.com] Sent: Tuesday, April 23, 2013 2:13 PM To: solr-user@lucene.apache.org Subject: RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times. James, Is there a way to determine how many times the collations were tried? Is there a parameter that can be issued that can return this in debug information? This would be very helpful. Appreciate your help with this. Thanks. -- Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058400.html Sent from the Solr - User mailing list archive at Nabble.com.
Aw: Re: Support of field variants in solr
Ok, thanks for this hint i have two further questions to understand it completly. Settingup custom request handler makes it easier to avoid all the mapping parameters in the query but it would also be possible with one request handler and all mapping in the request arguments right? What about indexing, i there also a mechanism like this or should the application deside with target field to use? Gesendet: Dienstag, 23. April 2013 um 02:32 Uhr Von: Alexandre Rafalovitch arafa...@gmail.com An: solr-user@lucene.apache.org Betreff: Re: Support of field variants in solr To route different languages, you could use different request handlers and do different alias mapping. There are two alias mapping: On the way in for eDisMax: https://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming On the way out: https://wiki.apache.org/solr/CommonQueryParameters#Field_alias[https://wiki.apache.org/solr/CommonQueryParameters#Field_alias] Between the two, you can make sure that all searches to /searchES map 'content' field to 'content_es' and for /searchDE map 'content' to 'content_de'. Hope this helps, Alex. Personal blog: http://blog.outerthoughts.com/[http://blog.outerthoughts.com/] LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch[http://www.linkedin.com/in/alexandrerafalovitch] - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Apr 22, 2013 at 2:31 PM, Timo Schmidt timo-schm...@gmx.net wrote: Hi together, i am timo and work for a solr implementation company. During the last projects we came to know that we need to be able to generate different variants of a document. Example 1 (Language): To handle all documents in one solr core, we need a field variant for each language. content for spanish content field name=content type=text_es indexed=true stored=true variant=“es“ / content for german content field name=content type=text_de indexed=true stored=true variant=“de“ / Each of these fields can be configured in the solr schema to act optimal for the specific taget language. Example 2 (Stores): We have customers who want to sell the same product in different stores for different prices. price in frankfurt field name=price type=sfloat indexed=true stored=true variant=“fr“ / price in paris field name=price type=sfloat indexed=true stored=true variant=“pr“ / To solve this in an optimal way it would be nice when this works complely transparent inside solr by definig a „variantQuery“ A select query could look like this: select?variantQuery=frqf=price,content Additional the following is possible. No variant is present, behavious should be as before, so it should be relevant for all queries. The setting variant=“*“ would mean: There can be several wildcard variant defined in a commited document. This makes sence when the data type would be the same for all variants and you will have many variants (like in the price example). The same as during query time should be possible during indexing time. I know, that we can do somthing like this also with dynamic fields but then we need to resolve the concrete fields during index and querytime on the application level, what is possible but it would be nicer to have a concept like this in solr, also working with facets is easier with this approach when the concrete fieldname does not need to be populated in the application. So my questions are: What do you think about this approach? Is it better to work with dynamic fields? Is it reasonable when you have 200 variants or more of a document? What needs to be done in solr to have something like this variant attribute for fields? Do you have other approaches?
Re: Using Solr For a Real Search Engine
On 4/23/2013 2:25 PM, Furkan KAMACI wrote: Is there any documentation that explains using Jetty as embedded or not? I use Solr deployed at Tomcat but after you message I will consider about Jetty. If we think about other issues i.e. when I want to update my Solr jars/wars etc.(this is just an foo example) does any pros and cons Tomcat or Jetty has? The Jetty in the example is only 'embedded' in the sense that you don't have to install it separately. It is not special -- the Jetty components are not changed at all, a subset of them is just included in the Solr download with a tuned configuration file. If you go to www.eclipse.org/jetty and download the latest stable-8 version, you'll see some familiar things - start.jar, an etc directory, a lib directory, and a contexts directory. They have more in them than the example does -- extra functionality Solr doesn't need. If you want to start the downloaded version, you can use 'java -jar start.jar' just like you do with Solr. Thanks, Shawn
Re: Using Solr For a Real Search Engine
Thanks for the answers. I will go with embedded Jetty for my SolrCloud. If I face with something important I would want to share my experiences with you. 2013/4/23 Shawn Heisey s...@elyograg.org On 4/23/2013 2:25 PM, Furkan KAMACI wrote: Is there any documentation that explains using Jetty as embedded or not? I use Solr deployed at Tomcat but after you message I will consider about Jetty. If we think about other issues i.e. when I want to update my Solr jars/wars etc.(this is just an foo example) does any pros and cons Tomcat or Jetty has? The Jetty in the example is only 'embedded' in the sense that you don't have to install it separately. It is not special -- the Jetty components are not changed at all, a subset of them is just included in the Solr download with a tuned configuration file. If you go to www.eclipse.org/jetty and download the latest stable-8 version, you'll see some familiar things - start.jar, an etc directory, a lib directory, and a contexts directory. They have more in them than the example does -- extra functionality Solr doesn't need. If you want to start the downloaded version, you can use 'java -jar start.jar' just like you do with Solr. Thanks, Shawn
minGramSize
Hi, I want my minGramSize in ngram filter to be the size of the word passed in the query. how can i do that? Because if i put minsize to 2 and write in abc it gives me result for ab and bc i just want abc or what ever the length of my word is, i want it to be the minGram Size. how can i do that? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/minGramSize-tp4058450.html Sent from the Solr - User mailing list archive at Nabble.com.
Book text with chapter line number
Hello. I'm trying to figure out if Solr is going to work for a new project that I am wanting to build. At it's heart it's a book text searching application. Each book is broken into chapters and each chapter is broken into lines. I want to be able to search these books and return relevant sections of the book and display the results with chapter and line number. I'm not sure how I would structure my data so that it's efficient and functional. I could simply treat each line of text as a document which would provide some of the functionality but what if the search query spanned two lines? Then it seems the passage the user was searching for wouldn't be returned. I could treat each book as a document and use highlighting to find the context but that seems to limit weighting/results for best matches as well as difficultly in finding chapter/line numbers. What is the best way to do this with Solr? Is there a better tool to use to solve my problem?
Re: Reordered DBQ.
On Tue, Apr 23, 2013 at 3:51 PM, Marcin Rzewucki mrzewu...@gmail.com wrote: Recently I noticed a lot of Reordered DBQs detected messages in logs. As far as I checked in logs it could be related with deleting documents, but not sure. Do you know what is the reason of those messages ? For high throughput indexing, we version updates on the leader and forward onto other replicas w/o strict serialization. If on a leader, an add happened before a DBQ, then on a replica the DBQ is serviced before the add, Solr detects this reordering and fixes it. It's not an error or an indication that anything is wrong (hence the INFO level log message). -Yonik http://lucidworks.com
Re: Too many close, count -1
: Subject: Re: Too many close, count -1 Thanks for the details, nothing jumps out at me, but we're now tracking this in SOLR-4753... https://issues.apache.org/jira/browse/SOLR-4753 -Hoss
Re: Solr index searcher to lucene index searcher
: . For any query it passes through the search handler and solr finally : directs it to lucene Index Searcher. As results are matched and collected : as TopDocs in lucene i want to inspect the top K Docs , reorder them by : some logic and pass the final TopDocs to solr which solr may send as a : response . can you elaborate on what exactly your some logic involves? instead of writing a custom collector, using a function query may be the best solution. https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: Autocommit and replication have been slowing down
Hi Shawn, Thanks for the answer. If I understand well the autoWarmCount is the number of elements used from the cache for new searches. I guess that this isn't the problem because after the commit property increases on the UPDATE HANDLERS (admin UI) I can see the new docs in the searches result. Unfortunately I can't increase the java heap on the servers right now. So I was think to change some configurations to release some memory. For example, we could decrease the maxBufferedDocs value. Do you know if it will be effective? Best Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Autocommit-and-replication-have-been-slowing-down-tp4058361p4058459.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: minGramSize
Why are you bothering to use an Edge/NGram filter if you are setting the minGramSize to the token size?!! I mean, why bother - just skip the Edge/NGrem filter and it would give the same result - setting minGramSize to the token size means that there would be only a single gram and it would be identical to the token text. Now... tell us what you are really trying to accomplish with this diversion. -- Jack Krupansky -Original Message- From: hassancrowdc Sent: Tuesday, April 23, 2013 4:56 PM To: solr-user@lucene.apache.org Subject: minGramSize Hi, I want my minGramSize in ngram filter to be the size of the word passed in the query. how can i do that? Because if i put minsize to 2 and write in abc it gives me result for ab and bc i just want abc or what ever the length of my word is, i want it to be the minGram Size. how can i do that? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/minGramSize-tp4058450.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: minGramSize
Perhaps he needs different analyzer chains for index and query. Create the edge ngrams when indexing, but not when querying. wunder On Apr 23, 2013, at 2:44 PM, Jack Krupansky wrote: Why are you bothering to use an Edge/NGram filter if you are setting the minGramSize to the token size?!! I mean, why bother - just skip the Edge/NGrem filter and it would give the same result - setting minGramSize to the token size means that there would be only a single gram and it would be identical to the token text. Now... tell us what you are really trying to accomplish with this diversion. -- Jack Krupansky -Original Message- From: hassancrowdc Sent: Tuesday, April 23, 2013 4:56 PM To: solr-user@lucene.apache.org Subject: minGramSize Hi, I want my minGramSize in ngram filter to be the size of the word passed in the query. how can i do that? Because if i put minsize to 2 and write in abc it gives me result for ab and bc i just want abc or what ever the length of my word is, i want it to be the minGram Size. how can i do that? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/minGramSize-tp4058450.html Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org
Does SolrCloud supports QueryElevationComponent?
When I read Lucidworks' Solr Guide I saw that: Distributed searching does not support the QueryElevationComponent, which configures the top results for a given query regardless of Lucene's scoring is that still true for SolrCloud?
Re: Book text with chapter line number
There is no simple, obvious, and direct approach, right out of the box. Sure, you can highlight passages of raw text, right out of the box, but that won't give you chapters, pages, and line numbers. To do all of that, you would have to either: 1. Add chapter, page, and line number as part of the payload for each word. And add some custom document transformers to access the information. or 2. Index each line as a separate Solr document, with fields for book, chapter, page, and line number. -- Jack Krupansky -Original Message- From: Jason Funk Sent: Tuesday, April 23, 2013 5:02 PM To: solr-user@lucene.apache.org Subject: Book text with chapter line number Hello. I'm trying to figure out if Solr is going to work for a new project that I am wanting to build. At it's heart it's a book text searching application. Each book is broken into chapters and each chapter is broken into lines. I want to be able to search these books and return relevant sections of the book and display the results with chapter and line number. I'm not sure how I would structure my data so that it's efficient and functional. I could simply treat each line of text as a document which would provide some of the functionality but what if the search query spanned two lines? Then it seems the passage the user was searching for wouldn't be returned. I could treat each book as a document and use highlighting to find the context but that seems to limit weighting/results for best matches as well as difficultly in finding chapter/line numbers. What is the best way to do this with Solr? Is there a better tool to use to solve my problem?