Re: Update field properties via Schema Rest API ?
Updating a field isn't straightforward. Changing size from string to int would, if you don't re-index, break your index. The schema tells Slr how to interpret the binary bits it finds in the index. If there are no bits in the index for that field name, then no issue. If there already are bits in the index, changing the schema will cause Solr to get confused when those bits are returned from the index in the results of a query. It seems to me that you could get away with using dynamic fields. Start with size_s, which is a string field. Then, start adding a size_i field to your index. It is a new field containing the integer version. Because of the dynamic fields definitions in the schema, these fields would not require schema editing or core reloads. If you, however, want every document in tier index to have the integer size value, you will need to update those documents, adding that new field(and skipping/removing the string one if no-longer needed). Hope this helps. Upayavira On Sat, Sep 28, 2013, at 04:38 PM, bengates wrote: Haha, Thanks for your reply, that's what I'll do then. Unfortunately I can speak Java as well as I can speak ancient Chinese in Sign Language... ^^ -- View this message in context: http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907p4092507.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hello and help :)
If your app and solr aren't far apart, you shouldn't be afraid of multiple queries to solr per user request (I once discovered an app that did 36 hits to solr per user request, and despite such awfulness of design, no user ever complained about speed). You could do a query to solr for q=+user_id:X +date:[dateX TO dateY] to find out how many docs, then take the numFound value, if it is above Y, do a subsequent query to retrieve the docs, either all docs, or toes in the relevant date range. Don't know if that helps. Upayavira On Sun, Sep 29, 2013, at 05:15 PM, Matheus Salvia wrote: Thanks for the anwser. Yes, you understood it correctly. The method you proposed should work perfectly, except I do have one more requirement that I forgot to mention earlier, and I apologize for that. The true problem we are facing is: * find all documents for userID=x, where userID=x has more than y documents in the index between dateA and dateB And since dateA and dateB can be any dates, its impossible to save the count, since we cannot foresee what date and what count will be requested. 2013/9/28 Upayavira u...@odoko.co.uk To phrase your need more generically: * find all documents for userID=x, where userID=x has more than y documents in the index Is that correct? If it is, I'd probably do some work at index time. First guess, I'd keep a separate core, which has a very small document per user, storing just: * userID * docCount Then, when you add/delete a document, you use atomic updates to either increase or decrease the docCount on that user doc. Then you can use a pseudo join between these two cores relatively easily. q=user_id:x {!join fromIndex=user from=user_id to=user_id}+user_id:x +doc_count:[y TO *] Worst case, if you don't want to mess with your indexing code, I wonder if you could use a ScriptUpdateProcessor to do this work - not sure if you can have one add an entirely new, additional, document to the list, but may be possible. Upayavira On Fri, Sep 27, 2013, at 09:50 PM, Matheus Salvia wrote: Sure, sorry for the inconvenience. I'm having a little trouble trying to make a query in Solr. The problem is: I must be able retrieve documents that have the same value for a specified field, but they should only be retrieved if this value appeared more than X times for a specified user. In pseudosql it would be something like: select user_id from documents where my_field=my_value and (select count(*) from documents where my_field=my_value and user_id=super.user_id) X I Know that solr return a 'numFound' for each query you make, but I dont know how to retrieve this value in a subquery. My Solr is organized in a way that a user is a document, and the properties of the user (such as name, age, etc) are grouped in another document with a 'root_id' field. So lets suppose the following query that gets all the root documents whose children have the prefix some_prefix. is_root:true AND _query_:{!join from=root_id to=id}requests_prefix:\some_prefix\ Now, how can I get the root documents (users in some sense) that have more than X children matching 'requests_prefix:some_prefix' or any other condition? Is it possible? P.S. It must be done in a single query, fields can be added at will, but the root/children structure should be preserved (preferentially). 2013/9/27 Upayavira u...@odoko.co.uk Mattheus, Given these mails form a part of an archive that are themselves self-contained, can you please post your actual question here? You're more likely to get answers that way. Thanks, Upayavira On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote: Hello everyone, I'm having a problem regarding how to make a solr query, I've posted it on stackoverflow. Can someone help me? http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter Thanks in advance! -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332 Skype: meta.faraday -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332 Skype: meta.faraday -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332 Skype: meta.faraday
Re: Maximum solr processes per machine
Bram Van Dam wrote On 09/29/2013 04:03 PM, adfel70 wrote: If you're doing real time on a 5TB index then you'll probably want to throw your money at the fastest storage you can afford (SSDs vs spinning rust made a huge difference in our benchmarks) and the fastest CPUs you can get your hands on. Memory is important too, but in our benchmarks that didn't have as much impact as the other factors. Keeping a 5TB index in memory is going to be tricky, so in my opinion you'd be better off investing in faster disks instead. Can you please elaborate on your benchmarks? what was the cluster size, which hardware (CPUs, RAM size, disk type...) and so on? This info might really help us. Also, if I understand you correctly, from certain index size, the impact of RAM size is less important than disk performance? -- View this message in context: http://lucene.472066.n3.nabble.com/Maximum-solr-processes-per-machine-tp4092568p4092651.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: autocomplete_edge type split words
in fact, I've removed the autoGeneratePhraseQuery=true, and it doesn't change anything. behaviour is the same with or without (ie request with debugQuery=on is the same) Thanks for your comments. Best, Elisabeth 2013/9/28 Erick Erickson erickerick...@gmail.com You've probably been doing this right along, but adding debug=query will show the parsed query. I really question though, your apparent combination of autoGeneratePhraseQuery what looks like an ngram field. I'm not at all sure how those would interact... Best, Erick On Fri, Sep 27, 2013 at 10:12 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Yes! what I've done is set autoGeneratePhraseQueries to true for my field, then give it a boost (bq=myAutompleteEdgeNGramField=my query with spaces^50). This only worked with autoGeneratePhraseQueries=true, for a reason I didn't understand. since when I did q= myAutompleteEdgeNGramField=my query with spaces, I didn't need autoGeneratePhraseQueries set to true. and, another thing is when I tried q=myAutocompleteNGramField:(my query with spaces) OR myAutompleteEdgeNGramField=my query with spaces (with a request handler with edismax and default operator field = AND), the request on myAutocompleteNGramField would OR the grams, so I had to put an AND (myAutocompleteNGramField:(my AND query AND with AND spaces)), which was pretty ugly. I don't always understand what is exactly going on. If you have a pointer to some text I could read to get more insights about this, please let me know. Thanks again, Best regards, Elisabeth 2013/9/27 Erick Erickson erickerick...@gmail.com Have you looked at autoGeneratePhraseQueries? That might help. If that doesn't work, you can always do something like add an OR clause like OR original query and optionally boost it high. But I'd start with the autoGenerate bits. Best, Erick On Fri, Sep 27, 2013 at 7:37 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Thanks for your answer. So I guess if someone wants to search on two fields, on with phrase query and one with normal query (splitted in words), one has to find a way to send query twice: one with quote and one without... Best regards, Elisabeth 2013/9/27 Erick Erickson erickerick...@gmail.com This is a classic issue where there's confusion between the query parser and field analysis. Early in the process the query parser has to take the input and break it up. that's how, for instance, a query like text:term1 term2 gets parsed as text:term1 defaultfield:term2 This happens long before the terms get to the analysis chain for the field. So your only options are to either quote the string or escape the spaces. Best, Erick On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I am using solr 4.2.1 and I have a autocomplete_edge type defined in schema.xml fieldType name=autocomplete_edge class=solr.TextField analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=\s+ replacement= replace=all/ filter class=solr.EdgeNGramFilterFactory maxGramSize=30 minGramSize=1/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=\s+ replacement= replace=all/ filter class=solr.PatternReplaceFilterFactory pattern=^(.{30})(.*)? replacement=$1 replace=all/ /analyzer /fieldType When I have a request with more then one word, for instance rue de la, my request doesn't match with my autocomplete_edge field unless I use quotes around the query. In other words q=rue de la doesnt work and q=rue de la works. I've check the request with debugQuery=on, and I can see in first case, the query is splitted into words, and I don't understand why since my field type uses KeywordTokenizerFactory. Does anyone have a clue on how I can request my field without using quotes? Thanks, Elisabeth
documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json
Hi all I'm trying out the tutorial about solrcloud, and then I manage to write my own plugin to import data from our set of databases, I use SolrWriter from DataImporter package and the docs could be distributed commit to shards. Every thing works fine using jetty from the solr example, but when I move to tomcat, solrcloud seems not been configured right. As the documents are just committed to the shard where update requested goes to. The cause probably is the range is null for shards in clusterstate.json. The router is implicit instead of compositeId as well. Is there anything missed or configured wrong in the following steps? How could I fix it. Your help will be much of my appreciation. PS, solr cloud tomcat wiki page isn't up to 4.4 with core discovery, I'm trying out after reading SoclrCloud, SolrCloudJboss, and CoreAdmin wiki pages. Here's what I've done and some useful logs: 1. start three zookeeper server. 2. upload configuration files to zookeeper, the collection name is content_collection 3. start three tomcat instants on three server with core discovery a) core file: name=content loadOnStartup=true transient=false shard=shard1 (differrent on servers) collection=content_collection b) solr.xml solr solrcloud str name=host${host:}/str str name=hostContext${hostContext:solr}/str int name=hostPort8080/int int name=zkClientTimeout${zkClientTimeout:15000}/int str name=zkHost10.199.46.176:2181,10.199.46.165:2181, 10.199.46.158:2181/str bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr 4. In the solr.log, I see the three shards are recognized, and the solrcloud can see the content_collection has three shards as well. 5. write documents to content_collection using my update request, the documents only commits to the shard the request goes to, in the log I can see the DistributedUpdateProcessorFactory is in the processorChain and disribute commit is triggered: INFO - 2013-09-30 16:31:43.205; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; updata request processor factories: INFO - 2013-09-30 16:31:43.206; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.LogUpdateProcessorFactory@4ae7b77 INFO - 2013-09-30 16:31:43.207; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.*DistributedUpdateProcessorFactory* @5b2bc407 INFO - 2013-09-30 16:31:43.207; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.RunUpdateProcessorFactory@1652d654 INFO - 2013-09-30 16:31:43.283; org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onInit: commits: num=1 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1} INFO - 2013-09-30 16:31:43.284; org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 1 INFO - 2013-09-30 16:31:43.440; *org.apache.solr.update.SolrCmdDistributor; Distrib commit to*:[StdNode: http://10.199.46.176:8080/solr/content/, StdNode: http://10.199.46.165:8080/solr/content/] params:commit_end_point=truecommit=truesoftCommit=falsewaitSearcher=trueexpungeDeletes=false but the documents won't go to other shards, the other shards only has a request with not documents: INFO - 2013-09-30 16:31:43.841; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} INFO - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onInit: commits: num=1 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1} INFO - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 1 INFO - 2013-09-30 16:31:43.856; org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit. INFO - 2013-09-30 16:31:43.865; org.apache.solr.search.SolrIndexSearcher; Opening Searcher@3c74c144 main INFO - 2013-09-30 16:31:43.869; org.apache.solr.core.QuerySenderListener; QuerySenderListener sending requests to Searcher@3c74c144main{StandardDirectoryReader(segments_1:1:nrt)} INFO - 2013-09-30 16:31:43.869; org.apache.solr.core.QuerySenderListener; QuerySenderListener done. INFO - 2013-09-30 16:31:43.869; org.apache.solr.core.SolrCore; [content] Registered new searcher Searcher@3c74c144main{StandardDirectoryReader(segments_1:1:nrt)} INFO - 2013-09-30 16:31:43.870; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush INFO - 2013-09-30
solr 4.4 config trouble
Hi, I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I can't get it to work. If someone can point me at what I'm doing wrong. tomcat context: Context docBase=/opt/solr4.4/dist/solr-4.4.0.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/opt/solr4.4/solr_address override=true / /Context core.properties: name=address collection=address coreNodeName=address dataDir=/opt/indexes4.1/address solr.xml: ?xml version=1.0 encoding=UTF-8 ? solr solrcloud str name=host${host:}/str int name=hostPort8080/int str name=hostContextsolr_address/str int name=zkClientTimeout${zkClientTimeout:15000}/int bool name=genericCoreNodeNamesfalse/bool /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr In solrconfig.xml I have: luceneMatchVersion4.1/luceneMatchVersion dataDir/opt/indexes4.1/address/dataDir And the log4j logs in catalina.out: ... INFO: Deploying configuration descriptor solr_address.xml 0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – SolrDispatchFilter.init() 24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI solr.home: /opt/solr4.4/solr_address 26 [main] INFO org.apache.solr.core.SolrResourceLoader – new SolrResourceLoader for directory: '/opt/solr4.4/solr_address/' 176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container configuration from /opt/solr4.4/solr_address/solr.xml 272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/xslt 277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/lang 278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/velocity 283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer 991552899 284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into CoreContainer [instanceDir=/opt/solr4.4/solr_address/] 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting socketTimeout to: 0 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting urlScheme to: http:// 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting connTimeout to: 0 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxConnectionsPerHost to: 20 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting corePoolSize to: 0 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maximumPoolSize to: 2147483647 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxThreadIdleTime to: 5 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting sizeOfQueue to: -1 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting fairnessPolicy to: false 320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating new http client, config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false 420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)] 422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper client=192.168.10.206:2181 429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating new http client, config:maxConnections=500maxConnectionsPerHost=16socketTimeout=0connTimeout=0 487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for client to connect to ZooKeeper 540 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager – Watcher org.apache.solr.common.cloud.ConnectionManager@7dc21ece name:ZooKeeperConnection Watcher:192.168.10.206:2181 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None 541 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Client is connected to ZooKeeper 562 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /overseer/queue 578 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /overseer/collection-queue-work 591 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /live_nodes 595 [main] INFO org.apache.solr.cloud.ZkController – Register node as live in ZooKeeper:/live_nodes/192.168.10.206:8080_solr_address 600 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /live_nodes/192.168.10.206:8080_solr_address 606 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /collections 613 [main] INFO
Solr takes too long to start up
Hi all and thanks in advance for any help with this issue I am having... Loading halts here: Sep 30, 2013 9:38:04 AM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to Searcher@687de17d main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)} Once i flush the index and repopulate it loads up normally. I suspect somehow the index is getting corrupt. I also get the following errors on startup (these are related to the tomcat admin page which I do not use and solr has run fine in the past with them): NFO: QuerySenderListener sending requests to Searcher@252ac42e main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)} Sep 30, 2013 9:52:13 AM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler [http-bio-8080] Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 1018 ms Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardService startInternal INFO: Starting service Catalina Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardEngine startInternal INFO: Starting Servlet Engine: Apache Tomcat/7.0.30 Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig deployDescriptor INFO: Deploying configuration descriptor /etc/tomcat7/Catalina/localhost/host-manager.xml Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext resourcesStart SEVERE: Error starting static Resources java.lang.IllegalArgumentException: Document base /usr/share/tomcat7-admin/host-manager does not exist or is not a readable directory at org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140) at org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal SEVERE: Error in resourceStart() Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal SEVERE: Error getConfigured Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal SEVERE: Context [/host-manager] startup failed due to previous errors Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig deployDescriptor INFO: Deploying configuration descriptor /etc/tomcat7/Catalina/localhost/manager.xml Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext resourcesStart SEVERE: Error starting static Resources java.lang.IllegalArgumentException: Document base /usr/share/tomcat7-admin/manager does not exist or is not a readable directory at org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140) at org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal SEVERE: Error in
cookies sent by solrj to SOLR
Hello! We have recorded the tcp stream between the client using solrj to send requests to SOLR and arrived at the following header (body omitted): POST /solr/core0/select HTTP/1.1 Content-Charset: UTF-8 Content-Type: application/x-www-form-urlencoded; charset=UTF-8 User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0 Content-Length: 6163 Host: host:port Connection: Keep-Alive Cookie: visited=yes Cookie2: $Version=1 Can someone please explain what effect do both cookies have on the frontend solr that has shards underneath it? Solr 4.3.1. Thanks, Dmitry
Re: Solr takes too long to start up
As a follow up looks like it is related to this thread: http://lucene.472066.n3.nabble.com/spellcheck-causing-Core-Reload-to-hang-td4089866.html Disabling spellcheck gave a normal restart. On Sep 30, 2013, at 12:54 PM, Zenith wrote: Hi all and thanks in advance for any help with this issue I am having... Loading halts here: Sep 30, 2013 9:38:04 AM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to Searcher@687de17d main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)} Once i flush the index and repopulate it loads up normally. I suspect somehow the index is getting corrupt. I also get the following errors on startup (these are related to the tomcat admin page which I do not use and solr has run fine in the past with them): NFO: QuerySenderListener sending requests to Searcher@252ac42e main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)} Sep 30, 2013 9:52:13 AM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler [http-bio-8080] Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 1018 ms Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardService startInternal INFO: Starting service Catalina Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardEngine startInternal INFO: Starting Servlet Engine: Apache Tomcat/7.0.30 Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig deployDescriptor INFO: Deploying configuration descriptor /etc/tomcat7/Catalina/localhost/host-manager.xml Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext resourcesStart SEVERE: Error starting static Resources java.lang.IllegalArgumentException: Document base /usr/share/tomcat7-admin/host-manager does not exist or is not a readable directory at org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140) at org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal SEVERE: Error in resourceStart() Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal SEVERE: Error getConfigured Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal SEVERE: Context [/host-manager] startup failed due to previous errors Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig deployDescriptor INFO: Deploying configuration descriptor /etc/tomcat7/Catalina/localhost/manager.xml Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext resourcesStart SEVERE: Error starting static Resources java.lang.IllegalArgumentException: Document base /usr/share/tomcat7-admin/manager does not exist or is not a readable directory at org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140) at org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
OpenJDK or OracleJDK
Hi guyz, I am trying to setup a server. Could someone tell me if OpenJDK or OracleJDK will be best for Apache Solr over CentOS? Thanks a lot. -- Regards, Raheel Hasan
Re: solr 4.4 config trouble
Hi Marc, what exactly is not working - no obvious problemsin the logs as as I see Cheers, Siegfried Goeschl Am 30.09.2013 um 11:44 schrieb Marc des Garets m...@ttux.net: Hi, I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I can't get it to work. If someone can point me at what I'm doing wrong. tomcat context: Context docBase=/opt/solr4.4/dist/solr-4.4.0.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/opt/solr4.4/solr_address override=true / /Context core.properties: name=address collection=address coreNodeName=address dataDir=/opt/indexes4.1/address solr.xml: ?xml version=1.0 encoding=UTF-8 ? solr solrcloud str name=host${host:}/str int name=hostPort8080/int str name=hostContextsolr_address/str int name=zkClientTimeout${zkClientTimeout:15000}/int bool name=genericCoreNodeNamesfalse/bool /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr In solrconfig.xml I have: luceneMatchVersion4.1/luceneMatchVersion dataDir/opt/indexes4.1/address/dataDir And the log4j logs in catalina.out: ... INFO: Deploying configuration descriptor solr_address.xml 0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – SolrDispatchFilter.init() 24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI solr.home: /opt/solr4.4/solr_address 26 [main] INFO org.apache.solr.core.SolrResourceLoader – new SolrResourceLoader for directory: '/opt/solr4.4/solr_address/' 176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container configuration from /opt/solr4.4/solr_address/solr.xml 272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/xslt 277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/lang 278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/velocity 283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer 991552899 284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into CoreContainer [instanceDir=/opt/solr4.4/solr_address/] 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting socketTimeout to: 0 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting urlScheme to: http:// 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting connTimeout to: 0 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxConnectionsPerHost to: 20 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting corePoolSize to: 0 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maximumPoolSize to: 2147483647 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxThreadIdleTime to: 5 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting sizeOfQueue to: -1 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting fairnessPolicy to: false 320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating new http client, config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false 420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)] 422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper client=192.168.10.206:2181 429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating new http client, config:maxConnections=500maxConnectionsPerHost=16socketTimeout=0connTimeout=0 487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for client to connect to ZooKeeper 540 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager – Watcher org.apache.solr.common.cloud.ConnectionManager@7dc21ece name:ZooKeeperConnection Watcher:192.168.10.206:2181 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None 541 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Client is connected to ZooKeeper 562 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /overseer/queue 578 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /overseer/collection-queue-work 591 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /live_nodes 595 [main] INFO org.apache.solr.cloud.ZkController – Register node as live in
Re: solr 4.4 config trouble
http://www.coretechnologies.com/products/AlwaysUp/Apps/RunApacheSolrAsAService.html Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Mon, Sep 30, 2013 at 5:33 AM, Siegfried Goeschl sgoes...@gmx.at wrote: Hi Marc, what exactly is not working - no obvious problemsin the logs as as I see Cheers, Siegfried Goeschl Am 30.09.2013 um 11:44 schrieb Marc des Garets m...@ttux.net: Hi, I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I can't get it to work. If someone can point me at what I'm doing wrong. tomcat context: Context docBase=/opt/solr4.4/dist/solr-4.4.0.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/opt/solr4.4/solr_address override=true / /Context core.properties: name=address collection=address coreNodeName=address dataDir=/opt/indexes4.1/address solr.xml: ?xml version=1.0 encoding=UTF-8 ? solr solrcloud str name=host${host:}/str int name=hostPort8080/int str name=hostContextsolr_address/str int name=zkClientTimeout${zkClientTimeout:15000}/int bool name=genericCoreNodeNamesfalse/bool /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr In solrconfig.xml I have: luceneMatchVersion4.1/luceneMatchVersion dataDir/opt/indexes4.1/address/dataDir And the log4j logs in catalina.out: ... INFO: Deploying configuration descriptor solr_address.xml 0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – SolrDispatchFilter.init() 24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI solr.home: /opt/solr4.4/solr_address 26 [main] INFO org.apache.solr.core.SolrResourceLoader – new SolrResourceLoader for directory: '/opt/solr4.4/solr_address/' 176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container configuration from /opt/solr4.4/solr_address/solr.xml 272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/xslt 277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/lang 278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/velocity 283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer 991552899 284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into CoreContainer [instanceDir=/opt/solr4.4/solr_address/] 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting socketTimeout to: 0 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting urlScheme to: http:// 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting connTimeout to: 0 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxConnectionsPerHost to: 20 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting corePoolSize to: 0 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maximumPoolSize to: 2147483647 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxThreadIdleTime to: 5 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting sizeOfQueue to: -1 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting fairnessPolicy to: false 320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating new http client, config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false 420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)] 422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper client= 192.168.10.206:2181 429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating new http client, config:maxConnections=500maxConnectionsPerHost=16socketTimeout=0connTimeout=0 487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for client to connect to ZooKeeper 540 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager – Watcher org.apache.solr.common.cloud.ConnectionManager@7dc21ecename:ZooKeeperConnection Watcher: 192.168.10.206:2181 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None 541 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Client is connected to ZooKeeper 562 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath:
AW: Re: solr 4.4 config trouble
Not sure if you are doing your company a favour ;-) Cheers Siegfried Goeschl Von Samsung Mobile gesendet Ursprüngliche Nachricht Von: Kishan Parmar kishan@gmail.com Datum: An: solr-user@lucene.apache.org Betreff: Re: solr 4.4 config trouble http://www.coretechnologies.com/products/AlwaysUp/Apps/RunApacheSolrAsAService.html Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Mon, Sep 30, 2013 at 5:33 AM, Siegfried Goeschl sgoes...@gmx.at wrote: Hi Marc, what exactly is not working - no obvious problemsin the logs as as I see Cheers, Siegfried Goeschl Am 30.09.2013 um 11:44 schrieb Marc des Garets m...@ttux.net: Hi, I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I can't get it to work. If someone can point me at what I'm doing wrong. tomcat context: Context docBase=/opt/solr4.4/dist/solr-4.4.0.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/opt/solr4.4/solr_address override=true / /Context core.properties: name=address collection=address coreNodeName=address dataDir=/opt/indexes4.1/address solr.xml: ?xml version=1.0 encoding=UTF-8 ? solr solrcloud str name=host${host:}/str int name=hostPort8080/int str name=hostContextsolr_address/str int name=zkClientTimeout${zkClientTimeout:15000}/int bool name=genericCoreNodeNamesfalse/bool /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr In solrconfig.xml I have: luceneMatchVersion4.1/luceneMatchVersion dataDir/opt/indexes4.1/address/dataDir And the log4j logs in catalina.out: ... INFO: Deploying configuration descriptor solr_address.xml 0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – SolrDispatchFilter.init() 24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI solr.home: /opt/solr4.4/solr_address 26 [main] INFO org.apache.solr.core.SolrResourceLoader – new SolrResourceLoader for directory: '/opt/solr4.4/solr_address/' 176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container configuration from /opt/solr4.4/solr_address/solr.xml 272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/xslt 277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/lang 278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores in /opt/solr4.4/solr_address/conf/velocity 283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer 991552899 284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into CoreContainer [instanceDir=/opt/solr4.4/solr_address/] 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting socketTimeout to: 0 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting urlScheme to: http:// 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting connTimeout to: 0 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxConnectionsPerHost to: 20 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting corePoolSize to: 0 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maximumPoolSize to: 2147483647 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxThreadIdleTime to: 5 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting sizeOfQueue to: -1 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting fairnessPolicy to: false 320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating new http client, config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false 420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)] 422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper client= 192.168.10.206:2181 429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating new http client, config:maxConnections=500maxConnectionsPerHost=16socketTimeout=0connTimeout=0 487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for client to connect to ZooKeeper 540 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager – Watcher org.apache.solr.common.cloud.ConnectionManager@7dc21ecename:ZooKeeperConnection Watcher:
Re: OpenJDK or OracleJDK
On 09/30/2013 01:11 PM, Raheel Hasan wrote: Could someone tell me if OpenJDK or OracleJDK will be best for Apache Solr over CentOS? If you're using Java 7 (or 8) then it doesn't matter. If you're using Java 6, stick with the Oracle version.
Re: Solr sorting situation!
Anyone with any ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-sorting-situation-tp4091966p4092688.html Sent from the Solr - User mailing list archive at Nabble.com.
filterCache stats reported wrongly in solr admin?
Hi! Can it really be so that filterCache size is 63, inserts 103 and zero evictions? Is this a bug or am I misinterpreting the stats? http://pasteboard.co/9Dmkc4H.png Thanks, Dmitry
Re: Atomic updates with solr cloud in solr 4.4
The field variant_count is stored and is not the target of a copyfield. field name=variant_count type=int indexed=true stored=true required=true multiValued=false/ However I did notice that we were setting the same coreNodeName on both the shards in core.properties. Removing this property fixed the issue and updates succeed. What role does this play in handling updates and why were other queries using the select handler not failing? Thanks Sesha On Sat, Sep 21, 2013 at 7:59 PM, Yonik Seeley yo...@lucidworks.com wrote: I can't reproduce this. I tried starting up a 2 shard cluster and then followed the example here: http://yonik.com/solr/atomic-updates/ book1 was on shard2 (port 7574) and everything still worked fine. missing required field: variant_count Perhaps the problem is document specific... What can you say about this variant_count field? Is it stored? Is it the target of a copyField? -Yonik http://lucidworks.com On Tue, Sep 17, 2013 at 12:56 PM, Sesha Sendhil Subramanian seshasend...@indix.com wrote: curl http://localhost:8983/solr/search/update -H 'Content-type:application/json' -d ' [ { id: c8cce27c1d8129d733a3df3de68dd675!c8cce27c1d8129d733a3df3de68dd675, link_id_45454 : {set:abcdegff} } ]' I have two collections search and meta. I want to do an update in the search collection. If i pick a document in same shard : localhost:8983, the update succeeds 15350327 [qtp386373885-19] INFO org.apache.solr.update.processor.LogUpdateProcessor ? [search] webapp=/solr path=/update params={} {add=[6cfcb56ca52b56ccb1377a7f0842e74d!6cfcb56ca52b56ccb1377a7f0842e74d (1446444025873694720)]} 0 5 If i pick a document on a different shard : localhost:7574, the update fails 15438547 [qtp386373885-75] INFO org.apache.solr.update.processor.LogUpdateProcessor ? [search] webapp=/solr path=/update params={} {} 0 1 15438548 [qtp386373885-75] ERROR org.apache.solr.core.SolrCore ? org.apache.solr.common.SolrException: [doc=c8cce27c1d8129d733a3df3de68dd675!c8cce27c1d8129d733a3df3de68dd675] missing required field: variant_count at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:189) at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:556) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:692) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:392) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:117) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:101) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:65) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at
Re: filterCache stats reported wrongly in solr admin?
Looking at the code reveals that the put = insert operation increases the counter regardless of the duplicates. The size returns unique values only. Thanks to ehatcher for the hint. Dmitry On 30 Sep 2013 16:23, Dmitry Kan solrexp...@gmail.com wrote: Hi! Can it really be so that filterCache size is 63, inserts 103 and zero evictions? Is this a bug or am I misinterpreting the stats? http://pasteboard.co/9Dmkc4H.png Thanks, Dmitry
Re: Solr Autocomplete with did you means functionality handle misspell word like google
It's really simple indeed. Solr provide the SpellCheck[1] feature that allow you to do this. You have only to configure the RequestHandler and the Search Component. And of course develop a simple ui ( you can find an example in the velocity response handler Solritas[2] . Cheers [1] https://cwiki.apache.org/confluence/display/solr/Spell+Checking [2] https://cwiki.apache.org/confluence/display/solr/Velocity+Search+UI 2013/9/27 Otis Gospodnetic otis.gospodne...@gmail.com Hi, Not sure if Solr suggester can do this (can it, anyone?), but... shameless plug... I know http://sematext.com/products/autocomplete/index.html can do that. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Thu, Sep 26, 2013 at 8:26 AM, Suneel Pandey pandey.sun...@gmail.com wrote: http://lucene.472066.n3.nabble.com/file/n4092127/autocomplete.png Hi, I have implemented auto complete it's working file but, I want to implement autosuggestion like google (see above screen) . when someone typing misspell words suggestion should be show e.g: cmputer = computer. Please help me. - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Autocomplete-with-did-you-means-functionality-handle-misspell-word-like-google-tp4092127.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Solr and jvm Garbage Collection tuning
I think this could help : http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Cheers 2013/9/27 ewinclub7 ewincl...@hotmail.com ด้วยที่แทงบอลแบบออนไลน์กำลังมาแรงทำให้พวกโต๊ะบอลเดี๋ยวนี้ก็เริ่มขยับขยายมาเปิดรับแทงบอลออนไลน์เอง download goldclub http://www.goldclub.net/download/ เป้าหมายหลักในวิธีการเล่นคาสิโนนั้น มีเพื่อความเพลิดเพลินหรือความสนุก ไม่ใช่เพื่อมาหาเงินหรือหวังที่จะรวย เพราะนั้นเหมือนกับการที่เราเอาจิตใจของตนเองไปผูกติดกับวิธีการเล่นพนัน โปรโมชั่น goldclub slot http://www.goldclub.net/promotion/ เล่นสนุก เล่นง่าย พร้อมบริการอย่างเป็นกันเอง กับทีมงาน ผลบอลเมื่อคืนนี้ http://www.mixscore.com/result-score/ จากที่เราได้เห็นวิธีการเล่นการพนันที่เล่นกันง่ายนั่นก็เลยทำให้คนเรานั่นเกิดความคิดที่อยากจะลองเล่นการพนันลองดู สาเหตุที่ทำให้นักเล่นหน้าใหม่ได้หัดเล่นเกมส์ซะเป็นส่วนใหญ่ goldclub slot http://www.goldclub-slot.com/ เพราะแน่นอนว่าจากที่เคยไปเที่ยวประเทศไหนที่มีคาสิโนและเข้าไปลองเล่นดูก็คงจะได้สัมผัสถึงความคึกคักของคาสิโนนั้นๆ ถอนออกมาทั้ง 1200 บาทเลยก็ได้ หรือจะถอนมาแค่ 1000 บาท อีก 200 บาทเก็บไว้ เล่นอีก แบบนี้ก็ได้เล่นกัน -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-jvm-Garbage-Collection-tuning-tp1455467p4092328.html Sent from the Solr - User mailing list archive at Nabble.com. -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: OpenJDK or OracleJDK
hmm why is that so? Isnt Oracle's version a bit slow? On Mon, Sep 30, 2013 at 5:56 PM, Bram Van Dam bram.van...@intix.eu wrote: On 09/30/2013 01:11 PM, Raheel Hasan wrote: Could someone tell me if OpenJDK or OracleJDK will be best for Apache Solr over CentOS? If you're using Java 7 (or 8) then it doesn't matter. If you're using Java 6, stick with the Oracle version. -- Regards, Raheel Hasan
Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception
Hi Andreas, When using XPathEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessoryour DataSource must be of type DataSourceReader. You shouldn't be using BinURLDataSource, it's giving you the cast exception. Use URLDataSourcehttps://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/URLDataSource.html or FileDataSourcehttps://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/FileDataSource.htmlinstead. I don't think you need to specify namespaces, at least you didn't used to. The other thing that I've noticed is that the anywhere xpath expression // doesn't always work in DIH. You might have to be more specific. Cheers, Tricia On Sun, Sep 29, 2013 at 9:47 AM, Andreas Owen a...@conx.ch wrote: how dum can you get. obviously quite dum... i would have to analyze the html-pages with a nested instance like this: entity name=rec processor=XPathEntityProcessor url=file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml forEach=/docs/doc dataSource=main entity name=htm processor=XPathEntityProcessor url=${rec.urlParse} forEach=/xhtml:html dataSource=dataUrl field column=text xpath=//content / field column=h_2 xpath=//body / field column=text_nohtml xpath=//text / field column=h_1 xpath=//h:h1 / /entity /entity but i'm pretty sure the foreach is wrong and the xpath expressions. in the moment i getting the following error: Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream cannot be cast to java.io.Reader On 28. Sep 2013, at 1:39 AM, Andreas Owen wrote: ok i see what your getting at but why doesn't the following work: field xpath=//h:h1 column=h_1 / field column=text xpath=/xhtml:html/xhtml:body / i removed the tiki-processor. what am i missing, i haven't found anything in the wiki? On 28. Sep 2013, at 12:28 AM, P Williams wrote: I spent some more time thinking about this. Do you really need to use the TikaEntityProcessor? It doesn't offer anything new to the document you are building that couldn't be accomplished by the XPathEntityProcessor alone from what I can tell. I also tried to get the Advanced Parsinghttp://wiki.apache.org/solr/TikaEntityProcessorexample to work without success. There are some obvious typos (document instead of /document) and an odd order to the pieces (dataSources is enclosed by document). It also looks like FieldStreamDataSource http://lucene.apache.org/solr/4_3_1/solr-dataimporthandler/org/apache/solr/handler/dataimport/FieldStreamDataSource.html is the one that is meant to work in this context. If Koji is still around maybe he could offer some help? Otherwise this bit of erroneous instruction should probably be removed from the wiki. Cheers, Tricia $ svn diff Index: solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java === --- solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java (revision 1526990) +++ solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java (working copy) @@ -99,13 +99,13 @@ runFullImport(getConfigHTML(identity)); assertQ(req(*:*), testsHTMLIdentity); } - + private String getConfigHTML(String htmlMapper) { return dataConfig + dataSource type='BinFileDataSource'/ + document + -entity name='Tika' format='xml' processor='TikaEntityProcessor' + +entity name='Tika' format='html' processor='TikaEntityProcessor' + url=' + getFile(dihextras/structured.html).getAbsolutePath() + ' + ((htmlMapper == null) ? : ( htmlMapper=' + htmlMapper + ')) + + field column='text'/ + @@ -114,4 +114,36 @@ /dataConfig; } + private String[] testsHTMLH1 = { + //*[@numFound='1'] + , //str[@name='h1'][contains(.,'H1 Header')] + }; + + @Test + public void testTikaHTMLMapperSubEntity() throws Exception { +runFullImport(getConfigSubEntity(identity)); +assertQ(req(*:*), testsHTMLH1); + } + + private String getConfigSubEntity(String htmlMapper) { +return +dataConfig + +dataSource type='BinFileDataSource' name='bin'/ + +dataSource type='FieldStreamDataSource' name='fld'/ + +document + +entity name='tika'
Re: Cross index join query performance
Ah, got it now - thanks for the explanation. On Sat, Sep 28, 2013 at 3:33 AM, Upayavira u...@odoko.co.uk wrote: The thing here is to understand how a join works. Effectively, it does the inner query first, which results in a list of terms. It then effectively does a multi-term query with those values. q=size:large {!join fromIndex=other from=someid to=someotherid}type:shirt Imagine the inner join returned values A,B,C. Your inner query is, on core 'other', q=type:shirtfl=someid. Then your outer query becomes size:large someotherid:(A B C) Your inner query returns 25k values. You're having to do a multi-term query for 25k terms. That is *bound* to be slow. The pseudo-joins in Solr 4.x are intended for a small to medium number of values returned by the inner query, otherwise performance degrades as you are seeing. Is there a way you can reduce the number of values returned by the inner query? As Joel mentions, those other joins are attempts to find other ways to work with this limitation. Upayavira On Fri, Sep 27, 2013, at 09:44 PM, Peter Keegan wrote: Hi Joel, I tried this patch and it is quite a bit faster. Using the same query on a larger index (500K docs), the 'join' QTime was 1500 msec, and the 'hjoin' QTime was 100 msec! This was for true for large and small result sets. A few notes: the patch didn't compile with 4.3 because of the SolrCore.getLatestSchema call (which I worked around), and the package name should be: queryParser name=hjoin class=org.apache.solr.search.joins.HashSetJoinQParserPlugin/ Unfortunately, I just learned that our uniqueKey may have to be an alphanumeric string instead of an int, so I'm not out of the woods yet. Good stuff - thanks. Peter On Thu, Sep 26, 2013 at 6:49 PM, Joel Bernstein joels...@gmail.com wrote: It looks like you are using int join keys so you may want to check out SOLR-4787, specifically the hjoin and bjoin. These perform well when you have a large number of results from the fromIndex. If you have a small number of results in the fromIndex the standard join will be faster. On Wed, Sep 25, 2013 at 3:39 PM, Peter Keegan peterlkee...@gmail.com wrote: I forgot to mention - this is Solr 4.3 Peter On Wed, Sep 25, 2013 at 3:38 PM, Peter Keegan peterlkee...@gmail.com wrote: I'm doing a cross-core join query and the join query is 30X slower than each of the 2 individual queries. Here are the queries: Main query: http://localhost:8983/solr/mainindex/select?q=title:java QTime: 5 msec hit count: 1000 Sub query: http://localhost:8983/solr/subindex/select?q=+fld1:[0.1TO 0.3] QTime: 4 msec hit count: 25K Join query: http://localhost:8983/solr/mainindex/select?q=title:javafq={!joinfromIndex=mainindextoIndex=subindexfrom=docidto=docid}fld1:[0.1 TO 0.3] QTime: 160 msec hit count: 205 Here are the index spec's: mainindex size: 117K docs, 1 segment mainindex schema: field name=docid type=int indexed=true stored=true required=true multiValued=false / field name=title type=text_en_splitting indexed=true stored=true multiValued=false / uniqueKeydocid/uniqueKey subindex size: 117K docs, 1 segment subindex schema: field name=docid type=int indexed=true stored=true required=true multiValued=false / field name=fld1 type=float indexed=true stored=true required=false multiValued=false / uniqueKeydocid/uniqueKey With debugQuery=true I see: debug:{ join:{ {!join from=docid to=docid fromIndex=subindex}fld1:[0.1 TO 0.3]:{ time:155, fromSetSize:24742, toSetSize:24742, fromTermCount:117810, fromTermTotalDf:117810, fromTermDirectCount:117810, fromTermHits:24742, fromTermHitsTotalDf:24742, toTermHits:24742, toTermHitsTotalDf:24742, toTermDirectCount:24627, smallSetsDeferred:115, toSetDocsAdded:24742}}, Via profiler and debugger, I see 150 msec spent in the outer 'while(term!=null)' loop in: JoinQueryWeight.getDocSet(). This seems like a lot of time to join the bitsets. Does this seem right? Peter -- Joel Bernstein Professional Services LucidWorks
Considerations about setting maxMergedSegmentMB
Hi, Trying to solve query performance issue, we suspect on the number of index segments, which might slow the query (due to I/O seeks, happens for each term in the query, multiplied by number of segments). We are on Solr 4.3 (TieredMergePolicy with mergeFactor of 4). We can reduce the number of segments by enlarging maxMergedSegmentMB, from the default 5GB to something bigger (10GB, 15GB?). What are the side effects, which should be considered when doing it? Did anyone changed this setting in PROD for a while?
Searching on (hyphenated/capitalized) word issue
I have a search term multi-CAD being issues on tokenized text. The problem is that you cannot get any search results when you type multicad unless you add a hyphen (multi-cad) or type multiCAD (omitting the hyphen, but correctly adding the CAPS into the spelling). However, for the similar but unhyphenated word AutoCAD, you can type autocad and get hits for AutoCAD, as you would expect. You can type auto-cad and get the same results. The query seems to get parsed as separate words (resulting in hits) for multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for multicad. In other words, the search terms become multi cad and auto cad for all cases except for when the term is multicad. I'm guessing this may be in part to auto being a more common word prefix, but I may be wrong. Can anyone provide some clarity (and maybe point me towards a potential solution)? Thanks in advance! Kristian Van Tassell Siemens Industry Sector Siemens Product Lifecycle Management Software Inc. 5939 Rice Creek Parkway Shoreview, MN 55126 United States Tel. :+1 (651) 855-6194 Fax :+1 (651) 855-6280 kristian.vantass...@siemens.com kristian.vantass...@siemens.com%20 www.siemens.com/plm
Re: Data duplication using Cloud+HDFS+Mirroring
Hi Greg, Did you get an answer? I'm interested in the same question. More generally, what are the benefits of HdfsDirectoryFactory, besides the transparent restore of the shard contents in case of a disk failure, and the ability to rebuild index using MR? Is the next statement exact? blocks of a particular shard, which are replicated to another node, will be never queried, since there is no solr core configured to read them. On Wed, Aug 7, 2013 at 8:46 PM, Greg Walters gwalt...@sherpaanalytics.comwrote: While testing Solr's new ability to store data and transaction directories in HDFS I added an additional core to one of my testing servers that was configured as a backup (active but not leader) core for a shard elsewhere. It looks like this extra core copies the data into its own directory rather than just using the existing directory with the data that's already available to it. Since HDFS likely already has redundancy of the data covered via the replicationFactor is there a reason for non-leader cores to create their own data directory rather than doing reads on the existing master copy? I searched Jira for anything that suggests this behavior might change and didn't find any issues; is there any intent to address this? Thanks, Greg
Re: Doing time sensitive search in solr
Thanks for the quick answers. i have gone thru the presentation and thats what i was tilting towards using dynamic fields i just want to run down an example so thats its clear about how to approach this issue. entry start-date=1-sept-2013 Sept content : Honda is releasing the car this month entry entry start-date=1-dec-2013 Dec content : Toyota is releasing the car this month entry After adding dynamic fields like *_entryDate and *_entryText my solr doc will look something like this. date name=2013-09-01T00:00:00Z_entryDate2013-09-01T00:00:00Z/date str name=2013-09-01T0:00:00Z_entryTextSept content : Honda is releasing the car this month /str date name=2013-12-01T00:00:00Z_entryDate2013-12-01T00:00:00Z/date str name=2013-12-01T00:00:00Z_entryTextDec content : Toyota is releasing the car this month /str if someone searches for a query something like *_entryDate:[* TO NOW] AND *_entryText:Toyota the results wont show up toyota in the search results. the only disadvantage we have with this approach is we might end up with a lot of runtime fields since we have thousands of entries which might be time bound in our cms. i might also do some more investigation to see if we can handle this at index time to index data as time comes some scheduler of something, because the above approach might solve the issue but may make the queries very slow. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092763.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching on (hyphenated/capitalized) word issue
You need to look at your analysis chain. The stuff you're talking about there is all configurable. There's different tokenisers available to split your fields differently, then you might use the WordDelimiterFilterFactory to split existing tokens further (e.g. WiFi might become wi, fi and WiFi). So really, you need to craft your own analysis chain to fit the kind of data you are working with. Upayavira On Mon, Sep 30, 2013, at 06:50 PM, Van Tassell, Kristian wrote: I have a search term multi-CAD being issues on tokenized text. The problem is that you cannot get any search results when you type multicad unless you add a hyphen (multi-cad) or type multiCAD (omitting the hyphen, but correctly adding the CAPS into the spelling). However, for the similar but unhyphenated word AutoCAD, you can type autocad and get hits for AutoCAD, as you would expect. You can type auto-cad and get the same results. The query seems to get parsed as separate words (resulting in hits) for multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for multicad. In other words, the search terms become multi cad and auto cad for all cases except for when the term is multicad. I'm guessing this may be in part to auto being a more common word prefix, but I may be wrong. Can anyone provide some clarity (and maybe point me towards a potential solution)? Thanks in advance! Kristian Van Tassell Siemens Industry Sector Siemens Product Lifecycle Management Software Inc. 5939 Rice Creek Parkway Shoreview, MN 55126 United States Tel. :+1 (651) 855-6194 Fax :+1 (651) 855-6280 kristian.vantass...@siemens.com kristian.vantass...@siemens.com%20 www.siemens.com/plm
Issue with Group By / Field Collapsing
Hi, I'm trying to use group by option to remove duplicates from my search result. I'm applying Group By option on a field called TopicId. I'm simply appending this at the end of my query. group=truegroup.field=TopicId Initially, the result looked great as I was able to see the duplicates getting removed and only the document with highest score among the duplicates,is being returned. But then when I started comparing the result without the group by option, something doesn't look right. For e.g. the search without a group by option returned results from Source A, B and C. Documents from Source A have the TopicId field while it's not present in B or C. When I add the Group-By option, the documents from B and C are completely ignored, though some of them have scores higher than A. I'm little confused if this is the intended behavior. Does group-by mean that it'll only return results where the group-by field is present ? Do I need to use additional group-by parameters to address this ? Any pointers will be highly appreciated. Thanks, Shamik
Re: Hello and help :)
Upayavira, First of all, thanks for the answers. We have considerer the possibily of doing several queries, however in hour case we want a count to show to the user (should take less than 2 seconds) and we could have millions of rows (being million of queries) to get this count. Isn't there any way to filter by the count? Something like, get all users where the number of corresponding documents in a join is lesser than X. Or all the users grouped by field F where count of records for field F is lesser than X... Or anything like that, regarding counts... Best regards, Marcelo Valle. 2013/9/30 Upayavira u...@odoko.co.uk If your app and solr aren't far apart, you shouldn't be afraid of multiple queries to solr per user request (I once discovered an app that did 36 hits to solr per user request, and despite such awfulness of design, no user ever complained about speed). You could do a query to solr for q=+user_id:X +date:[dateX TO dateY] to find out how many docs, then take the numFound value, if it is above Y, do a subsequent query to retrieve the docs, either all docs, or toes in the relevant date range. Don't know if that helps. Upayavira On Sun, Sep 29, 2013, at 05:15 PM, Matheus Salvia wrote: Thanks for the anwser. Yes, you understood it correctly. The method you proposed should work perfectly, except I do have one more requirement that I forgot to mention earlier, and I apologize for that. The true problem we are facing is: * find all documents for userID=x, where userID=x has more than y documents in the index between dateA and dateB And since dateA and dateB can be any dates, its impossible to save the count, since we cannot foresee what date and what count will be requested. 2013/9/28 Upayavira u...@odoko.co.uk To phrase your need more generically: * find all documents for userID=x, where userID=x has more than y documents in the index Is that correct? If it is, I'd probably do some work at index time. First guess, I'd keep a separate core, which has a very small document per user, storing just: * userID * docCount Then, when you add/delete a document, you use atomic updates to either increase or decrease the docCount on that user doc. Then you can use a pseudo join between these two cores relatively easily. q=user_id:x {!join fromIndex=user from=user_id to=user_id}+user_id:x +doc_count:[y TO *] Worst case, if you don't want to mess with your indexing code, I wonder if you could use a ScriptUpdateProcessor to do this work - not sure if you can have one add an entirely new, additional, document to the list, but may be possible. Upayavira On Fri, Sep 27, 2013, at 09:50 PM, Matheus Salvia wrote: Sure, sorry for the inconvenience. I'm having a little trouble trying to make a query in Solr. The problem is: I must be able retrieve documents that have the same value for a specified field, but they should only be retrieved if this value appeared more than X times for a specified user. In pseudosql it would be something like: select user_id from documents where my_field=my_value and (select count(*) from documents where my_field=my_value and user_id=super.user_id) X I Know that solr return a 'numFound' for each query you make, but I dont know how to retrieve this value in a subquery. My Solr is organized in a way that a user is a document, and the properties of the user (such as name, age, etc) are grouped in another document with a 'root_id' field. So lets suppose the following query that gets all the root documents whose children have the prefix some_prefix. is_root:true AND _query_:{!join from=root_id to=id}requests_prefix:\some_prefix\ Now, how can I get the root documents (users in some sense) that have more than X children matching 'requests_prefix:some_prefix' or any other condition? Is it possible? P.S. It must be done in a single query, fields can be added at will, but the root/children structure should be preserved (preferentially). 2013/9/27 Upayavira u...@odoko.co.uk Mattheus, Given these mails form a part of an archive that are themselves self-contained, can you please post your actual question here? You're more likely to get answers that way. Thanks, Upayavira On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote: Hello everyone, I'm having a problem regarding how to make a solr query, I've posted it on stackoverflow. Can someone help me? http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter Thanks in advance! -- -- // Matheus Salvia Desenvolvedor Mobile Celular: +55 11 9-6446-2332
Re: Hello and help :)
Socratees, You wrote: Or, What if you can facet by the field, and group by the field count, then *apply facet filtering to exclude all filters with count less than 5?* That's exactly what I want, I just couldn't figure how to do it! Any idea how could I write this query? Best regards, Marcelo. 2013/9/27 Socratees Samipillai ss...@outlook.com Hi Marcelo, I haven't faced this exact situation before so I can only try posting my thoughts. Since Solr allows Result Grouping and Faceting at the same time, and since you can apply filters on these facets, can you take advantage of that? Or, What if you can facet by the field, and group by the field count, then apply facet filtering to exclude all filters with count less than 5? These links might be helpful. http://architects.dzone.com/articles/facet-over-same-field-multiple https://issues.apache.org/jira/browse/SOLR-2898 Thanks, — Socratees. Date: Fri, 27 Sep 2013 20:32:22 -0300 Subject: Re: Hello and help :) From: marc...@s1mbi0se.com.br To: solr-user@lucene.apache.org Ssami, I work with Matheus and I am helping him to take a look at this problem. We took a look at result grouping, thinking it could help us, but it has two drawbacks: - We cannot have multivalued fields, if I understood it correctly. But ok, we could manage that... - Suppose some query like that: - select count(*) NUMBER group by FIELD where CONDITION AND NUMBER 5 - In this case, we are not just taking the count for each group as a result. The count actually makes part of the where clause. - AFAIK, result grouping doesn't allow that, although I would really love to be proven wrong :D We really need this, so I am trying to figure what could I change in solr to make this work... Any hint on that? We would need to write a custom facet / search handler / search component ? Of course we prefer a solution that works with current solr features, but we could consider writing some custom code to do that Thanks in advance! Best regards, Marcelo Valle. 2013/9/27 ssami ss...@outlook.com If I understand your question right, Result Grouping in Solr might help you. Refer here https://cwiki.apache.org/confluence/display/solr/Result+Grouping . -- View this message in context: http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html Sent from the Solr - User mailing list archive at Nabble.com.
multi core join and simple indexed join
Comparing indexed joins on multiple core or on the same core... Which one would be faster? I am guessing doing it on multiple cores would be faster, as the index on each core would be smaller... Any thoughts on that? []s
[JOB] Solr / Elasticsearch Engineer @ Sematext
Hello, If you are looking to work with Solr and Elasticsearch, among other things, this may be for you: http://blog.sematext.com/2013/09/26/solr-elasticsearch-job-engineering/ This role offers a healthy mix of Solr/ES consulting, support, and product development. Everything that might be of interest should be there, but I'll be happy to answer any questions anyone may have off-list. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm
Re: Considerations about setting maxMergedSegmentMB
Before going there, you can do a really simple test. Turn off indexing and then issues a optimize/force-merge. After it completes (and it may take quite some time) measure your performance again to see fi this is on the right track. Best, Erick On Mon, Sep 30, 2013 at 1:31 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, Trying to solve query performance issue, we suspect on the number of index segments, which might slow the query (due to I/O seeks, happens for each term in the query, multiplied by number of segments). We are on Solr 4.3 (TieredMergePolicy with mergeFactor of 4). We can reduce the number of segments by enlarging maxMergedSegmentMB, from the default 5GB to something bigger (10GB, 15GB?). What are the side effects, which should be considered when doing it? Did anyone changed this setting in PROD for a while?
No longer allowed to store html in a 'string' type
We have been using Solr for a while now, went from 1.4 - 3.6. While running some tests in 4.4 we are no longer allowed to store raw html in a documents field with a type of 'string', which we used to be able to do. Has something changed here? Now we get the following error: Undeclared general entity \nbsp\\r\n at [row,col {unknown-source}]: [11,53] I understand what its saying and can change the way we store and extract it if it's a must but would like to understand what changed. Sounds like something just became more strict to adhering to rules. doc str name=rawcontent pTesting a href=/sample_group/b/sample_weblog/archive/tags/bananas/default.aspx class=tag hash-tag data-tags=bananas#bananas/anbsp;tag/p p/p pdocument document document document document document/pdiv style=clear:both;/div /str str name=typeblog/str /doc
Re: Doing time sensitive search in solr
Hello i just wanted to make sure can we query dynamic fields using wildcard well if not then i dont think this solution might work, since i dont know the exact concrete name of the field. -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [JOB] Solr / Elasticsearch Engineer @ Sematext
Hi, I would like to apply for SEARCH CONSULTING SEARCH SOLUTIONS ARCHITECT position. PFA my resume. You can reach me at 2019934403. Thanks, Ashwin cell - 2019934403 On Mon, Sep 30, 2013 at 4:17 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hello, If you are looking to work with Solr and Elasticsearch, among other things, this may be for you: http://blog.sematext.com/2013/09/26/solr-elasticsearch-job-engineering/ This role offers a healthy mix of Solr/ES consulting, support, and product development. Everything that might be of interest should be there, but I'll be happy to answer any questions anyone may have off-list. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm AshwinTandel.docx Description: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Re: OpenJDK or OracleJDK
On 9/30/2013 9:28 AM, Raheel Hasan wrote: hmm why is that so? Isnt Oracle's version a bit slow? For Java 6, the Sun JDK is the reference implementation. For Java 7, OpenJDK is the reference implementation. http://en.wikipedia.org/wiki/Reference_implementation I don't think Oracle's version could really be called slow. Sun invented Java. Sun open sourced Java. Oracle bought Sun. The Oracle implemetation is likely more conservative than some of the other implementations, like the one by IBM. The IBM implementation is pretty aggressive with optimization, so aggressive that Solr and Lucene have a history of revealing bugs that only exist in that implementation. Thanks, Shawn
Re: OpenJDK or OracleJDK
Hi, A while back I remember we notices some SPM users were having issues with OpenJDK. Since then we've been recommending Oracle's implementation to our Solr and to SPM users. At the same time, we haven't seen any issues with OpenJDK in the last ~6 months. Oracle JDK is not slow. :) Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Sep 30, 2013 at 11:02 PM, Shawn Heisey s...@elyograg.org wrote: On 9/30/2013 9:28 AM, Raheel Hasan wrote: hmm why is that so? Isnt Oracle's version a bit slow? For Java 6, the Sun JDK is the reference implementation. For Java 7, OpenJDK is the reference implementation. http://en.wikipedia.org/wiki/Reference_implementation I don't think Oracle's version could really be called slow. Sun invented Java. Sun open sourced Java. Oracle bought Sun. The Oracle implemetation is likely more conservative than some of the other implementations, like the one by IBM. The IBM implementation is pretty aggressive with optimization, so aggressive that Solr and Lucene have a history of revealing bugs that only exist in that implementation. Thanks, Shawn
Re: Percolate feature?
Just came across this ancient thread. Charlie, did this end up happening? I suspect Wolfgang may be interested, but that's just a wild guess. I was curious about your feeling that what you were open-sourcing might be a lot faster and more flexible than ES's percolator - can you share more about why do you have that feeling and whether you've confirmed this? Thanks, Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Aug 5, 2013 at 6:34 AM, Charlie Hull char...@flax.co.uk wrote: On 03/08/2013 00:50, Mark wrote: We have a set number of known terms we want to match against. In Index: term one term two term three I know how to match all terms of a user query against the index but we would like to know how/if we can match a user's query against all the terms in the index? Search Queries: my search term = 0 matches my term search one = 1 match (term one) some prefix term two = 1 match (term two) one two three = 0 matches I can only explain this is almost a reverse search??? I came across the following from ElasticSearch (http://www.elasticsearch.org/guide/reference/api/percolate/) and it sounds like this may accomplish the above but haven't tested. I was wondering if Solr had something similar or an alternative way of accomplishing this? Thanks Hi Mark, We've built something that implements this kind of reverse search for our clients in the media monitoring sector - we're working on releasing the core of this as open source very soon, hopefully in a month or two. It's based on Lucene. Just for reference it's able to apply tens of thousands of stored queries to a document per second (our clients often have very large and complex Boolean strings representing their clients' interests and may monitor hundreds of thousands of news stories every day). It also records the positions of every match. We suspect it's a lot faster and more flexible than Elasticsearch's Percolate feature. Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Problem regarding queries enclosed in double quotes in Solr 3.4
We have a Solr 3.4 setup. When we try to do queries with double quotes like : semantic web , the query takes a long time to execute. One solution we are thinking about is to make the same query without the quotes and set the phrase slop(ps) parameter to 0. That is quite quicker than the query with the quotes and gives similar results to the query with quotes. Is there a way to fix this by modifying the schema.xml file? Any suggestions would be appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-regarding-queries-enclosed-in-double-quotes-in-Solr-3-4-tp4092856.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem regarding queries with numbers with a decimal point
We have a Solr 3.4 setup. When we try to do queries with a decimal point like : web 2.0 , the query takes a long time to execute. One fix we did was to set generateNumberParts=0 in the solr.WordDelimiterFilterFactory This reduced the query time greatly but we want to reduce it further. Is there a way to fix this by modifying the schema.xml file? Any suggestions would be appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-regarding-queries-with-numbers-with-a-decimal-point-tp4092857.html Sent from the Solr - User mailing list archive at Nabble.com.