Re: Addreplica throwing error when authentication is enabled
It appears the issue is with the encrypted file. Are these files encrypted? If yes, you need to decrypt it first. moreCaused by: javax.crypto.BadPaddingException: RSA private key operation failed Best, Ben On Tue, Sep 1, 2020, 10:51 PM yaswanth kumar wrote: > Can some one please help me on the below error?? > > Solr 8.2; zookeeper 3.4 > > Enabled authentication and authentication and make sure that the role gets > all access > > Now just add a collection with single replica and once done .. now try to > add another replica with addreplica solr api and that’s throwing error .. > note: this is happening only when security.json was enabled with > authentication > > Below is the error > Collection: test operation: restore > failed:org.apache.solr.common.SolrException: ADDREPLICA failed to create > replicaCollection: test operation: restore > failed:org.apache.solr.common.SolrException: ADDREPLICA failed to create > replica at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler$ShardRequestTracker.processResponses(OverseerCollectionMessageHandler.java:1030) > at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler$ShardRequestTracker.processResponses(OverseerCollectionMessageHandler.java:1013) > at > org.apache.solr.cloud.api.collections.AddReplicaCmd.lambda$addReplica$1(AddReplicaCmd.java:177) > at > org.apache.solr.cloud.api.collections.AddReplicaCmd$$Lambda$798/.run(Unknown > Source) at > org.apache.solr.cloud.api.collections.AddReplicaCmd.addReplica(AddReplicaCmd.java:199) > at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:708) > at > org.apache.solr.cloud.api.collections.RestoreCmd.call(RestoreCmd.java:286) > at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264) > at > org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown > Source) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834)Caused by: > org.apache.solr.common.SolrException: javax.crypto.BadPaddingException: RSA > private key operation failed at > org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:325) at > org.apache.solr.security.PKIAuthenticationPlugin.generateToken(PKIAuthenticationPlugin.java:305) > at > org.apache.solr.security.PKIAuthenticationPlugin.access$200(PKIAuthenticationPlugin.java:61) > at > org.apache.solr.security.PKIAuthenticationPlugin$2.onQueued(PKIAuthenticationPlugin.java:239) > at > org.apache.solr.client.solrj.impl.Http2SolrClient.decorateRequest(Http2SolrClient.java:468) > at > org.apache.solr.client.solrj.impl.Http2SolrClient.makeRequest(Http2SolrClient.java:455) > at > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:364) > at > org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:746) > at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1274) at > org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238) > at > org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199) > at > org.apache.solr.handler.component.HttpShardHandler$$Lambda$512/.call(Unknown > Source) at > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) > ... 5 moreCaused by: javax.crypto.BadPaddingException: RSA private key > operation failed at > java.base/sun.security.rsa.NativeRSACore.crtCrypt_Native(NativeRSACore.java:149) > at java.base/sun.security.rsa.NativeRSACore.rsa(NativeRSACore.java:91) at > java.base/sun.security.rsa.RSACore.rsa(RSACore.java:149) at > java.base/com.sun.crypto.provider.RSACipher.doFinal(RSACipher.java:355) at > java.base/com.sun.crypto.provider.RSACipher.engineDoFinal(RSACipher.java:392) > at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2260) at > org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:323) ... > 20 more > > That's the error stack trace I
Re: Solr Down Issue
Can you send solr logs? Best, Ben On Sun, Aug 9, 2020, 9:55 AM Rashmi Jain wrote: > Hello Team, > > I am Rashmi jain implemented solr on one of our site > bookswagon.com<https://www.bookswagon.com/>. last 2-3 month we are facing > strange issue, solr down suddenly without interrupting. We check solr > login and also check application logs but no clue found there regarding > this. > We have implemented solr 7.4 on Java SE 10 and have index > data of books around 28 million. > Also we are running solr on Windows server 2012 standard > with 32 RAM. > Please help us on this. > > Regards, > Rashmi > > >
No Client EndPointIdentificationAlgorithm configured for SslContextFactory
Hello Everyone, I just downloaded Sitecore 9.3.0 and installed Solr using the JSON file that Sitecore provided. The installation was seamless and Solr was working as expected. But when I checked the logs I am getting this warning .I am attaching solr logs as well for your reference. o.e.j.u.s.S.config No Client EndPointIdentificationAlgorithm configured for SslContextFactory@1a2e2935 [provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks] This appears to be the issue on 8.0, 8.1 and even 8.2 solr versions. Can you please confirm? As a workaround I have updated the entry in the jetty-ssl.xml file ( steps below). Is there a fix or a patch to fix this issue? Stop Solr Service Go to Path - D:\Solr\\server\etc\jetty-ssl.xml Open jetty-ssl.xml file Add below entry to the element: null Hope to hear back from you soon. Best, Ben 2020-07-21 13:21:02.786 INFO (main) [ ] o.e.j.u.log Logging initialized @4907ms to org.eclipse.jetty.util.log.Slf4jLog 2020-07-21 13:21:03.004 WARN (main) [ ] o.e.j.s.AbstractConnector Ignoring deprecated socket close linger time 2020-07-21 13:21:03.004 INFO (main) [ ] o.e.j.s.Server jetty-9.4.14.v20181114; built: 2018-11-14T21:20:31.478Z; git: c4550056e785fb5665914545889f21dc136ad9e6; jvm 1.8.0_222-b10 2020-07-21 13:21:03.036 INFO (main) [ ] o.e.j.d.p.ScanningAppProvider Deployment monitor [file:///D:/Solr/solr-8.1.1/server/contexts/] at interval 0 2020-07-21 13:21:03.473 INFO (main) [ ] o.e.j.w.StandardDescriptorProcessor NO JSP Support for /solr, did not find org.apache.jasper.servlet.JspServlet 2020-07-21 13:21:03.473 INFO (main) [ ] o.e.j.s.session DefaultSessionIdManager workerName=node0 2020-07-21 13:21:03.473 INFO (main) [ ] o.e.j.s.session No SessionScavenger set, using defaults 2020-07-21 13:21:03.489 INFO (main) [ ] o.e.j.s.session node0 Scavenging every 60ms 2020-07-21 13:21:03.536 INFO (main) [ ] o.a.s.u.c.SSLConfigurations Setting javax.net.ssl.keyStorePassword 2020-07-21 13:21:03.536 INFO (main) [ ] o.a.s.u.c.SSLConfigurations Setting javax.net.ssl.trustStorePassword 2020-07-21 13:21:03.567 INFO (main) [ ] o.a.s.s.SolrDispatchFilter Using logger factory org.apache.logging.slf4j.Log4jLoggerFactory 2020-07-21 13:21:03.567 INFO (main) [ ] o.a.s.s.SolrDispatchFilter ___ _ Welcome to Apache Solr™ version 8.1.1 2020-07-21 13:21:03.567 INFO (main) [ ] o.a.s.s.SolrDispatchFilter / __| ___| |_ _ Starting in standalone mode on port 8983 2020-07-21 13:21:03.567 INFO (main) [ ] o.a.s.s.SolrDispatchFilter \__ \/ _ \ | '_| Install dir: D:\Solr\solr-8.1.1 2020-07-21 13:21:03.567 INFO (main) [ ] o.a.s.s.SolrDispatchFilter |___/\___/_|_|Start time: 2020-07-21T13:21:03.567Z 2020-07-21 13:21:03.583 INFO (main) [ ] o.a.s.c.SolrResourceLoader Using system property solr.solr.home: D:\Solr\solr-8.1.1\server\solr 2020-07-21 13:21:03.598 INFO (main) [ ] o.a.s.c.SolrXmlConfig Loading container configuration from D:\Solr\solr-8.1.1\server\solr\solr.xml 2020-07-21 13:21:03.692 INFO (main) [ ] o.a.s.c.SolrXmlConfig MBean server found: com.sun.jmx.mbeanserver.JmxMBeanServer@1677d1, but no JMX reporters were configured - adding default JMX reporter. 2020-07-21 13:21:03.973 INFO (main) [ ] o.a.s.h.c.HttpShardHandlerFactory Host whitelist initialized: WhitelistHostChecker [whitelistHosts=null, whitelistHostCheckingEnabled=true] 2020-07-21 13:21:04.004 WARN (main) [ ] o.a.s.c.s.i.Http2SolrClient Create Http2SolrClient with HTTP/1.1 transport since Java 8 or lower versions does not support SSL + HTTP/2 2020-07-21 13:21:04.083 INFO (main) [ ] o.e.j.u.s.SslContextFactory x509=X509@15dcfae7(solr_ssl_trust_store,h=[companydomain],w=[companydomain]) for SslContextFactory@3da05287[provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks] 2020-07-21 13:21:04.129 WARN (main) [ ] o.e.j.u.s.S.config No Client EndPointIdentificationAlgorithm configured for SslContextFactory@3da05287[provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks] 2020-07-21 13:21:04.239 WARN (main) [ ] o.a.s.c.s.i.Http2SolrClient Create Http2SolrClient with HTTP/1.1 transport since Java 8 or lower versions does not support SSL + HTTP/2 2020-07-21 13:21:04.254 INFO (main) [ ] o.e.j.u.s.SslContextFactory x509=X509@bff34c6(solr_ssl_trust_store,h=[companydomain],w=[companydomain]) for SslContextFactory@1522d8a0[provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks] 2020-07-21 13:21:04.254 WARN (main) [ ] o.e.j.u.s.S.config No Client EndPointIdentificationAlgorithm configured for SslContextFactory@1522d8a0[provider=null
SolrClient.ping() in 8.2, using SolrJ
Before I submit a new bug, I should ask you folks if this is my error. I started a local SolrCloud instance with two nodes and two replicas per node. I created one empty collection on each node. I tried to use the ping method in Solrj to verify my connected client. When I try to use it, it throws ... Caused by: org.apache.solr.common.SolrException: No collection param specified on request and no default collection has been set: [] at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1071) ~[solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:11:07] I cannot pass a collection name to the ping request. And the CloudSolrClient.Builder does not allow me to declare a default collection. I'm not sure why a collection would be required for a ping. And I'm not sure why it does not automatically use the only collection created. Have any suggestions for me? Thank you.
Re: Inconsistent leader between ZK and Solr and a lot of downtime
Daniel Carrasco wrote > Hello, > > I'm investigating an 8 nodes Solr 7.2.1 cluster because we've a lot of > problems, like when a node fails to import from a DB (maybe it freeze), > the > entire cluster goes down, and other like the leader wont change even when > is down (all nodes detects that is down but no leader election is > triggered), and similar problems. Every few days we've to recover the > cluster because becomes inestable and goes down. > > The last problem that I've got, is three collections that have nodes on > "recovery" state from a lot of hours, and the log shows an error telling > that "leader node is not the leader" so I'm trying to change the leader. Make sure that the clocks on your servers are in sync. Otherwise inter node authentication tokens could time out which could lead to the problems you described. You should find hints to the cause of the communication problem in your Solr logs. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Sort Facet Values by "Interestingness"?
Hi Joel, thank you, this sounds great! As to your first proposal: I am a bit out of my depth here, as I have not worked with streaming expressions so far. But I will try out your example using the facet() expression on a simple use case as soon as you publish it. Using the TermsComponent directly, would that imply that I have to retrieve all possible candidates and then sent them back as a terms.list to get their df? However, I assume that this would still be faster than having 2 repeated facet-calls. Or did you suggest to use the component in a customized RequestHandler? Regards, Ben Am 03.08.2016 um 14:57 schrieb Joel Bernstein: Also the TermsComponent now can export the docFreq for a list of terms and the numDocs for the index. This can be used as a general purpose mechanism for scoring facets with a callback. https://issues.apache.org/jira/browse/SOLR-9243 Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Aug 3, 2016 at 8:52 AM, Joel Bernstein<joels...@gmail.com> wrote: What you're describing is implemented with Graph aggregations in this ticket using tf-idf. Other scoring methods can be implemented as well. https://issues.apache.org/jira/browse/SOLR-9193 I'll update this thread with a description of how this can be used with the facet() streaming expression as well as with graph queries later today. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Aug 3, 2016 at 8:18 AM,<heuw...@uni-hildesheim.de> wrote: Dear everybody, as the JSON-API now makes configuration of facets and sub-facets easier, there appears to be a lot of potential to enable instant calculation of facet-recommendations for a query, that is, to sort facets by their relative importance/interestingess/signficance for a current query relative to the complete collection or relative to a result set defined by a different query. An example would be to show the most typical terms which are used in descriptions of horror-movies, in contrast to the most popular ones for this query, as these may include terms that occur as often in other genres. This feature has been discussed earlier in the context of solr: * http://stackoverflow.duapp.com/questions/26399264/how-can-i-sort-facets-by-their-tf-idf-score-rather-than-popularity * http://lucene.472066.n3.nabble.com/Facets-with-an-IDF-concept-td504070.html In elasticsearch, the specific feature that I am looking for is called Significant Terms Aggregation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html#search-aggregations-bucket-significantterms-aggregation As of now, I have two questions: a) Are there workarounds in the current solr-implementation or known patches that implement such a sort-option for fields with a large number of possible values, e.g. text-fields? (for smaller vocabularies it is easy to do this client-side with two queries) b) Are there plans to implement this in facet.pivot or in the facet.json-API? The first step could be to define "interestingness" as a sort-option for facets and to define interestingness as facet-count in the result-set as compared to the complete collection: documentfrequency_termX(bucket) * inverse_documentfrequency_termX(collection) As an extension, the JSON-API could be used to change the domain used as base for the comparison. Another interesting option would be to compare facet-counts against a current parent-facet for nested facets, e.g. the 5 most interesting terms by genre for a query on 70s movies, returning the terms specific to horror, comedy, action etc. compared to all terminology at the time (i.e. in the parent-query). A call-back-function could be used to define other measures of interestingness such as the log-likelihood-ratio ( http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.html). Most measures need at least the following 4 values: document-frequency for a term for the result-set, document-frequency for the result-set, document-frequency for a term in the index (or base-domain), document-frequency in the index (or base-domain). I guess, this feature might be of interest for those who want to do some small-scale term-analysis in addition to search, e.g. as in my case in digital humanities projects. But it might also be an interesting navigation device, e.g. when searching on job-offers to show the skills that are most distinctive for a category. It would be great to know, if others are interested in this feature. If there are any implementations out there or if anybody else is working on this, a pointer would be a great start. In the absence of existing solutions: Perhaps somebody has some idea on where and how to start implementing this? Best regards, Ben -- Ben Heuwing, Dr. phil. Wissenschaftlicher Mitarbeiter Institut für Informationswissenschaft und Sprachtechnologie Universität Hildesheim Postanschrift: Universitätsplatz 1 D-31141 Hildesheim Büro: Lübeckerstraße 3 Rau
Fw: new message
Hey! New message, please read <http://askdrrutherford.com/eat.php?2dijy> Ben Tilly
check If I am Still Leader
Hi, I am using Solr 4.10.0 with tomcat and embedded Zookeeper. I use SolrCloud in my system. Each Shard machine try to reach/connect with other cluster machines in order to index the document ,it just checks if it is still the leader. I don't use replication so why does it has to check who is the leader? How can I bypass this constraint and make my solrcloud not use ClusterStateUpdater.checkIfIamStillLeader when i am indexing? Thanks, Adir.
RE: check If I am Still Leader
I have not mentioned before that the index are always routed to specific machine. Is there a way to avoid connectivity from the node to all other nodes? From: adi...@hotmail.com To: solr-user@lucene.apache.org Subject: check If I am Still Leader Date: Thu, 16 Apr 2015 16:08:15 +0300 Hi, I am using Solr 4.10.0 with tomcat and embedded Zookeeper. I use SolrCloud in my system. Each Shard machine try to reach/connect with other cluster machines in order to index the document ,it just checks if it is still the leader. I don't use replication so why does it has to check who is the leader? How can I bypass this constraint and make my solrcloud not use ClusterStateUpdater.checkIfIamStillLeader when i am indexing? Thanks, Adir.
newbie questions regarding solr cloud
Hello I am playing with solr5 right now, to see if its cloud features can replace what we have with solr 3.6, and I have some questions, some newbie, and some not so newbie Background: the documents we are putting in solr have a date field. the majority of our searches are restricted to documents created within the last week, but searches do go back 60 days. documents older than 60 days are removed from the repo. we also want high availability in case a machine becomes unavailable our current method, using solr 3.6, is to split the data into 1 day chunks, within each day the data is split into several shards, and each shard has 2 replicas. Our code generates the list of cores to be queried on based on the time ranged in the query. Cores that fall off the 60 day range are deleted through solr's RESTful API. This all sounds a lot like what Solr Cloud provides, so I started looking at Solr Cloud's features. My newbie questions: - it looks like the way to write a document is to pick a node (possibly using a LB), send it to that node, and let solr figure out which nodes that document is supposed to go. is this the recommended way? - similarly, can I just randomly pick a core (using the demo example: http://localhost:7575/solr/#/gettingstarted_shard1_replica2/query ), query it, and let it scatter out the queries to the appropriate cores, and send me the results back? will it give me back results from all the shards? - is there a recommended Python library? My hopefully less newbie questions: - does solr auto detect when node become unavailable, and stop sending queries to them? - when the master node dies and the cluster elects a new master, what happens to writes? - what happens when a node is unavailable - what is the procedure when a shard becomes too big for one machine, and needs to be split? - what is the procedure when we lose a machine and the node needs replacing - how would we quickly bulk delete data within a date range?
Re: ReplicationHandler - SnapPull failed to download a file completely.
Shawn, Thank you for your answer. for the purpose of testing it we have a test environment where we are not indexing anymore. We also disabled the DIH delta import. so as I understand there shouldn't be any commits on the master. I also tried with str name=commitReserveDuration50:50:50/str and get the same failure. I tried changing and increasing various parameters on the master and slave but no luck yet. the master is functioning ok, we do have search results so I assume there is no index corruption on the master side. just to mention , we have done that many times before in the past few years, this started just now when we upgraded our solr from version 3.6 to version 4.3 and we reindexed all documents. if we have no solution soon, and this is holding an upgrade to our production site and various customers, do you think we can copy the index directory from the master to the slave and hope that future replication will work ? Thank you again. Shalom On Wed, Oct 30, 2013 at 10:00 PM, Shawn Heisey s...@elyograg.org wrote: On 10/30/2013 1:49 PM, Shalom Ben-Zvi Kazaz wrote: we are continuously getting this exception during replication from master to slave. our index size is 9.27 G and we are trying to replicate a slave from scratch. Its a different file each time , sometimes we get to 60% replication before it fails and sometimes only 10%, we never managed a successful replication. snip this is the master setup: |requestHandler name=/replication class=solr.**ReplicationHandler lst name=master str name=replicateAftercommit/**str str name=replicateAfterstartup**/str str name=confFilesstopwords.**txt,spellings.txt,synonyms.** txt,protwords.txt,elevate.xml,**currency.xml/str str name=commitReserveDuration**00:00:50/str /lst /requestHandler I assume that you're probably doing commits fairly often, resulting in a lot of merge activity that frequently deletes segments. That commitReserveDuration parameter needs to be made larger. I would imagine that it takes a lot more than 50 seconds to do the replication - even if you've got an extremely fast network, replicating 9.7GB probably takes several minutes. From the wiki page on replication: If your commits are very frequent and network is particularly slow, you can tweak an extra attribute str name=commitReserveDuration**00:00:10/str. This is roughly the time taken to download 5MB from master to slave. Default is 10 secs. http://wiki.apache.org/solr/**SolrReplication#Masterhttp://wiki.apache.org/solr/SolrReplication#Master You've said that your network is not slow, but with that much data, all networks are slow. Thanks, Shawn
Re: [SOLVED] ReplicationHandler - SnapPull failed to download a file completely.
[explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Removing from cache: CachedDirrefCount=0;path=/opt/watchdox/solr-slave/data/index.20131031180837277;done=true 31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data/index 1 false 31 Oct 2013 18:10:40,879 [explicit-fetchindex-cmd] ERROR ReplicationHandler - SnapPull failed :org.apache.solr.common.SolrException: Unable to download _aa7_Lucene41_0.pos completely. Downloaded 0!=1081710 at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1212) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1092) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317) at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218) 31 Oct 2013 18:10:40,910 [http-bio-8080-exec-8] DEBUG CachingDirectoryFactory - Reusing cached directory: CachedDirrefCount=2;path=/opt/watchdox/solr-slave/data/index;done=false So I upgraded the httpcomponents jars to their latest 4.3.x version and the problem disappeared. the httpcomponents jars which are dependencies of solrj where in the 4.2.x version, I upgraded to httpclient-4.3.1 , httpcore-4.3 and httpmime-4.3.1 I ran the replication a few times now and no problem at all, it is now working as expected. It seams that the upgrade is necessary only on the slave side but I'm going to upgrade the master too. Thank you so much for your help. Shalom On Thu, Oct 31, 2013 at 6:46 PM, Shawn Heisey s...@elyograg.org wrote: On 10/31/2013 7:26 AM, Shalom Ben-Zvii Kazaz wrote: Shawn, Thank you for your answer. for the purpose of testing it we have a test environment where we are not indexing anymore. We also disabled the DIH delta import. so as I understand there shouldn't be any commits on the master. I also tried with str name=commitReserveDuration50:50:50/str and get the same failure. If it's in an environment where there are no commits, that's really odd. I would suspect underlying filesystem or network issues. There's one problem that's not well known, but is very common - problems with NIC firmware, most commonly Broadcom NICs. These problems result in things working correctly almost all the time, but when there is a high network load, things break in strange ways, and the resulting errors rarely look like they are network-related. Most embedded NICs are either Broadcom or Realtek, both of which are famous for their firmware problems. Broadcom NICs are very common on Dell and HP servers. Upgrading the firmware (which is not usually the same thing as upgrading the driver) is the only fix. NICs from other manufacturers also have upgradable firmware, but don't usually have the same kind of high-profile problems as Broadcom. The NIC firmware might not have anything to do with this problem, but it's the only thing left that I can think of. I personally haven't used replication since Solr 1.4.1, but a lot of people do. I can't say that there's no bugs, but so far I'm not seeing the kind of problem reports that appear when a bug in a critical piece of the software exists. Thanks, Shawn
ReplicationHandler - SnapPull failed to download a file completely.
we are continuously getting this exception during replication from master to slave. our index size is 9.27 G and we are trying to replicate a slave from scratch. Its a different file each time , sometimes we get to 60% replication before it fails and sometimes only 10%, we never managed a successful replication. 30 Oct 2013 18:38:52,884 [explicit-fetchindex-cmd] ERROR ReplicationHandler - SnapPull failed :org.apache.solr.common.SolrException: Unable to download _aa7_Lucene41_0.tim completely. Downloaded 0!=1054090 at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1244) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1124) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317) at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218) I read in some thread that there was a related bug in solr 4.1, but we are using solr 4.3 and tried with 4.5.1 also. It seams that DirectoryFileFetcher can not download a file sometimes , the files is downloaded to the slave in size zero. we are running in a test environment where bandwidth is high. this is the master setup: |requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFilesstopwords.txt,spellings.txt,synonyms.txt,protwords.txt,elevate.xml,currency.xml/str str name=commitReserveDuration00:00:50/str /lst /requestHandler | and the slave setup: | requestHandler name=/replication class=|||solr.ReplicationHandler| lst name=slave str name=masterUrlhttp://solr-master.saltdev.sealdoc.com:8081/solr-master/str str name=httpConnTimeout15/str str name=httpReadTimeout30/str /lst /requestHandler |
edismax behaviour with japanese
Hello, I have a text and text_ja fields where text is english and text_ja is japanese analyzers, i index both with copyfield from other fields. I'm trying to search both fields using edismax and qf parameter, but I see strange behaviour of edismax , I wonder if someone can give me a hist to what's going on and what am I doing wrong? when I run this query i can see that solr is searching both fields but the text_ja: query is only a partial text and text: is the complete text. http://localhost/solr/core0/select/?indent=onrows=100; debug=query defType=edismaxqf=text+text_jaq=このたびは lst name=debug str name=rawquerystringこのたびは/str str name=querystringこのたびは/str str name=parsedquery(+DisjunctionMaxQuery((text_ja:たび | text:この たびは)))/no_coord/str str name=parsedquery_toString+(text_ja:たび | text:このたびは)/str str name=QParserExtendedDismaxQParser/str /lst now, if I remove the last two characters from the query string solr will not search the text_ja, at list that's what I understand from the debug output: http://localhost/solr/core0/select/?indent=onrows=100; debug=query defType=edismaxqf=text+text_jaq=このた lst name=debug str name=rawquerystringこのた/str str name=querystringこのた/str str name=parsedquery(+DisjunctionMaxQuery((text:このた)))/no_coord /str str name=parsedquery_toString+(text:このた)/str str name=QParserExtendedDismaxQParser/str /lst with another string of japanese text solr now cuts the query to multiple text_ja queries: http://localhost/solr/core0/select/?indent=onrows=100; debug=query defType=edismaxqf=text+text_jaq=システムをお買い求め いただき lst name=debug str name=rawquerystringシステムをお買い求めいただき/str str name=querystringシステムをお買い求めいただき/str str name=parsedquery(+DisjunctionMaxQuerytext_ja:システム text_ja:買い求める text_ja:いただく)~3) | text:システムをお買い求めいた だき)))/no_coord/str str name=parsedquery_toString+(((text_ja:システム text_ja:買い求める text_ja:いただく)~3) | text:システムをお買い求めいただき)/str str name=QParserExtendedDismaxQParser/str /lst Thank you.
searching both english and japanese
Hi, We have a customer that needs support for both english and japanese, a document can be any of the two and we have no indication about the language for a document. ,so I know I can construct a schema with both english and japanese fields and index them with copy field. I also know I can detect the language and index only the relevant fields but I want to support mixed language documents so I think I need to index to both english and japanese fields. we are using the standard request handler no dismax and we want to keep using it as our queries should be on certain fields with no errors. queries are user entered and can be any valid query like q=lexmark or q=docname:lexmark AND content:printer , now what I think I want is to add the japanese fields to this query and end up with q=docname:lexmark OR docname_ja:lexmark or q=(docname:lexmark AND content:printer) OR (docname_ja:lexmark AND content_ja:printer) . of course I can not ask the use to do that. and also we have only one default field and it must be japanese or english but not both. I think the default field can be solved by using dixmax and specify multi default fields with qt, but we don't use dismax. we use solrj as our client and It would be better if I could do something in the client side and not in solr side. any help/idea is appreciated. ?
filter result by numFound in Result Grouping
Hello list In one of our search that we use Result Grouping we have a need to filter results to only groups that have more then one document in the group, or more specifically to groups that have two documents. Is it possible in some way? Thank you
RE: How to deal with cache for facet search when index is always increment?
Hi You can give soft-commit a try. More details available here http://wiki.apache.org/solr/NearRealtimeSearch -Original Message- From: 李威 [mailto:li...@antvision.cn] Sent: Thursday, 2 May 2013 12:02 PM To: solr-user Cc: 李景泽; 罗佳 Subject: How to deal with cache for facet search when index is always increment? Hi folks, For facet seach, solr would create cache which is based on the whole docs. If I import a new doc into index, the cache would out of time and need to create again. For real time seach, the docs would be import to index anytime. In this case, the cache is nealy always need to create again, which cause the facet seach is very slowly. Do you have any idea to deal with such problem? Thanks, Wei Li
RE: Sorting on Score Problem
Hi Hoss Thanks for the reply. Unfortunately we have other customized similarity classes that I don’t know how to disable them and still make query work. I am trying to attach more information once I work out how to simply the issue. Thanks Ben From: Chris Hostetter [hossman_luc...@fucit.org] Sent: Thursday, January 24, 2013 12:34 PM To: solr-user@lucene.apache.org Subject: Re: Sorting on Score Problem : We met a wired problem in our project when sorting by score in Solr 4.0, : the biggest score document is not a the top the debug explanation from : solr are like this, that's weird ... can you post the full debugQuery output of a an example query showing the problem, using echoParams=all fl=id,score (or whatever unique key field you have) also: can you elaborate wether you are using a single node setup or a distributed (ie: SolrCloud) query? : Then we thought it could be a float rounding problem then we implement : our own similarity class to increse queryNorm by 10,000 and it changes : the score scale but the rank is still wrong. when you post the details request above, please don't use your custom similarity (just the out of the box solr code) so there's one less variable in the equation. -Hoss
Sorting on Score Problem
Hi We met a wired problem in our project when sorting by score in Solr 4.0, the biggest score document is not a the top the debug explanation from solr are like this, First Document 1.8412635 = (MATCH) sum of: 2675.7964 = (MATCH) sum of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) max of: 0.0 = (MATCH) btq, product of: 0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of: 0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0 .. Second Document 1.8412637 = (MATCH) sum of: 0.26757964 = (MATCH) sum of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) max of: 0.0 = (MATCH) btq, product of: 0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of: 0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0 . Third Document 1.841253 = (MATCH) sum of: 2675.7964 = (MATCH) sum of: 0.0 = (MATCH) sum of: 0.0 = (MATCH) max of: 0.0 = (MATCH) btq, product of: 0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of: 0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0 ... Then we thought it could be a float rounding problem then we implement our own similarity class to increse queryNorm by 10,000 and it changes the score scale but the rank is still wrong. Dose Anyone have the similiar issue? I can debug with solr source code and please shed some light on the sorting part Thanks
RE: sort by function error
Hi Yonik I will give the latest 4.0 release a try. Thanks anyway. Cheers Ben From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley [yo...@lucidworks.com] Sent: Tuesday, November 13, 2012 2:04 PM To: solr-user@lucene.apache.org Subject: Re: sort by function error I can't reproduce this with the example data. Here's an example of what I tried: http://localhost:8983/solr/query?q=*:*sort=geodist(store,-32.123323,108.123323)+ascgroup.field=inStockgroup=true Perhaps this is an issue that's since been fixed. -Yonik http://lucidworks.com On Mon, Nov 12, 2012 at 11:19 PM, Kuai, Ben ben.k...@sensis.com.au wrote: Hi Yonik Thanks for the reply. My sample query, q=cafesort=geodist(geoLocation,-32.123323,108.123323)+ascgroup.field=familyId field name=geoLocation type=latLon indexed=true stored=false / field name=familyId type=string indexed=true stored=false / as long as I remove the group field the query working. BTW, I just find out that the version of solr we are using is an old copy of 4.0 snapshot before the alpha release. Could that be the problem? we have some customized parsers so it will take quite some time to upgrade. Ben From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley [yo...@lucidworks.com] Sent: Tuesday, November 13, 2012 6:46 AM To: solr-user@lucene.apache.org Subject: Re: sort by function error On Mon, Nov 12, 2012 at 5:24 AM, Kuai, Ben ben.k...@sensis.com.au wrote: more information, problem only happends when I have both sort by function and grouping in query. I haven't been able to duplicate this with a few ad-hoc queries. Could you give your complete request (or at least all of the relevant grouping and sorting parameters), as well as the field type you are grouping on? -Yonik http://lucidworks.com
RE: sort by function error
Hi Yonik Thanks for the reply. My sample query, q=cafesort=geodist(geoLocation,-32.123323,108.123323)+ascgroup.field=familyId field name=geoLocation type=latLon indexed=true stored=false / field name=familyId type=string indexed=true stored=false / as long as I remove the group field the query working. BTW, I just find out that the version of solr we are using is an old copy of 4.0 snapshot before the alpha release. Could that be the problem? we have some customized parsers so it will take quite some time to upgrade. Ben From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley [yo...@lucidworks.com] Sent: Tuesday, November 13, 2012 6:46 AM To: solr-user@lucene.apache.org Subject: Re: sort by function error On Mon, Nov 12, 2012 at 5:24 AM, Kuai, Ben ben.k...@sensis.com.au wrote: more information, problem only happends when I have both sort by function and grouping in query. I haven't been able to duplicate this with a few ad-hoc queries. Could you give your complete request (or at least all of the relevant grouping and sorting parameters), as well as the field type you are grouping on? -Yonik http://lucidworks.com
RE: sort by function error
more information, problem only happends when I have both sort by function and grouping in query. From: Kuai, Ben [ben.k...@sensis.com.au] Sent: Monday, November 12, 2012 2:12 PM To: solr-user@lucene.apache.org Subject: sort by function error Hi I am trying to use sort by function something like sort=sum(field1, field2) asc But it is not working and I got error SortField needs to be rewritten through Sort.rewrite(..) and SortField.rewrite Please shed me some light on this. Thanks Ben Full exception stack track: SEVERE: java.lang.IllegalStateException: SortField needs to be rewritten through Sort.rewrite(..) and SortField.rewrite(..) at org.apache.lucene.search.SortField.getComparator(SortField.java:484) at org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.init(AbstractFirstPassGroupingCollector.java:82) at org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.init(TermFirstPassGroupingCollector.java:58) at org.apache.solr.search.Grouping$TermFirstPassGroupingCollectorJava6.init(Grouping.java:1009) at org.apache.solr.search.Grouping$CommandField.createFirstPassCollector(Grouping.java:632) at org.apache.solr.search.Grouping.execute(Grouping.java:301) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:373) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:201) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
RE: Help! Confused about using Jquery for the Search query - Want to ditch it
I'm new to Solr...but this is more of a web programming question...so I can get in on this :). Your only option to get the data from Solr sans-Javascript, is the use python to pull the results BEFORE the client loads the page. So, if you are asking if you can get AJAX like results (an already loaded page pulling info from your Solr server)...but without using Javascript...no, you cannot do that. You might be able to hack something ugly together using iframes, but trust me, you don't want to. It will look bad, it won't work well, and interacting with data in an iframe is nightmarish. So, basically, if you don't want to use Javascript, your only option is a total page reload every time you need to query Solr (which you then query on the python side.) -Original Message- From: Spadez [mailto:james_will...@hotmail.com] Sent: Thursday, June 07, 2012 11:37 AM To: solr-user@lucene.apache.org Subject: Re: Help! Confused about using Jquery for the Search query - Want to ditch it Thank you for the reply, but I'm afraid I don't understand :( This is how things are setup. On my Python website, I have a keyword and location box. When clicked, it queries the server via a javascript GET request, it then sends back the data via Json. I'm saying that I dont want to be reliant on Javascript. So I'm confused about the best way to not only send the request to the Solr server, but also how to receive the data. My guess is that a GET request without javascript is the right way to send the request to the Solr server, but then what should Solr be spitting out the other end, just an XML file? Then is the idea that my Python site would receive this XML data and display it on the site? -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988246.html Sent from the Solr - User mailing list archive at Nabble.com. Quincy and its subsidiaries do not discriminate in the sale of advertising in any medium (broadcast, print, or internet), and will accept no advertising which is placed with an intent to discriminate on the basis of race or ethnicity.
RE: Help! Confused about using Jquery for the Search query - Want to ditch it
Yes (or, at least, I think I understand what you are saying, haha.) Let me clarify. 1. Client sends GET request to web server 2. Web server (via Python, in your case, if I remember correctly) queries Solr Server 3. Solr server sends response to web server 4. You take that data and put it into the page you are creating server-side 5. Server returns static page to client -Original Message- From: Spadez [mailto:james_will...@hotmail.com] Sent: Thursday, June 07, 2012 12:53 PM To: solr-user@lucene.apache.org Subject: RE: Help! Confused about using Jquery for the Search query - Want to ditch it Hi Ben, Thank you for the reply. So, If I don't want to use Javascript and I want the entire page to reload each time, is it being done like this? 1. User submits form via GET 2. Solr server queried via GET 3. Solr server completes query 4. Solr server returns XML output 5. XML data put into results page 6. User shown new results page Is this basically how it would work if we wanted Javascript out of the equation? Regards, James -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988272.html Sent from the Solr - User mailing list archive at Nabble.com. Quincy and its subsidiaries do not discriminate in the sale of advertising in any medium (broadcast, print, or internet), and will accept no advertising which is placed with an intent to discriminate on the basis of race or ethnicity.
RE: Help! Confused about using Jquery for the Search query - Want to ditch it
As far as I know, it is the only way to do this. Look around a bit, Python (or PHP, or C, etc., etc.) is able to act as an HTTP client...in fact, that is the most common way that web services are consumed. But, we are definitely beyond the scope of the Solr list at this point. -Original Message- From: Spadez [mailto:james_will...@hotmail.com] Sent: Thursday, June 07, 2012 2:09 PM To: solr-user@lucene.apache.org Subject: RE: Help! Confused about using Jquery for the Search query - Want to ditch it Thank you, that helps. The bit I am still confused about how the server sends the response to the server though. I get the impression that there are different ways that this could be done, but is sending an XML response back to the Python server the best way to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html Sent from the Solr - User mailing list archive at Nabble.com. Quincy and its subsidiaries do not discriminate in the sale of advertising in any medium (broadcast, print, or internet), and will accept no advertising which is placed with an intent to discriminate on the basis of race or ethnicity.
RE: Help! Confused about using Jquery for the Search query - Want to ditch it
But, check out things like httplib2 and urllib2. -Original Message- From: Spadez [mailto:james_will...@hotmail.com] Sent: Thursday, June 07, 2012 2:09 PM To: solr-user@lucene.apache.org Subject: RE: Help! Confused about using Jquery for the Search query - Want to ditch it Thank you, that helps. The bit I am still confused about how the server sends the response to the server though. I get the impression that there are different ways that this could be done, but is sending an XML response back to the Python server the best way to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html Sent from the Solr - User mailing list archive at Nabble.com. Quincy and its subsidiaries do not discriminate in the sale of advertising in any medium (broadcast, print, or internet), and will accept no advertising which is placed with an intent to discriminate on the basis of race or ethnicity.
Faceting and Variable Buckets
Hello, Just wondering if the following is possible: We need to produce facets on ranges but they do not follow a steady increment which is all I can see SOLR can produce. Im looking for a way to produce facets on a price field: 0-1000 1000-5000 5000-1 1-2 Any suggestions with out waiting for https://issues.apache.org/jira/browse/SOLR-2366 Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
RE: need help to integrate SolrJ with my web application.
Hello, When I have seen this it usually means the SOLR you are trying to connect to is not available. Do you have it installed on: http://localhost:8080/solr Try opening that address in your browser. If your running the example solr using the embedded Jetty you wont be on 8080 :D Hope that helps -Original Message- From: Vijaya Kumar Tadavarthy [mailto:vijaya.tadavar...@ness.com] Sent: 16 April 2012 12:15 To: 'solr-user@lucene.apache.org' Subject: need help to integrate SolrJ with my web application. Hi All, I am trying to integrate solr with my Spring application. I have performed following steps: 1) Added below list of jars to my webapp lib folder. apache-solr-cell-3.5.0.jar apache-solr-core-3.5.0.jar apache-solr-solrj-3.5.0.jar commons-codec-1.5.jar commons-httpclient-3.1.jar lucene-analyzers-3.5.0.jar lucene-core-3.5.0.jar 2) I have added Tika jar files for processing binary files. tika-core-0.10.jar tika-parsers-0.10.jar pdfbox-1.6.0.jar poi-3.8-beta4.jar poi-ooxml-3.8-beta4.jar poi-ooxml-schemas-3.8-beta4.jar poi-scratchpad-3.8-beta4.jar 3) I have modified web.xml added below setup. filter filter-nameSolrRequestFilter/filter-name filter-classorg.apache.solr.servlet.SolrDispatchFilter/filter-class /filter filter-mapping filter-nameSolrRequestFilter/filter-name url-pattern/dataimport/url-pattern /filter-mapping servlet servlet-nameSolrServer/servlet-name servlet-classorg.apache.solr.servlet.SolrServlet/servlet-class load-on-startup1/load-on-startup /servlet servlet servlet-nameSolrUpdate/servlet-name servlet-classorg.apache.solr.servlet.SolrUpdateServlet/servlet-class load-on-startup2/load-on-startup /servlet servlet servlet-nameLogging/servlet-name servlet-classorg.apache.solr.servlet.LogLevelSelection/servlet-class /servlet servlet-mapping servlet-nameSolrUpdate/servlet-name url-pattern/update/*/url-pattern /servlet-mapping servlet-mapping servlet-nameLogging/servlet-name url-pattern/admin/logging/url-pattern /servlet-mapping I am trying to test this setup by running a simple java program with extract content of MS Excel file as below public SolrServer createNewSolrServer() { try { // setup the server... String url = http://localhost:8080/solr;; CommonsHttpSolrServer s = new CommonsHttpSolrServer( url ); s.setConnectionTimeout(100); // 1/10th sec s.setDefaultMaxConnectionsPerHost(100); s.setMaxTotalConnections(100); // where the magic happens s.setParser(new BinaryResponseParser()); s.setRequestWriter(new BinaryRequestWriter()); return s; } catch( Exception ex ) { throw new RuntimeException( ex ); } } public static void main(String[] args) throws IOException, SolrServerException { IndexFilesSolrCell infil = new IndexFilesSolrCell(); System.setProperty(solr.solr.home, /WebApp/PCS-DMI/WebContent/resources/solr); SolrServer serverone = infil.createNewSolrServer(); ContentStreamUpdateRequest reqext = new ContentStreamUpdateRequest(/update/extract); reqext.addFile(new File(Open Search Approach.xlsx)); reqext.setParam(ExtractingParams.EXTRACT_ONLY, true); System.out.println(Content Stream Data path: +serverone.toString()); NamedListObject result = serverone.request(reqext); System.out.println(Result: + result); } I am getting below exception Exception in thread main org.apache.solr.common.SolrException: Not Found Not Found request: http://localhost:8080/solr/update/extract?extractOnly=truewt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:432) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:246) Please direct me how to extract content... I have tried to work with example with solr distribution to extract a MS Excel file. The file extraction was successful and I could check the metadata using admin of example app. Thanks, Vijaya Kumar T PACIFIC COAST STEEL (Pinnacle) Project Ness Technologies India Pvt. Ltd 1st 2nd Floor, 2A Maximus Builing, Raheja Mindspace IT Park, Madhapur, Hyderabad, 500081, India. | Tel: +91 40 41962079 | Mobile: +91 9963001551 vijaya.tadavar...@ness.com | www.ness.com The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it.
RE: Solr data export to CSV File
A combination of the CSV response writer and SOLRJ to page through all of the results sending it to something like apache commons fileutils: FileUtils.writeStringToFile(new File(output.csv), outputLine (line.separator), true); Would be quiet quick to knock up in Java. Thanks Ben -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 13 April 2012 13:28 To: solr-user@lucene.apache.org Subject: Re: Solr data export to CSV File Does this help? http://wiki.apache.org/solr/CSVResponseWriter Best Erick On Fri, Apr 13, 2012 at 7:59 AM, Pavnesh pavnesh.ku...@altruistindia.com wrote: Hi Team, A very-very thanks to you guy who had developed such a nice product. I have one query regarding solr that I have app 36 Million data in my solr and I wants to export all the data to a csv file but I have found nothing on the same so please help me on this topic . Regards Pavnesh This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Errors during indexing
Hello We have just switched to Solr4 as we needed the ability to return geodist() along with our results. I use a simple multithreaded java app and solr to ingest the data. We keep seeing the following: 13-Apr-2012 15:50:10 org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Error handling 'status' action at org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:546) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:156) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:175) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: /usr/solr4/data/index/_2jb.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccaessFile.java:216) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:219) at org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:47) at org.apache.lucene.index.SegmentInfo.loadFieldInfos(SegmentInfo.java:201) at org.apache.lucene.index.SegmentInfo.getFieldInfos(SegmentInfo.java:227) at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:415) at org.apache.lucene.index.SegmentInfos.files(SegmentInfos.java:756) at org.apache.lucene.index.StandardDirectoryReader$ReaderCommit.init(StandardDirectoryReader.java:369) at org.apache.lucene.index.StandardDirectoryReader.getIndexCommit(StandardDirectoryReader.java:354) at org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:558) at org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:816) at org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:537) ... 16 more This seems to happen when were using the new admin tool. Im checking on the autocommit handler. Has anyone seen anything similar? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
RE: Simple Slave Replication Question
Hello, Had to leave the office so didn't get a chance to reply. Nothing in the logs. Just ran one through from the ingest tool. Same results full copy of the index. Is it something to do with: server.commit(); server.optimize(); I call this at the end of the ingestion. Would optimize then work across the whole index? Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 15:10 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Also, what happens if, instead of adding the 40K docs you add just one and commit? 2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com Have you changed the mergeFactor or are you using 10 as in the example solrconfig? What do you see in the slave's log during replication? Do you see any line like Skipping download for...? On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: I just have a index directory. I push the documents through with a change to a field. Im using SOLRJ to do this. Im using the guide from the wiki to setup the replication. When the feed of updates to the master finishes I call a commit again using SOLRJ. I then have a poll period of 5 minutes from the slave. When it kicks in I see a new version of the index and then it copys the full 5gb index. Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 14:29 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Hi Ben, only new segments are replicated from master to slave. In a situation where all the segments are new, this will cause the index to be fully replicated, but this rarely happen with incremental updates. It can also happen if the slave Solr assumes it has an invalid index. Are you committing or optimizing on the slaves? After replication, the index directory on the slaves is called index or index.timestamp? Tomás On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House
RE: Simple Slave Replication Question
That's great information. Thanks for all the help and guidance, its been invaluable. Thanks Ben -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 26 March 2012 12:21 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question It's the optimize step. Optimize essentially forces all the segments to be copied into a single new segment, which means that your entire index will be replicated to the slaves. In recent Solrs, there's usually no need to optimize, so unless and until you can demonstrate a noticeable change, I'd just leave the optimize step off. In fact, trunk renames it to forceMerge or something just because it's so common for people to think of course I want to optimize my index! and get the unintended consequences you're seeing even thought the optimize doesn't actually do that much good in most cases. Some people just do the optimize once a day (or week or whatever) during off-peak hours as a compromise. Best Erick On Mon, Mar 26, 2012 at 5:02 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Had to leave the office so didn't get a chance to reply. Nothing in the logs. Just ran one through from the ingest tool. Same results full copy of the index. Is it something to do with: server.commit(); server.optimize(); I call this at the end of the ingestion. Would optimize then work across the whole index? Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 15:10 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Also, what happens if, instead of adding the 40K docs you add just one and commit? 2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com Have you changed the mergeFactor or are you using 10 as in the example solrconfig? What do you see in the slave's log during replication? Do you see any line like Skipping download for...? On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: I just have a index directory. I push the documents through with a change to a field. Im using SOLRJ to do this. Im using the guide from the wiki to setup the replication. When the feed of updates to the master finishes I call a commit again using SOLRJ. I then have a poll period of 5 minutes from the slave. When it kicks in I see a new version of the index and then it copys the full 5gb index. Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 14:29 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Hi Ben, only new segments are replicated from master to slave. In a situation where all the segments are new, this will cause the index to be fully replicated, but this rarely happen with incremental updates. It can also happen if the slave Solr assumes it has an invalid index. Are you committing or optimizing on the slaves? After replication, the index directory on the slaves is called index or index.timestamp? Tomás On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6
Simple Slave Replication Question
Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
RE: Simple Slave Replication Question
So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
RE: Simple Slave Replication Question
I just have a index directory. I push the documents through with a change to a field. Im using SOLRJ to do this. Im using the guide from the wiki to setup the replication. When the feed of updates to the master finishes I call a commit again using SOLRJ. I then have a poll period of 5 minutes from the slave. When it kicks in I see a new version of the index and then it copys the full 5gb index. Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 14:29 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Hi Ben, only new segments are replicated from master to slave. In a situation where all the segments are new, this will cause the index to be fully replicated, but this rarely happen with incremental updates. It can also happen if the slave Solr assumes it has an invalid index. Are you committing or optimizing on the slaves? After replication, the index directory on the slaves is called index or index.timestamp? Tomás On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Data Import Handler Delta Import and Debug Mode Help
Good Afternoon, Im looking at Deltas via a DeltaImportHandler. I was running Solr 1.4.1 but just upgraded to 3.5. Previously I was able to run debug and verbose from: http://localhost:8080/solr/admin/dataimport.jsp?handler=/advert But since upgrading when choosing these options the right panel does not populate with anything. Am I missing something when i upgraded as I copied all relevant jars to my classpath. This is proving a problem as im trying to debug why my delta import is not picking up any records: entity name=Stock pk=ID query=select * from stock_item s join advert_detail a on a.stock_item_id=s.id where a.Destination='ConsumerWebsite' deltaImportQuery=select * from stock_item s join advert_detail a on a.stock_item_id=s.id where a.Destination='foo' and s.id='${dataimporter.delta.ID}' deltaQuery=select s.ID from stock_item s where s.last_updated gt; to_date('${dataimporter.last_index_time}','-MM-DD hh24:mm:ss') dataSource=pos_ds The entity does have two nested entitys with in it. When I run the query for the delta on the DB I get back the expected 100 stock id’s Any help would be appreciated. Thanks Ben
RE: changing the root directory where solrCloud stores info inside zookeeper File system
Thanks A lot mark, Since My SolrCloud code was old I tried downloading and building the newest code from here https://svn.apache.org/repos/asf/lucene/dev/trunk/ I am using tomcat6 I manually created the sc sub-directory in my zooKeeper ensemble file-system I used this connection String to my ZK ensemble zook1:2181/sc,zook2:2181/sc,zook3:2181/sc but I still get the same problem here is the entire catalina.out log with the exception Using CATALINA_BASE: /opt/tomcat6 Using CATALINA_HOME: /opt/tomcat6 Using CATALINA_TMPDIR: /opt/tomcat6/temp Using JRE_HOME:/usr/java/default/ Using CLASSPATH: /opt/tomcat6/bin/bootstrap.jar Java HotSpot(TM) 64-Bit Server VM warning: Failed to reserve shared memory (errno = 12). Aug 2, 2011 4:28:46 AM org.apache.catalina.core.AprLifecycleListener init INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/java/jdk1.6.0_21/jre/lib/amd64/server:/usr/java/jdk1.6.0_21/jre/lib/a md64:/usr/java/jdk1.6.0_21/jre/../lib/amd64:/usr/java/packages/lib/amd64:/ usr/lib64:/lib64:/lib:/usr/lib Aug 2, 2011 4:28:46 AM org.apache.coyote.http11.Http11Protocol init INFO: Initializing Coyote HTTP/1.1 on http-8983 Aug 2, 2011 4:28:46 AM org.apache.coyote.http11.Http11Protocol init INFO: Initializing Coyote HTTP/1.1 on http-8080 Aug 2, 2011 4:28:46 AM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 448 ms Aug 2, 2011 4:28:46 AM org.apache.catalina.core.StandardService start INFO: Starting service Catalina Aug 2, 2011 4:28:46 AM org.apache.catalina.core.StandardEngine start INFO: Starting Servlet Engine: Apache Tomcat/6.0.29 Aug 2, 2011 4:28:46 AM org.apache.catalina.startup.HostConfig deployDescriptor INFO: Deploying configuration descriptor solr1.xml Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /home/tomcat/solrCloud1 Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/home/tomcat/solrCloud1/' Aug 2, 2011 4:28:46 AM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /home/tomcat/solrCloud1 Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /home/tomcat/solrCloud1/solr.xml Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer init INFO: New CoreContainer 853527367 Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /home/tomcat/solrCloud1 Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/home/tomcat/solrCloud1/' Aug 2, 2011 4:28:46 AM org.apache.solr.cloud.SolrZkServerProps getProperties INFO: Reading configuration from: /home/tomcat/solrCloud1/zoo.cfg Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer initZooKeeper INFO: Zookeeper client=zook1:2181/sc,zook2:2181/sc,zook3:2181/sc Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:host.name=ob1079.nydc1.outbrain.com Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.version=1.6.0_21 Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.vendor=Sun Microsystems Inc. Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.home=/usr/java/jdk1.6.0_21/jre Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.class.path=/opt/tomcat6/bin/bootstrap.jar Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.library.path=/usr/java/jdk1.6.0_21/jre/lib/amd64/server:/ usr/java/jdk1.6.0_21/jre/lib/amd64:/usr/java/jdk1.6.0_21/jre/../lib/amd64: /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.io.tmpdir=/opt/tomcat6/temp Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.compiler=NA Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:os.name=Linux Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:os.arch=amd64 Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:os.version=2.6.18-194.8.1.el5 Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:user.name=tomcat Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:user.home=/home/tomcat Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client
changing the root directory where solrCloud stores info inside zookeeper File system
Hi! I am using solrCloud with a zookeeper ensamble of 3. I noticed that solcOuld stores information direclt under the root dir in the ZooKeepr file system: \config \live_nodes \ collections In my setup Zookeepr is also used by other modules so I would like solrCLoud to store everything under /solrCLoud/ or something similar Is there a property for that or do I need to custom code it ? Thanks
Solr Request Logging
I am using the trunk version of solr and I am getting a ton more logging information than I really care to see and definitely more than 1.4, but I cant really see a way to change it. A little background: I am faceting on fields that have a very high number of distinct values and also returning large numbers of documents in a sharded environment. For example: INFO: [core1] webapp=/solr path=/select params={facet=trueattr_lng_rng_low.revenue__terms=a lot of distinct values moreParams ...} Another example: INFO: [core1] webapp=/solr path=/select params={facet=falsefacet.mincount=1ids=a lot of document idsmoreParams...} In just a few minutes, I have racked up 10MB of log my dev environment. Any ideas for a sane way of handling these messages? I imagine its slowing down Solr as well. Thanks -Ben
Localized alphabetical order
As someone who's new to Solr/Lucene, I'm having trouble finding information on sorting results in localized alphabetical order. I've ineffectively searched the wiki and the mail archives. I'm thinking for example about Hawai'ian, where mīka (with an i-macron) comes after mika (i without the macron) but before miki (also without the macron), or about Welsh, where the digraphs (ch, dd, etc.) are treated as single letters, or about Ojibwe, where the apostrophe ' is a letter which sorts between h and i. How do non-English languages typically handle this? -Ben
Field Analyzers: which values are indexed?
Hi there, Just a quick question that the wiki page ( http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem to answer very well. Given an analyzer that has zero or more Char Filter Factories, one Tokenizer Factory, and zero or more Token Filter Factories, which value(s) are indexed? Is every value that is produced from each char filter, tokenizer, and filter indexed? Or is the only the final value after completing the whole chain indexed? Cheers, Ben
Re: Field Analyzers: which values are indexed?
Thanks both for your replies Eric, Yep, I use the Analysis page extensively, but what I was directly looking for was whether all of only the last line of values given by the analysis page, where eventually indexed. I think we've concluded it's only the last line. Cheers, Ben On Wed, Apr 13, 2011 at 2:41 PM, Erick Erickson erickerick...@gmail.comwrote: CharFilterFactories are applied to the raw input before tokenization. Each token output from the tokenization is then sent through the rest of the chain. The Analysis page available from the Solr admin page is invaluable in answering in great detail what each part of an analysis chain does. TokenFilterFactories are applied to each token emitted from the tokenizer, and this includes the similar PatternReplaceFilterFactory. The difference is that the PatternReplaceCharFilterFactory is applied before tokenization to the entire input stream and PatternReplaceFilterFactory is applied to each token emitted by the tokenizer. And to make it even more fun, you can do both! Best Erick On Wed, Apr 13, 2011 at 8:14 AM, Ben Davies ben.dav...@gmail.com wrote: Hi there, Just a quick question that the wiki page ( http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem to answer very well. Given an analyzer that has zero or more Char Filter Factories, one Tokenizer Factory, and zero or more Token Filter Factories, which value(s) are indexed? Is every value that is produced from each char filter, tokenizer, and filter indexed? Or is the only the final value after completing the whole chain indexed? Cheers, Ben
Re: question on solr.ASCIIFoldingFilterFactory
I can't remember where I read it, but I think MappingCharFilterFactory is prefered. There is an example in the example schema. charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ From this, I get: org.apache.solr.analysis.MappingCharFilterFactory {mapping=mapping-ISOLatin1Accent.txt} |text|despues| On Tue, Apr 5, 2011 at 5:06 PM, Nemani, Raj raj.nem...@turner.com wrote: All, I am using solr.ASCIIFoldingFilterFactory to perform accent insensitive search. One of the words that got indexed as part my indexing process is después. Having used the ASCIIFoldingFilterFactory,I expected that If I searched for word despues I should have the document containing the word después show up in the results but that was not the case. Then I used the Analysis.jsp to analyze después and noticed that the ASCIIFoldingFilterFactory folded después as despue. If I repeat the above exercise for the word Imágenes, then Analysis.jsp tell me that the ASCIIFoldingFilterFactory folded Imágenes as imagen. But I can search for Imagenes and get the correct results. I am not familiar with Spanish but I found the above behavior confusing. Can anybody please explain the behavior described above? Thank a million in advance Raj
MoreLikeThis with document that has not been indexed
Hello, It is currently possible to use the MoreLikeThis handler to find documents similar to a given document in the index. Is there any way to feed the handler a new document in XML or JSON (as one would do for adding to the index) and have it find similar documents without indexing the target document? I understand that it is possible to do a MLT query using free text, but I want to utilize structured data. Thanks, Ben -- Ben Anhalt ben.anh...@gmail.com Mi parolas Esperante.
Cache size
Hi folks, Is there any way to know the size *in bytes* occupied by a cache (filter cache, doc cache ...)? I don't find such information within the stats page. Regards -- Mehdi BEN HAJ ABBES
Re: xpath processing
processor=FileListEntityProcessor fileName=.*xml recursive=true Shouldn't this be fileName=*.xml? Ben On Oct 22, 2010, at 10:52 PM, pghorp...@ucla.edu wrote: dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=f rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=.*xml recursive=true baseDir=C:\data\sample_records\mods\starr entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${f.fileAbsolutePath} stream=false forEach=/mods transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer field column=id template=${f.file}/ field column=collectionKey template=starr/ field column=collectionName template=starr/ field column=fileAbsolutePath template=${f.fileAbsolutePath}/ field column=fileName template=${f.file}/ field column=fileSize template=${f.fileSize}/ field column=fileLastModified template=${f.fileLastModified}/ field column=classification_keyword xpath=/mods/classification/ field column=accessCondition_keyword xpath=/mods/accessCondition/ field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date'] / /entity /entity /document /dataConfig Quoting Ken Stanley doh...@gmail.com: Parinita, In its simplest form, what does your entity definition for DIH look like; also, what does one record from your xml look like? We need more information before we can really be of any help. :) - Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote: Quoting pghorp...@ucla.edu: Can someone help me please? I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath=/mods/name/namepa...@type = 'date'] I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita
Multiple indexes inside a single core
We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Re: Multiple indexes inside a single core
Thanks Erick. The problem with multiple cores is that the documents are scored independently in each core. I would like to be able to search across both cores and have the scores 'normalized' in a way that's similar to what Lucene's MultiSearcher would do. As far a I understand, multiple cores would likely result in seriously skewed scores in my case since the documents are not distributed evenly or randomly. I could have one core/index with 20 million docs and another with 200. I've poked around in the code and this feature doesn't seem to exist. I would be happy with finding a decent place to try to add it. I'm not sure if there is a clean place for it. Ben On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote: It seems to me that multiple cores are along the lines you need, a single instance of Solr that can search across multiple sub-indexes that do not necessarily share schemas, and are independently maintainable.. This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin HTH Erick On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote: We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes. I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben
Re: How to delete a SOLR document if that particular data doesnt exist in DB?
Now my question is.. Is there a way I can use preImportDeleteQuery to delete the documents from SOLR for which the data doesnt exist in back end db? I dont have anything called delete status in DB, instead I need to get all the UID's from SOLR document and compare it with all the UID's in back end and delete the data from SOLR document for the UID's which is not present in DB. I've done something like this with raw Lucene and I'm not sure how or if you could do it with Solr as I'm relatively new to it. We stored a timestamp for when we started to import and stored an update timestamp field for every document added to the index. After the data import, we did a delete by query that matched all documents with a timestamp older than when we started. The assumption being that if we didn't update the timestamp during the load, then the record must have been deleted from the database. Hope this helps. Ben On Wed, Oct 20, 2010 at 8:05 PM, Erick Erickson erickerick...@gmail.comwrote: We are indexing multiple data by data types hence cant delete the index and do a complete re-indexing each week also we want to delete the orphan solr documents (for which the data is not present in back end DB) on a daily basis. Can you make delete by query work? Something like delete all Solr docs of a certain type and do a full re-index of just that type? I have no idea whether this is practical or not But your solution also works. There's really no way Solr #can# know about deleted database records, especially since the uniqueKey field is completely arbitrarily defined. Best Erick On Wed, Oct 20, 2010 at 10:51 AM, bbarani bbar...@gmail.com wrote: Hi, I have a very common question but couldnt find any post related to my question in this forum, I am currently initiating a full import each week but the data that have been deleted in the source is not update in my document as I am using clean=false. We are indexing multiple data by data types hence cant delete the index and do a complete re-indexing each week also we want to delete the orphan solr documents (for which the data is not present in back end DB) on a daily basis. Now my question is.. Is there a way I can use preImportDeleteQuery to delete the documents from SOLR for which the data doesnt exist in back end db? I dont have anything called delete status in DB, instead I need to get all the UID's from SOLR document and compare it with all the UID's in back end and delete the data from SOLR document for the UID's which is not present in DB. Any suggestion / ideas would be of great help. Note: Currently I have developed a simple program which will fetch the UID's from SOLR document and then connect to backend DB to check the orphan UID's and delete the documents from SOLR index corresponding to orphan UID's. I just dont want to re-invent the wheel if this feature is already present in SOLR as I need to do more testing in terms of performance / scalability for my program.. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html Sent from the Solr - User mailing list archive at Nabble.com.
possible bug in zookeeper / solrCloud ?
Hi I am using solrCloud which uses an ensemble of 3 zookeeper instances. I am performing survivability tests: Taking one of the zookeeper instances down I would expect the client to use a different zookeeper server instance. But as you can see in the below logs attached Depending on which instance I choose to take down (in my case, the last one in the list of zookeeper servers) the client is constantly insisting on the same zookeeper server (Attempting connection to server zook3/192.168.252.78:2181) and not switching to a different one Any one has an idea on this ? Solr cloud currently is using zookeeper-3.2.2.jar Is this a know bug that was fixed in later versions ?( 3.3.1) Thanks in advance, Yatir Logs: Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):999) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970) Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):1004) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info INFO: Attempting connection to server zook3/192.168.252.78:2181 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Exception closing session 0x32b105244a20001 to sun.nio.ch.selectionkeyi...@3ca58cbf java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):999) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):1004) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info INFO: Attempting connection to server zook3/192.168.252.78:2181 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Exception closing session 0x32b105244a2 to sun.nio.ch.selectionkeyi...@3960f81b java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):999) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):1004) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
solrCloud zookeepr related excpetions
Hi I am running a zookeeper ensemble of 3 zookeeper instances and established a solrCloud to work with it (2 masters , 2 slaves) on each master machine I have 2 shards (4 shards in total) on one of the masters I keep noticing ZooKeeper related exceptions which I can't understand: One appears to be TIME OUT in (ClientCnxn.java):906 And the other is java.lang.IllegalArgumentException: Path cannot be null (PathUtils.java:45) Here are my logs (I set the log level to FINE on zookeeper package) Anyone can identify the issue? FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null serverPath:null finished:false header:: -8,101 replyHeader:: -8,-1,0 request:: 30064776552,v{'/collections},v{},v{'/collections/ENPwl/shards/ENPWL1,'/collections/ENPwl/shards/ENPWL4,'/collections/ENPwl/shards/ENPWL2,'/collections,'/collections/ENPwl/shards/ENPWL3,'/collections/ENPwlMaster/shards/ENPWLMaster_3,'/collections/ENPwlMaster/shards/ENPWLMaster_4,'/live_nodes,'/collections/ENPwlMaster/shards/ENPWLMaster_1,'/collections/ENPwlMaster/shards/ENPWLMaster_2} response:: null Aug 25, 2010 5:18:19 AM org.apache.log4j.Category debug FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null serverPath:null finished:false header:: 540,8 replyHeader:: 540,-1,0 request:: '/collections,F response:: v{'ENPwl,'ENPwlMaster} Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader updateCloudState INFO: Cloud state update for ZooKeeper already scheduled Aug 25, 2010 5:18:19 AM org.apache.log4j.Category error SEVERE: Error while calling watcher java.lang.IllegalArgumentException: Path cannot be null at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45) at org.apache.zookeeper.ZooKeeper.getChildren(zookeeper:ZooKeeper.java):1196) at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:200) at org.apache.solr.common.cloud.ZkStateReader$5.process(ZkStateReader.java:315) at org.apache.zookeeper.ClientCnxn$EventThread.run(zookeeper:ClientCnxn.java):425) Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process INFO: Detected a shard change under ShardId:ENPWL3 in collection:ENPwl Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader updateCloudState INFO: Cloud state update for ZooKeeper already scheduled Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process INFO: Detected a shard change under ShardId:ENPWL4 in collection:ENPwl Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader updateCloudState INFO: Cloud state update for ZooKeeper already scheduled Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process INFO: Detected a shard change under ShardId:ENPWL1 in collection:ENPwl Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader updateCloudState INFO: Cloud state update for ZooKeeper already scheduled Aug 25, 2010 5:18:19 AM org.apache.solr.cloud.ZkController$2 process INFO: Updating live nodes:org.apache.solr.common.cloud.solrzkcli...@55308275 Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader updateCloudState INFO: Updating live nodes from ZooKeeper... Aug 25, 2010 5:18:19 AM org.apache.log4j.Category debug FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null serverPath:null finished:false header:: 541,8 replyHeader:: 541,-1,0 request:: '/live_nodes,F response:: v{'ob1078.nydc1.outbrain.com:8983_solr2,'ob1078.nydc1.outbrain.com:8983_solr1,'ob1061.nydc1.outbrain.com:8983_solr2,'ob1062.nydc1.outbrain.com:8983_solr1,'ob1062.nydc1.outbrain.com:8983_solr2,'ob1061.nydc1.outbrain.com:8983_solr1,'ob1077.nydc1.outbrain.com:8983_solr2,'ob1077.nydc1.outbrain.com:8983_solr1} Aug 25, 2010 5:18:19 AM org.apache.log4j.Category error SEVERE: Error while calling watcher java.lang.IllegalArgumentException: Path cannot be null at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45) at org.apache.zookeeper.ZooKeeper.getChildren(zookeeper:ZooKeeper.java):1196) at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:200) at org.apache.solr.cloud.ZkController$2.process(ZkController.java:321) at org.apache.zookeeper.ClientCnxn$EventThread.run(zookeeper:ClientCnxn.java):425) Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ConnectionManager process INFO: Watcher org.apache.solr.common.cloud.connectionmana...@339bb448 name:ZooKeeperConnection Watcher:zook1:2181,zook2:2181,zook3:2181 got event WatchedEvent: Server state change. New state: Disconnected path:null type:None Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process INFO: Detected a shard change under ShardId:ENPWLMaster_1 in collection:ENPwlMaster Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader updateCloudState INFO: Cloud state update for ZooKeeper already scheduled Aug 25, 2010 5:18:19 AM
question: havnig multiple solrCloud configuration on the same machine
Hi! I am using solrCloud with tomcat5.5 in my setup every lanugage has an its own index and its own solr filters so it needs a seprated solr configuration files. in solrCLoud examples posted here : http://wiki.apache.org/solr/SolrCloud I noticed that bootstrap_confdir is a given as global -D parameter but I need to be able to supply it per Core I tried doing this in solr.xml but failed solr.xml core name='coreES' instanceDir='coreES/' property name='dataDir' value='/Data/Solr/coreES' / property name=bootstrap_confdir value=/home/tomcat/solr/coreES/conf/ /core all my cores are usign the same zoo keeper configuration according to the -Dbootstrap_confdi=... does anyone know how I can specify the bootstrap_confdir on a per-core basis? thanks Yatir Ben Shlomo Outbrain Engineering yat...@outbrain.commailto:yat...@outbrain.com tel: +972-73-223912 fax: +972-9-8350055 www.outbrain.comhttp://www.outbrain.com/
question: solrCloud with multiple cores on each machine
Hi I am using solrCloud. Suppose I have a total 4 machines dedicated for solr. I want to have 2 machines as replication (salves) and 2 masters But I want to work with 8 logical cores rather 2. i.e. each master (and each slave) will have 4 cores on it. the reason is that I can optimize the cores one at a time so the IO intensity at any given moment will be low and will not degrade the online performance Is there a way to configure my solr.xml so that when I am doing a distributed search (distrib=true) it will know to query all 8 cores ? Thanks Yatir
Re: How real-time are Soir/Lucene queries?
You may wish to look at Lucandra: http://github.com/tjake/Lucandra On 21 May 2010, at 06:12, Walter Underwood wrote: Solr is a very good engine, but it is not real-time. You can turn off the caches and reduce the delays, but it is fundamentally not real-time. I work at MarkLogic, and we have a real-time transactional search engine (and respository). If you are curious, contact me directly. I do like Solr for lots of applications -- I chose it when I was at Netflix. wunder On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote: Hello Soir, Soir looks like an excellent API and its nice to have a tutorial that makes it easy to discover the basics of what Soir does, I'm impressed. I can see plenty of potential uses of Soir/Lucene and I'm interested now in just how real-time the queries made to an index can be? For example, in my application I have time ordered data being processed by a paint method in real-time. Each piece of data is identified and its associated renderer is invoked. The Java2D renderer would then lookup any layout and style values it requires to render the current data it has received from the layout and style indexes. What I'm wondering is if this lookup which would be a Lucene search will be fast enough? Would it be best to make Lucene queries for the relevant layout and style values required by the renderers ahead of rendering time and have the query results placed into the most performant collection (map/array) so renderer lookup would be as fast as possible? Or can Lucene handle many individual lookup queries fast enough so rendering is quick? Best regards from Canada, Thom
Re: How real-time are Soir/Lucene queries?
Further to earlier note re Lucandra. I note that Cassandra, which Lucandra backs onto, is 'eventually consistent', so given your real- time requirements, you may want to review this in the first instance, if Lucandra is of interest. On 21 May 2010, at 06:12, Walter Underwood wrote: Solr is a very good engine, but it is not real-time. You can turn off the caches and reduce the delays, but it is fundamentally not real-time. I work at MarkLogic, and we have a real-time transactional search engine (and respository). If you are curious, contact me directly. I do like Solr for lots of applications -- I chose it when I was at Netflix. wunder On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote: Hello Soir, Soir looks like an excellent API and its nice to have a tutorial that makes it easy to discover the basics of what Soir does, I'm impressed. I can see plenty of potential uses of Soir/Lucene and I'm interested now in just how real-time the queries made to an index can be? For example, in my application I have time ordered data being processed by a paint method in real-time. Each piece of data is identified and its associated renderer is invoked. The Java2D renderer would then lookup any layout and style values it requires to render the current data it has received from the layout and style indexes. What I'm wondering is if this lookup which would be a Lucene search will be fast enough? Would it be best to make Lucene queries for the relevant layout and style values required by the renderers ahead of rendering time and have the query results placed into the most performant collection (map/array) so renderer lookup would be as fast as possible? Or can Lucene handle many individual lookup queries fast enough so rendering is quick? Best regards from Canada, Thom
Re: Custom sort
It could be that you should be providing an implementation of SortComparatorSource I have missed the earlier part of this thread, I assume you're trying to implement some form of custom search? B dontthinktwice wrote: Marc Sturlese wrote: I have been able to create my custom field. The problem is that I have laoded in the solr core a couple of HashMapsid_doc,value_influence_sort from a DB with values that will influence in the sort. My problem is that I don't know how to let my custom sort have access to this HashMaps. I am a bit confused now. I think that would be easy to reach my goal using: CustomSortComponent extends SearchComponent implements SolrCoreAware This way, I would load the HashMaps in the inform method and would create de custom sort using the HashMaps in the preprare method. Don't know how to do that with the CustomField (similar to the RandomField)... any advice? Marc, did you get this working somehow? I'm looking at doing something similar, and before I make a custom sort field (like RandomSortField) I would be delighted to know that I can give it access to a the data structure it will need to calculate the sort...
Re: DocSlice andNotSize
DocSet isn't an object it's an interface. The DocSlice class *implements* DocSet. What you're saying about set operations not working for DocSlice but working for DocSet then doesn't make any sense... can you clarify? The failure of these set operations to work as expected is confusing the hell out of me too! Thanks Ben Yonik Seeley wrote: On Thu, Jul 2, 2009 at 4:24 PM, Candide Kemmlercand...@palacehotel.org wrote: I have a simple question rel the DocSlice class. I'm trying to use the (very handy) set operations on DocSlices and I'm rather confused by the way it behaves. I have 2 DocSlices, atlDocs which, by looking at the debugger, holds a docs array of ints of size 1; the second DocSlice is btlDocs, with a docs array of ints of size 67. I know that atlDocs is a subset of btlDocs, so the doing btlDocs.andNotSize(atlDocs) should really return 66. But it's returning 10. The short answer is that all of the set operations were only designed for DocSets (as opposed to DocLists). Yes, perhaps DocList should not have extended DocSet... -Yonik http://www.lucidimagination.com
Re: Excluding characters from a wildcard query - More Info - Is this difficult, or am I being ignored because it's too obvious to merit an answer?
Ben wrote: The exception SOLR raises is : org.apache.lucene.queryParser.ParseException: Cannot parse 'vector:_*[^_]*_[^_]*_[^_]*': Encountered ] at line 1, column 12. Was expecting one of: TO ... RANGEIN_QUOTED ... RANGEIN_GOOP ... Ben wrote: Passing in a RegularExpression like [^_]*_[^_]* (e.g. matching anything with an underscore in the string) using some code like : ... parameters.add(fq, vector:[^_]*_[^_]*); ... seems to cause problems for SOLR, I assume because of the [ or ^ character. Can somebody please advise how to handle character exclusion in such searches? Any help or pointers are much appreciated! Thanks Ben
Re: Excluding characters from a wildcard query
Yes, I had done that... however, I'm beginning to see now that what I am doing is called a wildcard query which is going via Lucene's queryparser. Lucene's query parser doesn't not support the regexp idea of character exclusion ... i.e. I'm not trying to match [ I'm trying to express Match as many characters as possible, which are not underscores with [^_]* Perhaps I'm going about my whole problem in an ineffective way, but I'm not sure how I can sensibly describe what I'm doing without it becoming a long document. The only other approach I can think of is to change what I'm indexing but I'm not sure how to achieve that. I've tried explaining it once, and obviously failed, so I'll try again. I'm given a string containing many vectors (where each dimension is separated by an underscore, and each vector is seperated by a comma) e.g. A1_B1_C1_D1,A2_B2_C2_D2,A3_B3_C3_D3 I want my facet query to tell me if, within one of the vectors within that string, there is a match for dimensions I'm interested in. Of the four dimensions in this example, I may choose to fix an arbitrary number of them with values, and the rest with wildcards e.g. I might look for a facet containing Ox_*_*_* so one of the vectors in the string must have its first dimension matching Ox and I don't care about the rest. ***Is there a way to break down this string on the comma's so that I can apply a normal wildcard query and SOLR applies it to each individually?*** That would solve all my problems : e.g. The string is internally represented in lucene/solr as A1_B1_C1_D1 A2_B2_C2_D2 A3_B3_C3_D3 where it tries to match the wildcard query on each in turn? Thanks for you help, I'm deeply confused about this at the moment... Ben
Re: Excluding characters from a wildcard query
Is there a way in the Schema to specify that the comma should be used to split the values up? e.g. Can I specify my vector field as multivalue and also specify some sort of tokeniser to automatically split on commas? Ben Uwe Klosa wrote: You should split the strings at the comma yourself and store the values in a multivalued field? Then wildcard search like A1_* are not a problem. I don't know so much about facets. But if they work on multivalued fields that should be then no problem at all. Uwe 2009/7/1 Ben b...@autonomic.net Yes, I had done that... however, I'm beginning to see now that what I am doing is called a wildcard query which is going via Lucene's queryparser. Lucene's query parser doesn't not support the regexp idea of character exclusion ... i.e. I'm not trying to match [ I'm trying to express Match as many characters as possible, which are not underscores with [^_]* Perhaps I'm going about my whole problem in an ineffective way, but I'm not sure how I can sensibly describe what I'm doing without it becoming a long document. The only other approach I can think of is to change what I'm indexing but I'm not sure how to achieve that. I've tried explaining it once, and obviously failed, so I'll try again. I'm given a string containing many vectors (where each dimension is separated by an underscore, and each vector is seperated by a comma) e.g. A1_B1_C1_D1,A2_B2_C2_D2,A3_B3_C3_D3 I want my facet query to tell me if, within one of the vectors within that string, there is a match for dimensions I'm interested in. Of the four dimensions in this example, I may choose to fix an arbitrary number of them with values, and the rest with wildcards e.g. I might look for a facet containing Ox_*_*_* so one of the vectors in the string must have its first dimension matching Ox and I don't care about the rest. ***Is there a way to break down this string on the comma's so that I can apply a normal wildcard query and SOLR applies it to each individually?*** That would solve all my problems : e.g. The string is internally represented in lucene/solr as A1_B1_C1_D1 A2_B2_C2_D2 A3_B3_C3_D3 where it tries to match the wildcard query on each in turn? Thanks for you help, I'm deeply confused about this at the moment... Ben
Re: Excluding characters from a wildcard query
I'm not quite sure I understand exactly what you mean. The string I'm processing could have many tens of thousands of values... I hope you aren't implying I'd need to split it into many tens of thousands of columns. If you're saying what I think you're saying, you're saying that I should leave whitespaces between the individual parts of the string, pass in the string into a multiValued field and have SOLR internally treat each word as an individual entity? Thanks for your help with this... Ben Uwe Klosa wrote: To get the desired efffect I described you have to do the split before you send the document to solr. I'm not aware of an analyzer that can split one field value into several field values. The analyzers and tokenizers do create tokens from field values in many different ways. As I see it you have to do some preprocessing yourself. Uwe 2009/7/1 Ben b...@autonomic.net Is there a way in the Schema to specify that the comma should be used to split the values up? e.g. Can I specify my vector field as multivalue and also specify some sort of tokeniser to automatically split on commas? Ben Uwe Klosa wrote: You should split the strings at the comma yourself and store the values in a multivalued field? Then wildcard search like A1_* are not a problem. I don't know so much about facets. But if they work on multivalued fields that should be then no problem at all. Uwe 2009/7/1 Ben b...@autonomic.net Yes, I had done that... however, I'm beginning to see now that what I am doing is called a wildcard query which is going via Lucene's queryparser. Lucene's query parser doesn't not support the regexp idea of character exclusion ... i.e. I'm not trying to match [ I'm trying to express Match as many characters as possible, which are not underscores with [^_]* Perhaps I'm going about my whole problem in an ineffective way, but I'm not sure how I can sensibly describe what I'm doing without it becoming a long document. The only other approach I can think of is to change what I'm indexing but I'm not sure how to achieve that. I've tried explaining it once, and obviously failed, so I'll try again. I'm given a string containing many vectors (where each dimension is separated by an underscore, and each vector is seperated by a comma) e.g. A1_B1_C1_D1,A2_B2_C2_D2,A3_B3_C3_D3 I want my facet query to tell me if, within one of the vectors within that string, there is a match for dimensions I'm interested in. Of the four dimensions in this example, I may choose to fix an arbitrary number of them with values, and the rest with wildcards e.g. I might look for a facet containing Ox_*_*_* so one of the vectors in the string must have its first dimension matching Ox and I don't care about the rest. ***Is there a way to break down this string on the comma's so that I can apply a normal wildcard query and SOLR applies it to each individually?*** That would solve all my problems : e.g. The string is internally represented in lucene/solr as A1_B1_C1_D1 A2_B2_C2_D2 A3_B3_C3_D3 where it tries to match the wildcard query on each in turn? Thanks for you help, I'm deeply confused about this at the moment... Ben
Re: Excluding characters from a wildcard query
my brain was switched off. I'm using SOLRJ, which means I'll need to specify multiple : addMultipleFields(solrDoc, vector, vectorvalue, 1.0f); for each value to be added to the multiValuedField. Then, with luck, the simple wildcard query will be executed over each individual value when looking for matches, meaning the simple query syntax can made adequate to do what's needed. Many thanks Uwe. B Uwe Klosa wrote: 2009/7/1 Ben b...@autonomic.net I'm not quite sure I understand exactly what you mean. The string I'm processing could have many tens of thousands of values... I hope you aren't implying I'd need to split it into many tens of thousands of columns. No, that is not what I meant. It will be one field (column) with tens of thousands of values. If you're saying what I think you're saying, you're saying that I should leave whitespaces between the individual parts of the string, pass in the string into a multiValued field and have SOLR internally treat each word as an individual entity? Thanks for your help with this... I said nothing about whitespaces. I don't know how you update your solr documents. Are you using XML or Solrj? Uwe
Building Solr index with Lucene
For performance reasons, we're attempting to build the index used with Solr, directly in Lucene. It works for the most part fine, but I'm having issue when it comes to stemming. I'm guessing this is due to a mismatch in how Lucene is stemming, with how Solr stems during its queries or something. Has anyone built their Solr index using Lucene, and how did you handle stemmed fields in Lucene so that Solr worked properly with them? Cheers, Ben
Excluding characters from a wildcard query
Passing in a RegularExpression like [^_]*_[^_]* (e.g. matching anything with an underscore in the string) using some code like : ... parameters.add(fq, vector:[^_]*_[^_]*); ... seems to cause problems for SOLR, I assume because of the [ or ^ character. Can somebody please advise how to handle character exclusion in such searches? Any help or pointers are much appreciated! Thanks Ben
Re: Excluding characters from a wildcard query - More Info
The exception SOLR raises is : org.apache.lucene.queryParser.ParseException: Cannot parse 'vector:_*[^_]*_[^_]*_[^_]*': Encountered ] at line 1, column 12. Was expecting one of: TO ... RANGEIN_QUOTED ... RANGEIN_GOOP ... Ben wrote: Passing in a RegularExpression like [^_]*_[^_]* (e.g. matching anything with an underscore in the string) using some code like : ... parameters.add(fq, vector:[^_]*_[^_]*); ... seems to cause problems for SOLR, I assume because of the [ or ^ character. Can somebody please advise how to handle character exclusion in such searches? Any help or pointers are much appreciated! Thanks Ben
Excluding Characters and SubStrings in a Faceted Wildcard Query
Hello, I've been using SOLR for a while now, but am stuck for information on two issues : 1) Is it possible to exclude characters in a SOLR facet wildcard query? e.g. [^,]* to match any character except an , ? 2) Can one setup the facet wildcard query to return the exact sub strings it matched of the queried facet, rather than the whole string? I hope somebody can help :) Thanks, Ben
Re: Excluding Characters and SubStrings in a Faceted Wildcard Query
Hi Erik, I'm not sure exactly how much context you need here, so I'll try to keep it short and expand as needed. The column I am faceting contains a comma deliniated set of vectors. Each vector is made up of {Make,Year,Model} e.g. _ford_1996_focus,mercedes_1996_clk,ford_2000_focus I have a custom request handler, where if I want to find all the cars from 1996 I pass in a facet query for the Year (1996) which is transformed to a wildcard facet query : _*_1996_* In otherwords, it'll match any records whose vector column contains a string, which somewhere has a car from 1996. Why not put the Make, Year and Model in separate columns and do a facet query of multiple columns?... because once we've selected 1996, we should (in the above example) then be offering ford and mercedes as further facet choices, and nothing more. If the parts were in their own columns, there would be no way to tie the Makes and Models to specific years, for example. At anyrate, the wildcard search returns the entire match (_ford_1996_focus,mercedes_1996_clk,ford_2000_focus). I then have to do another RegExp over it to extract only the two parts (the first ford and mercedes) that were from 1996. This isn't using SOLR's cache very effectively. It would be excellent if SOLR could break up that comma separated list into three different parts, and run the RegExp over each , returning only those which match. Is that what you're implying with Analysis? If that were the case, I'd not need to worry about character exclusion. Sorry if that's a bit fuzzy... it's hard trying to explain enough to be useful, but not too much that it turns into an essay!!! Thanks, Ben The solution I'm using is to form a vector Erik Hatcher wrote: Ben, Could you post an example of the type of data you're dealing with and how you want it handled? I suspect there is a way to accomplish what you want using an analyzed field, or by preprocessing the data you're indexing. Erik On Jun 29, 2009, at 9:29 AM, Ben wrote: Hello, I've been using SOLR for a while now, but am stuck for information on two issues : 1) Is it possible to exclude characters in a SOLR facet wildcard query? e.g. [^,]* to match any character except an , ? 2) Can one setup the facet wildcard query to return the exact sub strings it matched of the queried facet, rather than the whole string? I hope somebody can help :) Thanks, Ben
Sending Mlt POST request
Hello, I wish to send an Mlt request to Solr and filter the result by a list of values to specific field. The problem is sometimes the list can include thousands of values and it's impossible to send such GET request. Sending this request as POST didn't work well... Is POST supported by mlt? If not, is there suppose to be added in one of the next versions? Or is there a different solution maybe? I will appreciate any help and advice, Thanks, Ohad.
dismax query not working with 1.4
Hello, I'm using the March 18th 1.4 nightly, and I can't get a dismax query to return results. The standard and partitioned query types return data fine. I'm using jetty, and the problem occurs with the default solrconfig.xml as well as the one I am using, which is the Drupal module, beta 6. The problem occurs in the admin interface for solr, though, not just in the end application. And...that's it? I don't know what else to say or offer other than dismax doesn't work, and I'm not sure where else to go to troubleshoot. Any ideas? Ben
Re: dismax query not working with 1.4
I do not have a qf set; this is the query generated by the admin interface: dismax: select?indent=onversion=2.2q=teststart=0rows=10fl=*%2Cscoreqt=dismaxwt=standardexplainOther=hl.fl= standard: select?indent=onversion=2.2q=teststart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= dismax has no results, standard has 30. I don't see a requirement that qf be defined on http://wiki.apache.org/solr/DisMaxRequestHandler; am I missing something? The query responses are the same with both the application-specific and default solrconfig.xml's. The application definition for dismax is: requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str /lst /requestHandler And the one from my nightly is: requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 /str str name=pf text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 /str str name=bf ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3 /str str name=fl id,name,price,score /str str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps100/int str name=q.alt*:*/str !-- example highlighter config, enable per-query with hl=true -- str name=hl.fltext features name/str !-- for this field, we want no fragmenting, just highlighting -- str name=f.name.hl.fragsize0/str !-- instructs Solr to return the field itself if no query terms are found -- str name=f.name.hl.alternateFieldname/str str name=f.text.hl.fragmenterregex/str !-- defined below -- /lst /requestHandler So there's no particular mention of any fields from schema.xml in dismax, but the standard works without that. Thanks for the responses, Ben On Thu, Mar 26, 2009 at 2:11 PM, Matt Mitchell goodie...@gmail.com wrote: Do you have qf set? Just last week I had a problem where no results were coming back, and it turned out that my qf param was empty. Matt On Thu, Mar 26, 2009 at 2:30 PM, Ben Lavender blaven...@gmail.com wrote: Hello, I'm using the March 18th 1.4 nightly, and I can't get a dismax query to return results. The standard and partitioned query types return data fine. I'm using jetty, and the problem occurs with the default solrconfig.xml as well as the one I am using, which is the Drupal module, beta 6. The problem occurs in the admin interface for solr, though, not just in the end application. And...that's it? I don't know what else to say or offer other than dismax doesn't work, and I'm not sure where else to go to troubleshoot. Any ideas? Ben
Re: dismax query not working with 1.4
Did the XML in that message come through okay? Gmail seems to be eating it on my end. Anyway, while the default config has those fields, it also fails with the application config, which has: requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str /lst /requestHandler Since this essentially the same as standard, I assumed it would work without any qf. I manually added a qf to the query with the application solrconfig and got a result. Off to debug the application side! Thank you very much for the help! Ben On Thu, Mar 26, 2009 at 3:08 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Standard searches your default field (specified in schema.xml). DisMax searches fields you specify in DisMax config. Yours has: text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 But there are not your real fields. Change that to your real fields in qf, pf and other parts of DisMax config and things should start working. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ben Lavender blaven...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, March 26, 2009 4:02:58 PM Subject: Re: dismax query not working with 1.4 I do not have a qf set; this is the query generated by the admin interface: dismax: select?indent=onversion=2.2q=teststart=0rows=10fl=*%2Cscoreqt=dismaxwt=standardexplainOther=hl.fl= standard: select?indent=onversion=2.2q=teststart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= dismax has no results, standard has 30. I don't see a requirement that qf be defined on http://wiki.apache.org/solr/DisMaxRequestHandler; am I missing something? The query responses are the same with both the application-specific and default solrconfig.xml's. The application definition for dismax is: dismax explicit And the one from my nightly is: dismax explicit 0.01 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3 id,name,price,score 2-1 5-2 690% 100 *:* text features name 0 name regex So there's no particular mention of any fields from schema.xml in dismax, but the standard works without that. Thanks for the responses, Ben On Thu, Mar 26, 2009 at 2:11 PM, Matt Mitchell wrote: Do you have qf set? Just last week I had a problem where no results were coming back, and it turned out that my qf param was empty. Matt On Thu, Mar 26, 2009 at 2:30 PM, Ben Lavender wrote: Hello, I'm using the March 18th 1.4 nightly, and I can't get a dismax query to return results. The standard and partitioned query types return data fine. I'm using jetty, and the problem occurs with the default solrconfig.xml as well as the one I am using, which is the Drupal module, beta 6. The problem occurs in the admin interface for solr, though, not just in the end application. And...that's it? I don't know what else to say or offer other than dismax doesn't work, and I'm not sure where else to go to troubleshoot. Any ideas? Ben
field range (min and max term)
Hi Solr users, Is there a method of retrieving a field range i.e. the min and max values of that fields term enum. For example I would like to know the first and last date entry of N documents. Regards, -Ben
RE: *Very* slow Commit after upgrading to solr 1.3
So other than me doing trial error, do you have any guidance on how to configure the merge factor (and ramBufferSizeMB ? ). any formula that supplies the optimal value ? Thanks, Yatir -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, October 07, 2008 1:10 PM To: solr-user@lucene.apache.org Subject: Re: *Very* slow Commit after upgrading to solr 1.3 On Tue, Oct 7, 2008 at 6:32 AM, Ben Shlomo, Yatir [EMAIL PROTECTED] wrote: The problem is solved, see below. Since the performance is so sensitive to configuration - do you have a tip on how to determine the optimal configuration for mergeFactor, ramBufferSizeMB and other properties ? The issue might have been your high merge factor coupled with changes in how Lucene closes an index. To prevent possible corruption on a crash, Lucene now does an fsync on the index files before it writes the new segment descriptor that references those files. A high merge factor means more segments, hence more segment files to sync on a close. -Yonik My original problem occurred even on a fresh rebuild of the index with solr 1.3 To solve it I used the entire IndexWriter section settings from the solr 1.3 example file This had a dramatic impact: I indexed 20 GB of data (52M docs) The total indexing time was 13 hours The index size was 30 GB The total commit time was less than 2 minutes Tomcat Log for reference Oct 5, 2008 9:43:24 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening [EMAIL PROTECTED] main Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0, warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio= 0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for [EMAIL PROTECTED] main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0, warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio= 0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr atio=0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for [EMAIL PROTECTED] main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr atio=0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati o=0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for [EMAIL PROTECTED] main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati o=0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore registerSearcher INFO: [] Registered new searcher [EMAIL PROTECTED] main Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher close INFO: Closing [EMAIL PROTECTED] main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0, warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio= 0.00,cumulative_inserts=0,cumulative_evictions=0} queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr atio=0.00,cumulative_inserts=0,cumulative_evictions=0} documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati o=0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {commit=} 0 18406 Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/dss1 path=/update params={} status=0 QTime=18406 Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=true
RE: *Very* slow Commit after upgrading to solr 1.3
Thanks Yonik, The problem is solved, see below. Since the performance is so sensitive to configuration - do you have a tip on how to determine the optimal configuration for mergeFactor, ramBufferSizeMB and other properties ? My original problem occurred even on a fresh rebuild of the index with solr 1.3 To solve it I used the entire IndexWriter section settings from the solr 1.3 example file This had a dramatic impact: I indexed 20 GB of data (52M docs) The total indexing time was 13 hours The index size was 30 GB The total commit time was less than 2 minutes Tomcat Log for reference Oct 5, 2008 9:43:24 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening [EMAIL PROTECTED] main Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0, warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio= 0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for [EMAIL PROTECTED] main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0, warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio= 0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr atio=0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for [EMAIL PROTECTED] main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr atio=0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati o=0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for [EMAIL PROTECTED] main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati o=0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore registerSearcher INFO: [] Registered new searcher [EMAIL PROTECTED] main Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher close INFO: Closing [EMAIL PROTECTED] main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0, warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio= 0.00,cumulative_inserts=0,cumulative_evictions=0} queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr atio=0.00,cumulative_inserts=0,cumulative_evictions=0} documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= 0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati o=0.00,cumulative_inserts=0,cumulative_evictions=0} Oct 5, 2008 9:43:43 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {commit=} 0 18406 Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/dss1 path=/update params={} status=0 QTime=18406 Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true) Oct 5, 2008 9:45:07 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening [EMAIL PROTECTED] main Oct 5, 2008 9:45:07 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Saturday, October 04, 2008 6:07 PM To: solr-user@lucene.apache.org Subject: Re: *Very* slow Commit after upgrading to solr 1.3 Ben, see also http://www.nabble.com/Commit-in-solr-1.3-can-take-up-to-5-minutes-td1980 2781.html#a19802781 What type of physical drive is this and what interface is used (SATA, etc)? What is the filesystem (NTFS)? Did you add to an existing index from an older version of Solr, or start from scratch? If you add a single document to the index and commit
*Very* slow Commit after upgrading to solr 1.3
Hi! I am running on widows 64 bit ... I have upgraded to solr 1.3 in order to use the distributed search. I haven't changed the solrConfig and the schema xml files during the upgrade. I am indexing ~ 350K documents (each one is about 0.5 KB in size) The indexing takes a reasonable amount of time (350 seconds) See tomcat log: INFO: {add=[8x-wbTscWftuu1sVWpdnGw==, VOu1eSv0obBl1xkj2jGjIA==, YkOm-nKPrTVVVyeCZM4-4A==, rvaq_TyYsqt3aBc0KKDVbQ==, 9NdzWXsErbF_5btyT1JUjw==, ...(398728 more)]} 0 349875 But when I commit it takes more than an hour ! (5000 seconds!, the optimize after the commit took 14 seconds) INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) p.s. its not a machine problem I moved to another machine and the same thing happened I noticed something very strange during the time I wait for the commit: While the solr index is 210MB in size In the windows task manager I noticed that the java process is making a HUGE amounts of IO reads: It reads more than 350 GB ! (- which takes a lot of time.) The process is constantly taking 25% of the cpu resources. All my autowarmCount in Solrconfig file do not exceed 256... Any more ideas to check? Thanks. Here is part of my solrConfig file: - file:///C:\dss1\SolrHome\conf\solrconfig.xml## - indexDefaults - !-- Values here affect all index writers and act as a default unless overridden. -- useCompoundFilefalse/useCompoundFile mergeFactor1000/mergeFactor maxBufferedDocs1000/maxBufferedDocs maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout /indexDefaults - mainIndex - !-- options specific to the main on-disk lucene index -- useCompoundFilefalse/useCompoundFile mergeFactor1000/mergeFactor maxBufferedDocs1000/maxBufferedDocs maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength - !-- If true, unlock any held write or commit locks on startup. This defeats the locking mechanism that allows multiple processes to safely access a lucene index, and should be used with care. -- unlockOnStartuptrue/unlockOnStartup /mainIndex Yatir Ben-shlomo | eBay, Inc. | Classification Track, Shopping.com (Israel) | w: +972-9-892-1373 | email: [EMAIL PROTECTED] |
help required: how to design a large scale solr system
Hi! I am already using solr 1.2 and happy with it. In a new project with very tight dead line (10 development days from today) I need to setup a more ambitious system in terms of scale Here is the spec: * I need to index about 60,000,000 documents * Each document is has 11 textual fields to be indexed stored and 4 more fields to be stored only * Most fields are short (2-14 characters) however 2 indexed fields can be up to 1KB and another stored field is up to 1KB * On average every document is about 0.5 KB to be stored and 0.4KB to be indexed * The SLA for data freshness is a full nightly re-index ( I cannot obtain an incremental update/delete lists of the modified documents) * The SLA for query time is 5 seconds * the number of expected queries is 2-3 queries per second * the queries are simple a combination of Boolean operation and name searches (no fancy fuzzy searches and levinstien distances, no faceting, etc) * I have a 64 bit Dell 2950 4-cpu machine (2 dual cores ) with RAID 10, 200 GB HD space, and 8GB RAM memory * The documents are not given to me explicitly - I am given a raw-documents in RAM - one by one, from which I create my document in RAM. and then I can either http-post is to index it directly or append it to a tsv file for later indexing * Each document has a unique ID I have a few directions I am thinking about The simple approach * Have one solr instance that will index the entire document set (from files). I am afraid this will take too much time Direction 1 * Create TSV files from all the documents - this will take around 3-4 hours * Have all the documents partitioned into several subsets (how many should I choose? ) * Have multiple solr instances on the same machine * Let each solr instance concurrently index the appropriate subset * At the end merge all the indices using the IndexMergeTool - (how much time will it take ?) Direction 2 * Like the previous but instead of using the IndexMergeTool , use distributed search with shards (upgrading to solr 1.3) Direction 3,4 * Like previous directions only avoid using TSV files at all and directly index the documents from RAM Questions: * Which direction do you recommend in order to meet the SLAs in the fastest way? * Since I have RAID on the machine can I gain performance by using multiple solr instances on the same machine or only multiple machines will help me * What's the minimal number of machines I should require (I might get more weaker machines) * How many concurrent indexers are recommended? * Do you agree that the bottle neck is the indexing time? Any help is appreciated Thanks in advance yatir
RE: help required: how to design a large scale solr system
Thanks Mark!. Do you have any comment regarding the performance differences between indexing TSV files as opposed to directly indexing each document via http post? -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 24, 2008 2:12 PM To: solr-user@lucene.apache.org Subject: Re: help required: how to design a large scale solr system From my limited experience: I think you might have a bit of trouble getting 60 mil docs on a single machine. Cached queries will probably still be *very* fast, but non cached queries are going to be very slow in many cases. Is that 5 seconds for all queries? You will never meet that on first run queries with 60mil docs on that machine. The light query load might make things workable...but your near the limits of a single machine (4 core or not) with 60 mil. You want to use a very good stopword list...common term queries will be killer. The docs being so small will be your only possible savior if you go the one machine route - that and cached hits. You don't have enough ram to get as much of the filesystem into RAM as youd like for 60 mil docs either. I think you might try two machines with 30, 3 with 20, or 4 with 15. The more you spread, even with slower machines, the faster your likely to index, which as you say, will take a long time for 60 mil docs (start today g). Multiple machines will help the indexing speed the most for sure - its still going to take a long time. I don't think you will get much advantage using more than one solr install on a single machine - if you do, that should be addressed in the code, even with RAID. So I say, spread if you can. Faster indexing, faster search, easy to expand later. Distributed search is so easy with solr 1.3, you wont regret it. I think there is a bug to be addressed if your needing this in a week though - in my experience, with distributed search, for every million docs on a machine beyond the first, you lose a doc in a search across all machines (ie 1 mil on machine 1, 1 million on machine 2, a *:* search will be missing 1 doc. 10 mil each on 3 machines, a *:* search will be missing 30. Not a big deal, but could be a concern for some with picky, look at everything customers. - Mark Ben Shlomo, Yatir wrote: Hi! I am already using solr 1.2 and happy with it. In a new project with very tight dead line (10 development days from today) I need to setup a more ambitious system in terms of scale Here is the spec: * I need to index about 60,000,000 documents * Each document is has 11 textual fields to be indexed stored and 4 more fields to be stored only * Most fields are short (2-14 characters) however 2 indexed fields can be up to 1KB and another stored field is up to 1KB * On average every document is about 0.5 KB to be stored and 0.4KB to be indexed * The SLA for data freshness is a full nightly re-index ( I cannot obtain an incremental update/delete lists of the modified documents) * The SLA for query time is 5 seconds * the number of expected queries is 2-3 queries per second * the queries are simple a combination of Boolean operation and name searches (no fancy fuzzy searches and levinstien distances, no faceting, etc) * I have a 64 bit Dell 2950 4-cpu machine (2 dual cores ) with RAID 10, 200 GB HD space, and 8GB RAM memory * The documents are not given to me explicitly - I am given a raw-documents in RAM - one by one, from which I create my document in RAM. and then I can either http-post is to index it directly or append it to a tsv file for later indexing * Each document has a unique ID I have a few directions I am thinking about The simple approach * Have one solr instance that will index the entire document set (from files). I am afraid this will take too much time Direction 1 * Create TSV files from all the documents - this will take around 3-4 hours * Have all the documents partitioned into several subsets (how many should I choose? ) * Have multiple solr instances on the same machine * Let each solr instance concurrently index the appropriate subset * At the end merge all the indices using the IndexMergeTool - (how much time will it take ?) Direction 2 * Like the previous but instead of using the IndexMergeTool , use distributed search with shards (upgrading to solr 1.3) Direction 3,4 * Like previous directions only avoid using TSV files at all and directly index the documents from RAM Questions: * Which direction do you recommend in order to meet
Changing Solr Query Syntax
Hi solr users, I need to change the query format for solr a little bit. How can I accomplish this. I don't wan to modify the underlying lucene query specification but just the way I query the index through the the GET http method in solr. Thanks a lot for your help. Ben
Re: Changing Solr Query Syntax
Shalin, thanks a lot for answering that fast. Use Case: I'm migrating from a proprietary index server (XYZ) to Solr. All my applications and my customer's applications relay on the query specification of XYZ. It would be hard to modify all those apps to use the Solr Query Syntax (although, it would be ideal, Sorl query is a lot superior than that of XYZ, but impractical). On Tue, Mar 18, 2008 at 9:50 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Hi Ben, It would be nice if you can tell us your use-case so that we can be more helpful. Why does the normal query syntax not work well for you? What are you trying to accomplish? Maybe there is an easier way. On Tue, Mar 18, 2008 at 8:17 PM, Ben Sanchez [EMAIL PROTECTED] wrote: Hi solr users, I need to change the query format for solr a little bit. How can I accomplish this. I don't wan to modify the underlying lucene query specification but just the way I query the index through the the GET http method in solr. Thanks a lot for your help. Ben -- Regards, Shalin Shekhar Mangar.
Re: Changing Solr Query Syntax
Hi Shalin, thanks a lot for answering that fast. Use Case: I'm migrating from a proprietary index server (XYZ) to Solr. All my applications and my customer's applications relay on the query specification of XYZ. It would be hard to modify all those apps to use the Solr Query Syntax (although, it would be ideal, Sorl query is a lot superior than that of XYZ). Basically I need to replace : with = ; + with / and = with : in the query syntax. Thank you. On Tue, Mar 18, 2008 at 9:50 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Hi Ben, It would be nice if you can tell us your use-case so that we can be more helpful. Why does the normal query syntax not work well for you? What are you trying to accomplish? Maybe there is an easier way. On Tue, Mar 18, 2008 at 8:17 PM, Ben Sanchez [EMAIL PROTECTED] wrote: Hi solr users, I need to change the query format for solr a little bit. How can I accomplish this. I don't wan to modify the underlying lucene query specification but just the way I query the index through the the GET http method in solr. Thanks a lot for your help. Ben -- Regards, Shalin Shekhar Mangar.
Re: Changing Solr Query Syntax
Shalin, Thanks a lot. I'll do that. On Tue, Mar 18, 2008 at 11:13 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Hi Ben, If I had to do this, I would start by adding a custom javax.servlet.Filter into Solr. It should work fine since all you're doing is replacing characters in the q parameter for requests coming into /select handler. It's a bit hackish but that's exactly what you're trying to do :) Don't know if there's an alternate/easier way. On Tue, Mar 18, 2008 at 9:30 PM, Ben Sanchez [EMAIL PROTECTED] wrote: Hi Shalin, thanks a lot for answering that fast. Use Case: I'm migrating from a proprietary index server (XYZ) to Solr. All my applications and my customer's applications relay on the query specification of XYZ. It would be hard to modify all those apps to use the Solr Query Syntax (although, it would be ideal, Sorl query is a lot superior than that of XYZ). Basically I need to replace : with = ; + with / and = with : in the query syntax. Thank you. On Tue, Mar 18, 2008 at 9:50 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Hi Ben, It would be nice if you can tell us your use-case so that we can be more helpful. Why does the normal query syntax not work well for you? What are you trying to accomplish? Maybe there is an easier way. On Tue, Mar 18, 2008 at 8:17 PM, Ben Sanchez [EMAIL PROTECTED] wrote: Hi solr users, I need to change the query format for solr a little bit. How can I accomplish this. I don't wan to modify the underlying lucene query specification but just the way I query the index through the the GET http method in solr. Thanks a lot for your help. Ben -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
solr web admin
why does the web admin append core=null to all the requests? e.g. admin/get-file.jsp?core=nullfile=schema.xml
retrieve lucene doc id
how do I retrieve the lucene doc id in a query? -Ben
RE: lowercase text/strings to be used in list box
sorry - this should have been posted on the Lucene user list. ...the solution is to use the lucene PerFieldAnalyzerWrapper and add the field with the KeywordAnalyzer then pass the PerFieldAnalyzerWrapper to the QueryParser. -Ben -Original Message- From: Ben Incani [mailto:[EMAIL PROTECTED] Sent: Friday, 19 October 2007 5:52 PM To: solr-user@lucene.apache.org Subject: lowercase text/strings to be used in list box I have a field which will only contain several values (that include spaces). I want to display a list box with all possible values by browsing the lucene terms. I have setup a field in the schema.xml file. fieldtype name=text_lc class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype I also tried; fieldtype name=string_lc class=solr.StrField analyzer tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype This allows me to browse all the values no problem, but when it comes to search the documents I have to use the lucene org.apache.lucene.analysis.KeywordAnalyzer, when I would rather use the org.apache.lucene.analysis.standard.StandardAnalyzer and the power of the default query parser to perform a phrase query such as my_field:(the value) or my_field:the value, which don't work? So is there a way to prevent tokenisation of a field using the StandardAnalyzer, without implementing your own TokenizerFactory? Regards Ben
lowercase text/strings to be used in list box
I have a field which will only contain several values (that include spaces). I want to display a list box with all possible values by browsing the lucene terms. I have setup a field in the schema.xml file. fieldtype name=text_lc class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype I also tried; fieldtype name=string_lc class=solr.StrField analyzer tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype This allows me to browse all the values no problem, but when it comes to search the documents I have to use the lucene org.apache.lucene.analysis.KeywordAnalyzer, when I would rather use the org.apache.lucene.analysis.standard.StandardAnalyzer and the power of the default query parser to perform a phrase query such as my_field:(the value) or my_field:the value, which don't work? So is there a way to prevent tokenisation of a field using the StandardAnalyzer, without implementing your own TokenizerFactory? Regards Ben
RE: solr not finding all results
Did you try to add a backslash to escape the - in Geckoplp4-M (Geckoplp4\-M) -Original Message- From: Kevin Lewandowski [mailto:[EMAIL PROTECTED] Sent: Friday, October 12, 2007 9:40 PM To: solr-user@lucene.apache.org Subject: solr not finding all results I've found an odd situation where solr is not returning all of the documents that I think it should. A search for Geckoplp4-M returns 3 documents but I know that there are at least 100 documents with that string. Here is an example query for that phrase and the result set: http://localhost:9020/solr/select/?q=Geckoplp4-Mversion=2.2start=0row s=10indent=onfl=comments,id ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=rows10/str str name=start0/str str name=indenton/str str name=flcomments,id/str str name=qGeckoplp4-M/str str name=version2.2/str /lst /lst result name=response numFound=3 start=0 doc str name=commentsGeckoplp4-M/str str name=idm2816500/str /doc doc str name=commentstoptrax recordings. Same tracks. Geckoplp4-M/str str name=idm2816544/str /doc doc str name=commentsGeckoplp4-M/str str name=idm2815903/str /doc /result /response Now here's an example of a search for two documents that I know have that string, but were not returned in the previous search: http://localhost:9020/solr/select/?q=id%3Am2816615+OR+id%3Am2816611vers ion=2.2start=0rows=10indent=onfl=id,comments ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=rows10/str str name=start0/str str name=indenton/str str name=flid,comments/str str name=qid:m2816615 OR id:m2816611/str str name=version2.2/str /lst /lst result name=response numFound=2 start=0 doc str name=commentsGeckoplp4-M/str str name=idm2816611/str /doc doc str name=commentsGeckoplp4-M/str str name=idm2816615/str /doc /result /response Here is the definition for the comments field: field name=comments type=text indexed=true stored=true/ And here is the definition for a text field: fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !--filter class=solr.StopFilterFactory ignoreCase=true/-- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/-- filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ !--filter class=solr.StopFilterFactory ignoreCase=true/-- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/-- filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory / /analyzer /fieldtype Any ideas? Am I doing something wrong? thanks, Kevin
I can't delete, why?
Hi! I know I can delete multiple docs with the following: deletequerymediaId:(6720 OR 6721 OR )/query/delete My question is can I do something like this? deletequerylanguageId:123 AND manufacturer:456 /query/delete (It does not work for me and I didn't forget to commit) How can I do it ? with copy field ? deletequerylanguageIdmanufacturer:123456/query/delete Thanks yatir
problem with quering solr after indexing UTF-8 encoded CSV files
Hi! I have utf-8 encoded data inside a csv file (actually it’s a tab separated file - attached) I can index it with no apparent errors I did not forget to set this in my tomcat configuration Server ... Service ... Connector ... URIEncoding=UTF-8/ When I query a document using the UTF-8 text I get zero matches: ?xml version=1.0 encoding=UTF-8 ? - http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8version=2.2start=0rows=10indent=on## response - http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8version=2.2start=0rows=10indent=on## lst name=responseHeader int name=status0/int int name=QTime0/int - http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8version=2.2start=0rows=10indent=on## lst name=params str name=indenton/str str name=start0/str ststr name=qיתיר/str // Note that - I can see the correct UTF-8 text in it (hebrew characters) str name=rows10/str str name=version2.2/str /lst /lst result name=response numFound=0 start=0 / /response When I observe this text in the response by querinig for *:* I notice that the text does not appear as desired: יתיר instead of יתיר Do you have any ideas? Thanks… Here is the response : ?xml version=1.0 encoding=UTF-8 ? - http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*version=2.2start=0rows=10indent=on## response - http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*version=2.2start=0rows=10indent=on## lst name=responseHeader int name=status0/int int name=QTime0/int - http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*version=2.2start=0rows=10indent=on## lst name=params str name=indenton/str str name=start0/str str name=q*:*/str str name=rows10/str str name=version2.2/str /lst /lst - http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*version=2.2start=0rows=10indent=on## result name=response numFound=1 start=0 - http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*version=2.2start=0rows=10indent=on## doc str name=country1/str str name=descdesc is a very good camera/str str name=dispnamedisplay is יתיר ABC res123 /str str name=form1/str str name=lang1/str str name=manuABC/str str name=model res123 /str str name=pnC123/str str name=productid123456/str str name=upc72900010123/str /doc /result /response yatir
question: how to divide the indexing into sperate domains
Hi! say I have 300 csv files that I need to index. Each one holds millions of lines (each line is a few fields separated by commas) Each csv file represents a different domain of data (e,g, file1 is computers, file2 is flowers, etc) There is no indication of the domain ID in the data inside the csv file When I search I would like to specify the id of a specific domain And I want solr to search only in this domain - to save time and reduce the number of matches I need to specify during indexing - the domain id of the csv file being indexed How do I do it ? Thanks p.s. I wish I could index like this: curl http://localhost:8080/solr/update/csv?stream.file=test.csvfieldnames=fi eld1,field2f.domain.value=98765 http://localhost:8080/solr/update/csv?stream.file=test.csvfieldnames=f ield1,field2f.domain.value=98765 (where 98765 is the domain id for ths specific csv file)
separate log files
Hi Solr users, I'm running multiple instances of Solr, which all using the same war file to load from. Below is an example of the servlet context file used for each application. Context path=/app1-solr docBase=/var/usr/solr/solr-1.0.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/var/local/app1 override=true / /Context Hence each application is using the same WEB-INF/classes/logging.properties file to configure logging. I would like to each instance to log to separate log files such as; app1-solr.-mm-dd.log app2-solr.-mm-dd.log ... Is there an easy way to append the context path to org.apache.juli.FileHandler.prefix E.g. org.apache.juli.FileHandler.prefix = ${catalina.context}-solr. Or would this require a code change? Regards -Ben