Re: Monitor the QTime.
On Sun, Feb 13, 2011 at 7:54 AM, Lance Norskog goks...@gmail.com wrote: If you're a unix shell scripting wiz, here are a few strategies. Tail the logfile and filter for the string 'QTime'. The number is the very last string in the line. So, strip the text between the timestamp and the number- sort by the timestamp first and the number second. Now grab the first qtime for each timestamp. I don't know a command for this. This gives you the longest query time for each second. [...] Thanks for the idea. As this was useful enough for us, have gone ahead and implemented it. * Attached is a bash script that will print timestamp query-string qtime for each query in a list of Solr log files specified as command-line arguments. * Use as ./qtime_solr.sh logfile1 logfile2 Without any arguments, the script uses /var/log/tomcat6/catalina.out as the log file. * Have only minimally tested it, but would be glad to fix any bugs that people encounter. * Have *not* implemented the latter part of the suggestion, i.e., sorting by timestamp, and taking only the first query. IMHO, the needs here might be different, and it is best to have a second script doing this, that uses the output from this one. * Not sure if attachments to this list are stripped, so the script has also been uploaded to http://pastebin.com/YLqBHp19 Regards, Gora qtime_solr.sh Description: Bourne shell script
Re: Faceting Query
On Friday 11 February 2011 11:34 PM, Gora Mohanty wrote: On Thu, Feb 10, 2011 at 12:21 PM, Isha Gargisha.g...@orkash.com wrote: What is facet.pivot field? PLz explain with example Does http://wiki.apache.org/solr/SimpleFacetParameters#facet.pivot not help? Regards, Gora No, it is not showing any pivot results in my case http://localhost:8984/solr/worldNews/select/?q=*%3A*version=2.2start=0rows=0indent=onfacet.pivot=category,country,KeyLocationfacet.pivot=country,categoryfacet=truefacet.field=categorywt=json Output is: { responseHeader:{ status:0, QTime:1, params:{ facet:true, indent:on, start:0, q:*:*, facet.field:category, wt:json, facet.pivot:[category,country,KeyLocation, country,category], version:2.2, rows:0}}, response:{numFound:6775,start:0,docs:[] }, facet_counts:{ facet_queries:{}, facet_fields:{ category:[ Counterfeiting and Piracy ,2367, Social Unrest ,2143, Security Measures ,1064, Fraud and Cheating,356, Naxelites ,266, Terrorism ,243, Sex Crime ,232, Shiv Sena ,76, Major Crime ,23, Drug Running and Organized Crime ,5]}, facet_dates:{}}}
Deploying Solr CORES on OVH Cloud
Hi, I'm a bit new in Solr. I'm trying to set up a bunch of server (just for solr) on OVH cloud (http://www.ovh.co.uk/cloud/) and create new cores as needed on each server. First question: What do you recommend: Ubuntu or Debian? I mean in term od performance? Second question: Jetty or Tomcat? Again in term of performance and security? Third question: I've followed all the wiki but i can't get it working the CORES... Impossible to create CORE or access my cores? Does anyone have a working config to share? Thanks a lot for your help Regards,
Re: Faceting Query
Likely because it is a Solr 4.0 feature and you are using Solr 1.4.1. That'd be my guess. (Solr 4.0 is the latest greatest as yet unreleased version of Solr - numbering scheme changed to fit with that of Lucene). Upayavira On Mon, 14 Feb 2011 16:05 +0530, Isha Garg isha.g...@orkash.com wrote: On Friday 11 February 2011 11:34 PM, Gora Mohanty wrote: On Thu, Feb 10, 2011 at 12:21 PM, Isha Gargisha.g...@orkash.com wrote: What is facet.pivot field? PLz explain with example Does http://wiki.apache.org/solr/SimpleFacetParameters#facet.pivot not help? Regards, Gora No, it is not showing any pivot results in my case http://localhost:8984/solr/worldNews/select/?q=*%3A*version=2.2start=0rows=0indent=onfacet.pivot=category,country,KeyLocationfacet.pivot=country,categoryfacet=truefacet.field=categorywt=json Output is: { responseHeader:{ status:0, QTime:1, params:{ facet:true, indent:on, start:0, q:*:*, facet.field:category, wt:json, facet.pivot:[category,country,KeyLocation, country,category], version:2.2, rows:0}}, response:{numFound:6775,start:0,docs:[] }, facet_counts:{ facet_queries:{}, facet_fields:{ category:[ Counterfeiting and Piracy ,2367, Social Unrest ,2143, Security Measures ,1064, Fraud and Cheating,356, Naxelites ,266, Terrorism ,243, Sex Crime ,232, Shiv Sena ,76, Major Crime ,23, Drug Running and Organized Crime ,5]}, facet_dates:{}}} --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
SolrCloud - Example C not working
Hi all, I followed http://wiki.apache.org/solr/SolrCloud and everything worked fine till I tried Example C:. I start all 4 server but all of them keep looping through: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) Feb 14, 2011 1:31:16 PM org.apache.log4j.Category info INFO: Opening socket connection to server localhost/127.0.0.1:9983 Feb 14, 2011 1:31:16 PM org.apache.log4j.Category warn WARNING: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) Feb 14, 2011 1:31:16 PM org.apache.log4j.Category info INFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9900 Feb 14, 2011 1:31:16 PM org.apache.log4j.Category warn WARNING: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) Feb 14, 2011 1:31:17 PM org.apache.log4j.Category info INFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9983 Feb 14, 2011 1:31:17 PM org.apache.log4j.Category warn WARNING: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) Feb 14, 2011 1:31:19 PM org.apache.log4j.Category info INFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:8574 Feb 14, 2011 1:31:19 PM org.apache.log4j.Category warn WARNING: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) Feb 14, 2011 1:31:20 PM org.apache.log4j.Category info INFO: Opening socket connection to server localhost/127.0.0.1:8574 Feb 14, 2011 1:31:20 PM org.apache.log4j.Category warn WARNING: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) The problem seems that the zk instances can not connects to the different nodes and so do not get up at all. I am using revision 1070473 for the tests. Anybody has an idea? salu2 -- Thorsten Scherler thorsten.at.apache.org codeBusters S.L. - web based systems consulting, training and solutions http://www.codebusters.es/ smime.p7s Description: S/MIME cryptographic signature
Lily 0.3 is released
Hi all, Lily is a data/content repository that integrates HBase with SOLR: flexible content storage and automatic index maintenance - at scale. It's available under the Apache license. This release is the result of 3 months of hard work since Lily 0.2 last October. Our focus was stabilization, performance and robustness, providing a platform we can continue building upon. More than 50 tickets were resolved during this development sprint, and we're slowly readying ourselves for the 1.0 release. Lily 0.3 brings many gradual improvements over Lily 0.2. It has a more solid implementation of the blob fields, automatic retry of operations that fail due to I/O exceptions (between Lily client and Lily server), and other miscellaneous improvements, all listed underneath. Everything Lily can be found at www.lilyproject.org. We're now also sharing details of our commercial software subscription service with select prospects, let us know if you're interested! Here's a concise list of improvements since Lily 0.2: - Repository - Performance / space improvements - Shorter column key encoding (field id's) - Reduction of number of column families used - Avoid duplicate values in the table: make use of sparseness of the table - Drop the use of HBase rowlocks, which do not survive region splits/moves. - Use byte[] as keys in RecordType FieldType cache - API - Added a new method createOrUpdate which creates or updates a record depending on whether it already exists. This new method has the advantage over the create method that it can be retried in case of IO exceptions, i.e. it is idempotent, similar to PUT in HTTP/REST. - Allow updating versioned-mutable fields without specifying the record type. - Throw a RecordLockedException instead of generic exception when a record is locked, this allows Lily clients to retry the operation in that case. - Clear historical data when deleting a record and remove any referenced blobs. - The link index stores record IDs and field IDs as bytes instead of strings. - The record ID string representation was changed to use comma instead of semicolon to separate variant properties, since the use of semicolons was problematic in the JAX-RS based REST interface implementation. - Upgrade to Apache HBase 0.90 - Blobs - Rework blobstore functionality - Blobs can only be accessed through the record they are used in, not directly by using their blob key. This is to allow for future record-level access control. - Introduce a Repository.getBlob() method, which returns a BlobAccess object, which provides access to the blob meta data (Blob object) and the blob input stream. This avoids the need to read the record in case you need the blob metadata. - Uploaded blobs which are never used in a record are cleaned up. - The HDFS-stored blobs are stored in a hierarchical structure. - RowLog improvements - Performance improvements - the RowLog processor uses a Zookeeper based notification system instead of Netty based. - Optimize queue scanning: avoid scanning over deleted rows in the table, fix too-frequent scanning, fix endless scanning loop on startup in case of no repository activity. - The RowLog processor only processes messages of a minimal age (avoid conflicts with direct processing of wal messages). - Extended RowLogConfigurationManager to add/update rowlog configuration information. - Avoid and remove stale messages in the queue. - Allow the rowlog to either use row-level locks (wal use case) or executionstate-level locks per subscription (mq use case) when processing messages. - Added a WAL processor which handles open WAL messages. - REST interface - Adapted blob-support to new blobstore functionality. Content-Length header is now set when downloading blobs. Multi-value or hierarchical blobs are now accessible. - Support updating versioned-mutable fields. - Fixed various smaller bugs reported by users. - HBase index library - Allow to add/remove multiple entries in one call. - Performance - Fixed important performance issue whereby row scanning always ran to the end of the index table. - Enable scan caching. - Added a performance testing tool. - Indexer - Upgrade to Tika 0.8 - Performance - Avoid FieldNotFoundException when evaluating field values - the SOLR request-writer and response-parser implementation configurable. This allows to use the XML format instead of the javabin format. - LilyClient - Automatically retry operations on IOExceptions, this allows
Re: SolrCloud Feedback
Some more comments: f) For consistency, the JAVA OPTIONS should all be prefixed with solr.* even if they are related to embedded ZK -Dsolr.hostPort=8900 -Dsolr.zkRun -Dsolr.zkHost=localhost:9900 -Dsolr.zkBootstrap_confdir=./solr/conf g) I often share parts of my config between cores, e.g. a common schema.xml or synonyms.xml In the file-based mode I can thus use ../../common_conf/synonyms.xml or similar. I have not tried to bootstrap such a config into ZK but I assume it will not work ZK mode should support such a use case either by supporting notations like .. or by allowing an explicit zk name space: zk://configs/common-cfg/synonyms.xml h) Support for dev / test / prod environments In real life you want to develop in one environment, test in another and run production in a third Thus, the ZK data structure should have a clear separation between logical feature configuration and physical deployment config. Perhaps a new level above /COLLECTIONS could be used to model this, e.g. /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardA/prod01.server.com:8080 /ENV/PROD/COLLECTIONS/WEB/SHARDS/shardB/prod02.server.com:8080 /ENV/PROD/COLLECTIONS/FILES/SHARDS/shardA/prod03.server.com:8080 /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardA/test01.server.com:8080 /ENV/TEST/COLLECTIONS/WEB/SHARDS/shardB/test01.server.com:9090 /ENV/TEST/COLLECTIONS/FILES[@configName=TESTFILES]/SHARDS/shardA/test01.server.com:7070 When starting solr we may specify environment: -Dsolr.env=TEST (or configure a default) The main benefit is that we can maintain and store one single ZK config in our SCM, distribute the same configs to all servers, and if you like, point all envs to the same ZK ensemble. In the future, we can use this for automatic install of a new node as well: By simply adding a ZK entry on the right place, the node can discover who it is from ZK. i) Ideally, no config inside conf should contain host names. My DIH config will most likely include server names, which will be different between TEST and PROD This could be solved as above, by letting the collection in TEST use another configName than PROD, but for some use cases, it might be more elegant to swap out a hardcoded string with a ZK node in a generic way, such as jdbcString=my-hardcoded-string to jdbcString=${zk://ENV/PROD/jdbcstrA} j) Question: Is ReplicationHandler ZK-aware yet? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 10. feb. 2011, at 16.10, Jan Høydahl wrote: Hi, I have so far just tested the examples and got a N by M cluster running. My feedback: a) First of all, a major update of the SolrCloud Wiki is needed, to clearly state what is in which version, what are current improvement plans and get rid of outdated stuff. That said I think there are many good ideas there. b) The collection terminology is too much confused with core, and should probably be made more distinct. I just tried to configure two cores on the same Solr instance into the same collection, and that worked fine, both as distinct shards and as same shard (replica). The wiki examples give the impression that collection1 in localhost:8983/solr/collection1/select?distrib=true is some magic collection identifier, but what it really does is doing the query on the *core* named collection1, looking up what collection that core is part of and distributing the query to all shards in that collection. c) ZK is not designed to store large files. While the files in conf are normally well below the 1M limit ZK imposes, we should perhaps consider using a lightweight distributed object or k/v store for holding the /CONFIGS and let ZK store a reference only d) How are admins supposed to update configs in ZK? Install their favourite ZK editor? e) We should perhaps not be so afraid to make ZK a requirement for Solr in v4. Ideally you should interact with a 1-node Solr in the same manner as you do with a 100-node Solr. An example is the Admin GUI where the schema and solrconfig links assume local file. This requires decent tool support to make ZK interaction intuitive, such as import and export commands. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 19. jan. 2011, at 21.07, Mark Miller wrote: Hello Users, About a little over a year ago, a few of us started working on what we called SolrCloud. This initial bit of work was really a combination of laying some base work - figuring out how to integrate ZooKeeper with Solr in a limited way, dealing with some infrastructure - and picking off some low hanging search side fruit. The next step is the indexing side. And we plan on starting to tackle that sometime soon. But first - could you help with some feedback?ISome people are using our SolrCloud start - I have seen evidence of it ;) Some, even in production. I would love to have
Solr seems to be using partial words to do sort by
I am trying what I think is a very simple sort with Solr but my results are confusing. It appears that Solr is using any word in the field I want to sort on to do the sort. I am returning only the sorted field (just for this example) and asking that it be sorted desc. I am using Solr 1.4.1 Here is my query: http://localhost:8280/solr/catalogProductSearch/select/?facet=truefl=titlesort=title%20descstart=1q=category_level1:%28%22Fiction*Historical%22%29wt=xmlrows=15version=2.2 I have attached my schema.xml and a text file with my results. I am assuming it has to do with my schema config but i am stumped as to what it might be. Any thoughts? Michael Hayes schema.xml Description: schema.xml response lst name=responseHeader int name=status0/int int name=QTime1/int /lst result name=response numFound=42 start=1 doc str name=titleWild Irish, The/str /doc doc str name=titleWidow's War, The/str /doc doc str name=titleFatal Waltz, A/str /doc doc str name=titleAnnette Vallon/str /doc doc str name=titleVagabond/str /doc doc str name=titleDaughter Of Troy/str /doc doc str name=titleTo the Tower Born/str /doc doc str name=titleGallows Thief/str /doc doc str name=titleFool's Tale, The/str /doc doc str name=titleArcher's Tale, The/str /doc doc str name=titleSword Song/str /doc doc str name=titleSundial in a Grave: 1610, A/str /doc doc str name=titleBeneath a Silent Moon/str /doc doc str name=titleSharpe's Fury/str /doc doc str name=titleSharpe's Escape/str /doc /result /response
Re: Solr seems to be using partial words to do sort by
Hi, Your schema says that title is type=string, so this should work. However, has title always been string, or have you changed it from text without doing a complete re-index at some point? A few hints: * When using type=string, you will not get matches for each single word in your catch-all text field Add another field of type text for serching, and a separate field for sorting * The separate sorting field (e.g. title_sort) could use type=alphaOnlySort or some other fieldtype with lowercasing, to avoid case sensitive sort. * If your index is large, consider putting a cap on number of bytes used for title_sort, by adding maxChars=20 to the copyField tag -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 14. feb. 2011, at 16.41, Michael Hayes wrote: I am trying what I think is a very simple sort with Solr but my results are confusing. It appears that Solr is using any word in the field I want to sort on to do the sort. I am returning only the sorted field (just for this example) and asking that it be sorted desc. I am using Solr 1.4.1 Here is my query: http://localhost:8280/solr/catalogProductSearch/select/?facet=truefl=titlesort=title%20descstart=1q=category_level1:%28%22Fiction*Historical%22%29wt=xmlrows=15version=2.2 I have attached my schema.xml and a text file with my results. I am assuming it has to do with my schema config but i am stumped as to what it might be. Any thoughts? Michael Hayes schema.xmlSolr_Strange_Sort.txt
Re: Guidance for event-driven indexing
Hi, One option would be to keep the JMS listener as today but move the downloading and transforming part to a SolrUpdateRequestProcessor on each shard. The benefit is that you ship only a tiny little SolrInputDocument over the wire with a reference to the doc to download, and do the heavy lifting on Solr side. If each JMS topic/channel corresponds to a particular shard, you could move the whole thing to Solr. If so, a new JMSUpdateHandler could perhaps be a way to go? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 14. feb. 2011, at 16.53, Rich Cariens wrote: Hello, I've built a system that receives JMS events containing links to docs that I must download and index. Right now the JMS receiving, downloading, and transformation into SolrInputDoc's happens in a separate JVM that then uses Solrj javabin HTTP POSTs to distribute these docs across many index shards. For various reasons I won't go into here, I'd like to relocate/deploy most of my processing (JMS receiving, downloading, and transformation into SolrInputDoc's) into the SOLR webapps running on each distributed shard host. I might be wrong, but I don't think the request-driven idiom of the DataImportHandler is not a good fit for me as I'm not kicking off full or delta imports. If that's true, what's the correct way to hook my components into SOLR's update facilities? Should I try to get a reference a configured DirectUpdateHandler? I don't know if this information helps, but I'll put it out there anyways: I'm using Spring 3 components to receive JMS events, wired up via webapp context hooks. My plan would be to add all that to my SOLR shard webapp. Best, Rich
Re: Solr seems to be using partial words to do sort by
Thanks for the help. You were right that it did change from text to string and I think I forgot to restart the server to get the new schema loaded. But I did create a new title_sort field of type alphaOnlySort and that worked. Michael On Feb 14, 2011, at 10:10 AM, Jan Høydahl wrote: Hi, Your schema says that title is type=string, so this should work. However, has title always been string, or have you changed it from text without doing a complete re-index at some point? A few hints: * When using type=string, you will not get matches for each single word in your catch-all text field Add another field of type text for serching, and a separate field for sorting * The separate sorting field (e.g. title_sort) could use type=alphaOnlySort or some other fieldtype with lowercasing, to avoid case sensitive sort. * If your index is large, consider putting a cap on number of bytes used for title_sort, by adding maxChars=20 to the copyField tag -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 14. feb. 2011, at 16.41, Michael Hayes wrote: I am trying what I think is a very simple sort with Solr but my results are confusing. It appears that Solr is using any word in the field I want to sort on to do the sort. I am returning only the sorted field (just for this example) and asking that it be sorted desc. I am using Solr 1.4.1 Here is my query: http://localhost:8280/solr/catalogProductSearch/select/?facet=truefl=titlesort=title%20descstart=1q=category_level1:%28%22Fiction*Historical%22%29wt=xmlrows=15version=2.2 I have attached my schema.xml and a text file with my results. I am assuming it has to do with my schema config but i am stumped as to what it might be. Any thoughts? Michael Hayes schema.xmlSolr_Strange_Sort.txt
Solr 1.4 requestHandler update Runtime disable/enable
Is there a possibility at the runtime to disable or enable of update handler? I have two servers and would like to turn off the update handler on master. Then replicate master to slave and switch slave to master. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-1-4-requestHandler-update-Runtime-disable-enable-tp2493745p2493745.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Guidance for event-driven indexing
Thanks Jan, I don't think I want to tie up a thread on two boxes waiting for an UpdateRequestProcessor to finish. I'd prefer to offload it all to the target shards. And a special JMSUpdateHandler feels like overkill. I *think* I'm really just looking for a simple API that allows me to add a SolrInputDocument to the index in-process. Perhaps I just need to use the EmbeddedSolrServer in the Solrj packages? I'm worried that this will break all the nice stuff one gets with the standard SOLR webapp (stats, admin, etc). Best, Rich On Mon, Feb 14, 2011 at 11:18 AM, Jan Høydahl jan@cominvent.com wrote: Hi, One option would be to keep the JMS listener as today but move the downloading and transforming part to a SolrUpdateRequestProcessor on each shard. The benefit is that you ship only a tiny little SolrInputDocument over the wire with a reference to the doc to download, and do the heavy lifting on Solr side. If each JMS topic/channel corresponds to a particular shard, you could move the whole thing to Solr. If so, a new JMSUpdateHandler could perhaps be a way to go? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 14. feb. 2011, at 16.53, Rich Cariens wrote: Hello, I've built a system that receives JMS events containing links to docs that I must download and index. Right now the JMS receiving, downloading, and transformation into SolrInputDoc's happens in a separate JVM that then uses Solrj javabin HTTP POSTs to distribute these docs across many index shards. For various reasons I won't go into here, I'd like to relocate/deploy most of my processing (JMS receiving, downloading, and transformation into SolrInputDoc's) into the SOLR webapps running on each distributed shard host. I might be wrong, but I don't think the request-driven idiom of the DataImportHandler is not a good fit for me as I'm not kicking off full or delta imports. If that's true, what's the correct way to hook my components into SOLR's update facilities? Should I try to get a reference a configured DirectUpdateHandler? I don't know if this information helps, but I'll put it out there anyways: I'm using Spring 3 components to receive JMS events, wired up via webapp context hooks. My plan would be to add all that to my SOLR shard webapp. Best, Rich
Re: Solr Spellcheck on Large index size
I know this is old, but I caught it while looking for some help on Spellcheck. I see that you are copying a set of fields to the field named 'text' and then coping 'text' to 'spell'. According to the documentation: http://wiki.apache.org/solr/SchemaXml#Copy_Fields no copy feeds into another copy so it may be that no data is actually being copied to your index, so there is nothing to build the spell checker from. I also noticed that the 'text' and 'spell' fields are not defined but I wasn't sure if they were just omitted to save space. Anyway, good look. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spellcheck-on-Large-index-size-tp760416p2496116.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Which version of Solr?
I figured instead of trying to index content, I'd simply issue a query via SolrJ. This seems related to my problem below. I create a CommonsHttpSolrServer instance in the manner already described and in a new method: @Override public ListString getNodeIdsForProductId(final String productId, final String partnerId) { final ListString nodes = new ArrayListString(); final CommonsHttpSolrServer solrServer = (CommonsHttpSolrServer)getSolrServer(partnerId); final SolrQuery query = new SolrQuery(); query.setQuery(productId: + productId); query.addField(nodeId); try { final QueryResponse response = solrServer.query(query); final SolrDocumentList docs = response.getResults(); log.info(String.format(getNodeIdsForProductId - got %d nodes for productId: %s, docs.getNumFound(), productId)); for (SolrDocument doc : docs) { log.info(doc); } } catch (SolrServerException ex) { final String msg = String.format(Unable to query Solr server %s, for query: %s, solrServer.getBaseURL(), query); log.error(msg); throw new ServiceException(msg, ex); } return nodes; } When issuing the query I get: 2011-02-14 13:13:28 INFO solr.SolrProductIndexService - getSolrServer - Solr url: http://localhost:8080/solr/partner-tmo 2011-02-14 13:13:28 INFO solr.SolrProductIndexService - getSolrServer - construct server for url: http://localhost:8080/solr/partner-tmo 2011-02-14 13:13:28 ERROR solr.SolrProductIndexService - Unable to query Solr server http://localhost:8080/solr/partner-tmo, for query: q=productId%3Aproduct4fl=nodeId ... Caused by: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server localhost failed to respond at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:484) ... Caused by: org.apache.commons.httpclient.NoHttpResponseException: The server localhost failed to respond at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1976) If I run this through the proxy again, I can see the request being made as: GET /solr/partner-tmo/select?q=productId%3Aproduct4fl=nodeIdwt=xmlversion=2.2 HTTP/1.1 User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0 Host: localhost:8080 And I get no response from Solr. If instead I use this URL in Firefox: http://localhost:8080/solr/partner-tmo/select?q=productId%3Aproduct4fl=nodeIdwt=xmlversion=2.2 I get search results. What is it about SolrJ that is just not working out? What basic thing am I missing? Using Firefox here, or curl below, I can talk to Solr (running in Tomcat 6) just fine. But when going via SolrJ, I cannot update or query. All of this stuff is running on a single system. I guess I'll try a simpler app/unit test to see what happens... This is really a big problem for me. Any suggests are greatly appreciated. Thanks, Jeff On Feb 13, 2011, at 9:15 PM, Jeff Schmidt wrote: Hello again: Back to the javabin iissue: On Feb 12, 2011, at 6:07 PM, Lance Norskog wrote: --- But I'm unable to get SolrJ to work due to the 'javabin' version mismatch. I'm using the 1.4.1 version of SolrJ, but I always get an HTTP response code of 200, but the return entity is simply a null byte, which does not match the version number of 1 defined in Solr common. --- I've never seen this problem. At this point you are better off starting with 3.x instead of chasing this problem down. I'm now using the latest branch_3x built Solr and SolrJ. Other places I've seen the message: Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 0) or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) One was told to make sure the version of Solr and SolrJ are compatible, and that the schema is valid. Unlike 1.4, I see 3.1 actually outputs the expected and received version numbers, which is helpful. You can see the invalid version of 0 is indicated which is the zero byte I receive in response. I have Solr running within Tomcat by following the wiki. I have the conf/Catalina/localhost/solr.xml file set as: ?xml version=1.0 encoding=utf-8? Context docBase=/usr/local/ingenuity/isec/solr/apache-solr-3.1-SNAPSHOT.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String
Re: Deploying Solr CORES on OVH Cloud
The first two questions are almost like religion. I am not sure we want to start a debate. Core setup is fairly easy. Add a solr.xml file and subdirs one per core (see example/) directory. Make sure you use the right URL for the admin console. On Mon, Feb 14, 2011 at 3:38 AM, Rosa (Anuncios) rosaemailanunc...@gmail.com wrote: Hi, I'm a bit new in Solr. I'm trying to set up a bunch of server (just for solr) on OVH cloud (http://www.ovh.co.uk/cloud/) and create new cores as needed on each server. First question: What do you recommend: Ubuntu or Debian? I mean in term od performance? Second question: Jetty or Tomcat? Again in term of performance and security? Third question: I've followed all the wiki but i can't get it working the CORES... Impossible to create CORE or access my cores? Does anyone have a working config to share? Thanks a lot for your help Regards,
slave out of sync
Hi, We're thinking of having a master-slave configuration where there are multiple slaves. Let's say during replication, one of the slaves does not replicate properly. How will we dectect that the 1 slave is out of sync? Tri
rollback to other versions of index
Hi, Does solr version each index build? We'd like to be able to rollback to not just a previous version but maybe a few version before the current one. Thanks, Tri
Keeping Tokens Whole by Force?
Hi, I have a Tokenizer that ensures emails are kept whole, but I then run the token stream through a set of filters that inevitably breaks up the email back into its components. Some of these filters are proprietary, and I can't modify them. It seems hacky to include an EmailRecoveryFilter at the end of my stream to take component tokens and reconstruct an email: and frankly, that could result in incorrect reconstruction in some cases. Ideally, I'd have a way to mark a Token as KEEP_WHOLE such that downstream TokenFilter's simply ignore it. Is there a precedent for this, or do I need to implement that myself? And if so, would you suggest I use the Token type, the Payload, or use a wrapper around Token itself to store the flag? Thanks! Tavi
Re: boosting results by a query?
found something that works great! in 3.1+ we can sort by a function query, so: sort=query({!lucene v='field:value'}) desc, score desc will put everything that matches 'field:value' first, then order the rest by score check: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function On Fri, Feb 11, 2011 at 4:31 PM, Ryan McKinley ryan...@gmail.com wrote: I have an odd need, and want to make sure I am not reinventing a wheel... Similar to the QueryElevationComponent, I need to be able to move documents to the top of a list that match a given query. If there were no sort, then this could be implemented easily with BooleanQuery (i think) but with sort it gets more complicated. Seems like I need: sortSpec.setSort( new Sort( new SortField[] { new SortField( something that only sorts results in the boost query ), new SortField( the regular sort ) })); Is there an existing FieldComparator I should look at? Any other pointers/ideas? Thanks ryan
Re: Faceting Query
I am also working on same feature of solr 4.0 And I have doubt in the results am getting. I will post the cases here. If anyone know why is it so,Please revert back... I run a normal facet query with q parameter q=*:* and did facet=on facet.field=stockfacet.filed=placefacet.field=quantityfacet.mincout=1 Results i got is- facet_fields stock rice10/ bean10/ wheat10/ jowar10/ /stock place bangalore10/ Kolar10/ /place quality standard10/ high10/ /quality /facet_fields Now when I am doing this facet.pivot query with same q paramater (q= *:* )and same data set .. query - facet.pivot=stock,place,qualityfacet.mincout=1 Result I get is like this- lst rice bangalore high /lst lst bean bangalore standard /lst lst jowar /lst The point is .. Why I am not getting result hirearchy for wheat when it is coming in the flat faceting above. Awaiting reply Regards, Rajani Maski On Mon, Feb 14, 2011 at 4:18 PM, rajini maski rajinima...@gmail.com wrote: This feature works in SOLR 4.0 release. You can follow this link for knowing how it works... Click herehttp://solr.pl/en/2010/10/25/hierarchical-faceting-pivot-facets-in-trunk/ Regards Rajani Maski On Mon, Feb 14, 2011 at 4:05 PM, Isha Garg isha.g...@orkash.com wrote: On Friday 11 February 2011 11:34 PM, Gora Mohanty wrote: On Thu, Feb 10, 2011 at 12:21 PM, Isha Gargisha.g...@orkash.com wrote: What is facet.pivot field? PLz explain with example Does http://wiki.apache.org/solr/SimpleFacetParameters#facet.pivot not help? Regards, Gora No, it is not showing any pivot results in my case http://localhost:8984/solr/worldNews/select/?q=*%3A*version=2.2start=0rows=0indent=onfacet.pivot=category,country,KeyLocationfacet.pivot=country,categoryfacet=truefacet.field=categorywt=json Output is: { responseHeader:{ status:0, QTime:1, params:{ facet:true, indent:on, start:0, q:*:*, facet.field:category, wt:json, facet.pivot:[category,country,KeyLocation, country,category], version:2.2, rows:0}}, response:{numFound:6775,start:0,docs:[] }, facet_counts:{ facet_queries:{}, facet_fields:{ category:[ Counterfeiting and Piracy ,2367, Social Unrest ,2143, Security Measures ,1064, Fraud and Cheating,356, Naxelites ,266, Terrorism ,243, Sex Crime ,232, Shiv Sena ,76, Major Crime ,23, Drug Running and Organized Crime ,5]}, facet_dates:{}}}
Re: Difference between Solr and Lucidworks distribution
Right. LWE binaries are distributed for free, and may be used for non-production purposes. For a production deployment, separate subscriptions are required. It is effectively the same as requiring payment for a production deployment license bundled with a support subscription. On Sun, Feb 13, 2011 at 8:29 AM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I believe that the Lucid Works distro for Solr is free and as you mentioned they only appear to sell their services for it. I have used that version for several demos because it does seem to have all the bells and whistles already included and it's super easy to set up. The only downside in my case is that they are still on the official release version 1.4.1 which has an older version of PDFBox that doesn't parse PDF's generated from newer adobe software. Thanks Adobe ;-) It's easy enough to just rebuild Tika, PDFBox, FontBox, etc. and swap them out...If you want spatial support, you can use the plugin from the Spatial Solr project out of the Netherlands which is designed to support 1.4.1 and from what I can tell seems to work pretty well. Anyway, when 4.0 is released, hopefully with the extended spatial support from projects like SIS and JTS, I hope to see the office distro version change from Lucid. Thanks for all hard work the Lucid Team has provided over the years! Adam On Feb 12, 2011, at 10:55 PM, Andy wrote: Now I'm confused. In http://www.lucidimagination.com/lwe/subscriptions-and-pricing, the price of LucidWorks Enterprise Software is stated as FREE. I thought the price for Production was for the support service, not for the software. But you seem to be saying that 'LucidWorks Enterprise' is a separate software that isn't free. Did I misunderstand? --- On Sat, 2/12/11, Lance Norskog goks...@gmail.com wrote: From: Lance Norskog goks...@gmail.com Subject: Re: Difference between Solr and Lucidworks distribution To: solr-user@lucene.apache.org, markus.jel...@openindex.io Date: Saturday, February 12, 2011, 8:10 PM There are two distributions. The company is Lucid Imagination. 'Lucidworks for Solr' is the certified distribution of Solr 1.4.1, with several enhancements. Markus refers to 'LucidWorks Enterprise', which is LWE. This is a separate app with tools and a REST API for managing a Solr instance. Lance Norskog On Fri, Feb 11, 2011 at 8:36 AM, Markus Jelsma markus.jel...@openindex.io wrote: It is not free for production environments. http://www.lucidimagination.com/lwe/subscriptions-and-pricing On Friday 11 February 2011 17:31:22 Greg Georges wrote: Hello all, I just started watching the webinars from Lucidworks, and they mention their distribution which has an installer, etc.. Is there any other differences? Is it a good idea to use this free distribution? Greg -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com
Re: Which version of Solr?
II! I feel your pain! On Mon, Feb 14, 2011 at 3:27 PM, Jeff Schmidt j...@535consulting.com wrote: Wow, okay, it's Cassandra's fault... :) I create unit tests to use HttpClient and even HttpURLConnection, and the former got the non-response from the server, and the latter got unexpected end of file. But, if I use curl or telnet, things would work. Anyway, I noticed (Mac OS X 10.6.6): [imac:apache/cassandra/apache-cassandra-0.7.0] jas% netstat -an | grep 8080 tcp4 0 0 *.8080 *.* LISTEN tcp46 0 0 *.8080 *.* LISTEN [imac:apache/cassandra/apache-cassandra-0.7.0] jas% After shutting down tomcat, the tcp4 line would still show up. Only after also shutting down Cassandra were there no listeners on port 8080. Starting tomcat and Cassandra in either order, neither failed to bind to 8080. Why my Java programs tried to talk to Cassandra, and telnet, Firefox, curl etc. managed to hook up with Solr, I don't know. I moved tomcat to port 8090 and things are good... Sigh.. What a big waste of time. Cheers, Jeff On Feb 14, 2011, at 2:29 PM, Jeff Schmidt wrote: I figured instead of trying to index content, I'd simply issue a query via SolrJ. This seems related to my problem below. I create a CommonsHttpSolrServer instance in the manner already described and in a new method: @Override public ListString getNodeIdsForProductId(final String productId, final String partnerId) { final ListString nodes = new ArrayListString(); final CommonsHttpSolrServer solrServer = (CommonsHttpSolrServer)getSolrServer(partnerId); final SolrQuery query = new SolrQuery(); query.setQuery(productId: + productId); query.addField(nodeId); try { final QueryResponse response = solrServer.query(query); final SolrDocumentList docs = response.getResults(); log.info(String.format(getNodeIdsForProductId - got %d nodes for productId: %s, docs.getNumFound(), productId)); for (SolrDocument doc : docs) { log.info(doc); } } catch (SolrServerException ex) { final String msg = String.format(Unable to query Solr server %s, for query: %s, solrServer.getBaseURL(), query); log.error(msg); throw new ServiceException(msg, ex); } return nodes; } When issuing the query I get: 2011-02-14 13:13:28 INFO solr.SolrProductIndexService - getSolrServer - Solr url: http://localhost:8080/solr/partner-tmo 2011-02-14 13:13:28 INFO solr.SolrProductIndexService - getSolrServer - construct server for url: http://localhost:8080/solr/partner-tmo 2011-02-14 13:13:28 ERROR solr.SolrProductIndexService - Unable to query Solr server http://localhost:8080/solr/partner-tmo, for query: q=productId%3Aproduct4fl=nodeId ... Caused by: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.NoHttpResponseException: The server localhost failed to respond at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:484) ... Caused by: org.apache.commons.httpclient.NoHttpResponseException: The server localhost failed to respond at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1976) If I run this through the proxy again, I can see the request being made as: GET /solr/partner-tmo/select?q=productId%3Aproduct4fl=nodeIdwt=xmlversion=2.2 HTTP/1.1 User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0 Host: localhost:8080 And I get no response from Solr. If instead I use this URL in Firefox: http://localhost:8080/solr/partner-tmo/select?q=productId%3Aproduct4fl=nodeIdwt=xmlversion=2.2 I get search results. What is it about SolrJ that is just not working out? What basic thing am I missing? Using Firefox here, or curl below, I can talk to Solr (running in Tomcat 6) just fine. But when going via SolrJ, I cannot update or query. All of this stuff is running on a single system. I guess I'll try a simpler app/unit test to see what happens... This is really a big problem for me. Any suggests are greatly appreciated. Thanks, Jeff On Feb 13, 2011, at 9:15 PM, Jeff Schmidt wrote: Hello again: Back to the javabin iissue: On Feb 12, 2011, at 6:07 PM, Lance Norskog wrote: --- But I'm unable to get SolrJ to work due to the 'javabin' version mismatch. I'm using the 1.4.1 version of SolrJ, but I always get an HTTP response code of 200, but the return entity is simply a null byte, which does not match the version number of 1 defined in Solr common. ---
Re: slave out of sync
We wrote a utility that looks at the index version on both slaves and complains if they are not at the same version... Bill On 2/14/11 5:19 PM, Tri Nguyen tringuye...@yahoo.com wrote: Hi, We're thinking of having a master-slave configuration where there are multiple slaves. Let's say during replication, one of the slaves does not replicate properly. How will we dectect that the 1 slave is out of sync? Tri
Re: Which version of Solr?
Wow; I'm glad you figured it out -- sort of. FYI, in the future, don't hijack email threads to talk about a new subject. Start a new thread. ~ David p.s. yes, I'm working on the 2nd edition. - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Which-version-of-Solr-tp2482468p2498641.html Sent from the Solr - User mailing list archive at Nabble.com.
carrot2 clustering component error
help me out of this error: java.lang.NoClassDefFoundError: org/apache/solr/util/plugin/SolrCoreAware
Re: Errors when implementing VelocityResponseWriter
looks like you're missing the Velocity JAR. It needs to be in some Solr visible lib directory. With 1.4.1 you'll need to put it in solr-home/lib. In later versions, you can use the lib elements in solrconfig.xml to point to other directories. Erik On Feb 14, 2011, at 10:41 , McGibbney, Lewis John wrote: Hello List, I am currently trying to implement the above in Solr 1.4.1. Having moved velocity directory from $SOLR_DIST/contrib/velocity/src/main/solr/conf to my webapp /lib directory, then adding queryResponseWriter name=blah and class=blah followed by the responseHandler specifics I am shown the following terminal output. I also added lib dir=./lib / in solrconfig. Can anyone suggest what I have not included in the config that is still required? Thanks Lewis SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.response.VelocityResponseWriter' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.initWriters(SolrCore.java:1408) at org.apache.solr.core.SolrCore.init(SolrCore.java:547) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.ClassNotFoundException: org.apache.solr.response.VelocityResponseWriter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359) ... 21 more Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html