Re: Parallel SQL - column not found intermittent error
I have seen this with very few indexed documents and multiple shards. In such a case, some shards may not have any documents, and when the query happens to hit such a shard, it does not find the fields it's looking for and turns this into "column not found". If you resubmit the query and hit a different shards (with docs), the query will succeed. On 6/14/2017 11:42 AM, Susheel Kumar wrote: > Yes, Joel. Kind of every other command runs into this issue. I just > executed below queries and 3 of them failed while 1 succeeded. I just > have 6 documents ingested and no further indexing going on. Let me know > what else to look for the state of index. > > > ➜ solr-6.6.0 curl --data-urlencode 'stmt=SELECT sr_sv_userFirstName as > firstName, sr_sv_userLastName as lastName FROM collection1 ORDEr BY > dv_sv_userLastName LIMIT 15' > http://server17:8984/solr/collection1/sql\?aggregationMode\=facet > > > {"result-set":{"docs":[{"EXCEPTION":"Failed to execute sqlQuery 'SELECT > sr_sv_userFirstName as firstName, sr_sv_userLastName as lastName FROM > collection1 ORDEr BY dv_sv_userLastName LIMIT 15' against JDBC connection > 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT > sr_sv_userFirstName as firstName, sr_sv_userLastName as lastName FROM > collection1 ORDEr BY dv_sv_userLastName LIMIT 15\": From line 1, column 9 > to line 1, column 27: Column 'sr_sv_userFirstName' not found in any > table","EOF":true,"RESPONSE_TIME":85}]}} > > > ➜ solr-6.6.0 curl --data-urlencode 'stmt=SELECT sr_sv_userFirstName as > firstName, sr_sv_userLastName as lastName FROM collection1 ORDEr BY > dv_sv_userLastName LIMIT 15' > http://server17:8984/solr/collection1/sql\?aggregationMode\=facet > > > {"result-set":{"docs":[{"firstName":"Thiago","lastName":"Diego"},{"firstName":"John","lastName":"Jagger"},{"firstName":"John","lastName":"Jagger"},{"firstName":"John","lastName":"Johny"},{"firstName":"Isabel","lastName":"Margret"},{"firstName":"Isabel","lastName":"Margret"},{"EOF":true,"RESPONSE_TIME":241}]}} > > > ➜ solr-6.6.0 curl --data-urlencode 'stmt=SELECT sr_sv_userFirstName as > firstName, sr_sv_userLastName as lastName FROM collection1 ORDEr BY > dv_sv_userLastName LIMIT 15' > http://server17:8984/solr/collection1/sql\?aggregationMode\=facet > > > > {"result-set":{"docs":[{"EXCEPTION":"Failed to execute sqlQuery 'SELECT > sr_sv_userFirstName as firstName, sr_sv_userLastName as lastName FROM > collection1 ORDEr BY dv_sv_userLastName LIMIT 15' against JDBC connection > 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT > sr_sv_userFirstName as firstName, sr_sv_userLastName as lastName FROM > collection1 ORDEr BY dv_sv_userLastName LIMIT 15\": From line 1, column 9 > to line 1, column 27: Column 'sr_sv_userFirstName' not found in any > table","EOF":true,"RESPONSE_TIME":87}]}} > > On Wed, Jun 14, 2017 at 11:18 AM, Joel Bernsteinwrote: > >> Are you able to reproduce the error, or is it just appearing in the logs? >> >> Do you know the state of index when it's occurring? >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Wed, Jun 14, 2017 at 11:09 AM, Susheel Kumar >> wrote: >> >>> I have setup Solr-6.6-0 on local (local ZK and Solr) and then on servers >> (3 >>> ZK and 2 machines, 2 shards) and on both the env, i see this >> intermittent >>> error "column not found". The same query works sometime and other time >>> fails. >>> >>> Is that a bug or am I missing something... >>> >>> >>> Console >>> === >>> >>> -> solr-6.6.0 curl --data-urlencode 'stmt=SELECT dv_sv_userFirstName as >>> firstName, dv_sv_userLastName as lastName FROM collection1 ORDEr BY >>> dv_sv_userLastName LIMIT 15' >>> http://server17:8984/solr/collection1/sql\?aggregationMode\=facet >>> >>> {"result-set":{"docs":[{"EXCEPTION":"Failed to execute sqlQuery 'SELECT >>> dv_sv_userFirstName as firstName, dv_sv_userLastName as lastName FROM >>> collection1 ORDEr BY dv_sv_userLastName LIMIT 15' against JDBC connection >>> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT >>> dv_sv_userFirstName as firstName, dv_sv_userLastName as lastName FROM >>> collection1 ORDEr BY dv_sv_userLastName LIMIT 15\": From line 1, column 9 >>> to line 1, column 27: Column 'dv_sv_userFirstName' not found in any >>> table","EOF":true,"RESPONSE_TIME":78}]}} >>> >>> ➜ solr-6.6.0 curl --data-urlencode 'stmt=SELECT dv_sv_userFirstName as >>> firstName, dv_sv_userLastName as lastName FROM collection1 ORDEr BY >>> dv_sv_userLastName LIMIT 15' >>> http://server17:8984/solr/collection1/sql\?aggregationMode\=facet >>> >>> {"result-set":{"docs":[{"EXCEPTION":"Failed to execute sqlQuery 'SELECT >>> dv_sv_userFirstName as firstName, dv_sv_userLastName as lastName FROM >>> collection1 ORDEr BY dv_sv_userLastName LIMIT 15' against JDBC connection >>> 'jdbc:calcitesolr:'.\nError while executing SQL \"SELECT >>> dv_sv_userFirstName as firstName, dv_sv_userLastName as lastName FROM >>> collection1 ORDEr BY
Re: index multiple files into one index entity
No, the implementation was very specific to my needs. On 5/27/2013 8:28 AM, Alexandre Rafalovitch wrote: You did not open source it by any chance? :-) Regards, Alex.
Re: CoreAdmin STATUS performance
On 1/9/2013 10:38 AM, Shahar Davidson wrote: Hi All, I have a client app that uses SolrJ and which requires to collect the names (and just the names) of all loaded cores. I have about 380 Solr Cores on a single Solr server (net indices size is about 220GB). Running the STATUS action takes about 800ms - that seems a bit too long, given my requirements. So here are my questions: 1) Is there any way to get _only_ the core Name of all cores? If you have access to the filesystem, you could just read solr.xml where all cores are listed.
Re: SolrJ | Add a date field to ContentStreamUpdateRequest
On 12/30/2012 11:57 AM, uwe72 wrote: Hi there, how can i add a date field to a pdf document? Same way you add the ID field, using literal parameter. ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(pdfFile, application/octet-stream); up.setParam(literal. + SolrConstants.ID, solrPDFId); Regards Uwe -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Add-a-date-field-to-ContentStreamUpdateRequest-tp4029704.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ | Add a date field to ContentStreamUpdateRequest
On 12/30/2012 3:55 PM, uwe72 wrote: but i can just add String values.i want to add Date objects?! You represent the Date as a String, in format Solr uses for dates: http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/schema/DateField.html
Re: solr4.0 problem zkHost with multiple hosts throws out of range exception
I'm pretty sure this problem has been there forever -- the parsing of zkHost is busted. I believe it's only been intended for example/demo purposes and therefore makes some assumptions about the value. I haven't looked at the current code, but this is my recollection from about a year ago. From: Pascal freqresp...@pensa.fr To: solr-user@lucene.apache.org Sent: Thursday, October 18, 2012 5:45 AM Subject: solr4.0 problem zkHost with multiple hosts throws out of range exception Hi there, I've set up a test solr 4.0 cloud with some nodes, everything worked fine until i tried to put more than 1 zookeeper instance. If i put only one server it's ok eg: java -DzkHost=10.0.0.1:9983 -DzkRun -jar start.jar But if i put more than 1 server in zkHost param an Exception is thrown immediately when parsing the zkHost parameter: Exemple : java -DzkHost=10.0.0.1:9983,10.0.0.2:9983 -DzkRun -jar start.jar [...] SEVERE: null:java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.init(InetSocketAddress.java:83) at java.net.InetSocketAddress.init(InetSocketAddress.java:63) at org.apache.solr.cloud.SolrZkServerProps.setClientPort(SolrZkServer.java:315) at org.apache.solr.cloud.SolrZkServerProps.getMySeverId(SolrZkServer.java:278) at org.apache.solr.cloud.SolrZkServerProps.parseProperties(SolrZkServer.java:453) at org.apache.solr.cloud.SolrZkServer.parseConfig(SolrZkServer.java:90) at org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:208) [...] The port out of range:-1 look like if zkHost parameter wasn't correctly split as soon as i add a coma in the parameter. I tried to put hostnames instead of ip with no luck. I tried to search in this forum and on the net but didn't find why, any idea ? Thanks, Pascal -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-0-problem-zkHost-with-multiple-hosts-throws-out-of-range-exception-tp4014440.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to import a part of index from main Solr server(based on a query) to another Solr server and then do incremental import at intervals later(the updated index)?
You can merge indexes. You cannot split them. jefferyyuan yuanyun...@gmail.com wrote: Thanks for the reply, but I think SolrReplication may not help in this case, as we don't want to replicate all indexs to solr2, just a part of index(index of doc created by me). Seems SolrReplication doesn't support replicate a part of index(based on a query) to the slave. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-import-a-part-of-index-from-main-Solr-server-based-on-a-query-to-another-Solr-server-and-then-tp4013479p4013580.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Extract multiple streams into the same document
Answering my own question, for archive's sake, I worked this out by creating my own UpdateRequestProcessor. On 10/4/2012 2:35 PM, Yury Kats wrote: I'm sending streams of data to Solr, using ExtractingRequestHandler to be parsed/extracted by Tika and then indexed. While multiple streams can be passed with a single request to Solr, each stream ends up being indexed into a separate document. Or, if I pass the unique id parameter with the request (as literal.id parameter), the very last stream ends up overwriting all other streams withing the same request, since each one is being indexed into a new document with the same id. I'm looking for a way to have multiple streams indexed into the same document. I have a content field defined for extraction (using fmap.content parameter) and the field is defined as multiValued in the schema. I would like all streams from the request to be indexed as different values of that multiValued content field in the same document. Any hints or ideas are appreciated. Thanks, Yury
Extract multiple streams into the same document
I'm sending streams of data to Solr, using ExtractingRequestHandler to be parsed/extracted by Tika and then indexed. While multiple streams can be passed with a single request to Solr, each stream ends up being indexed into a separate document. Or, if I pass the unique id parameter with the request (as literal.id parameter), the very last stream ends up overwriting all other streams withing the same request, since each one is being indexed into a new document with the same id. I'm looking for a way to have multiple streams indexed into the same document. I have a content field defined for extraction (using fmap.content parameter) and the field is defined as multiValued in the schema. I would like all streams from the request to be indexed as different values of that multiValued content field in the same document. Any hints or ideas are appreciated. Thanks, Yury
Re: missing core name in path
On 8/16/2012 6:57 AM, Muzaffer Tolga Özses wrote: Also, below are the lines I got when starting it: SEVERE: org.apache.solr.common.SolrException: Schema Parsing Failed: multiple points ... Caused by: java.lang.NumberFormatException: multiple points at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1082) This looks like the version number at the top of the schema has more than one dot, eg 1.2.3. Solr parses version as a floating point number, so it must be 1.23 instead.
Re: Solr 4 Alpha SolrJ Indexing Issue
On 7/18/2012 7:11 PM, Briggs Thompson wrote: I have realized this is not specific to SolrJ but to my instance of Solr. Using curl to delete by query is not working either. Can be this: https://issues.apache.org/jira/browse/SOLR-3432
Re: Could I use Solr to index multiple applications?
On 7/17/2012 9:26 PM, Zhang, Lisheng wrote: Thanks very much for quick help! Multicore sounds interesting, I roughly read the doc, so we need to put each core name into Solr config XML, if we add another core and change XML, do we need to restart Solr? You can add/create cores on the fly, without restarting. See http://wiki.apache.org/solr/CoreAdmin#CREATE
multiValued false-true
I have an indexed, not stored, not multiValued field in the schema. If I change this field to be multiValued, would I need to re-index everything, or would all existing documents (that were indexed while the field was not multiValued) still be queryable? Thanks, Yury
Re: Sort by date field = outofmemory?
This solves the problem by allocating memory up front, instead of at some point later when JVM needs it. At that later point in time there may not be enough free memory left on the system to allocate. On 7/11/2012 11:04 AM, Michael Della Bitta wrote: There is a school of thought that suggests you should always set Xms and Xmx to the same thing if you expect your heap to hit Xms. This results in your process only needing to allocate the memory once, rather in a series of little allocations as the heap expands. I can't explain how this fixed your problem, but just a datapoint that might suggest that doing what you did is not such a bad thing. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Wed, Jul 11, 2012 at 4:05 AM, Bruno Mannina bmann...@free.fr wrote: Hi, some news this morning... I added -Xms1024m option and now it works?! no outofmemory ?! java -jar -Xms1024m -Xmx2048m start.jar Le 11/07/2012 09:55, Bruno Mannina a écrit : Hi Yury, Thanks for your anwer. ok for to increase memory but I have a problem with that, I have 8Go on my computer but the JVM accepts only 2Go max with the option -Xmx is it normal? Thanks, Bruno Le 11/07/2012 03:42, Yury Kats a écrit : Sorting is a memory-intensive operation indeed. Not sure what you are asking, but it may very well be that your only option is to give JVM more memory. On 7/10/2012 8:25 AM, Bruno Mannina wrote: Dear Solr Users, Each time I try to do a request with sort=pubdate+desc I get: GRAVE: java.lang.OutOfMemoryError: Java heap space I use Solr3.6, I have around 80M docs and my request gets around 160 results. Actually for my test, i use jetty java -jar -Xmx2g start.jar PS: If I write 3g i get an error, I have 8go Ram Thanks a lot for your help, Bruno
Re: query syntax to find ??? chars
On 7/11/2012 2:55 PM, Alexander Aristov wrote: content:?? doesn't work :) I would try escaping them: content:\?\?\?\?\?\?
Re: Sort by date field = outofmemory?
Sorting is a memory-intensive operation indeed. Not sure what you are asking, but it may very well be that your only option is to give JVM more memory. On 7/10/2012 8:25 AM, Bruno Mannina wrote: Dear Solr Users, Each time I try to do a request with sort=pubdate+desc I get: GRAVE: java.lang.OutOfMemoryError: Java heap space I use Solr3.6, I have around 80M docs and my request gets around 160 results. Actually for my test, i use jetty java -jar -Xmx2g start.jar PS: If I write 3g i get an error, I have 8go Ram Thanks a lot for your help, Bruno
Re: get number of cores
On 6/25/2012 8:40 AM, Yuval Dotan wrote: Hi Is there a *programmatic (java) *way to connect to the Solr server (using solrj probably) and get the number of cores and core names? A STATUS admin request will give you all available cores, with their names. http://wiki.apache.org/solr/CoreAdmin#STATUS
Re: Solr v3.5.0 - numFound changes when paging through results on 8-shard cluster
On 6/19/2012 4:06 PM, Justin Babuscio wrote: Solr v3.5.0 8 Master Shards 2 Slaves Per Master Confirming that there are no active records being written, the numFound value is decreasing as we page through the results. For example, Page1 - numFound = 3683 Page2 - numFound = 3683 Page3 - numFound = 3683 Page4 - numFound = 2866 Page5 - numFound = 2419 Page5 - numFound = 1898 Page6 - numFound = 1898 ... PageN - numFound = 1898 It looks like it eventually settles on the real count. Is this a limitation when using a distributed cluster or is the numFound always intended to give an approximately similar to how Google responds with total hits? numFound should return the real count for any given query. How are you sepcifying which shards/cores to use for each query? Does this change between queries?
Re: SolrCloud and split-brain
On 6/15/2012 12:49 PM, Otis Gospodnetic wrote: Hi, How exactly does SolrCloud handle split brain situations? Imagine a cluster of 10 nodes. Imagine 3 of them being connected to the network by some switch and imagine the out port of this switch dies. When that happens, these 3 nodes will be disconnected from the other 7 nodes and we'll have 2 clusters, one with 3 nodes and one with 7 nodes and we'll have a split brain situation. Imagine we had 3 ZK nodes in the original 10-node cluster, 2 of which are connected to the dead switch and are thus aware only of the 3 node cluster now, and 1 ZK instance which is on a different switch and is thus aware only of the 7 node cluster. At this point how exactly does ZK make SolrCloud immune to split brain? A quorum of N/2+1 nodes is required to operate (that's also the reason you need at least 3 to begin with)
Re: LockObtainFailedException after trying to create cores on second SolrCloud instance
On 6/14/2012 2:05 AM, Daniel Brügge wrote: Will check later to use different data dirs for the core on each instance. But because each Solr sits in it's own openvz instance (virtual server respectively) they should be totally separated. At least from my point of understanding virtualization. Depending on how your VMs are configured, their filesystems could be mapped to the same place of the host's filesystem. What you describe sounds like this is the case.
Re: copyField
On 5/18/2012 9:54 AM, Tolga wrote: Hi, I've put the line copyField=* dest=text stored=true indexed=true/ in my schema.xml and restarted Solr, crawled my website, and indexed (I've also committed but do I really have to commit?). But I still have to search with content:mykeyword at the admin interface. What do I have to do so that I can search only with mykeyword? Do you have the default field defined?
Re: copyField
On 5/18/2012 4:02 PM, Tolga wrote: Default field? I'm not sure but I think I do. Will have to look. http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field
Problem parsing queries with forward slashes and multiple fields
I'm running into a problem with queries that contain forward slashes and more than one field. For example, these queries work fine: fieldName:/a fieldName:/* But if I have two fields with similar syntax in the same query, it fails. For simplicity, I'm using the same field twice: fieldName:/a fieldName:/a results in: no field name specified in query and no defaultSearchField defined in schema.xml SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no defaultSearchField defined in schema.xml at org.apache.solr.search.SolrQueryParser.checkNullField(SolrQueryParser.java:106) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:124) at org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1058) at org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358) at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257) at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212) at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170) at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74) at org.apache.solr.search.QParser.getQuery(QParser.java:143) fieldName:/* fieldName:/* results in: null java.lang.NullPointerException at org.apache.solr.schema.IndexSchema$DynamicReplacement.matches(IndexSchema.java:747) at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1026) at org.apache.solr.schema.IndexSchema.getFieldType(IndexSchema.java:980) at org.apache.solr.search.SolrQueryParser.getWildcardQuery(SolrQueryParser.java:172) at org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1039) at org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358) at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257) at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212) at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170) at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74) at org.apache.solr.search.QParser.getQuery(QParser.java:143) Any ideas as to what may be wrong and how can I make these work? I'm on a 4.0 snapshot from Nov 29, 2011.
Re: Problem parsing queries with forward slashes and multiple fields
On 2/22/2012 12:25 PM, Yury Kats wrote: I'm running into a problem with queries that contain forward slashes and more than one field. For example, these queries work fine: fieldName:/a fieldName:/* But if I have two fields with similar syntax in the same query, it fails. For simplicity, I'm using the same field twice: fieldName:/a fieldName:/a Looks like escaping forward slashes makes the query work, eg fieldName:\/a fieldName:\/a This is a bit puzzling as the forward slash is not part of the query language, is it?
Re: Problem parsing queries with forward slashes and multiple fields
On 2/22/2012 1:05 PM, Em wrote: Yury, are you sure your request has a proper url-encoding? Yes
Re: Problem parsing queries with forward slashes and multiple fields
On 2/22/2012 1:25 PM, Em wrote: That's strange. Could you provide a sample dataset? Data set does not matter. The query fails to parse, long before it gets to the data.
Re: Problem parsing queries with forward slashes and multiple fields
On 2/22/2012 1:24 PM, Yonik Seeley wrote: This is a bit puzzling as the forward slash is not part of the query language, is it? Regex queries were added that use forward slashes: https://issues.apache.org/jira/browse/LUCENE-2604 Oh, so / is a special character now? I don't think it is mentioned as such on any of the wiki pages, or in org.apache.solr.client.solrj.util.ClientUtils
Re: Problem parsing queries with forward slashes and multiple fields
On 2/22/2012 1:24 PM, Yonik Seeley wrote: Looks like escaping forward slashes makes the query work, eg fieldName:\/a fieldName:\/a This is a bit puzzling as the forward slash is not part of the query language, is it? Regex queries were added that use forward slashes: https://issues.apache.org/jira/browse/LUCENE-2604 Looks like regex matching happens across multiple fields though. Feels like a bug to me?
Re: no such core error with EmbeddedSolrServer
On 1/6/2012 9:57 AM, Phillip Rhodes wrote: On Fri, Jan 6, 2012 at 3:06 AM, Sven Maurmann s...@kippdata.de wrote: Hi, from your snippets the reason is not completely clear. There are a number of reasons for not starting up the server. For example in case of a faulty configuration of the core (solrconfig.xml, schema.xml) the core does not start and you get the reported error. Yeah, that I noticed... I had some such errors earlier, that I noticed when starting the Solr / Jetty standalone instance, but those have been resolved, and now I can launch Solr as a process, and use the SolrJ implementation that talks http to it - from my program - and everything works as expected. But still no joy with the EmbeddedSolrServer. :-( Have you tried passing core name (collection1) to the c'tor, instead of the empty string?
Re: no such core error with EmbeddedSolrServer
On 1/6/2012 10:19 AM, Phillip Rhodes wrote: 2012/1/6 Yury Kats yuryk...@yahoo.com: Have you tried passing core name (collection1) to the c'tor, instead of the empty string? Yep, but that gives the same error (with the core name appended) such as no such core: collection1 That probably means the home is not set properly, so it can't find solr.xml
Re: Replication not working
On 12/22/2011 4:39 AM, Dean Pullen wrote: Yeh the drop index via the URL command doesn't help anyway - when rebuilding the index the timestamp is obviously ahead of master (as the slave is being created now) so the replication will still not happen. If you deleted the index and create the core anew, index version will be 0 and replication will work.
Core overhead
Does anybody have an idea, or better yet, measured data, to see what the overhead of a core is, both in memory and speed? For example, what would be the difference between having 1 core with 100M documents versus having 10 cores with 10M documents?
Re: Core overhead
On 12/15/2011 1:07 PM, Robert Stewart wrote: I think overall memory usage would be close to the same. Is this really so? I suspect that the consumed memory is in direct proportion to the number of terms in the index. I also suspect that if I divided 1 core with N terms into 10 smaller cores, each smaller core would have much more than N/10 terms. Let's say I'm indexing English texts, it's likely that all smaller cores would have almost the same number of terms, close to the original N. Not so?
Re: Core overhead
On 12/15/2011 1:41 PM, Robert Petersen wrote: loading. Try it out, but make sure that the functionality you are actually looking for isn't sharding instead of multiple cores... Yes, but the way to achieve sharding is to have multiple cores. The question is then becomes -- how many cores (shards)?
Re: Core overhead
On 12/15/2011 4:46 PM, Robert Petersen wrote: Sure that is possible, but doesn't that defeat the purpose of sharding? Why distribute across one machine? Just keep all in one index in that case is my thought there... To be able to scale w/o re-indexing. Also often referred to as micro-sharding.
Re: Virtual Memory very high
On 12/13/2011 6:16 AM, Dmitry Kan wrote: If you allow me to chime in, is there a way to check for which DirectoryFactory is in use, if ${solr.directoryFactory:solr.StandardDirectoryFactory} has been configured? I think you can get the currently used factory in a Luke response, if you hit your Solr server with a Luke request, eg http://localhost:8983/solr/admin/luke Dmitry 2011/12/12 Yury Kats yuryk...@yahoo.com On 12/11/2011 4:57 AM, Rohit wrote: What are the difference in the different DirectoryFactory? http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html
Re: Virtual Memory very high
On 12/11/2011 4:57 AM, Rohit wrote: What are the difference in the different DirectoryFactory? http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html
Re: Virtual Memory very high
On 12/9/2011 11:54 PM, Rohit wrote: Hi All, Don't know if this question is directly related to this forum, I am running Solr in Tomcat on linux server. The moment I start tomcat the virtual memory shown using TOP command goes to its max 31.1G and then remains there. Is this the right behaviour, why is the virtual memory usage so high. I have 36GB of ram on the server. To limit VIRT memory, change DirectoryFactory in the solrconfig.xml to use solr.NIOFSDirectoryFactory.
Re: Delete by Query with limited number of rows
On 11/12/2011 4:08 PM, mikr00 wrote: Similar to a first in first out list. The problem is: It's easy to check the limit, but how can I delete the oldest documents to go again below the limit? Can I do it with a delete by query request? In that case, I would probably have to limit the number of rows? But I can't seem to find a way to do that. Or would you see a different solution (maybe there is a way to configure the solr core such that it automatically behaves as desribed?)? You can certainly delete a set of documents using delete by query, but you need to somehow identify what documents you want to have deleted. For that, you'd need to have a field, such as a sequence number or a timestamp when the document was added. Alternatively, if you can control the uniqueKey field when adding documents, you can just cycle it between 1 and 1,000,000. When you reach 1,000,000 set the uniqueKey back to 1 and keep adding. The new document will automatically replace the old document with the key of 1.
Re: Default value for dynamic fields
On 11/3/2011 12:59 PM, Milan Dobrota wrote: Is there any way to define the default value for the dynamic fields in SOLR? I use some dynamic fields of type float with _val_ and if they haven't been created at index time, the value defaults to 0. I would want this to be 1. Can that be changed? Does specifying default=1 not work?
Re: shard indexing
There's a defaultCore parameter in solr.xml that let's you specify what core should be used when none is specified in the URL. You can change that every time you create a new core. From: Vadim Kisselmann v.kisselm...@googlemail.com To: solr-user@lucene.apache.org Sent: Wednesday, November 2, 2011 6:16 AM Subject: Re: shard indexing Hello Jan, thanks for your quick response. It's quite difficult to explain: We want to create new shards on the fly every month and switch the default shard to the newest one. We always want to index to the newest shard with the same update query like http://localhost:8983/solr/update.(content stream) Is our idea possible to implement? Thanks in advance. Regards Vadim 2011/11/2 Jan Høydahl jan@cominvent.com Hi, The only difference is the core name in the URL, which should be easy enough to handle from your indexing client code. I don't really understand the reason behind your request. How would you control which core to index your document to if you did not specify it in the URL? You could name ONE of your cores as ., meaning it would be the default core living at /solr/update, perhaps that is what you're looking for? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote: Hello folks, i have an problem with shard indexing. with an single core i use this update command: http://localhost:8983/solr/update . now i have 2 shards, we can call them core0 / core1 http://localhost:8983/solr/core0/update . can i adjust anything to indexing in the same way like with a single core without core-name? thanks and regards vadim
Re: Solr Replication: relative path in confFiles Element?
On 10/25/2011 11:24 AM, Mark Schoy wrote: Hi, is ist possible to define a relative path in confFile? For example: str name=confFiles../../x.xml/str If yes, to which location will the file be copied at the slave? I don;t think it's possible. Replication copies confFiles from master core's confDir to slave core's confDir.
Re: Merging Remote Solr Indexes?
On 10/19/2011 5:15 PM, Darren Govoni wrote: Hi Otis, Yeah, I saw page, but it says for merging cores, which I presume must reside locally to the solr instance doing the merging? What I'm interested in doing is merging across solr instances running on different machines into a single solr running on another machine (programmatically). Is it still possible or did I misread the wiki? Possible, but in a few steps. 1. Create new cores on another machine. 2. Replicate them from different machine. 3. Merge on another machine. All 3 steps can be done programmatically.
Re: Issue with Shard configuration in solrconfig.xml (Solr 3.1)
On 10/20/2011 11:33 AM, Rahul Warawdekar wrote: Hi, I am trying to evaluate distributed search for my project by splitting up our single index on 2 shards with Solr 3.1 When I query the first solr server by passing the shards parameter, I get correct search results from both shards. ( http://server1:8080/solr/test/select/?shards=server1:8080/solr/test,server2:8080/solr/testq=solrstart=0rows=20 ) I want to avoid the use of this shards parameter in the http url and specify it in solrconfig.xml as follows. requestHandler name=my_custom_handler class=solr.SearchHandler default=true str name=shardsserver1:8080/solr/test,server2:8080/solr/test/str .. /requestHandler Don't you need to wrap it in lst name=default or lst name=appends? After adding the shards parameter in solrconfig.xml, I get search results only from the first shard and not from the from the second one. Am I missing any configuration ? This means your 'shards' parameter is not being used, because it's not specified properly. Also, can the urls with the shard parameter be load balanced for a failover mechanism ? See SolrCloud http://wiki.apache.org/solr/SolrCloud
Re: SolrJ + Post
On 10/14/2011 9:29 AM, Rohit wrote: I want to user POST instead of GET while using solrj, but I am unable to find a clear example for it. If anyone has implemented the same it would be nice to get some insight. To do what? Submit? Query? How do you use SolrJ now?
Re: SolrJ + Post
On 10/14/2011 12:11 PM, Rohit wrote: I want to query, right now I use it in the following way, CommonsHttpSolrServer server = new CommonsHttpSolrServer(URL HERE); SolrQuery sq = new SolrQuery(); sq.add(q,query); QueryResponse qr = server.query(sq); QueryResponse qr = server.query(sq, METHOD.POST);
Re: basic solr cloud questions
On 9/30/2011 12:26 PM, Pulkit Singhal wrote: SOLR-2355 is definitely a step in the right direction but something I would like to get clarified: Questions about SOLR-2355 are best asked in SOLR-2355 :) b) Does this basic implementation distribute across shards or across cores? From a brief look, it seems to assume shard=core. You list all cores in the config file under shards.
Re: SolrCloud: is there a programmatic way to create an ensemble
Nope On 9/29/2011 12:17 AM, Pulkit Singhal wrote: Did you find out about this? 2011/8/2 Yury Kats yuryk...@yahoo.com: I have multiple SolrCloud instances, each running its own Zookeeper (Solr launched with -DzkRun). I would like to create an ensemble out of them. I know about -DzkHost parameter, but can I achieve the same programmatically? Either with SolrJ or REST API? Thanks, Yury
Re: basic solr cloud questions
On 9/29/2011 7:22 AM, Darren Govoni wrote: That was kinda my point. The new cloud implementation is not about replication, nor should it be. But rather about horizontal scalability where nodes manage different parts of a unified index. It;s about many things. You stated one, but there are goals, one of them being tolerance to node outages. In a cloud, when one of your many nodes fail, you don't want to stop querying and indexing. For this to happen, you need to maintain redundant copies of the same pieces of the index, hence you need to replicate. One of the design goals of the new cloud implementation is for this to happen more or less automatically. True, but there is a big gap between goals and current state. Right now, there is distributed search, but not distributed indexing or auto-sharding, or auto-replication. So if you want to use the SolrCloud now (as many of us do), you need do a number of things yourself, even if they might be done by SolrCloud automatically in the future. To me that means one does not have to manually distributed documents or enforce replication as Yurly suggests. Replication is different to me than what was being asked. And perhaps I misunderstood the original question. Yurly's response introduced the term core where the original person was referring to nodes. For all I know, those are two different things in the new cloud design terminology (I believe they are). I guess understanding cores vs. nodes vs shards is helpful. :) Shard is a slice of index. Index is managed/stored in a core. Nodes are Solr instances, usually physical machines. Each node can host multiple shards, and each shard can consist of multiple cores. However, all cores within the same shard must have the same content. This is where the OP ran into the problem. The OP had 1 shard, consisting of two cores on two nodes. Since there is no distributed indexing yet, all documents were indexed into a single core. However, there is distributed search, therefore queries were sent randomly to different cores of the same shard. Since one core in the shard had documents and the other didn't, the query result was random. To solve this problem, the OP must make sure all cores within the same shard (be they on the same node or not) have the same content. This can currently be achieved by: a) setting up replication between cores. you index into one core and the other core replicates the content b) indexing into both cores Hope this clarifies.
Re: basic solr cloud questions
On 9/27/2011 5:16 PM, Darren Govoni wrote: On 09/27/2011 05:05 PM, Yury Kats wrote: You need to either submit the docs to both nodes, or have a replication setup between the two. Otherwise they are not in sync. I hope that's not the case. :/ My understanding (or hope maybe) is that the new Solr Cloud implementation will support auto-sharding and distributed indexing. This means that shards will receive different documents regardless of which node received the submitted document (spread evenly based on a hash-node assignment). Distributed queries will thus merge all the solr shard/node responses. All cores in the same shard must somehow have the same index. Only then can you continue servicing searches when individual cores fail. Auto-sharding and distributed indexing don't have anything to do with this. In the future, SolrCloud may be managing replication between cores in the same shard automatically. But right now it does not.
Re: two cores but have single result set in solr
On 9/24/2011 3:09 AM, hadi wrote: I do not know how to search both cores and not define shard parameter,could you show me some solutions for solve my issue? See this: http://wiki.apache.org/solr/DistributedSearch
Re: two cores but have single result set in solr
On 9/23/2011 6:00 PM, hadi wrote: I index my files with solrj and crawl my sites with nutch 1.3 ,as you know, i have to overwrite the nutch schema on solr schema in order to have view the result in solr/browse, in this case i should define two cores,but i want have single result or the user can search into both core indexes at the same time Can you not use 'shard' parameter and specify both cores there?
How to check if replication is running
Let's say I'm forcing a replication of a core using fetchindex command. No new content is being added to the master. I can check whether replication has finished by periodically querying master and slave for their indexversion and comparing the two. But what's the best way to check if replication is actually happening and hasn't been dropped, if for example, there was a network outage between master and the slave, in which case, I want to re-start replication. Thanks, Yury
Re: How to check if replication is running
On 9/16/2011 4:58 PM, Brandon Fish wrote: Hi Yury, You could try checking out the details command of the replication handler: http://slave_host:port/solr/replication?command=details which has information such as isReplicating. How reliable is isReplicating? Is it updated on unexpected failures or only during nomral operation? Eg, if both servers were powered down and then up, would it be false? You could also look at the script attached to [1] which shows a thorough check of a slaves replication status which could be polled for to trigger a restart if there is an error. [1] https://issues.apache.org/jira/browse/SOLR-1855 Thanks, that's very helpful. I see that it ultimately checks for 2 hour threshhold, which implies that other means of checking may not be 100% reliable. Is that so?
Re: Can index size increase when no updates/optimizes are happening?
On 9/14/2011 2:36 PM, Erick Erickson wrote: What is the machine used for? Was your user looking at a master? Slave? Something used for both? Stand-alone machine with multiple Solr cores. No replication. Measuring the size of all the files in the index? Or looking at memory? Disk space. The index files shouldn't be getting bigger unless there were indexing operations going on. That's what I thought. Is it at all possible that DIH was configured to run automatically (or any other indexing job for that matter) and your user didn't realize it? There's no DIH, but there is a custom app that submit docs for indexing via SolrJ. Supposedly, Solr logs were not showing any updates over night, so the assumption was that no new docs were added. I'd write it off as a user error, but wanted to double check with the community that no other internal Solr/Lucene task can change the index file size in the absence of submits.
Can index size increase when no updates/optimizes are happening?
One of my users observed that the index size (in bytes) increased over night. There was no indexing activity at that time, only querying was taking place. Running optimize brought the index size back down to what it was when indexing finished the day before. What could explain that?
Re: Parameter not working for master/slave
On 9/11/2011 11:24 PM, William Bell wrote: I am using 3.3 SOLR. I tried passing in -Denable.master=true and -Denable.slave=true on the Slave machine. Then I changed solrconfig.xml to reference each as per: http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node These are core parameters, you need to set them in solr.xml per core.
Re: Replication setup with SolrCloud/Zk
On 9/10/2011 3:54 PM, Pulkit Singhal wrote: Hi Yury, How do you manage to start the instances without any issues? The way I see it, no matter which instance is started first, the slave will complain about not being to find its respective master because that instance hasn't been started yet ... no? Yes, but it's not a big deal. The slaves polls periodically, so next time around the master will be up.
Re: Solr Cloud - is replication really a feature on the trunk?
On 9/9/2011 10:52 AM, Pulkit Singhal wrote: Thank You Yury. After looking at your thread, there's something I must clarify: Is solr.xml not uploaded and held in ZooKeeper? Not as far as I understand. Cores are loaded/created by the local Solr server based on solr.xml and then registered with ZK, so that ZK know what cores are out there and how they are organized in shards. because you have a slightly different config between Node 1 2: http://lucene.472066.n3.nabble.com/Replication-setup-with-SolrCloud-Zk-td2952602.html I have two shards, each shard having a master and a slave core. Cores are located so that master and slave are on different nodes. This protects search (but not indexing) from node failure.
Re: SolrCloud and replica question
On 9/9/2011 4:48 PM, Jamie Johnson wrote: When doing writes do all writes need to be done to the primary shard or are writes that are done to the replica also pushed to all replicas of that shard? If you have replication setup between cores, all changes to the slave will be overwritten by replication. Therefore it makes sense to submit docs for indexing only to the master cores
Re: Solr Cloud - is replication really a feature on the trunk?
On 9/9/2011 6:54 PM, Pulkit Singhal wrote: Thanks Again. Another question: My solr.xml has: cores adminPath=/admin/cores defaultCoreName=master1 core name=master1 instanceDir=. shard=shard1 collection=myconf/ /cores And I omitted -Dcollection.configName=myconf from the startup command because I felt that specifying collection=myconf should take care of that: cd /trunk/solr/example java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar start.jar With this you are telling ZK to bootstrap a collection with content of specific files, but you don't tell what collection that should be. Hence you want collection.configName parameter, and you want solr.xml to reference the same name in 'collection' attribute for the cores, so that SolrCloud knows where to pull configuration for that core from.
Re: Solr Cloud - is replication really a feature on the trunk?
On 9/7/2011 3:18 PM, Pulkit Singhal wrote: Hello, I'm working off the trunk and the following wiki link: http://wiki.apache.org/solr/SolrCloud The wiki link has a section that seeks to quickly familiarize a user with replication in SolrCloud - Example B: Simple two shard cluster with shard replicas But after going through it, I have to wonder if this is truly replication? Not really. Replication is not set up in the example. The example use replicas as copies, to demonstrate high search availability. Because if it is truly replication then somewhere along the line, the following properties must have been set programmatically: replicateAfter, confFiles, masterUrl, pollInterval Can someone tell me: Where exactly in the code is this happening? Nowhere. If you want replication, you need to set all the properties you listed in solrconfig.xml. I've done it recently, see http://lucene.472066.n3.nabble.com/Replication-setup-with-SolrCloud-Zk-td2952602.html
Re: bug in termfreq? was Re: is it possible to do a sort without query?
On 8/8/2011 4:34 PM, Jason Toy wrote: Aelexei, thank you , that does seem to work. My sort results seem to be totally wrong though, I'm not sure if its because of my sort function or something else. My query consists of: sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100 And I get back 4571232 hits. That would be the total number of docs, I guess. Since your query is *:*, ie find everything. All the results don't have the phrase indie music anywhere in their data. You are only sorting on termfreq of indie music, you are not querying documents that contain it.
Re: Example Solr Config on EC2
On 8/8/2011 5:03 PM, Matt Shields wrote: I'm looking for some examples of how to setup Solr on EC2. The configuration I'm looking for would have multiple nodes for redundancy. I've tested in-house with a single master and slave with replication running in Tomcat on Windows Server 2003, but even if I have multiple slaves the single master is a single point of failure. Any suggestions or example configurations? This article describes various configurations: http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e410
Re: cores vs indices
On 8/8/2011 12:00 AM, Daniel Schobel wrote: Can someone provide me with a succinct defintion of what a solr core is? Is there a one-to-one relationship of cores to solr indices or can you have multiple indices per core? http://wiki.apache.org/solr/CoreAdmin There's one index per core.
SolrCloud: is there a programmatic way to create an ensemble
I have multiple SolrCloud instances, each running its own Zookeeper (Solr launched with -DzkRun). I would like to create an ensemble out of them. I know about -DzkHost parameter, but can I achieve the same programmatically? Either with SolrJ or REST API? Thanks, Yury
CoreAdminHandler: can I specify custom properties when creating cores?
When crating cores through solr.xml, I am able to specify custom properties, to be referenced in solrconfig.xml. For example: cores adminPath=/admin/cores defaultCoreName=master core name=master instanceDir=core1 shard=shard1 collection=myconf property name=enable.master value=true / /core core name=slave instanceDir=core2 shard=shard2 collection=myconf property name=enable.slave value=true / property name=masterHost value=node2:8983 / /core /cores This would create a master core and a slave core, participating in replication, both sharing the same solrconfig.xml for replication setup. Is there a way to specify such properties when creating cores through a CoreAdminHandler request [1]? Thanks, Yury [1] http://wiki.apache.org/solr/CoreAdmin#CREATE
Re: what is the need of setting autocommit in solrconfig.xml
On 5/27/2011 6:48 AM, Romi wrote: What is the benifit of setting autocommit in solrconfig.xml. i read somewhere that these settings control how often pending updates will be automatically pushed to the index. does it mean if solr server is running then it automaticaly starts indexing process if it finds any updates in database??? No, it means it automatically commits recently added documents to the index so that they become searchable.
Re: problem in setting field attribute in schema.xml
On 5/25/2011 9:29 AM, Romi wrote: and in http://wiki.apache.org/solr/SchemaXml#Fields it is clearly mentioned that a non-indexed field is not searchable then why i am getting search result. why should stored=true matter if indexed=false indexed controls whether you can find the document based on the content of this field. stored controls whether you will see the content of this field in the result.
Re: Storing, indexing and searching XML documents in Solr
On 5/18/2011 4:19 PM, Judioo wrote: Any help is greatly appreciated. Pointers to documentation that address my issues is even more helpful. I think this would be a good start: http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
Replication setup with SolrCloud/Zk
Hi, I have two Solr nodes, each managing two cores -- a master core and a slave core. The slaves are setup to replicate from the other node's masters That is, node1.master - node2.slave, node2.master - node1.slave. The replication is configured in each core's solrconfig.xml, eg Master's solrconfig.xml on both nodes: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str /lst /requestHandler node1.Slave's solrconfig.xml: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlhttp://node2:8983/solr/master/replication/str str name=pollInterval01:00:00/str /lst /requestHandler node2.Slave's solrconfig.xml: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlhttp://node1:8983/solr/master/replication/str str name=pollInterval01:00:00/str /lst /requestHandler This is all working great with regular Solr. I am now trying to move to SolrCloud/ZK and can't figure out how to keep my replication settings. The SolrCloud/ZK seems to be managing one configuration for all cores/nodes in the cluster, yet I need to keep 3 different soltconfig.xml apart -- one for the masters and one for each of the slaves. The rest of the configuration (schema.xml etc) is identical to all cores and can be shared. I found a reference to master/slave setup with Zk in the wiki [1]. Has it been implemented or is this a proposal? If it is implemented, it's not quite clear to me how to setup the ReplicationHandler to have 2 different slave cores to pull from two different masters. Any help/idea would be appreciated! Thanks, Yury [1] http://wiki.apache.org/solr/ZooKeeperIntegration#Master.2BAC8-Slave
Re: Replication setup with SolrCloud/Zk
On 5/17/2011 10:17 AM, Stefan Matheis wrote: Yury, perhaps Java-Pararms (like used for this sample: http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node) can help you? Ah, thanks! It does seem to work! Cluster's solrconfig.xml (shared between all Solr instances and cores via SolrCloud/ZK): requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=replicateAfterstartup/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=pollInterval00:01:00/str str name=masterUrlhttp://${masterHost:xyz}/solr/master/replication/str /lst /requestHandler Node 1 solr.xml: cores adminPath=/admin/cores defaultCoreName=master core name=master instanceDir=core1 shard=shard1 collection=myconf property name=enable.master value=true / /core core name=slave instanceDir=core2 shard=shard2 collection=myconf property name=enable.slave value=true / property name=masterHost value=node2:8983 / /core /cores Node 2 solr.xml: cores adminPath=/admin/cores defaultCoreName=master core name=master instanceDir=core1 shard=shard2 collection=myconf property name=enable.master value=true / /core core name=slave instanceDir=core2 shard=shard1 collection=myconf property name=enable.slave value=true / property name=masterHost value=node1:8983 / /core /cores
Re: Specifying backup location in solrconfig.xml
I would create a replication slave, for which you can specify whatever location you want, even put it on a different machine. If ran on the same machine, the slave can be another core in the same Solr instance. On 5/17/2011 2:20 PM, Dietrich wrote: I am using Solr Replication to create a snapshot for backup purposes after each optimize: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=backupAfteroptimize/str str name=confFilesschema.xml,mapping-ISOLatin1Accent.txt,protwords.txt,stopwords.txt,synonyms.txt,elevate.xml/str /lst /requestHandler That works fine, but i need to create the snapshots somewhere outside the data directory. I tried specifying a location like this: str name=location${solr.home}/backup/site/str or str name=location/opt/solr/backup/site/str but Solr is complaining: SEVERE: java.io.IOException: Cannot run program snapshooter (in directory solr/bin): java.io.IOException: error=2, No such file or directory How can I specify the location for the backup in solrconfig.xml Dietrich