Re: Which one is it cs or cz for Czech language?
Hi, On Wed, Mar 18, 2015 at 9:28 AM, steve sc_shep...@hotmail.com wrote: FYI:http://www.w3schools.com/tags/ref_country_codes.asp CZECH REPUBLICCZ No entry for CS Exactly, steve. CZ is the country code, however we are talking about language codes (which is CS), since those Solr types deal with languages not with countries. Or were you trying to point out something else? Thanks, Eduard P.S: Here's the 2-letter language codes ISO for reference: http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes From: md...@apache.org Date: Tue, 17 Mar 2015 12:45:57 -0500 Subject: Re: Which one is it cs or cz for Czech language? To: solr-user@lucene.apache.org Probably a historical artifact. cz is the country code for the Czech Republic, cs is the language code for Czech. Once, cs was also the country code for Czechosolvakia, leading some folks to accidentally conflate the two. On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru enygma2...@gmail.com wrote: Hi, First of all, a bit of a disclaimer: I am not a Czech language speaker, at all. We are using Solr's dynamic fields in our project (XWiki), and we have recently noticed a problem [1] with the Czech language. Basically, our mapping says something like this: dynamicField name=*_cz type=text_cz indexed=true stored=true multiValued=true / ...but at runtime, we ask for the language code cs (which is the ISO language code for Czech [2]) and it obviously fails (due to the mapping). Now, we can easily fix this on our end by fixing the mapping to name=*_cs, but what we are really wondering now is why does Lucene/Solr use cz (country code) instead of cs (language code) in both its text_cz field and its stopwords_cz.txt file? Is that a mistake on the Solr/Lucene side? Is it some kind of convention? Is it going to be fixed? Thanks, Eduard -- [1] http://jira.xwiki.org/browse/XWIKI-11897 [2] http://en.wikipedia.org/wiki/Czech_language
index duplicate records from data source into 1 document
Hi If I have duplicaterecords in my source data (DB or delimited files). For simplicity sake they are of the following nature Product IdBusiness Type --- 12345 Exporter 12345 Agent 12366 Manufacturer 12377 Exporter 12377 Distributor There are other fields with multiple values as well. How do I index theduplicate records into 1 document. Eg. Product Id 12345 will be 1 document,12366 as 1 document and 12377 as 1 document. -Derek
Re: Which one is it cs or cz for Czech language?
It does indeed appear that use of the _cz suffix is a mistake - those suffixes are supposed to be language codes. Sure, generally, there tends to be a one-to-one relationship between language and country, but clearly that is not as absolute as a casual observer might misguidedly think. I think it's worth a Jira - text types should use language codes, not country codes. -- Jack Krupansky On Tue, Mar 17, 2015 at 1:35 PM, Eduard Moraru enygma2...@gmail.com wrote: Hi, First of all, a bit of a disclaimer: I am not a Czech language speaker, at all. We are using Solr's dynamic fields in our project (XWiki), and we have recently noticed a problem [1] with the Czech language. Basically, our mapping says something like this: dynamicField name=*_cz type=text_cz indexed=true stored=true multiValued=true / ...but at runtime, we ask for the language code cs (which is the ISO language code for Czech [2]) and it obviously fails (due to the mapping). Now, we can easily fix this on our end by fixing the mapping to name=*_cs, but what we are really wondering now is why does Lucene/Solr use cz (country code) instead of cs (language code) in both its text_cz field and its stopwords_cz.txt file? Is that a mistake on the Solr/Lucene side? Is it some kind of convention? Is it going to be fixed? Thanks, Eduard -- [1] http://jira.xwiki.org/browse/XWIKI-11897 [2] http://en.wikipedia.org/wiki/Czech_language
Re: Whole RAM consumed while Indexing.
Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just short of insane unless it's a typo ;). Here's a long background: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But the short form is that you're opening searchers every 300 ms. The hard commit is better, but every 3 seconds is still far too short IMO. I'd start with soft commits of 6 and hard commits of 6 (60 seconds), meaning that you're going to have to wait 1 minute for docs to show up unless you explicitly commit. You're throwing away all the caches configured in solrconfig.xml more than 3 times a second, executing autowarming, etc, etc, etc Changing these to longer intervals might cure the problem, but if not then, as Hoss would say, details matter. I suspect you're also seeing overlapping searchers warning messages in your log, and it;s _possible_ that what's happening is that you're just exceeding the max warming searchers and never opening a new searcher with the newly-indexed documents. But that's a total shot in the dark. How are you looking for docs (and not finding them)? Does the numDocs number in the solr admin screen change? Best, Erick On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Alexandre, *Hard Commit* is : autoCommit maxTime${solr.autoCommit.maxTime:3000}/maxTime openSearcherfalse/openSearcher /autoCommit *Soft Commit* is : autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:300}/maxTime /autoSoftCommit And I am committing 2 documents each time. Is it good config for committing? Or I am good something wrong ? On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's your commit strategy? Explicit commits? Soft commits/hard commits (in solrconfig.xml)? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 23:19, Nitin Solanki nitinml...@gmail.com wrote: Hello, I have written a python script to do 2 documents indexing each time on Solr. I have 28 GB RAM with 8 CPU. When I started indexing, at that time 15 GB RAM was freed. While indexing, all RAM is consumed but **not** a single document is indexed. Why so? And it through *HTTPError: HTTP Error 503: Service Unavailable* in python script. I think it is due to heavy load on Zookeeper by which all nodes went down. I am not sure about that. Any help please.. Or anything else is happening.. And how to overcome this issue. Please assist me towards right path. Thanks.. Warm Regards, Nitin Solanki
Re: Add replica on shards
U can do the same simply by something like that http://localhost:8983/solr/admin/cores?action=CREATEcollection=wikingramname=ANY_NAME_HEREshard=shard1 The main part is shard=shard1, when you create core with existing shard (core name doesn't matter, we use collection_shard1_replica2, but u can do whatever u want), this core becomes a replica and copies data from leading shard. -- View this message in context: http://lucene.472066.n3.nabble.com/Add-replica-on-shards-tp4193659p4193732.html Sent from the Solr - User mailing list archive at Nabble.com.
schema.xml xsd file
Hello, Where can I find the xsd file for the schema.xml file? Thanks in advanced! Best regards, Pedro Figueiredo Senior Engineer mailto:pjlfigueir...@criticalsoftware.com pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 http://www.criticalsoftware.com/ www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA http://cmmiinstitute.com/ A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by http://www.cmu.edu/ CMU
Re: index duplicate records from data source into 1 document
I'd use SolrJ, pull the docs by productId order and combine records with the same product ID into a single doc. Here's a starter set for indexing form a DB with SolrJ. It has Tika processing in it as well, but you can pull that out pretty easily. https://lucidworks.com/blog/indexing-with-solrj/ Best, Erick On Wed, Mar 18, 2015 at 2:52 AM, Derek Poh d...@globalsources.com wrote: Hi If I have duplicaterecords in my source data (DB or delimited files). For simplicity sake they are of the following nature Product IdBusiness Type --- 12345 Exporter 12345 Agent 12366 Manufacturer 12377 Exporter 12377 Distributor There are other fields with multiple values as well. How do I index theduplicate records into 1 document. Eg. Product Id 12345 will be 1 document,12366 as 1 document and 12377 as 1 document. -Derek
Re: Whole RAM consumed while Indexing.
Don't do it. Really, why do you want to do this? This seems like an XY problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml since they won't be useful at all. Best, Erick On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just short of insane unless it's a typo ;). Here's a long background: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But the short form is that you're opening searchers every 300 ms. The hard commit is better, but every 3 seconds is still far too short IMO. I'd start with soft commits of 6 and hard commits of 6 (60 seconds), meaning that you're going to have to wait 1 minute for docs to show up unless you explicitly commit. You're throwing away all the caches configured in solrconfig.xml more than 3 times a second, executing autowarming, etc, etc, etc Changing these to longer intervals might cure the problem, but if not then, as Hoss would say, details matter. I suspect you're also seeing overlapping searchers warning messages in your log, and it;s _possible_ that what's happening is that you're just exceeding the max warming searchers and never opening a new searcher with the newly-indexed documents. But that's a total shot in the dark. How are you looking for docs (and not finding them)? Does the numDocs number in the solr admin screen change? Best, Erick On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Alexandre, *Hard Commit* is : autoCommit maxTime${solr.autoCommit.maxTime:3000}/maxTime openSearcherfalse/openSearcher /autoCommit *Soft Commit* is : autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:300}/maxTime /autoSoftCommit And I am committing 2 documents each time. Is it good config for committing? Or I am good something wrong ? On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's your commit strategy? Explicit commits? Soft commits/hard commits (in solrconfig.xml)? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 23:19, Nitin Solanki nitinml...@gmail.com wrote: Hello, I have written a python script to do 2 documents indexing each time on Solr. I have 28 GB RAM with 8 CPU. When I started indexing, at that time 15 GB RAM was freed. While indexing, all RAM is consumed but **not** a single document is indexed. Why so? And it through *HTTPError: HTTP Error 503: Service Unavailable* in python script. I think it is due to heavy load on Zookeeper by which all nodes went down. I am not sure about that. Any help please.. Or anything else is happening.. And how to overcome this issue. Please assist me towards right path. Thanks.. Warm Regards, Nitin Solanki
Re: Add replica on shards
Thanks Norgorn. I did the same thing but in different manner.. like - localhost:8983/solr/admin/cores?action=CREATEname=wikingram_shard4_replica3collection=wikingramproperty.shard=shard4 On Wed, Mar 18, 2015 at 7:20 PM, Norgorn lsunnyd...@mail.ru wrote: U can do the same simply by something like that http://localhost:8983/solr/admin/cores?action=CREATEcollection=wikingramname=ANY_NAME_HEREshard=shard1 The main part is shard=shard1, when you create core with existing shard (core name doesn't matter, we use collection_shard1_replica2, but u can do whatever u want), this core becomes a replica and copies data from leading shard. -- View this message in context: http://lucene.472066.n3.nabble.com/Add-replica-on-shards-tp4193659p4193732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud: data is present on one shard only
On 3/17/2015 3:54 AM, Aman Tandon wrote: I indexed the data in my SolrCoud architecture (2 shards present on 2 separate instance on one instance I have the replica of both the shards which is present on other 2 instance). And when I am looking at the index via admin interface, it is present on a single instance. Isn't the data should be present on both the shards. The question here is not clear, at least to me. What exactly are you looking at in the admin UI, what are you seeing, and what are you expecting to see? Thanks, Shawn
Re: Unable to index rich-text documents in Solr Cloud
Shot in the dark, but is the PDF file significantly larger than the others? Perhaps your simply exceeding the packet limits for the servlet container? Best, Erick On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi everyone, I'm having some issues with indexing rich-text documents from the Solr Cloud. When I tried to index a pdf or word document, I get the following error: org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) I'm able to index .xml and .csv files in Solr Cloud with the same configuration. I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and I have 2 shards with the following details: Shard1: 192.168.2.2:8983 Shard2: 192.168.2.2:8984 Prior to this, I'm already able to index rich-text documents without the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml, so my ExtractRequestHandler is already defined. Is there other settings required in order to index rich-text documents in Solr Cloud? Regards, Edwin
Re: Whole RAM consumed while Indexing.
On 3/18/2015 9:44 AM, Nitin Solanki wrote: I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6 seconds then it will be more expensive. If I don't encounter with **overlapping searchers warning messages** then I feel it seems to be okay. Is it? Even if the commit only handles a single document and it's a soft commit, it is an expensive operation in terms of CPU, and in a garbage-collected environment like Java, memory churn as well. A commit also invalidates the Solr caches, so if you have autowarming turned on, then you have the additional overhead of doing a bunch of queries to warm the new cache - on every single soft commit. Doing commits as often as three times a second (you did say the interval was 300 milliseconds) is generally a bad idea. Increasing the interval to once a minute will take a huge amount of load off of your servers, so indexing will happen faster. Thanks, Shawn
Multiple words suggestion
Hello there, Does Solr 4.x (or even 5) support *multiple words suggestions*? I mean if my query is: *tozota hilox*: And when I activate the spellcheck component, each word is treated separately. So *toyota* is suggested for *tozota*, and *hilux* is suggested for *hilox*. But what I need to have is a complete suggestion for all the query: i.e. *toyota hilux* which will be suggested when the user's query is *tozota hilox*. Please see below the *suggest component *from my *solrconfig.xml *(I changed only the *field*): *searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_general/str!-- Multiple Spell Checkers can be declared and used by this component --!-- a spellchecker built from a field of the main index --lst name=spellchecker str name=namedefault/str str name=fieldrecherche/str str name=classnamesolr.DirectSolrSpellChecker/str !-- the spellcheck distance measure used, the default is the internal levenshtein -- str name=distanceMeasureinternal/str !-- minimum accuracy needed to be considered a valid spellcheck suggestion -- float name=accuracy0.5/float !-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -- int name=maxEdits2/int !-- the minimum shared prefix when enumerating terms -- int name=minPrefix1/int !-- maximum number of inspections per result. -- int name=maxInspections5/int !-- minimum length of a query term to be considered for correction -- int name=minQueryLength4/int !-- maximum threshold of documents a query term can appear to be considered for correction -- float name=maxQueryFrequency0.01/float !-- uncomment this to require suggestions to occur in 1% of the documents float name=thresholdTokenFrequency.01/float --/lst!-- a spellchecker that can break or combine words. See /spell handler below for usage --lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/strstr name=fieldrecherche/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst* */searchComponent* -- Cordialement, Best regards, Hakim Benoudjit
Re: SolrCloud: data is present on one shard only
Hi Shawn, I apologize for my unclear mail, I have 1,20,000 documents. And I am indexing the data in my solrcloud architecture having the two shards. I am having the mindset that some data will be present on both the shards. But when I am looking on data size via admin interface, I am able to see that all the documents is present on only one shard and another shard has zero documents. So I am confused and wants to confirm that, does I am doing something wrong? With Regards Aman Tandon On Wed, Mar 18, 2015 at 7:26 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/17/2015 3:54 AM, Aman Tandon wrote: I indexed the data in my SolrCoud architecture (2 shards present on 2 separate instance on one instance I have the replica of both the shards which is present on other 2 instance). And when I am looking at the index via admin interface, it is present on a single instance. Isn't the data should be present on both the shards. The question here is not clear, at least to me. What exactly are you looking at in the admin UI, what are you seeing, and what are you expecting to see? Thanks, Shawn
Re: Whole RAM consumed while Indexing.
Hi Erick, I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6 seconds then it will be more expensive. If I don't encounter with **overlapping searchers warning messages** then I feel it seems to be okay. Is it? On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com wrote: Don't do it. Really, why do you want to do this? This seems like an XY problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml since they won't be useful at all. Best, Erick On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just short of insane unless it's a typo ;). Here's a long background: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But the short form is that you're opening searchers every 300 ms. The hard commit is better, but every 3 seconds is still far too short IMO. I'd start with soft commits of 6 and hard commits of 6 (60 seconds), meaning that you're going to have to wait 1 minute for docs to show up unless you explicitly commit. You're throwing away all the caches configured in solrconfig.xml more than 3 times a second, executing autowarming, etc, etc, etc Changing these to longer intervals might cure the problem, but if not then, as Hoss would say, details matter. I suspect you're also seeing overlapping searchers warning messages in your log, and it;s _possible_ that what's happening is that you're just exceeding the max warming searchers and never opening a new searcher with the newly-indexed documents. But that's a total shot in the dark. How are you looking for docs (and not finding them)? Does the numDocs number in the solr admin screen change? Best, Erick On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Alexandre, *Hard Commit* is : autoCommit maxTime${solr.autoCommit.maxTime:3000}/maxTime openSearcherfalse/openSearcher /autoCommit *Soft Commit* is : autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:300}/maxTime /autoSoftCommit And I am committing 2 documents each time. Is it good config for committing? Or I am good something wrong ? On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's your commit strategy? Explicit commits? Soft commits/hard commits (in solrconfig.xml)? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 23:19, Nitin Solanki nitinml...@gmail.com wrote: Hello, I have written a python script to do 2 documents indexing each time on Solr. I have 28 GB RAM with 8 CPU. When I started indexing,
Re: SolrCloud: data is present on one shard only
On 3/18/2015 9:47 AM, Aman Tandon wrote: I have 1,20,000 documents. And I am indexing the data in my solrcloud architecture having the two shards. I am having the mindset that some data will be present on both the shards. But when I am looking on data size via admin interface, I am able to see that all the documents is present on only one shard and another shard has zero documents. So I am confused and wants to confirm that, does I am doing something wrong? In your admin UI, click on Cloud and then Tree. You should see a /collections entry in the list. Open that, and then click on the collection you are concerned about. In the right side of that window, there will be a bunch of fields with values. Below that will be a small snippet of JSON text, and one of the bits of info in that JSON will be a field called router ... what is router set to? If it is implicit then your documents will not be automatically dispersed across your shards when you index. They will be indexed into the shard that received your indexing requests. You will need to create a new collection where the router is compositeId. Thanks, Shawn
Re: Solr returns incorrect results after sorting
Hi Raj, The group.sort you are using defines multiple criterias. The first criteria is the big solr function starting with the max. This means that inside each group the documents will be sorted by this criteria and if the values are equals between two documents then the comparison fallbacks to the second criteria (inStock_boolean desc) and so on. *Even though if i add price asc in the group.sort, but still the main sort does not consider that.* The main sort does not have to consider what's in the group.sort. The group.sort defines the way the documents are sorted inside each group. So if you want to sort the document inside each group with the same order than in the main sort you can remove the group.sort or you can have a primary sort on pricecommon_double desc in your group.sort: *group.sort=pricecommon_double desc, max(if(exists(query({!v='storeName_string:212'})),2,0),if(exists(query({!v='storeName_string:203'})),1,0)) desc,inStock_boolean desc,geodist() asc* Cheers, Jim 2015-03-18 7:28 GMT+01:00 kumarraj rajitpro2...@gmail.com: Hi Jim, Yes, you are right.. that document is having price 499.99, But i want to consider the first record in the group as part of the main sort. Even though if i add price asc in the group.sort, but still the main sort does not consider that. group.sort=max(if(exists(query({!v='storeName_string:212'})),2,0),if(exists(query({!v='storeName_string:203'})),1,0)) desc,inStock_boolean descgeodist() asc,pricecommon_double ascsort=pricecommon_double desc Is there any other workaround so that sort is always based on the first record which is pulled up in each group? Regards, Raj -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-returns-incorrect-results-after-sorting-tp4193266p4193658.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Whole RAM consumed while Indexing.
When I kept my configuration to 300 for soft commit and 3000 for hard commit and indexed some amount of data, I got the data size of the whole index to be 6GB after completing the indexing. When I changed the configuration to 6 for soft commit and 6 for hard commit and indexed same data then I got the data size of the whole index to be 5GB after completing the indexing. But the number of documents in the both scenario were same. I am wondering how that can be possible? On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6 seconds then it will be more expensive. If I don't encounter with **overlapping searchers warning messages** then I feel it seems to be okay. Is it? On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com wrote: Don't do it. Really, why do you want to do this? This seems like an XY problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml since they won't be useful at all. Best, Erick On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just short of insane unless it's a typo ;). Here's a long background: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But the short form is that you're opening searchers every 300 ms. The hard commit is better, but every 3 seconds is still far too short IMO. I'd start with soft commits of 6 and hard commits of 6 (60 seconds), meaning that you're going to have to wait 1 minute for docs to show up unless you explicitly commit. You're throwing away all the caches configured in solrconfig.xml more than 3 times a second, executing autowarming, etc, etc, etc Changing these to longer intervals might cure the problem, but if not then, as Hoss would say, details matter. I suspect you're also seeing overlapping searchers warning messages in your log, and it;s _possible_ that what's happening is that you're just exceeding the max warming searchers and never opening a new searcher with the newly-indexed documents. But that's a total shot in the dark. How are you looking for docs (and not finding them)? Does the numDocs number in the solr admin screen change? Best, Erick On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Alexandre, *Hard Commit* is : autoCommit maxTime${solr.autoCommit.maxTime:3000}/maxTime openSearcherfalse/openSearcher /autoCommit *Soft Commit* is : autoSoftCommit maxTime${solr.autoSoftCommit.maxTime:300}/maxTime /autoSoftCommit And I am committing 2 documents each time. Is it good config for committing? Or I am good something wrong ? On Fri, Mar 13, 2015 at 8:52 AM,
Re: SolrCloud: data is present on one shard only
Okay shawn thanks I will try as per your suggestion. And will update here. With Regards Aman Tandon On Wed, Mar 18, 2015 at 9:39 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/18/2015 9:47 AM, Aman Tandon wrote: I have 1,20,000 documents. And I am indexing the data in my solrcloud architecture having the two shards. I am having the mindset that some data will be present on both the shards. But when I am looking on data size via admin interface, I am able to see that all the documents is present on only one shard and another shard has zero documents. So I am confused and wants to confirm that, does I am doing something wrong? In your admin UI, click on Cloud and then Tree. You should see a /collections entry in the list. Open that, and then click on the collection you are concerned about. In the right side of that window, there will be a bunch of fields with values. Below that will be a small snippet of JSON text, and one of the bits of info in that JSON will be a field called router ... what is router set to? If it is implicit then your documents will not be automatically dispersed across your shards when you index. They will be indexed into the shard that received your indexing requests. You will need to create a new collection where the router is compositeId. Thanks, Shawn
Re: schema.xml xsd file
There isn't one. The question has ben bandied back and forth several times, but the reaction is that an XSD would be more trouble than it's worth, especially as it would have to handle any customizations that anyone wanted to throw at, say, custom field types. Best, Erick On Wed, Mar 18, 2015 at 7:45 AM, Pedro Figueiredo pjlfigueir...@criticalsoftware.com wrote: Hello, Where can I find the xsd file for the schema.xml file? Thanks in advanced! Best regards, *Pedro Figueiredo* Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 [image: CRITICAL Software] Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY http://cmmiinstitute.com/CMMI® is registered in the USPTO by CMU http://www.cmu.edu/
Re: Whole RAM consumed while Indexing.
Probably merged somewhat differently with some terms indexes repeating between segments. Check the number of segments in data directory.And do search for *:* and make sure both do have the same document counts. Also, In all these discussions, you still haven't answered about how fast after indexing you want to _search_? Because, if you are not actually searching while committing, you could even index on a completely separate server (e.g. a faster one) and swap (or alias) index in afterwards. Unless, of course, I missed it, it's a lot of emails in a very short window of time. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 18 March 2015 at 12:09, Nitin Solanki nitinml...@gmail.com wrote: When I kept my configuration to 300 for soft commit and 3000 for hard commit and indexed some amount of data, I got the data size of the whole index to be 6GB after completing the indexing. When I changed the configuration to 6 for soft commit and 6 for hard commit and indexed same data then I got the data size of the whole index to be 5GB after completing the indexing. But the number of documents in the both scenario were same. I am wondering how that can be possible? On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6 seconds then it will be more expensive. If I don't encounter with **overlapping searchers warning messages** then I feel it seems to be okay. Is it? On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com wrote: Don't do it. Really, why do you want to do this? This seems like an XY problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml since they won't be useful at all. Best, Erick On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just short of insane unless it's a typo ;). Here's a long background: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But the short form is that you're opening searchers every 300 ms. The hard commit is better, but every 3 seconds is still far too short IMO. I'd start with soft commits of 6 and hard commits of 6 (60 seconds), meaning that you're going to have to wait 1 minute for docs to show up unless you explicitly commit. You're throwing away all the caches configured in solrconfig.xml more than 3 times a second, executing autowarming, etc, etc, etc Changing these to longer intervals might cure the problem, but if not then, as Hoss would say, details matter. I suspect you're also seeing overlapping searchers warning messages in your log, and it;s _possible_ that what's happening is that you're just exceeding the max warming searchers and never opening a new searcher with the newly-indexed documents. But that's a total shot in
Re: copy field from boolean to int
I already use this field elsewhere, so I don't want to change it's type. I did implement a UpdateRequestProcessor to copy from a bool to an int. This works, but even better would be to fix Solr so that I can use DocValues with boolean. So, I am going to try to get that working as well. On Tue, Mar 17, 2015 at 10:25 PM, William Bell billnb...@gmail.com wrote: Can you reindex? Just use 1,0. On Tue, Mar 17, 2015 at 6:08 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Can you open a jira to add docValues support for BoolField? ... i can't think of any good reason not to directly support that in Solr for BoolField ... seems like just an oversight that slipped through the cracks. For now, your best bet is probably to use an UpdateProcessor ... maybe 2 instances of RegexReplaceProcessorFactory to match true and false and replace them with 0 and 1 ? : Date: Tue, 17 Mar 2015 17:57:03 -0700 : From: Kevin Osborn kosb...@centraldesktop.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: copy field from boolean to int : : I was hoping to use DocValues, but one of my fields is a boolean, which is : not currently supported by DocValues. I can use a copyField to convert my : boolean to a string. Is there is anyway to use a copyField to convert from : a boolean to a tint? -Hoss http://www.lucidworks.com/ -- Bill Bell billnb...@gmail.com cell 720-256-8076
RE: schema.xml xsd file
:( ok, thank you. Pedro Figueiredo Senior Engineer -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 18 March 2015 15:28 To: solr-user@lucene.apache.org Subject: Re: schema.xml xsd file There isn't one. The question has ben bandied back and forth several times, but the reaction is that an XSD would be more trouble than it's worth, especially as it would have to handle any customizations that anyone wanted to throw at, say, custom field types. Best, Erick On Wed, Mar 18, 2015 at 7:45 AM, Pedro Figueiredo pjlfigueir...@criticalsoftware.com wrote: Hello, Where can I find the xsd file for the schema.xml file? Thanks in advanced! Best regards, *Pedro Figueiredo* Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 [image: CRITICAL Software] Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927 | F. +351 229 446 929 www.criticalsoftware.com PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY http://cmmiinstitute.com/CMMI® is registered in the USPTO by CMU http://www.cmu.edu/
Re: Which one is it cs or cz for Czech language?
: Probably a historical artifact. Yeah, probably. fixing the solr example configs would be fairly trivial -- the names are just symbolic strings -- but currently they are all consistent with the lucene packagine names, which would me a more complex cange from a back compat standpoint -- i've opened some linked issues, hopefully someone who is more of an expert on the naming conventions of these packages can chime in and we can clean this up... https://issues.apache.org/jira/browse/SOLR-7267 https://issues.apache.org/jira/browse/LUCENE-6366 -Hoss http://www.lucidworks.com/
Unable to index rich-text documents in Solr Cloud
Hi everyone, I'm having some issues with indexing rich-text documents from the Solr Cloud. When I tried to index a pdf or word document, I get the following error: org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) I'm able to index .xml and .csv files in Solr Cloud with the same configuration. I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and I have 2 shards with the following details: Shard1: 192.168.2.2:8983 Shard2: 192.168.2.2:8984 Prior to this, I'm already able to index rich-text documents without the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml, so my ExtractRequestHandler is already defined. Is there other settings required in order to index rich-text documents in Solr Cloud? Regards, Edwin
Re: Add replica on shards
Any help please... On Wed, Mar 18, 2015 at 12:02 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I have created 8 shards on a collection named as ***wikingram**. Now at that time, I were not created any replica. Now, I want to add a replica on each shard. How can I do? I created this - ** sudo curl http://localhost:8983/solr/admin/collections?action=ADDREPLICAcollection=wikingramshard=shard1node=localhost:8983_solr** but it is not working. It throws errror - response lst name=responseHeader int name=status400/int int name=QTime86/int /lst str name=Operation ADDREPLICA caused exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not find collection : null/str lst name=exception str name=msgCould not find collection : null/str int name=rspCode400/int /lst lst name=error str name=msgCould not find collection : null/str int name=code400/int /lst /response Any help on this?
RE: Which one is it cs or cz for Czech language?
FYI:http://www.w3schools.com/tags/ref_country_codes.aspCZECH REPUBLICCZNo entry for CS From: md...@apache.org Date: Tue, 17 Mar 2015 12:45:57 -0500 Subject: Re: Which one is it cs or cz for Czech language? To: solr-user@lucene.apache.org Probably a historical artifact. cz is the country code for the Czech Republic, cs is the language code for Czech. Once, cs was also the country code for Czechosolvakia, leading some folks to accidentally conflate the two. On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru enygma2...@gmail.com wrote: Hi, First of all, a bit of a disclaimer: I am not a Czech language speaker, at all. We are using Solr's dynamic fields in our project (XWiki), and we have recently noticed a problem [1] with the Czech language. Basically, our mapping says something like this: dynamicField name=*_cz type=text_cz indexed=true stored=true multiValued=true / ...but at runtime, we ask for the language code cs (which is the ISO language code for Czech [2]) and it obviously fails (due to the mapping). Now, we can easily fix this on our end by fixing the mapping to name=*_cs, but what we are really wondering now is why does Lucene/Solr use cz (country code) instead of cs (language code) in both its text_cz field and its stopwords_cz.txt file? Is that a mistake on the Solr/Lucene side? Is it some kind of convention? Is it going to be fixed? Thanks, Eduard -- [1] http://jira.xwiki.org/browse/XWIKI-11897 [2] http://en.wikipedia.org/wiki/Czech_language
Re: schema.xml xsd file
On 3/18/2015 8:45 AM, Pedro Figueiredo wrote: Where can I find the xsd file for the schema.xml file? As Erick said, current XSD files do not exist. There are some (now probably outdated) XSD files in a patch on this issue: https://issues.apache.org/jira/browse/SOLR-1758 Thanks, Shawn
RE: Distributed IDF performance
Anshum, Jack - don't any of you have a cluster at hand to get some real results on this? After testing the actual functionality for a quite some time while the final patch was in development, we have not had the change to work on performance tests. We are still on Solr 4.10 and have to port lots of Lucene stuff to 5. I would sure like to see some numbers from any of you :) Markus -Original message- From:Anshum Gupta ans...@anshumgupta.net Sent: Friday 13th March 2015 23:33 To: solr-user@lucene.apache.org Subject: Re: Distributed IDF performance np! I forgot to mention that I didn't notice any considerable performance hit in my tests. The QTimes were barely off by 5%. On Fri, Mar 13, 2015 at 3:13 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Oops... I said StatsInfo and that should have been StatsCache (statsCache .../). -- Jack Krupansky On Fri, Mar 13, 2015 at 6:04 PM, Anshum Gupta ans...@anshumgupta.net wrote: There's no rough formula or performance data that I know of at this point. About he guidance, if you want to use Global stats, my obvious choice would be to use the LRUStatsCache. Before committing, I did run some tests on my macbook but as I said back then, they shouldn't be totally taken at face value. The tests didn't involve any network and were just about 20mn docs and synthetic queries. On Fri, Mar 13, 2015 at 2:08 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Does anybody have any actual performance data or even a rough formula for calculating the overhead for using the new Solr 5.0 Distributed IDF ( SOLR-1632 https://issues.apache.org/jira/browse/SOLR-1632)? And any guidance as far as which StatsInfo plugin is best to use? Are many people now using Distributed IDF as their default? I'm not currently using this, but the existing doc and Jira is too minimal to offer guidance as requested above. Mostly I'm just curious. Thanks. -- Jack Krupansky -- Anshum Gupta -- Anshum Gupta
Re: High memory usage while querying with sort using cursor
Thanks Chris, that makes a lot of sense. On Wed, Mar 18, 2015 at 3:16 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : A simple query on the collection: ../select?q=*:* works perfectly fine. : : But as soon as i add sorting, it crashes the nodes with OOM: : .../select?q=*:*sort=unique_id ascrows=0. if you don't have docValues=true on your unique_id field, then sorting rquires it to build up a large in memory data strucutre (formally known as FieldCache, now just an on the fly DocValues structure) With explicit docValues constructed at index time, a lot of that data can just live in the operating system's filesystem cache, and lucene only has to load a small potion of it into the heap. -Hoss http://www.lucidworks.com/
Re: CloudSolrServer : Could not find collection : gettingstarted
Does the Solr admin UIcloud view show the gettingstarted collection? The graph view might help. It _sounds_ like somehow you didn't actually create the collection. What steps did you follow to create the collection in SolrCloud? It's possible you have the wrong ZK root somehow I suppose. Best, Erick On Wed, Mar 18, 2015 at 12:32 PM, Adnan Yaqoob itsad...@gmail.com wrote: I'm getting following exception while trying to upload document on SolrCloud using CloudSolrServer. Exception in thread main org.apache.solr.common.SolrException: *Could not find collection :* gettingstarted at org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:162) at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:305) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) at Test.addDocumentSolrCloud(Test.java:265) at Test.main(Test.java:284) I can query through Solr admin, able to upload document using HttpSolrServer (single instance - non cloud mode) but CloudSolrServer. I've also verified the collection exists on zookeeper using zkCli command. Following is the code snippet CloudSolrServer server = new CloudSolrServer(localhost:2181); server.setDefaultCollection(gettingstarted); SolrInputDocument doc = new SolrInputDocument(); doc.addField(id, id); doc.addField(name, name); server.add(doc); server.commit(); Not sure what I'm missing. My Zookeeper is running externally with two solr nodes on same mac -- Regards, *Adnan Yaqoob*
Re: Unable to index rich-text documents in Solr Cloud
Hi Erick, No, the PDF file is a testing file which only contains 1 sentence. I've managed to get it to work by removing startup=lazy in the ExtractingRequestHandler and added the following lines: str name=uprefixignored_/str str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str Does the presence of startup=lazy affect the function of ExtractingRequestHandler , or is it one of the str name values? Regards, Edwin On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com wrote: Shot in the dark, but is the PDF file significantly larger than the others? Perhaps your simply exceeding the packet limits for the servlet container? Best, Erick On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi everyone, I'm having some issues with indexing rich-text documents from the Solr Cloud. When I tried to index a pdf or word document, I get the following error: org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) I'm able to index .xml and .csv files in Solr Cloud with the same configuration. I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and I have 2 shards with the following details: Shard1: 192.168.2.2:8983 Shard2: 192.168.2.2:8984 Prior to this, I'm already able to index rich-text documents without the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml, so my ExtractRequestHandler is already defined. Is there other settings required in order to index rich-text documents in Solr Cloud? Regards, Edwin
Re: Unable to index rich-text documents in Solr Cloud
This is the logs that I got from solr.log. I can't seems to figure out what's wrong with it. Does anyone knows? ERROR - 2015-03-18 15:06:51.019; org.apache.solr.update.StreamingSolrClients$1; error org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO - 2015-03-18 15:06:51.019; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update/extract params={literal.id =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdfresource.name=C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf} {add=[C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf]} 0 1252 INFO - 2015-03-18 15:06:51.029; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} INFO - 2015-03-18 15:06:51.029; org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit. INFO - 2015-03-18 15:06:51.029; org.apache.solr.core.SolrCore; SolrIndexSearcher has not changed - not re-opening: org.apache.solr.search.SolrIndexSearcher INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update params={waitSearcher=truedistrib.from= http://192.168.2.2:8983/solr/logmill/update.distrib=FROMLEADERopenSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false} {commit=} 0 10 INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update params={commit=true} {commit=} 0 10 Regards, Edwin On 19 March 2015 at 10:56, Damien Kamerman dami...@gmail.com wrote: I suggest you check your solr logs for more info as to the cause. On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi Erick, No, the PDF file is a testing file which only contains 1 sentence. I've managed to get it to work by removing startup=lazy in the ExtractingRequestHandler and added the following lines: str name=uprefixignored_/str str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str Does the presence of startup=lazy affect the function of ExtractingRequestHandler , or is it one of the str name values? Regards, Edwin On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com wrote: Shot in the dark, but is the PDF file significantly larger than the others? Perhaps your simply exceeding the packet limits for the servlet container? Best, Erick On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi everyone, I'm having some issues with indexing rich-text documents from the Solr Cloud. When I tried to index a pdf or word document, I get the following error: org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) I'm able to index .xml and .csv files in Solr Cloud with the same configuration. I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and I have 2 shards with the following details: Shard1: 192.168.2.2:8983 Shard2: 192.168.2.2:8984 Prior to this, I'm already able to index rich-text documents without the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml, so my ExtractRequestHandler is already defined. Is there other settings required in order to index rich-text documents in Solr Cloud? Regards, Edwin -- Damien Kamerman
Re: not able to import Data through DIH solr 4.2.1
Alex thanks for replying my solrconfig : lib dir=../../../example/lib/ regex=mysql-connector-java-.*\.jar / lib dir=../../../dist/ regex=solr-dataimporthandler-.*\.jar / ## requestHandler name=/dataimport class= org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config-new.xml/str /lst /requestHandler On Thu, Mar 19, 2015 at 10:26 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Could not load driver: com.mysql.jdbc.Driver Looks like a custom driver. Is the driver name correct? Is the library declared in solrconfig.xml? Is the library path correct (use absolute path if in doubt). Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 19 March 2015 at 00:35, abhishek tiwari test.mi...@gmail.com wrote: Please provide the basic steps to resolve the issue Getting following error Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not load driver: com.mysql.jdbc.Driver Processing Document # 1
Re: Unable to index rich-text documents in Solr Cloud
I suggest you check your solr logs for more info as to the cause. On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi Erick, No, the PDF file is a testing file which only contains 1 sentence. I've managed to get it to work by removing startup=lazy in the ExtractingRequestHandler and added the following lines: str name=uprefixignored_/str str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str Does the presence of startup=lazy affect the function of ExtractingRequestHandler , or is it one of the str name values? Regards, Edwin On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com wrote: Shot in the dark, but is the PDF file significantly larger than the others? Perhaps your simply exceeding the packet limits for the servlet container? Best, Erick On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi everyone, I'm having some issues with indexing rich-text documents from the Solr Cloud. When I tried to index a pdf or word document, I get the following error: org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) I'm able to index .xml and .csv files in Solr Cloud with the same configuration. I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and I have 2 shards with the following details: Shard1: 192.168.2.2:8983 Shard2: 192.168.2.2:8984 Prior to this, I'm already able to index rich-text documents without the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml, so my ExtractRequestHandler is already defined. Is there other settings required in order to index rich-text documents in Solr Cloud? Regards, Edwin -- Damien Kamerman
not able to import Data through DIH solr 4.2.1
Please provide the basic steps to resolve the issue Getting following error Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not load driver: com.mysql.jdbc.Driver Processing Document # 1
Re: Unable to index rich-text documents in Solr Cloud
The http://192.168.2.2:8984/solr/ http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 , the port number 8984 may be an HTTPS. The HTTP port should be 8983. Hope this help. -- Best Regards, Charlee Chitsuk === Application Security Product Group *Summit Computer Co., Ltd.* http://www.summitthai.com/ E-Mail: char...@summitthai.com Tel: +66-2-238-0895 to 9 ext. 164 Fax: +66-2-236-7392 === *@ Your Success is Our Pride* -- 2015-03-19 11:49 GMT+07:00 Damien Kamerman dami...@gmail.com: It sounds like https://issues.apache.org/jira/browse/SOLR-5551 Have you checked the solr.log for all nodes? On 19 March 2015 at 14:43, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: This is the logs that I got from solr.log. I can't seems to figure out what's wrong with it. Does anyone knows? ERROR - 2015-03-18 15:06:51.019; org.apache.solr.update.StreamingSolrClients$1; error org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO - 2015-03-18 15:06:51.019; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update/extract params={literal.id =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf resource.name =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf} {add=[C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf]} 0 1252 INFO - 2015-03-18 15:06:51.029; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} INFO - 2015-03-18 15:06:51.029; org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit. INFO - 2015-03-18 15:06:51.029; org.apache.solr.core.SolrCore; SolrIndexSearcher has not changed - not re-opening: org.apache.solr.search.SolrIndexSearcher INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update params={waitSearcher=truedistrib.from= http://192.168.2.2:8983/solr/logmill/update.distrib=FROMLEADERopenSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false } {commit=} 0 10 INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update params={commit=true} {commit=} 0 10 Regards, Edwin On 19 March 2015 at 10:56, Damien Kamerman dami...@gmail.com wrote: I suggest you check your solr logs for more info as to the cause. On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi Erick, No, the PDF file is a testing file which only contains 1 sentence. I've managed to get it to work by removing startup=lazy in the ExtractingRequestHandler and added the following lines: str name=uprefixignored_/str str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str Does the presence of startup=lazy affect the function of ExtractingRequestHandler , or is it one of the str name values? Regards, Edwin On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com wrote: Shot in the dark, but is the PDF file significantly larger than the others? Perhaps your simply exceeding the packet limits for the servlet container? Best, Erick On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi everyone, I'm having some issues with indexing rich-text documents from the Solr Cloud. When I tried to index a pdf or word document, I get the following error: org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at
Re: Unable to index rich-text documents in Solr Cloud
On 3/18/2015 1:22 AM, Zheng Lin Edwin Yeo wrote: I'm having some issues with indexing rich-text documents from the Solr Cloud. When I tried to index a pdf or word document, I get the following error: org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 This request appears to be one of the requests that SolrCloud makes between its different nodes, but it is using the /update handler. I assume that when you sent the request, you sent it to the /update/extract handler because it's a rich text document? The /update handler can't do rich text documents, it's only for documents in json, xml, csv, javabin, etc that are formatted in specific ways. One thing I'm wondering is whether the Extracting handler requires a shards.qt parameter, also set to /update/extract, to work right with SolrCloud. I have never used that handler myself, so I've got no idea what is required to make it work right. Thanks, Shawn
Re: not able to import Data through DIH solr 4.2.1
lib dir=/home/shopclues/solr-4.2.1/example/lib/ regex=mysql-connector-java-5.1.22-bin.jar / lib dir=/home/shopclues/solr-4.2.1/dist/ regex=solr-dataimporthandler-.*\.jar / but still not working On Thu, Mar 19, 2015 at 10:41 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Try absolute path to the jar directory. Hard to tell whether relative path is correct without knowing exactly how you are running it. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 19 March 2015 at 01:00, abhishek tiwari test.mi...@gmail.com wrote: Alex thanks for replying my solrconfig : lib dir=../../../example/lib/ regex=mysql-connector-java-.*\.jar / lib dir=../../../dist/ regex=solr-dataimporthandler-.*\.jar / ## requestHandler name=/dataimport class= org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config-new.xml/str /lst /requestHandler On Thu, Mar 19, 2015 at 10:26 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Could not load driver: com.mysql.jdbc.Driver Looks like a custom driver. Is the driver name correct? Is the library declared in solrconfig.xml? Is the library path correct (use absolute path if in doubt). Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 19 March 2015 at 00:35, abhishek tiwari test.mi...@gmail.com wrote: Please provide the basic steps to resolve the issue Getting following error Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not load driver: com.mysql.jdbc.Driver Processing Document # 1
Re: Unable to index rich-text documents in Solr Cloud
This is the logs that I got from solr.log. I can't seems to figure out what's wrong with it. Does anyone knows? ERROR - 2015-03-18 15:06:51.019; org.apache.solr.update.StreamingSolrClients$1; error org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO - 2015-03-18 15:06:51.019; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update/extract params={literal.id =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdfresource.name=C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf} {add=[C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf]} 0 1252 INFO - 2015-03-18 15:06:51.029; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} INFO - 2015-03-18 15:06:51.029; org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit. INFO - 2015-03-18 15:06:51.029; org.apache.solr.core.SolrCore; SolrIndexSearcher has not changed - not re-opening: org.apache.solr.search.SolrIndexSearcher INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update params={waitSearcher=truedistrib.from= http://192.168.2.2:8983/solr/logmill/update.distrib=FROMLEADERopenSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false} {commit=} 0 10 INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update params={commit=true} {commit=} 0 10 Regards, Edwin On 19 March 2015 at 10:56, Damien Kamerman dami...@gmail.com wrote: I suggest you check your solr logs for more info as to the cause. On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi Erick, No, the PDF file is a testing file which only contains 1 sentence. I've managed to get it to work by removing startup=lazy in the ExtractingRequestHandler and added the following lines: str name=uprefixignored_/str str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str Does the presence of startup=lazy affect the function of ExtractingRequestHandler , or is it one of the str name values? Regards, Edwin On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com wrote: Shot in the dark, but is the PDF file significantly larger than the others? Perhaps your simply exceeding the packet limits for the servlet container? Best, Erick On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi everyone, I'm having some issues with indexing rich-text documents from the Solr Cloud. When I tried to index a pdf or word document, I get the following error: org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) I'm able to index .xml and .csv files in Solr Cloud with the same configuration. I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and I have 2 shards with the following details: Shard1: 192.168.2.2:8983 Shard2: 192.168.2.2:8984 Prior to this, I'm already able to index rich-text documents without the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml, so my ExtractRequestHandler is already defined. Is there other settings required in order to index rich-text documents in Solr Cloud? Regards, Edwin -- Damien Kamerman
Re: Unable to index rich-text documents in Solr Cloud
It sounds like https://issues.apache.org/jira/browse/SOLR-5551 Have you checked the solr.log for all nodes? On 19 March 2015 at 14:43, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: This is the logs that I got from solr.log. I can't seems to figure out what's wrong with it. Does anyone knows? ERROR - 2015-03-18 15:06:51.019; org.apache.solr.update.StreamingSolrClients$1; error org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.23.72%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO - 2015-03-18 15:06:51.019; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update/extract params={literal.id =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdfresource.name =C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf} {add=[C:\Users\edwin\solr-5.0.0\example\exampledocs\solr-word.pdf]} 0 1252 INFO - 2015-03-18 15:06:51.029; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} INFO - 2015-03-18 15:06:51.029; org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit. INFO - 2015-03-18 15:06:51.029; org.apache.solr.core.SolrCore; SolrIndexSearcher has not changed - not re-opening: org.apache.solr.search.SolrIndexSearcher INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update params={waitSearcher=truedistrib.from= http://192.168.2.2:8983/solr/logmill/update.distrib=FROMLEADERopenSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false } {commit=} 0 10 INFO - 2015-03-18 15:06:51.039; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update params={commit=true} {commit=} 0 10 Regards, Edwin On 19 March 2015 at 10:56, Damien Kamerman dami...@gmail.com wrote: I suggest you check your solr logs for more info as to the cause. On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi Erick, No, the PDF file is a testing file which only contains 1 sentence. I've managed to get it to work by removing startup=lazy in the ExtractingRequestHandler and added the following lines: str name=uprefixignored_/str str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str Does the presence of startup=lazy affect the function of ExtractingRequestHandler , or is it one of the str name values? Regards, Edwin On 18 March 2015 at 23:19, Erick Erickson erickerick...@gmail.com wrote: Shot in the dark, but is the PDF file significantly larger than the others? Perhaps your simply exceeding the packet limits for the servlet container? Best, Erick On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi everyone, I'm having some issues with indexing rich-text documents from the Solr Cloud. When I tried to index a pdf or word document, I get the following error: org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.2.2%3A8983%2Fsolr%2Flogmill%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) I'm able to index .xml and .csv files in Solr Cloud with the same configuration. I have setup Solr Cloud using the default zookeeper in Solr 5.0.0, and I have 2 shards with the following details: Shard1: 192.168.2.2:8983 Shard2: 192.168.2.2:8984 Prior to this, I'm already able to index rich-text documents without the Solr Cloud, and I'm using the same solrconfig.xml and schema.xml, so my ExtractRequestHandler is already defined. Is there other
Re: not able to import Data through DIH solr 4.2.1
On 3/18/2015 11:00 PM, abhishek tiwari wrote: my solrconfig : lib dir=../../../example/lib/ regex=mysql-connector-java-.*\.jar / lib dir=../../../dist/ regex=solr-dataimporthandler-.*\.jar / The way that I always recommend dealing with extra jars: In your solr home, create a lib directory. Copy all the extra jars that you need into this directory, including the DIH jar and your jdbc driver jar. Remove all lib config elements from solrconfig.xml. In Solr 4.2, you will also need to make sure that your solr.xml has a sharedLib attribute on the solr tag, set to lib. On 4.3 and later, this step is not required ... it will actually cause the jars to NOT work. See the comment on this issue dated 12/Nov/13: https://issues.apache.org/jira/browse/SOLR-4852?focusedCommentId=13820197page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13820197 Thanks, Shawn
Re: not able to import Data through DIH solr 4.2.1
Could not load driver: com.mysql.jdbc.Driver Looks like a custom driver. Is the driver name correct? Is the library declared in solrconfig.xml? Is the library path correct (use absolute path if in doubt). Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 19 March 2015 at 00:35, abhishek tiwari test.mi...@gmail.com wrote: Please provide the basic steps to resolve the issue Getting following error Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not load driver: com.mysql.jdbc.Driver Processing Document # 1
Re: not able to import Data through DIH solr 4.2.1
Try absolute path to the jar directory. Hard to tell whether relative path is correct without knowing exactly how you are running it. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 19 March 2015 at 01:00, abhishek tiwari test.mi...@gmail.com wrote: Alex thanks for replying my solrconfig : lib dir=../../../example/lib/ regex=mysql-connector-java-.*\.jar / lib dir=../../../dist/ regex=solr-dataimporthandler-.*\.jar / ## requestHandler name=/dataimport class= org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config-new.xml/str /lst /requestHandler On Thu, Mar 19, 2015 at 10:26 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Could not load driver: com.mysql.jdbc.Driver Looks like a custom driver. Is the driver name correct? Is the library declared in solrconfig.xml? Is the library path correct (use absolute path if in doubt). Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 19 March 2015 at 00:35, abhishek tiwari test.mi...@gmail.com wrote: Please provide the basic steps to resolve the issue Getting following error Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not load driver: com.mysql.jdbc.Driver Processing Document # 1
Can we used CloudSolrServer for searching data
I am using Solrcloud with zookeeper setup. but when I try to make query using following code snippet I get exception code: CloudSolrServer server = new CloudSolrServer(localhost:2181); server.setDefaultCollection(gettingstarted); server.connect(); SolrQuery query = new SolrQuery(); query.setQuery(q); QueryResponse rsp; rsp = server.query(query); Exception: Exception in thread main org.apache.solr.common.SolrException: Collection not found: gettingstarted at org.apache.solr.client.solrj.impl.CloudSolrServer.getCollectionList(CloudSolrServer.java:679) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:562) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at Test.testSelectQueryCloudServer(Test.java:243) at Test.main(Test.java:357) I’ve verified the collection exists on zookeeper using zkCli and I can query using solr admin Sent from Windows Mail
CloudSolrServer : Could not find collection : gettingstarted
I'm getting following exception while trying to upload document on SolrCloud using CloudSolrServer. Exception in thread main org.apache.solr.common.SolrException: *Could not find collection :* gettingstarted at org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:162) at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:305) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) at Test.addDocumentSolrCloud(Test.java:265) at Test.main(Test.java:284) I can query through Solr admin, able to upload document using HttpSolrServer (single instance - non cloud mode) but CloudSolrServer. I've also verified the collection exists on zookeeper using zkCli command. Following is the code snippet CloudSolrServer server = new CloudSolrServer(localhost:2181); server.setDefaultCollection(gettingstarted); SolrInputDocument doc = new SolrInputDocument(); doc.addField(id, id); doc.addField(name, name); server.add(doc); server.commit(); Not sure what I'm missing. My Zookeeper is running externally with two solr nodes on same mac -- Regards, *Adnan Yaqoob*
Re: High memory usage while querying with sort using cursor
: A simple query on the collection: ../select?q=*:* works perfectly fine. : : But as soon as i add sorting, it crashes the nodes with OOM: : .../select?q=*:*sort=unique_id ascrows=0. if you don't have docValues=true on your unique_id field, then sorting rquires it to build up a large in memory data strucutre (formally known as FieldCache, now just an on the fly DocValues structure) With explicit docValues constructed at index time, a lot of that data can just live in the operating system's filesystem cache, and lucene only has to load a small potion of it into the heap. -Hoss http://www.lucidworks.com/
Re: Whole RAM consumed while Indexing.
bq: As you said, do commits after 6 seconds No, No, No. I'm NOT saying 6 seconds! That time is in _milliseconds_ as Shawn said. So setting it to 6 is every minute. From solrconfig.xml, conveniently located immediately above the autoCommit tag: maxTime - Maximum amount of time in ms that is allowed to pass since a document was added before automatically triggering a new commit. Also, a lot of answers to soft and hard commits is here as I pointed out before, did you read it? https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best Erick On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Probably merged somewhat differently with some terms indexes repeating between segments. Check the number of segments in data directory.And do search for *:* and make sure both do have the same document counts. Also, In all these discussions, you still haven't answered about how fast after indexing you want to _search_? Because, if you are not actually searching while committing, you could even index on a completely separate server (e.g. a faster one) and swap (or alias) index in afterwards. Unless, of course, I missed it, it's a lot of emails in a very short window of time. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 18 March 2015 at 12:09, Nitin Solanki nitinml...@gmail.com wrote: When I kept my configuration to 300 for soft commit and 3000 for hard commit and indexed some amount of data, I got the data size of the whole index to be 6GB after completing the indexing. When I changed the configuration to 6 for soft commit and 6 for hard commit and indexed same data then I got the data size of the whole index to be 5GB after completing the indexing. But the number of documents in the both scenario were same. I am wondering how that can be possible? On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6 seconds then it will be more expensive. If I don't encounter with **overlapping searchers warning messages** then I feel it seems to be okay. Is it? On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson erickerick...@gmail.com wrote: Don't do it. Really, why do you want to do this? This seems like an XY problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml since they won't be useful at all. Best, Erick On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki nitinml...@gmail.com wrote: Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson erickerick...@gmail.com wrote: First start by lengthening your soft and hard commit intervals substantially. Start with 6 and work backwards I'd say. Ramkumar has tuned the heck out of his installation to get the commit intervals to be that short ;). I'm betting that you'll see your RAM usage go way down, but that' s a guess until you test. Best, Erick On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You are saying correct. Something, **overlapping searchers warning messages** are coming in logs. **numDocs numbers** are changing when documents are adding at the time of indexing. Any help? On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson erickerick...@gmail.com wrote: First, the soft commit interval is very short. Very, very, very, very short. 300ms is just short of insane unless it's a typo ;). Here's a long background: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ But the short form is that you're opening searchers every 300 ms. The hard commit is better, but every 3 seconds is still far too short IMO. I'd start with soft commits of 6 and hard commits of 6 (60
High memory usage while querying with sort using cursor
Hi all, My setup is as follows: *Collection* size: 32GB, 2 shards, replication factor: 2 (~16GB on each replica). Number of rows: 250million 4 *Solr* nodes: RAM: 30GB each. Heap size: 8GB. Version: 4.9.1 Besides the collection in question, the nodes have some other collections present. The total size of all collections of each node is 30GB (which is the same as the amount of RAM on them). A simple query on the collection: ../select?q=*:* works perfectly fine. But as soon as i add sorting, it crashes the nodes with OOM: .../select?q=*:*sort=unique_id ascrows=0. I have tried to disable filter-cache and query-result-cache. But that did not help either. Any ideas/suggestions? Thanks, Vaibhav
Re: SolrCloud: data is present on one shard only
please help.. With Regards Aman Tandon On Tue, Mar 17, 2015 at 3:24 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I indexed the data in my SolrCoud architecture (2 shards present on 2 separate instance on one instance I have the replica of both the shards which is present on other 2 instance). And when I am looking at the index via admin interface, it is present on a single instance. Isn't the data should be present on both the shards. Am I doing something wrong? With Regards Aman Tandon
Re: Solr returns incorrect results after sorting
Hi Jim, Yes, you are right.. that document is having price 499.99, But i want to consider the first record in the group as part of the main sort. Even though if i add price asc in the group.sort, but still the main sort does not consider that. group.sort=max(if(exists(query({!v='storeName_string:212'})),2,0),if(exists(query({!v='storeName_string:203'})),1,0)) desc,inStock_boolean descgeodist() asc,pricecommon_double ascsort=pricecommon_double desc Is there any other workaround so that sort is always based on the first record which is pulled up in each group? Regards, Raj -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-returns-incorrect-results-after-sorting-tp4193266p4193658.html Sent from the Solr - User mailing list archive at Nabble.com.
Add replica on shards
Hi, I have created 8 shards on a collection named as ***wikingram**. Now at that time, I were not created any replica. Now, I want to add a replica on each shard. How can I do? I created this - ** sudo curl http://localhost:8983/solr/admin/collections?action=ADDREPLICAcollection=wikingramshard=shard1node=localhost:8983_solr** but it is not working. It throws errror - response lst name=responseHeader int name=status400/int int name=QTime86/int /lst str name=Operation ADDREPLICA caused exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not find collection : null/str lst name=exception str name=msgCould not find collection : null/str int name=rspCode400/int /lst lst name=error str name=msgCould not find collection : null/str int name=code400/int /lst /response Any help on this?