lost in solr new core architecture
HI, Currently i am using solr 4.2 with tomcat right now i am stucked because i don't know how to upgrade to solr 4.7, because the problem for me is that i am familiar with the cores architecture of solr 4.2 in which we defined the every core name as well as instanceDir but not with solr 4.7. Any help will be appreciated, thanks With Regards Aman Tandon
Re: deleting large amount data from solr cloud
Vinay please share your experience after trying this solution. On Sat, Apr 12, 2014 at 4:12 AM, Vinay Pothnis wrote: > The query is something like this: > > > *curl -H 'Content-Type: text/xml' --data 'param1:(val1 OR > val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO > 138516480]' > 'http://host:port/solr/coll-name1/update?commit=true'* > > Trying to restrict the number of documents deleted via the date parameter. > > Had not tried the "distrib=false" option. I could give that a try. Thanks > for the link! I will check on the cache sizes and autowarm values. Will try > and disable the caches when I am deleting and give that a try. > > Thanks Erick and Shawn for your inputs! > > -Vinay > > > > On 11 April 2014 15:28, Shawn Heisey wrote: > > > On 4/10/2014 7:25 PM, Vinay Pothnis wrote: > > > >> When we tried to delete the data through a query - say 1 day/month's > worth > >> of data. But after deleting just 1 month's worth of data, the master > node > >> is going out of memory - heap space. > >> > >> Wondering is there any way to incrementally delete the data without > >> affecting the cluster adversely. > >> > > > > I'm curious about the actual query being used here. Can you share it, or > > a redacted version of it? Perhaps there might be a clue there? > > > > Is this a fully distributed delete request? One thing you might try, > > assuming Solr even supports it, is sending the same delete request > directly > > to each shard core with distrib=false. > > > > Here's a very incomplete list about how you can reduce Solr heap > > requirements: > > > > http://wiki.apache.org/solr/SolrPerformanceProblems# > > Reducing_heap_requirements > > > > Thanks, > > Shawn > > > > > -- With Regards Aman Tandon
Re: [ANN] Solr learning resources on safariflow.com (w/subscription or free trial)
Looks nice. Would love to see the author-side usage/statistics too. To know which chapters of my book were most useful/recommended. Regards, Alex On 11/04/2014 8:45 pm, "Michael Sokolov" wrote: > I just wanted to let people know about some recent Solr books and videos > that are now available at safariflow.com. You can sign up for a free > trial and get instant access, buy a subscription, or you may already be a > subscriber. I don't normally send out announcements like this, but because > we just got an influx of new material, I thought people might be interested. > > Solr in Action (March 2014) > http://www.safariflow.com/library/view/Solr-in-Action/9781617291029/ > > Einführung in Apache Solr (March 2014) > http://www.safariflow.com/library/view/Einf%25C3%25BChrung-in-Apache-Solr/ > 9783955614249/ > > Apache Solr High Performance (March 2014) > http://www.safariflow.com/library/view/Apache-Solr-High- > Performance/9781782164821/ > > Getting Started with Apache Solr Search Server (June 2013 video course): > http://www.safariflow.com/library/view/Getting-started- > with-Apache-Solr-Search-Server-%255BVideo%255D/9781782160847/ > > > > In addition these are some other Solr and Lucene titles we have had for a > little while: > > > > http://www.safariflow.com/library/view/Instant-Apache- > Solr-for-Indexing-Data-How-to/9781782164845/ > > http://www.safariflow.com/library/view/Apache-Solr-3- > Enterprise-Search-Server/9781849516068/ > > http://www.safariflow.com/library/view/Lucene-in-Action% > 252C-Second-Edition/9781933988177/ > > >
Re: deleting large amount data from solr cloud
The query is something like this: *curl -H 'Content-Type: text/xml' --data 'param1:(val1 OR val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO 138516480]' 'http://host:port/solr/coll-name1/update?commit=true'* Trying to restrict the number of documents deleted via the date parameter. Had not tried the "distrib=false" option. I could give that a try. Thanks for the link! I will check on the cache sizes and autowarm values. Will try and disable the caches when I am deleting and give that a try. Thanks Erick and Shawn for your inputs! -Vinay On 11 April 2014 15:28, Shawn Heisey wrote: > On 4/10/2014 7:25 PM, Vinay Pothnis wrote: > >> When we tried to delete the data through a query - say 1 day/month's worth >> of data. But after deleting just 1 month's worth of data, the master node >> is going out of memory - heap space. >> >> Wondering is there any way to incrementally delete the data without >> affecting the cluster adversely. >> > > I'm curious about the actual query being used here. Can you share it, or > a redacted version of it? Perhaps there might be a clue there? > > Is this a fully distributed delete request? One thing you might try, > assuming Solr even supports it, is sending the same delete request directly > to each shard core with distrib=false. > > Here's a very incomplete list about how you can reduce Solr heap > requirements: > > http://wiki.apache.org/solr/SolrPerformanceProblems# > Reducing_heap_requirements > > Thanks, > Shawn > >
Re: deleting large amount data from solr cloud
On 4/10/2014 7:25 PM, Vinay Pothnis wrote: When we tried to delete the data through a query - say 1 day/month's worth of data. But after deleting just 1 month's worth of data, the master node is going out of memory - heap space. Wondering is there any way to incrementally delete the data without affecting the cluster adversely. I'm curious about the actual query being used here. Can you share it, or a redacted version of it? Perhaps there might be a clue there? Is this a fully distributed delete request? One thing you might try, assuming Solr even supports it, is sending the same delete request directly to each shard core with distrib=false. Here's a very incomplete list about how you can reduce Solr heap requirements: http://wiki.apache.org/solr/SolrPerformanceProblems#Reducing_heap_requirements Thanks, Shawn
Re: deleting large amount data from solr cloud
Tried to increase the memory to 24G but that wasn't enough as well. Agree that the index has now grown too much and had to monitor this and take action much earlier. The search operations seem to run ok with 16G - mainly because the bulk of the data that we are trying to delete is not getting searched. So, now - basically in a salvage mode. Does the number of documents deleted at a time have any impact? If I 'trickle delete' - say 50K documents at a time - would that make a difference? When i delete, does solr try to bring in all the index to memory? Trying to understand what happens under the hood. Thanks Vinay On 11 April 2014 13:53, Erick Erickson wrote: > Using 16G for a 360G index is probably pushing things. A lot. I'm > actually a bit surprised that the problem only occurs when you delete > docs > > The simplest thing would be to increase the JVM memory. You should be > looking at your index to see how big it is, be sure to subtract out > the *.fdt and *.fdx files, those are used for verbatim copies of the > raw data and don't really count towards the memory requirements. > > I suspect you're just not giving enough memory to your JVM and this is > just the first OOM you've hit. Look on the Solr admin page and see how > much is being reported, if it's near the limit of your 16G that's the > "smoking gun"... > > Best, > Erick > > On Fri, Apr 11, 2014 at 7:45 AM, Vinay Pothnis wrote: > > Sorry - yes, I meant to say leader. > > Each JVM has 16G of memory. > > > > > > On 10 April 2014 20:54, Erick Erickson wrote: > > > >> First, there is no "master" node, just leaders and replicas. But that's > a > >> nit. > >> > >> No real clue why you would be going out of memory. Deleting a > >> document, even by query should just mark the docs as deleted, a pretty > >> low-cost operation. > >> > >> how much memory are you giving the JVM? > >> > >> Best, > >> Erick > >> > >> On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis > wrote: > >> > [solr version 4.3.1] > >> > > >> > Hello, > >> > > >> > I have a solr cloud (4 nodes - 2 shards) with a fairly large amount > >> > documents (~360G of index per shard). Now, a major portion of the > data is > >> > not required and I need to delete those documents. I would need to > delete > >> > around 75% of the data. > >> > > >> > One of the solutions could be to drop the index completely re-index. > But > >> > this is not an option at the moment. > >> > > >> > When we tried to delete the data through a query - say 1 day/month's > >> worth > >> > of data. But after deleting just 1 month's worth of data, the master > node > >> > is going out of memory - heap space. > >> > > >> > Wondering is there any way to incrementally delete the data without > >> > affecting the cluster adversely. > >> > > >> > Thank! > >> > Vinay > >> >
Re: Solr Admin core status - Index is not "Current"
Thanks, Shawn. On Fri, Apr 11, 2014 at 11:11 AM, Shawn Heisey wrote: > On 4/10/2014 2:50 PM, Chris W wrote: > >> Hi there >> >>I am using solrcloud (4.3). I am trying to get the status of a core >> from >> solr using (localhost:8000/solr/admin/cores?action=STATUS&core=) >> and >> i get the following output >> >> 100 >> 102 >> 2 >> 20527 >> 20 >> *false* >> >> >> What does current mean? A few of the cores are optimized (with segment >> count 1) and show current = "true" and rest show current as false. >> >> If i have to make the core as current, what should i do? Is it a big alarm >> if the value is false? >> > > This basically means that Lucene has detected an index state where > something has made changes to the index, but those changes are not yet > visible. To make them visible and return this status to 'true', do a > commit or soft commit with openSearcher enabled. > > http://lucene.apache.org/core/4_7_0/core/org/apache/lucene/ > index/DirectoryReader.html#isCurrent%28%29 > > Thanks, > Shawn > > -- Best -- C
Strange double-logging with log4j
This is lucene_solr_4_7_2_r1586229, downloaded from the release manager's staging area. I configured the following in my log4j.properties file: log4j.rootLogger=WARN, file log4j.category.org.apache.solr.core.SolrCore=INFO, file Now EVERYTHING that SolrCore logs (which is all at INFO) is being logged twice. Should I have done this differently, or is there a bug? I am using a container setup that is almost exactly like the example. The slf4j jars have been upgraded to 1.7.6 and jetty's jars have been upgraded to 8.1.14. Thanks, Shawn
Re: Solr dosn't load index at startup: out of memory
My assumption is that you've been adding documents and just have finally run out of space Is that true Best, Erick On Fri, Apr 11, 2014 at 9:31 AM, Rafał Kuć wrote: > Hello! > > Do you have warming queries defined? > > -- > Regards, > Rafał Kuć > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > >> Hi, >> my solr (v. 4.5) after moths of work suddenly stopped to index: it responded >> at the query but didn't index anymore new data. Here the error message: >> ERROR - 2014-04-11 15:52:30.317; org.apache.solr.common.SolrException; >> java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot >> commit >> at >> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2788) > >> So, I restarted solr using more RAM (from 4GB until 8Gb) but now solr can't >> load the cores. Here the error message: >> ERROR - 2014-04-11 16:32:50.509; >> org.apache.solr.core.CoreContainer; Unable >> to create core: posts >> org.apache.solr.common.SolrException: Error Instantiating Update Handler, >> solr.DirectUpdateHandler2 failed to instantiate >> org.apache.solr.update.UpdateHandler >> ... >> Caused by: java.lang.OutOfMemoryError: Java heap space > >> Anyone can help me? > > > >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Solr-dosn-t-load-index-at-startup-out-of-memory-tp4130665.html >> Sent from the Solr - User mailing list archive at Nabble.com. >
Re: deleting large amount data from solr cloud
Using 16G for a 360G index is probably pushing things. A lot. I'm actually a bit surprised that the problem only occurs when you delete docs The simplest thing would be to increase the JVM memory. You should be looking at your index to see how big it is, be sure to subtract out the *.fdt and *.fdx files, those are used for verbatim copies of the raw data and don't really count towards the memory requirements. I suspect you're just not giving enough memory to your JVM and this is just the first OOM you've hit. Look on the Solr admin page and see how much is being reported, if it's near the limit of your 16G that's the "smoking gun"... Best, Erick On Fri, Apr 11, 2014 at 7:45 AM, Vinay Pothnis wrote: > Sorry - yes, I meant to say leader. > Each JVM has 16G of memory. > > > On 10 April 2014 20:54, Erick Erickson wrote: > >> First, there is no "master" node, just leaders and replicas. But that's a >> nit. >> >> No real clue why you would be going out of memory. Deleting a >> document, even by query should just mark the docs as deleted, a pretty >> low-cost operation. >> >> how much memory are you giving the JVM? >> >> Best, >> Erick >> >> On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis wrote: >> > [solr version 4.3.1] >> > >> > Hello, >> > >> > I have a solr cloud (4 nodes - 2 shards) with a fairly large amount >> > documents (~360G of index per shard). Now, a major portion of the data is >> > not required and I need to delete those documents. I would need to delete >> > around 75% of the data. >> > >> > One of the solutions could be to drop the index completely re-index. But >> > this is not an option at the moment. >> > >> > When we tried to delete the data through a query - say 1 day/month's >> worth >> > of data. But after deleting just 1 month's worth of data, the master node >> > is going out of memory - heap space. >> > >> > Wondering is there any way to incrementally delete the data without >> > affecting the cluster adversely. >> > >> > Thank! >> > Vinay >>
Re: High CPU usage after import
Are you storing the data? That is, the raw binary of the MP3? B/c when stored="true", Solr will try to compress the data, perhaps that's what's driving the CPU utilization? Easy test: set stored="false" for everything.. FWIW, Erick On Fri, Apr 11, 2014 at 5:23 AM, Александр Вандышев wrote: > I realized what the problem was. One of the Solr threads freezes when > importing > MP3 files. When there are many such files Solr loads all processors. Is > there a > way to free thread? > > Re: High CPU usage after import That could mean that the code is hung > somehow. > Or, maybe Solr is just > working on the commit. Unless you have an explicit commit, the automatic > commit will occur some time after the extract request. How much data are we > talking about? > > What does the Solr log say? Compare that to the case where CPU usage does > settle down. > > -- Jack Krupansky > > -Original Message- > From: Александр Вандышев > Sent: Thursday, April 3, 2014 3:24 AM > To: Solr User > Subject: High CPU usage after import > > Thanks for the answer. I meant that the CPU does not free after the end of > import.Tomtcat or Solr continue use it in max level. > > . > > Вт. 01 апр. 2014 20:09:24 пользователь Jack Krupansky > (j...@basetechnology.com) > написал: > > > Some document types can consume significant CPU resources, such as large PDF > files. > > -- Jack Krupansky > > -Original Message- > From: Александр Вандышев > Sent: Tuesday, April 1, 2014 9:28 AM > To: Solr User > Subject: High CPU usage after import > > I use a update/extract handler for indexing a large number of files. If > during > indexing a CPU loads was not maximum at the end of import loading decreases. > If > CPU loading was max then loading remain high. Who can help me?
Re: Solr Admin core status - Index is not "Current"
On 4/10/2014 2:50 PM, Chris W wrote: Hi there I am using solrcloud (4.3). I am trying to get the status of a core from solr using (localhost:8000/solr/admin/cores?action=STATUS&core=) and i get the following output 100 102 2 20527 20 *false* What does current mean? A few of the cores are optimized (with segment count 1) and show current = "true" and rest show current as false. If i have to make the core as current, what should i do? Is it a big alarm if the value is false? This basically means that Lucene has detected an index state where something has made changes to the index, but those changes are not yet visible. To make them visible and return this status to 'true', do a commit or soft commit with openSearcher enabled. http://lucene.apache.org/core/4_7_0/core/org/apache/lucene/index/DirectoryReader.html#isCurrent%28%29 Thanks, Shawn
Re: Solr Admin core status - Index is not "Current"
Any help on this is much appreciated. I cannot find any documentation around this and would be good to understand what this means Thanks On Thu, Apr 10, 2014 at 1:50 PM, Chris W wrote: > Hi there > > I am using solrcloud (4.3). I am trying to get the status of a core from > solr using (localhost:8000/solr/admin/cores?action=STATUS&core=) and > i get the following output > > 100 > 102 > 2 > 20527 > 20 > *false* > > What does current mean? A few of the cores are optimized (with segment > count 1) and show current = "true" and rest show current as false. > > If i have to make the core as current, what should i do? Is it a big alarm > if the value is false? > > -- > Best > -- > C > -- Best -- C
RE: Relevance/Rank
HI thanks Aman/Eric, I move part of the query under q=*:* and there is a difference in the score and the Order. It seems work for me now. I use this and move forward Thanks Ravi -Original Message- From: Aman Tandon [mailto:amantandon...@gmail.com] Sent: Friday, April 11, 2014 12:02 AM To: solr-user@lucene.apache.org Subject: Re: Relevance/Rank Its fine Erick, I am guessing that maybe* &fq=(SKU:204-161)... *this SKU with that value is present in all results that's why Name products are not getting boosted. Ravi: check your results without filtering, does all the results include *SKU:204-161. *I guess this may help. On Fri, Apr 11, 2014 at 9:22 AM, Erick Erickson wrote: > Aman: > > Oops, looked at the wrong part of the query, didn't see the bq clause. > You're right of course. Sorry for the misdirection. > > Erick > -- With Regards Aman Tandon
Re: Solr dosn't load index at startup: out of memory
Hello! Do you have warming queries defined? -- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ > Hi, > my solr (v. 4.5) after moths of work suddenly stopped to index: it responded > at the query but didn't index anymore new data. Here the error message: > ERROR - 2014-04-11 15:52:30.317; org.apache.solr.common.SolrException; > java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot > commit > at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2788) > So, I restarted solr using more RAM (from 4GB until 8Gb) but now solr can't > load the cores. Here the error message: > ERROR - 2014-04-11 16:32:50.509; > org.apache.solr.core.CoreContainer; Unable > to create core: posts > org.apache.solr.common.SolrException: Error Instantiating Update Handler, > solr.DirectUpdateHandler2 failed to instantiate > org.apache.solr.update.UpdateHandler > ... > Caused by: java.lang.OutOfMemoryError: Java heap space > Anyone can help me? > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-dosn-t-load-index-at-startup-out-of-memory-tp4130665.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search a list of words and returned order
Generally, the documents containing more of the terms should score higher and be returned first, but "relevancy" for some terms can skew that ordering, to some degree. What specific use cases are failing for you? You can always add an additional optional subquery which is the AND of all terms and has a significant boost: q=see spot run (+see +spot +run)^10 -- Jack Krupansky -Original Message- From: Croci Francesco Luigi (ID SWS) Sent: Friday, April 11, 2014 9:47 AM To: 'solr-user@lucene.apache.org' Subject: Search a list of words and returned order When I search for a list of words, per default Solr uses the OR operator. In my case I index (pdfs) files. How/what can I do so that when I search the index for a list of words, I get the list of documents ordered first by the ones that have all the words in them? Thank you Francesco
Solr dosn't load index at startup: out of memory
Hi, my solr (v. 4.5) after moths of work suddenly stopped to index: it responded at the query but didn't index anymore new data. Here the error message: ERROR - 2014-04-11 15:52:30.317; org.apache.solr.common.SolrException; java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2788) So, I restarted solr using more RAM (from 4GB until 8Gb) but now solr can't load the cores. Here the error message: ERROR - 2014-04-11 16:32:50.509; org.apache.solr.core.CoreContainer; Unable to create core: posts org.apache.solr.common.SolrException: Error Instantiating Update Handler, solr.DirectUpdateHandler2 failed to instantiate org.apache.solr.update.UpdateHandler ... Caused by: java.lang.OutOfMemoryError: Java heap space Anyone can help me? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-dosn-t-load-index-at-startup-out-of-memory-tp4130665.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Class not found ICUFoldingFilter (SOLR-4852)
On 4/11/2014 3:44 AM, ronak kirit wrote: > I am facing the same issue discussed at SOLR-4852. I am getting below error: > > Caused by: java.lang.NoClassDefFoundError: Could not initialize class > org.apache.lucene.analysis.icu.ICUFoldingFilter > at > org.apache.lucene.analysis.icu.ICUFoldingFilterFactory.create(ICUFoldingFilterFactory.java:50) > at > org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:67) > > > I am using solr-4.3.1. As discussed at SOLR-4852, I had all the jars at > (SOLR_HOME)/lib and there is no reference to lib via any of solrconfig.xml > or schema.xml. I filed SOLR-4852. Resource loading seems to be a black art with Solr! The only jars you need for the ICU analysis components are lucene-analyzers-icu-4.3.1.jar and icu4j-49.1.jar, possibly with different version numbers in the names. Are you defining solr.solr.home explicitly? I'm just wondering if maybe ${solr.solr.home}/lib isn't where you think it is, or whether maybe there's another copy of the jars somewhere on the classpath. The log should show which jars are loaded ... do you see either of the above jars loaded more than once? If you do, that seems to be the trigger for the problem. All but one copy needs to be removed. If the jars exist in the extracted WAR (the WEB-INF location you mentioned), everything seems to work, but the problem with this is that when you replace the .war file, your changes to the extracted war will either be outdated or possibly will get removed. It is good practice to entirely remove the extracted .war contents when upgrading Solr. Thanks, Shawn
Re: deleting large amount data from solr cloud
Sorry - yes, I meant to say leader. Each JVM has 16G of memory. On 10 April 2014 20:54, Erick Erickson wrote: > First, there is no "master" node, just leaders and replicas. But that's a > nit. > > No real clue why you would be going out of memory. Deleting a > document, even by query should just mark the docs as deleted, a pretty > low-cost operation. > > how much memory are you giving the JVM? > > Best, > Erick > > On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis wrote: > > [solr version 4.3.1] > > > > Hello, > > > > I have a solr cloud (4 nodes - 2 shards) with a fairly large amount > > documents (~360G of index per shard). Now, a major portion of the data is > > not required and I need to delete those documents. I would need to delete > > around 75% of the data. > > > > One of the solutions could be to drop the index completely re-index. But > > this is not an option at the moment. > > > > When we tried to delete the data through a query - say 1 day/month's > worth > > of data. But after deleting just 1 month's worth of data, the master node > > is going out of memory - heap space. > > > > Wondering is there any way to incrementally delete the data without > > affecting the cluster adversely. > > > > Thank! > > Vinay >
RE: Were changes made to facetting on multivalued fields recently?
Thanks to both of you. I finally found the issue and you were right (again) ;) The problem was not coming from the full indexation code containing the SQL replace statement but from another process whose job is to maintain our index up to date. This process had no idea that commas were to be replaced by spaces for some fields (and it should not about this either). I changed the Tokenizer used for the field to the following and everything is fine now. Thanks for your help > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: April-10-14 1:54 PM > To: solr-user@lucene.apache.org > Subject: Re: Were changes made to facetting on multivalued fields recently? > > bq: The SQL query contains a Replace statement that does this > > Well, I suspect that's where the issue is. The facet values being reported > include: > 134826 > which indicates that the incoming text to Solr still has the commas. > Solr is seeing the commas and all. > > You can cure this by using PatternReplaceCharFilterFactory and doing the > substitution at index time if you want to. > > That doesn't clarify why the behavior has changed though, but my > supposition is that it has nothing to do with Solr, and something about your > SQL statement is different. > > Best, > Erick > > On Thu, Apr 10, 2014 at 9:33 AM, Jean-Sebastien Vachon sebastien.vac...@wantedanalytics.com> wrote: > > The SQL query contains a Replace statement that does this > > > >> -Original Message- > >> From: Shawn Heisey [mailto:s...@elyograg.org] > >> Sent: April-10-14 11:30 AM > >> To: solr-user@lucene.apache.org > >> Subject: Re: Were changes made to facetting on multivalued fields > recently? > >> > >> On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote: > >> > Here are the field definitions for both our old and new index... as > >> > you can > >> see that are identical. We've been using this chain and field type > >> starting with Solr 1.4 and never had any problem. As for the > >> documents, both indexes are using the same data source. They could be > >> slightly out of sync from time to time but we tend to index them on a > >> daily basis. Both indexes are also using the same code (indexing through > SolrJ) to index their content. > >> > > >> > The source is a column in MySql that contains entries such as "4,1" > >> > that get stored in a Multivalued fields after replacing commas by > >> > spaces > >> > > >> > OLD (4.6.1): > >> > >> positionIncrementGap="100"> > >> > > >> > > >> > > >> > > >> > > >> > >> > stored="true" required="false" multiValued="true" /> > >> > >> Just so you know, there's nothing here that would require the field > >> to be multivalued. WhitespaceTokenizerFactory does not create > >> multiple field values, it creates multiple terms. If you are > >> actually inserting multiple values for the field in SolrJ, then you would > need a multivalued field. > >> > >> What is replacing the commas with spaces? I don't see anything here > >> that would do that. It sounds like that part of your indexing is not > working. > >> > >> Thanks, > >> Shawn > >> > >> > >> - > >> Aucun virus trouvé dans ce message. > >> Analyse effectuée par AVG - www.avg.fr > >> Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date: > >> 09/04/2014 > > - > Aucun virus trouvé dans ce message. > Analyse effectuée par AVG - www.avg.fr > Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date: > 09/04/2014
Search a list of words and returned order
When I search for a list of words, per default Solr uses the OR operator. In my case I index (pdfs) files. How/what can I do so that when I search the index for a list of words, I get the list of documents ordered first by the ones that have all the words in them? Thank you Francesco
[ANN] Solr learning resources on safariflow.com (w/subscription or free trial)
I just wanted to let people know about some recent Solr books and videos that are now available at safariflow.com. You can sign up for a free trial and get instant access, buy a subscription, or you may already be a subscriber. I don't normally send out announcements like this, but because we just got an influx of new material, I thought people might be interested. Solr in Action (March 2014) http://www.safariflow.com/library/view/Solr-in-Action/9781617291029/ Einführung in Apache Solr (March 2014) http://www.safariflow.com/library/view/Einf%25C3%25BChrung-in-Apache-Solr/9783955614249/ Apache Solr High Performance (March 2014) http://www.safariflow.com/library/view/Apache-Solr-High-Performance/9781782164821/ Getting Started with Apache Solr Search Server (June 2013 video course): http://www.safariflow.com/library/view/Getting-started-with-Apache-Solr-Search-Server-%255BVideo%255D/9781782160847/ In addition these are some other Solr and Lucene titles we have had for a little while: http://www.safariflow.com/library/view/Instant-Apache-Solr-for-Indexing-Data-How-to/9781782164845/ http://www.safariflow.com/library/view/Apache-Solr-3-Enterprise-Search-Server/9781849516068/ http://www.safariflow.com/library/view/Lucene-in-Action%252C-Second-Edition/9781933988177/
Re: High CPU usage after import
I realized what the problem was. One of the Solr threads freezes when importing MP3 files. When there are many such files Solr loads all processors. Is there a way to free thread? Re: High CPU usage after import That could mean that the code is hung somehow. Or, maybe Solr is just working on the commit. Unless you have an explicit commit, the automatic commit will occur some time after the extract request. How much data are we talking about? What does the Solr log say? Compare that to the case where CPU usage does settle down. -- Jack Krupansky -Original Message- From: Александр Вандышев Sent: Thursday, April 3, 2014 3:24 AM To: Solr User Subject: High CPU usage after import Thanks for the answer. I meant that the CPU does not free after the end of import.Tomtcat or Solr continue use it in max level. . Вт. 01 апр. 2014 20:09:24 пользователь Jack Krupansky (j...@basetechnology.com) написал: Some document types can consume significant CPU resources, such as large PDF files. -- Jack Krupansky -Original Message- From: Александр Вандышев Sent: Tuesday, April 1, 2014 9:28 AM To: Solr User Subject: High CPU usage after import I use a update/extract handler for indexing a large number of files. If during indexing a CPU loads was not maximum at the end of import loading decreases. If CPU loading was max then loading remain high. Who can help me?
Re: DataImportHandler - Automatic scheduling of delta imports in Solr in windows 7
Yes that is all fine with me. Only thing that worries me is what needs to be coded in the batch file. I will just try a sample batch file and get back with queries if any. Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-Automatic-scheduling-of-delta-imports-in-Solr-in-windows-7-tp4130565p4130635.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fails to index if unique field has special characters
Well, this is somewhat of a problem if you have have URL's as uniqueKey that contain exclamation marks. Isn't it an idea to allow those to be escaped and thus ignored by CompositeIdRouter? On Friday, April 11, 2014 11:43:31 AM Cool Techi wrote: > Thanks, that was helpful. > Regards,Rohit > > > Date: Thu, 10 Apr 2014 08:44:36 -0700 > > From: iori...@yahoo.com > > Subject: Re: Fails to index if unique field has special characters > > To: solr-user@lucene.apache.org > > > > Hi Ayush, > > > > I thinks this > > > > ""IBM!12345". The exclamation mark ('!') is critical here, as it > > distinguishes the prefix used to determine which shard to direct the > > document to." > > > > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+ > > in+SolrCloud > > > > > > > > > > On Thursday, April 10, 2014 2:35 PM, Cool Techi > > wrote: Hi, > > We are migrating from Solr 4.6 standalone to Solr 4.7 cloud version, while > > reindexing the document we are getting the following error. This is > > happening when the unique key has special character, this was not noticed > > in version 4.6 standalone mode, so we are not sure if this is a version > > problem or a cloud issue. Example of the unique key is given below, > > http://www.mynews.in/Blog/smrity!!**)))!miami_dolphins_vs_dallas_cowboys_ > > live_stream_on_line_nfl_football_free_video_broadcast_B142707.html > > Exception Stack Trace > > ERROR - 2014-04-10 10:51:44.361; org.apache.solr.common.SolrException; > > java.lang.ArrayIndexOutOfBoundsException: 2 at > > org.apache.solr.common.cloud.CompositeIdRouter$KeyParser.getHash(Composit > > eIdRouter.java:296) at > > org.apache.solr.common.cloud.CompositeIdRouter.sliceHash(CompositeIdRoute > > r.java:58) at > > org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRout > > er.java:33) at > > org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest( > > DistributedUpdateProcessor.java:218) at > > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(Di > > stributedUpdateProcessor.java:550) at > > org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP > > rocessorFactory.java:100) at > > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247 > > ) at > > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) > > at> > > > >org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.j > >ava:92) at > >org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Content > >StreamHandlerBase.java:74) at > >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas > >e.java:135) at > >org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at > >org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java > >:780) at > >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav > >a:427) at > >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav > >a:217) at > >org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl > >er.java:1419) at > >org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > > at > >org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1 > >37) at > >org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557 > >) at org.eclipse.jetty.server.session.SessionHandle> > > Thanks,Ayush > >
RE: Fails to index if unique field has special characters
Thanks, that was helpful. Regards,Rohit > Date: Thu, 10 Apr 2014 08:44:36 -0700 > From: iori...@yahoo.com > Subject: Re: Fails to index if unique field has special characters > To: solr-user@lucene.apache.org > > Hi Ayush, > > I thinks this > > ""IBM!12345". The exclamation mark ('!') is critical here, as it > distinguishes the prefix used to determine which shard to direct the document > to." > > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud > > > > > On Thursday, April 10, 2014 2:35 PM, Cool Techi > wrote: > Hi, > We are migrating from Solr 4.6 standalone to Solr 4.7 cloud version, while > reindexing the document we are getting the following error. This is happening > when the unique key has special character, this was not noticed in version > 4.6 standalone mode, so we are not sure if this is a version problem or a > cloud issue. Example of the unique key is given below, > http://www.mynews.in/Blog/smrity!!**)))!miami_dolphins_vs_dallas_cowboys_live_stream_on_line_nfl_football_free_video_broadcast_B142707.html > Exception Stack Trace > ERROR - 2014-04-10 10:51:44.361; org.apache.solr.common.SolrException; > java.lang.ArrayIndexOutOfBoundsException: 2 at > org.apache.solr.common.cloud.CompositeIdRouter$KeyParser.getHash(CompositeIdRouter.java:296) >at > org.apache.solr.common.cloud.CompositeIdRouter.sliceHash(CompositeIdRouter.java:58) >at > org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:33) >at > org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:218) >at > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:550) >at > org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) >at > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247) >at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) > at > > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) >at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) >at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) >at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) >at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) >at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) >at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) >at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > at org.eclipse.jetty.server.session.SessionHandle > > Thanks,Ayush
SOLR problem with full-import and shards
Hi, I built an Apache SOLR cloud (version 4.7.0) with 3 shards. I chose implicit routing mechanism while creating new collection (one shard per month, fields with date format MM use as shardId). I configured DataImportHandler with database as a data source. Finally I run full-import (data from 3 months is present in database) on the shard leader of first month's shard. Although I received success message on the web page, the data on the first shard was indexed only (data from first month which is ok; the data from other two months which should come to other two shards was nowhere indexed.) I checked logs and spotted hundreds of errors: WARN - 2014-04-11 10:55:33.921; org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending update org.apache.solr.common.SolrException: Bad Request request: http:// :/solr/trans_implicit/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F://%3A%2Fsolr%2Ftrans_implicit%2F&wt=javabin&version=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Can anyone help? I would very appreciate for any suggestions. Regards, Wojtek Jaworski
highlighting displays to much
i am using solr 4.3.1 and want to highlight complete sentences if possible or at least not cut up words. it it finds something the hole field is displayed instead of only 180 chars the field is: solrconfig setting for highlighting: true plain_text title description 5 180 regex 0.2 \w[^\.!\?]{20,160}
Class not found ICUFoldingFilter (SOLR-4852)
Hello, I am facing the same issue discussed at SOLR-4852. I am getting below error: Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.analysis.icu.ICUFoldingFilter at org.apache.lucene.analysis.icu.ICUFoldingFilterFactory.create(ICUFoldingFilterFactory.java:50) at org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:67) I am using solr-4.3.1. As discussed at SOLR-4852, I had all the jars at (SOLR_HOME)/lib and there is no reference to lib via any of solrconfig.xml or schema.xml. I have also tried with setting "sharedLib=foo", but that also didn't work. However, if I removed all the below files: icu4j-49.1.jar lucene-analyzers-morfologik-4.3.1.jar l ucene-analyzers-stempel-4.3.1.jar solr-analysis-extras-4.3.1.jar lucene-analyzers-icu-4.3.1.jar lucene-analyzers-smartcn-4.3.1.jar lucene-analyzers-uima-4.3.1.jar from $(solrhome)/lib and move to solr-webapp/webapp/WEB-INF/lib things are working fine. Any guess? Any help? Thanks, Ronak
Re: Solr relevancy tuning
Hello Doug I have just watched the quepid demonstration video, and I strongly agree with your introduction: it is very hard to involve marketing/business people in repeated testing session, and speadsheets or other kind of files are not the right tool to use. Currenlty I'm quite alone in my tuning task and having a visual approach could be benefical for me, you are giving me many good inputs! I see that kelvin (my scripted tool) and queepid follows the same path. In queepid someone quickly whatches the results and applies colours to result, in kelvin you enter one on more queries (network cable, ethernet cable) and states that the result must contains ethernet in the title, or must come from a list of product categories. I also do diffs of results, before and after changes, to check what is going on; but I have to do that in a very unix-scripted way. Have you considered of placing a counter of total red/bad results in quepid? I use this index to have a quick overview of changes impact across all queries. Actually I repeat tests in production from times to time, and if I see the "kelvin temperature" rising (the number of errors going up) I know I have to check what's going on because new products maybe are having a bad impact on the index. I also keep counters of products with low quality images/no images at all or too short listings, sometimes are useful to undestand better what will happen if you change some bq/fq in the application. I see also that after changes in quepid someone have to check "gray" results and assign them a colour, in kelvin case sometimes the conditions can do a bit of magic (new product names still contains SM-G900F) but sometimes can introduce false errors (the new product name contains only Galaxy 5 and not the product code SM-G900F). So some checks are needed but with quepid everybody can do the check, with kelvin you have to change some line of a script, and not everybody is able/willing to do that. The idea of a static index is a good suggestion, I will try to have it in the next round of search engine improvement. Thank you Doug! 2014-04-09 17:48 GMT+02:00 Doug Turnbull < dturnb...@opensourceconnections.com>: > Hey Giovanni, nice to meet you. > > I'm the person that did the Test Driven Relevancy talk. We've got a product > Quepid (http://quepid.com) that lets you gather good/bad results for > queries and do a sort of test driven development against search relevancy. > Sounds similar to your existing scripted approach. Have you considered > keeping a static catalog for testing purposes? We had a project with a lot > of updates and date-dependent relevancy. This lets you create some test > scenarios against a static data set. However, one downside is you can't > recreate problems in production in your test setup exactly-- you have to > find a similar issue that reflects what you're seeing. > > Cheers, > -Doug > > > On Wed, Apr 9, 2014 at 10:42 AM, Giovanni Bricconi < > giovanni.bricc...@banzai.it> wrote: > > > Thank you for the links. > > > > The book is really useful, I will definitively have to spend some time > > reformatting the logs to to access number of result founds, session id > and > > much more. > > > > I'm also quite happy that my test cases produces similar results to the > > precision reports shown at the beginning of the book. > > > > Giovanni > > > > > > 2014-04-09 12:59 GMT+02:00 Ahmet Arslan : > > > > > Hi Giovanni, > > > > > > Here are some relevant pointers : > > > > > > > > > > > > http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy > > > > > > > > > http://rosenfeldmedia.com/books/search-analytics/ > > > > > > http://www.sematext.com/search-analytics/index.html > > > > > > > > > Ahmet > > > > > > > > > On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi < > > > giovanni.bricc...@banzai.it> wrote: > > > It is about one year I'm working on an e-commerce site, and > > unfortunately I > > > have no "information retrieval" background, so probably I am missing > some > > > important practices about relevance tuning and search engines. > > > During this period I had to fix many "bugs" about bad search results, > > which > > > I have solved sometimes tuning edismax weights, sometimes creating ad > hoc > > > query filters or query boosting; but I am still not able to figure out > > what > > > should be the correct process to improve search results relevance. > > > > > > These are the practices I am following, I would really appreciate any > > > comments about them and any hints about what practices you follow in > your > > > projects: > > > > > > - In order to have a measure of search quality I have written many test > > > cases such as "if the user searches for <> the search > > > result should display at least four <> products with the words > > > <> and <> in the title". I have written a tool that > > read > > > such tests from json files and applies them to my appli
Re: Shared Stored Field
Erick Erickson wrote > So you're saying that you have B_1 - B_8 in one doc, B_9 - B_16 in > another doc etc? Well yes that could work, but this would mean we get a lot of unique dymanic fields, basically equal to the number of documents in our system and I am not sure if that is a good practice. -- View this message in context: http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130589.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1
Thanks Shawn, perhaps the comment on the luceneMatchVersion in the example schema.xml could be changed to reflect / clarify this? this comment made me think that the parameter is affecting the index side of things too (aka index format version). I.e. I would appreciate seeing there things like you just mentioned regarding emulated behaviour. So we could draw a line between the index format (low-level, not controllable by a user) and analysis chain etc (solr config level, user controllable). I have tried specifying the postingsFormat on a per field type basis. For postingsFormat="Lucene40" I get: org.apache.solr.client.solrj.SolrServerException: java.lang.UnsupportedOperationException: this codec can only be used for reading at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155) ... 12 more Caused by: java.lang.UnsupportedOperationException: this codec can only be used for reading at org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:246) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:131) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:336) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:465) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:506) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:616) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2864) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3022) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2989) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:578) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1457) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1434) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150) ... 12 more But that is just a side note. I have added a comment to the cwiki regarding the possible values for postingsFormat parameter (currently values marked as n/a): https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties Dmitry On Fri, Apr 11, 2014 at 10:42 AM, Shawn Heisey wrote: > On 4/11/2014 12:42 AM, Dmitry Kan wrote: > > Thanks! So solr 4.7 does not seem to respect the luceneMatchVersion on > the > > binary (index) level. Or perhaps, I misunderstand the meaning of the > > luceneMatchVersion. > > luceneMatchVersion does not dictate the index format. It is a way to > signal things like analysis components that they should emulate behavior > (sometimes buggy) found in an earlier version. Not all analysis > components will operate differently when this config is used. There is > probably not a central repository of how the version affects Solr/Lucene > behavior. > > > I wonder whether there is any possibility of defining the version of the > > codec in solr config/schema. > > I don't think Solr exposes any way to define an entire codec. You can > change things individually, like the postings format and docValues > format on a field, but there's no way (that I know of) to define an > entire codec. The overall index format is not something you can specify. > > I think it could be possible to come up with some XML syntax for > describing a complete codec and then write code to parse it and build > the codec ... but because my understanding of how all the Lucene pieces > fit together is relatively low, there may be some really good reason > that Solr doesn't offer this functionality. > > Thanks, > Shawn > > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1
On 4/11/2014 12:42 AM, Dmitry Kan wrote: > Thanks! So solr 4.7 does not seem to respect the luceneMatchVersion on the > binary (index) level. Or perhaps, I misunderstand the meaning of the > luceneMatchVersion. luceneMatchVersion does not dictate the index format. It is a way to signal things like analysis components that they should emulate behavior (sometimes buggy) found in an earlier version. Not all analysis components will operate differently when this config is used. There is probably not a central repository of how the version affects Solr/Lucene behavior. > I wonder whether there is any possibility of defining the version of the > codec in solr config/schema. I don't think Solr exposes any way to define an entire codec. You can change things individually, like the postings format and docValues format on a field, but there's no way (that I know of) to define an entire codec. The overall index format is not something you can specify. I think it could be possible to come up with some XML syntax for describing a complete codec and then write code to parse it and build the codec ... but because my understanding of how all the Lucene pieces fit together is relatively low, there may be some really good reason that Solr doesn't offer this functionality. Thanks, Shawn
Re: Pushing content to Solr from Nutch
Hi Xavier; I think that it is better to ask this question at Nutch user list. Thanks; Furkan KAMACI 2014-04-11 7:52 GMT+03:00 Jack Krupansky : > Does your Solr schema match the data output by nutch? It's up to you to > create a Solr schema that matches the output of nutch - read up on the > nutch doc for that info. Solr doesn't define that info, nutch does. > > -- Jack Krupansky > > From: Xavier Morera > Sent: Thursday, April 10, 2014 12:58 PM > To: solr-user@lucene.apache.org > Subject: Pushing content to Solr from Nutch > > Hi, > > I have followed several Nutch tutorials - including the main one > http://wiki.apache.org/nutch/NutchTutorial - to crawl sites (which works, > I can see in the console as the pages get crawled and the directories built > with the data) but for the life of me I can't get anything posted to Solr. > The Solr console doesn't even squint, therefore Nutch is not sending > anything. > > This is the command that I send over that crawls and in theory should also > post > bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr 2 > > > But I found that I could also use this one when it is already crawled > bin/nutch solrindex http://localhost:8983/solr crawl/crawldb crawl/linkdb > crawl/segments/* > > > But no luck. > > This is the only thing that called my attention but I read that by adding > the property below would work but doesn't work. > No IndexWriters activated - check your configuration > > > This is the property > > plugin.includes > > protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic) > > > Any idea? Apache Nutch 1.8 running Java 1.6 via Cygwin on Windows. > > -- > > Xavier Morera > email: xav...@familiamorera.com > > CR: +(506) 8849 8866 > US: +1 (305) 600 4919 > skype: xmorera >