RE: yet another optimize question
Petersen, Robert [robert.peter...@mail.rakuten.com] wrote: We actually have hundreds of facet-able fields, but most are specialized and are only faceted upon if the user has drilled into the particular category to which they are applicable and so they are only indexed for products in those categories. I guess it is the facets that eat up so much of our memory. As Andre mentions, the problem is that the fc facet method maintains a list of values (or pointers to values, if we're talking text) for each document in the whole index. Faceting on a field that only has a single value in a single document in the whole index still allocates memory linear to the total number of documents. You are in the same situation as John Nielsen in the thread Solr using a ridiculous amount of memory http://lucene.472066.n3.nabble.com/Solr-using-a-ridiculous-amount-of-memory-tt4050840.html#none You could try and change the way you index the facet information to get around this waste, but it is quite a lot of work: http://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/ It was suggested that if I use facet method = enum for those particular specialized facets then my memory usage would go down. If the number of unique values in the individual facets is low, this could work. If nothing else, it is very easy to try. - Toke Eskildsen
Re: UnInverted multi-valued field
Hello, well ... we have 5 multi-valued facet fields, so you had to wait sometimes up to one minute. The old searcher blocks during this time. @Toke Eskildsen: the example I posted was a very small update, usually there are more terms. We are using Solr 3.6. I don't know if it will be faster with 4.x. These are the configurations of our cache: filterCache class=solr.FastLRUCache size=30 initialSize=30 autowarmCount=5/ queryResultCache class=solr.LRUCache size=10 initialSize=10 autowarmCount=5/ documentCache class=solr.LRUCache size=5 initialSize=5 autowarmCount=1/ We have 5 million document in our index. @Roman: Do you think our autowarmCound should be larger? Greetings Jochen Roman Chyla schrieb: On Wed, Jun 19, 2013 at 5:30 AM, Jochen Lienhard lienh...@ub.uni-freiburg.de wrote: Hi @all. We have the problem that after an update the index takes to much time for 'warm up'. We have some multivalued facet-fields and during the startup solr creates the messages: INFO: UnInverted multi-valued field {field=mt_facet,memSize=** 18753256,tindexSize=54,time=**170,phase1=156,nTerms=17,** bigTerms=3,termInstances=**903276,uses=0} In the solconfig we use the facet.method 'fc'. We know, that the start-up with the method 'enum' is faster, but then the searches are very slow. How do you handle this problem? Or have you any idea for optimizing the warm up? Or what do you do after an update? You probably know, but just in case... you may use autowarming; the searcher will populate the cache and only after the warmup queries finished, will it be exposed to the world. The old searcher continues to handle requests in the meantime. roman Greetings Jochen -- Dr. rer. nat. Jochen Lienhard Dezernat EDV Albert-Ludwigs-Universität Freiburg Universitätsbibliothek Rempartstr. 10-16 | Postfach 1629 79098 Freiburg | 79016 Freiburg Telefon: +49 761 203-3908 E-Mail: lienh...@ub.uni-freiburg.de Internet: www.ub.uni-freiburg.de -- Dr. rer. nat. Jochen Lienhard Dezernat EDV Albert-Ludwigs-Universität Freiburg Universitätsbibliothek Rempartstr. 10-16 | Postfach 1629 79098 Freiburg | 79016 Freiburg Telefon: +49 761 203-3908 E-Mail: lienh...@ub.uni-freiburg.de Internet: www.ub.uni-freiburg.de
DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor
Hi, i searched for a solution for quite some time but did not manage to find some real hints on how to fix it. I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a tomcat 6 container. My data import setup is basically the following: Data-config.xml: entity name=article dataSource=ds1 query=SELECT * FROM article deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date gt; '${dih.last_index_time} deltaImportQuery=SELECT * FROM article WHERE myownid=${dih.delta.myownid} pk=myownid field column=myownid name=id/ entity name=supplier dataSource=ds2 query=SELECT * FROM supplier WHERE status=1 processor=CachedSqlEntityProcessor cacheKey=SUPPLIER_ID cacheLookup=article.ARTICLE_SUPPLIER_ID /entity entity name=attributes dataSource=ds1 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' Value:'+ATTRIBUTE_VALUE FROM attributes cacheKey=ARTICLE_ID cacheLookup=article.myownid processor=CachedSqlEntityProcessor /entity /entity Ok now for the problem: At first I tried everything without the Cache. But the full-import took a very long time. Because the attributes query is pretty slow compared to the rest. As a result I got a processing speed of around 150 Documents/s. When switching everything to the CachedSqlEntityProcessor the full import processed at the speed of 4000 Documents/s So full import is running quite fine. Now I wanted to use the delta import. When running the delta import I was expecting the ramp up time to be about the same as in full import since I need to load the whole table supplier and attributes to the cache in the first step. But when looking into the log file the weird thing is solr seems to refresh the Cache for every single document that is processed. So currently my delta-import is a lot slower than the full-import. I even tried to add the deltaImportQuery parameter to the entity but it doesn't change the behavior at all (of course I know it is not supposed to change anything in the setup I run). The following solutions would be possible in my opinion: 1. Is there any way to tell the config to ignore the Cache when running a delta import? That would help already because we are talking about the maximum of 500 documents changed in 15 minutes compared to over 5 million documents in total. 2. Get solr to not refresh the cash for every document. Best Regards Constantin Wolber
solr performance problem from 4.3.0 with sorting
Hi, We updated to version 4.3.0 from 4.2.1 and we have some performance problem with the sorting. A query that returns 1 hits has a query time more than 100ms (can be more than 1s) against less than 10ms for the same query without the sort parameter: query with sorting option: q=level_4_id:531044sort=level_4_id+asc response: - int name=QTime1/int - int name=QTime106/int query without sorting option: q=level_4_id:531024 - int name=QTime1/int - result name=response numFound=1 start=0 the field level_4_id is unique and defined as a long. In version 4.2.1, the performances were identical. The 4.3.1 version has the same behavior than the version 4.3.0. Thanks, Ariel
Re: UnInverted multi-valued field
Hello, Am 20.06.2013 09:34, schrieb Jochen Lienhard: Hello, well ... we have 5 multi-valued facet fields, so you had to wait sometimes up to one minute. The old searcher blocks during this time. May be related to an already fixed SOLR-4589 issue? Generally there is no blocking by the old searcher. It just feels like blocking because the system is busy with your tons of autowarming so that the old searcher has no chance to answer queries. @Toke Eskildsen: the example I posted was a very small update, usually there are more terms. We are using Solr 3.6. I don't know if it will be faster with 4.x. DocValues are introduced to SOLR with version 4.2 It is always good to use a more recent version because of improvements, bug fixes, new features,... These are the configurations of our cache: filterCache class=solr.FastLRUCache size=30 initialSize=30 autowarmCount=5/ queryResultCache class=solr.LRUCache size=10 initialSize=10 autowarmCount=5/ documentCache class=solr.LRUCache size=5 initialSize=5 autowarmCount=1/ We have 5 million document in our index. So how have you calculated these values? Looks like they were just set by chance. autowarmCount depends on what your system is serving and how it is configured. Like my system with 46 million docs I have autowarmCount=0 for all and do a static warming. Why static warming? Because I have a static system. As an example you have queryResultCache set to 10 which means IF you calculate with 100 qps (which is a lot) it will cache the last 1000 seconds (if all queries are unique) which is 16.6 minutes. Is that what you want and what your system should serve? And with the halfe of it (5) a new searcher should be warmed? Regards, Bernd -- * Bernd FehlingBielefeld University Library Dipl.-Inform. (FH)LibTec - Library Technology Universitätsstr. 25 and Knowledge Management 33615 Bielefeld Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: Informal poll on running Solr 4 on Java 7 with G1GC
Am 20.06.2013 00:18, schrieb Timothy Potter: I'm sure there's some site to do this but wanted to get a feel for who's running Solr 4 on Java 7 with G1 gc enabled? Cheers, Tim Currently using Solr 4.2.1 in production with Oracle Java(TM) SE Runtime Environment (build 1.7.0_07-b10) and using G1GC without any options. Linux 2.6.32.23-0.3-xen SMP x86_64 GNU/Linux. Performs better than CMS (with several tuning options). Very little sawtooth, smaller faster GCs. 1 Master/3 Slave System, 128.97 GB index, 46.3 mio. docs. Bernd
Re: yet another optimize question
Take a look at using DocValues for facets that are problematic. It not only moves the memory off-heap, but stores values in a much more optimal manner. -- Jack Krupansky -Original Message- From: Toke Eskildsen Sent: Thursday, June 20, 2013 3:26 AM To: solr-user@lucene.apache.org Subject: RE: yet another optimize question Petersen, Robert [robert.peter...@mail.rakuten.com] wrote: We actually have hundreds of facet-able fields, but most are specialized and are only faceted upon if the user has drilled into the particular category to which they are applicable and so they are only indexed for products in those categories. I guess it is the facets that eat up so much of our memory. As Andre mentions, the problem is that the fc facet method maintains a list of values (or pointers to values, if we're talking text) for each document in the whole index. Faceting on a field that only has a single value in a single document in the whole index still allocates memory linear to the total number of documents. You are in the same situation as John Nielsen in the thread Solr using a ridiculous amount of memory http://lucene.472066.n3.nabble.com/Solr-using-a-ridiculous-amount-of-memory-tt4050840.html#none You could try and change the way you index the facet information to get around this waste, but it is quite a lot of work: http://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/ It was suggested that if I use facet method = enum for those particular specialized facets then my memory usage would go down. If the number of unique values in the individual facets is low, this could work. If nothing else, it is very easy to try. - Toke Eskildsen=
Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor
it is possible to create two separate root entities . one for full-import and another for delta. for the delta-import you can skip Cache that way On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber constantin.wol...@medicalcolumbus.de wrote: Hi, i searched for a solution for quite some time but did not manage to find some real hints on how to fix it. I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a tomcat 6 container. My data import setup is basically the following: Data-config.xml: entity name=article dataSource=ds1 query=SELECT * FROM article deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date gt; '${dih.last_index_time} deltaImportQuery=SELECT * FROM article WHERE myownid=${dih.delta.myownid} pk=myownid field column=myownid name=id/ entity name=supplier dataSource=ds2 query=SELECT * FROM supplier WHERE status=1 processor=CachedSqlEntityProcessor cacheKey=SUPPLIER_ID cacheLookup=article.ARTICLE_SUPPLIER_ID /entity entity name=attributes dataSource=ds1 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' Value:'+ATTRIBUTE_VALUE FROM attributes cacheKey=ARTICLE_ID cacheLookup=article.myownid processor=CachedSqlEntityProcessor /entity /entity Ok now for the problem: At first I tried everything without the Cache. But the full-import took a very long time. Because the attributes query is pretty slow compared to the rest. As a result I got a processing speed of around 150 Documents/s. When switching everything to the CachedSqlEntityProcessor the full import processed at the speed of 4000 Documents/s So full import is running quite fine. Now I wanted to use the delta import. When running the delta import I was expecting the ramp up time to be about the same as in full import since I need to load the whole table supplier and attributes to the cache in the first step. But when looking into the log file the weird thing is solr seems to refresh the Cache for every single document that is processed. So currently my delta-import is a lot slower than the full-import. I even tried to add the deltaImportQuery parameter to the entity but it doesn't change the behavior at all (of course I know it is not supposed to change anything in the setup I run). The following solutions would be possible in my opinion: 1. Is there any way to tell the config to ignore the Cache when running a delta import? That would help already because we are talking about the maximum of 500 documents changed in 15 minutes compared to over 5 million documents in total. 2. Get solr to not refresh the cash for every document. Best Regards Constantin Wolber -- - Noble Paul
Re: solr performance problem from 4.3.0 with sorting
Ariel, I just went up against a similar issue with upgrading from 3.6.1 to 4.3.0. In my case, my solrconfig.xml for 4.3.0 (which was based on my 3.6.1 file) did not provide a newSearcher or firstSearcher warming query. After adding a query to each listener, my query speeds drastically increased. Check your config file and if you aren't defining a query (make sure to sort it on the field in question) do so. Shane On Thu, Jun 20, 2013 at 3:45 AM, Ariel Zerbib ariel.zer...@gmail.comwrote: Hi, We updated to version 4.3.0 from 4.2.1 and we have some performance problem with the sorting. A query that returns 1 hits has a query time more than 100ms (can be more than 1s) against less than 10ms for the same query without the sort parameter: query with sorting option: q=level_4_id:531044sort=level_4_id+asc response: - int name=QTime1/int - int name=QTime106/int query without sorting option: q=level_4_id:531024 - int name=QTime1/int - result name=response numFound=1 start=0 the field level_4_id is unique and defined as a long. In version 4.2.1, the performances were identical. The 4.3.1 version has the same behavior than the version 4.3.0. Thanks, Ariel
Steps for creating a custom query parser and search component
Hello list followers, I need to write a custom Solr query parser and a search component. The requirements for the component are that the raw query that may need to be split into separate Solr queries is in a proprietary format encoded in JSON, and the output is also going to be in a similar proprietary JSON format. I would like some advice on how to get started. Which base classes should I start to work with? I have been looking at the plugin classes and my initial thoughts are along the lines of following workflow: 1. Subclass (QParser?) and write a new parser method that knows how to deal with the input format. 2. Subclass (SolrQueryRequestBase?) or use LocalSolrQueryRequest like in the TestHarness.makeRequest() and use it to execute the required queries. 3. Compile the aggregate results as specified in the query. 4. Use some existing component (?) for returning the results to the user. 5. Put these components in steps 1-4 together into (?) so that it can be added to solrconfig.xml as a custom query parser accessible at http://solr/core/customparser Is my approach reasonable, or am I overlooking some canonical way of achieving what I need to do? What and where do I need to look into to replace the question marks in my plan with knowledge? :) -- Juha
Getting the String which matched in the document as response
Hi, Is it possible to get the exact matched string in the index in the select response of Solr. For eg : If the search query is Hello World and the query parser is OR solr would return all documents which matched both Hello World, only Hello or only World. Now I want to know which of the returned documents matched both Hello World and which of them matched only Hello or World. Is it possible to get this info? Thanks, Prathik
Re: Informal poll on running Solr 4 on Java 7 with G1GC
We used to use G1, but recently went back to CMS. G1 gave us too long stop-the-world events. CMS uses more ressources for the same work, but it is more predictable and we get better worst-case performance out of it. Med venlig hilsen / Best regards John Nielsen Programmer MCB A/S Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 p...@mcb.dk www.mcb.dk On 20-06-2013 00:18, Timothy Potter wrote: I'm sure there's some site to do this but wanted to get a feel for who's running Solr 4 on Java 7 with G1 gc enabled? Cheers, Tim
Re: Getting the String which matched in the document as response
Take a look at the explain section when you add the debugQuery=true parameter. You can additionally set debug.explain.structured=true to get the scoring explanation in XML if parsing the text is a problem for you. -- Jack Krupansky -Original Message- From: Prathik Puthran Sent: Thursday, June 20, 2013 9:55 AM To: solr-user@lucene.apache.org Subject: Getting the String which matched in the document as response Hi, Is it possible to get the exact matched string in the index in the select response of Solr. For eg : If the search query is Hello World and the query parser is OR solr would return all documents which matched both Hello World, only Hello or only World. Now I want to know which of the returned documents matched both Hello World and which of them matched only Hello or World. Is it possible to get this info? Thanks, Prathik
Re: Steps for creating a custom query parser and search component
First, my standard admonition: DON'T DO IT!!! Try harder to use the features Solr provides before trying to shoehorn even more code into Solr. And... think again about whether this code needs to be inside of Solr as opposed to simply doing multiple requests in a clean, RESTful application layer that is completely under your own control. Those disclaimers out of the way... Start by studying any of the existing query parser plugins - AND its unit tests. Ditto with search components. Keep studying until you have specific questions. -- Jack Krupansky -Original Message- From: Juha Haaga Sent: Thursday, June 20, 2013 9:32 AM To: solr-user@lucene.apache.org Subject: Steps for creating a custom query parser and search component Hello list followers, I need to write a custom Solr query parser and a search component. The requirements for the component are that the raw query that may need to be split into separate Solr queries is in a proprietary format encoded in JSON, and the output is also going to be in a similar proprietary JSON format. I would like some advice on how to get started. Which base classes should I start to work with? I have been looking at the plugin classes and my initial thoughts are along the lines of following workflow: 1. Subclass (QParser?) and write a new parser method that knows how to deal with the input format. 2. Subclass (SolrQueryRequestBase?) or use LocalSolrQueryRequest like in the TestHarness.makeRequest() and use it to execute the required queries. 3. Compile the aggregate results as specified in the query. 4. Use some existing component (?) for returning the results to the user. 5. Put these components in steps 1-4 together into (?) so that it can be added to solrconfig.xml as a custom query parser accessible at http://solr/core/customparser Is my approach reasonable, or am I overlooking some canonical way of achieving what I need to do? What and where do I need to look into to replace the question marks in my plan with knowledge? :) -- Juha
RE: Steps for creating a custom query parser and search component
Hi Juha, If it's just a matter of format, have you considered adding another layer between Solr where you've got a class that just takes in your queries in the proprietary format and then converts them to what Solr needs? Similarly, if you need your results in a format, just convert them again? I would imagine that'd be a lot simpler than subclassing Solr classes. Swati -Original Message- From: Juha Haaga [mailto:juha.ha...@codenomicon.com] Sent: Thursday, June 20, 2013 9:33 AM To: solr-user@lucene.apache.org Subject: Steps for creating a custom query parser and search component Hello list followers, I need to write a custom Solr query parser and a search component. The requirements for the component are that the raw query that may need to be split into separate Solr queries is in a proprietary format encoded in JSON, and the output is also going to be in a similar proprietary JSON format. I would like some advice on how to get started. Which base classes should I start to work with? I have been looking at the plugin classes and my initial thoughts are along the lines of following workflow: 1. Subclass (QParser?) and write a new parser method that knows how to deal with the input format. 2. Subclass (SolrQueryRequestBase?) or use LocalSolrQueryRequest like in the TestHarness.makeRequest() and use it to execute the required queries. 3. Compile the aggregate results as specified in the query. 4. Use some existing component (?) for returning the results to the user. 5. Put these components in steps 1-4 together into (?) so that it can be added to solrconfig.xml as a custom query parser accessible at http://solr/core/customparser Is my approach reasonable, or am I overlooking some canonical way of achieving what I need to do? What and where do I need to look into to replace the question marks in my plan with knowledge? :) -- Juha
AW: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor
Hi, and thanks for the answer. But I'm a little bit confused about what you are suggesting. I did not really use the rootEntity attribute before. But from what I read in the documentation as far as I can tell that would result in two documents (maybe with the same id which would probably result in only one document being stored) because one for each root entity. It would be great if you could just sketch the setup with the entities I provided. Because currently I have no idea on how to do it. Regards Constantin -Ursprüngliche Nachricht- Von: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] Gesendet: Donnerstag, 20. Juni 2013 15:42 An: solr-user@lucene.apache.org Betreff: Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor it is possible to create two separate root entities . one for full-import and another for delta. for the delta-import you can skip Cache that way On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber constantin.wol...@medicalcolumbus.de wrote: Hi, i searched for a solution for quite some time but did not manage to find some real hints on how to fix it. I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a tomcat 6 container. My data import setup is basically the following: Data-config.xml: entity name=article dataSource=ds1 query=SELECT * FROM article deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date gt; '${dih.last_index_time} deltaImportQuery=SELECT * FROM article WHERE myownid=${dih.delta.myownid} pk=myownid field column=myownid name=id/ entity name=supplier dataSource=ds2 query=SELECT * FROM supplier WHERE status=1 processor=CachedSqlEntityProcessor cacheKey=SUPPLIER_ID cacheLookup=article.ARTICLE_SUPPLIER_ID /entity entity name=attributes dataSource=ds1 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' Value:'+ATTRIBUTE_VALUE FROM attributes cacheKey=ARTICLE_ID cacheLookup=article.myownid processor=CachedSqlEntityProcessor /entity /entity Ok now for the problem: At first I tried everything without the Cache. But the full-import took a very long time. Because the attributes query is pretty slow compared to the rest. As a result I got a processing speed of around 150 Documents/s. When switching everything to the CachedSqlEntityProcessor the full import processed at the speed of 4000 Documents/s So full import is running quite fine. Now I wanted to use the delta import. When running the delta import I was expecting the ramp up time to be about the same as in full import since I need to load the whole table supplier and attributes to the cache in the first step. But when looking into the log file the weird thing is solr seems to refresh the Cache for every single document that is processed. So currently my delta-import is a lot slower than the full-import. I even tried to add the deltaImportQuery parameter to the entity but it doesn't change the behavior at all (of course I know it is not supposed to change anything in the setup I run). The following solutions would be possible in my opinion: 1. Is there any way to tell the config to ignore the Cache when running a delta import? That would help already because we are talking about the maximum of 500 documents changed in 15 minutes compared to over 5 million documents in total. 2. Get solr to not refresh the cash for every document. Best Regards Constantin Wolber -- - Noble Paul
AW: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor
Hi, i may have been a little to fast with my response. After reading a bit more I imagine you meant running the full-import with the entity param for the root entity for full import. And running the delta import with the entity param for the delta entity. Is that correct? Regards Constantin -Ursprüngliche Nachricht- Von: Constantin Wolber [mailto:constantin.wol...@medicalcolumbus.de] Gesendet: Donnerstag, 20. Juni 2013 16:42 An: solr-user@lucene.apache.org Betreff: AW: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor Hi, and thanks for the answer. But I'm a little bit confused about what you are suggesting. I did not really use the rootEntity attribute before. But from what I read in the documentation as far as I can tell that would result in two documents (maybe with the same id which would probably result in only one document being stored) because one for each root entity. It would be great if you could just sketch the setup with the entities I provided. Because currently I have no idea on how to do it. Regards Constantin -Ursprüngliche Nachricht- Von: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] Gesendet: Donnerstag, 20. Juni 2013 15:42 An: solr-user@lucene.apache.org Betreff: Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor it is possible to create two separate root entities . one for full-import and another for delta. for the delta-import you can skip Cache that way On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber constantin.wol...@medicalcolumbus.de wrote: Hi, i searched for a solution for quite some time but did not manage to find some real hints on how to fix it. I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a tomcat 6 container. My data import setup is basically the following: Data-config.xml: entity name=article dataSource=ds1 query=SELECT * FROM article deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date gt; '${dih.last_index_time} deltaImportQuery=SELECT * FROM article WHERE myownid=${dih.delta.myownid} pk=myownid field column=myownid name=id/ entity name=supplier dataSource=ds2 query=SELECT * FROM supplier WHERE status=1 processor=CachedSqlEntityProcessor cacheKey=SUPPLIER_ID cacheLookup=article.ARTICLE_SUPPLIER_ID /entity entity name=attributes dataSource=ds1 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' Value:'+ATTRIBUTE_VALUE FROM attributes cacheKey=ARTICLE_ID cacheLookup=article.myownid processor=CachedSqlEntityProcessor /entity /entity Ok now for the problem: At first I tried everything without the Cache. But the full-import took a very long time. Because the attributes query is pretty slow compared to the rest. As a result I got a processing speed of around 150 Documents/s. When switching everything to the CachedSqlEntityProcessor the full import processed at the speed of 4000 Documents/s So full import is running quite fine. Now I wanted to use the delta import. When running the delta import I was expecting the ramp up time to be about the same as in full import since I need to load the whole table supplier and attributes to the cache in the first step. But when looking into the log file the weird thing is solr seems to refresh the Cache for every single document that is processed. So currently my delta-import is a lot slower than the full-import. I even tried to add the deltaImportQuery parameter to the entity but it doesn't change the behavior at all (of course I know it is not supposed to change anything in the setup I run). The following solutions would be possible in my opinion: 1. Is there any way to tell the config to ignore the Cache when running a delta import? That would help already because we are talking about the maximum of 500 documents changed in 15 minutes compared to over 5 million documents in total. 2. Get solr to not refresh the cash for every document. Best Regards Constantin Wolber -- - Noble Paul
Re: update solr.xml dynamically to add new cores
Hi, I wouldn't edit solr.xml directly for two reasons. One being that an already running Solr installation won't update with changes to that file, and might actually overwrite the changes that you make to it. And two, it's going away in a future release of Solr. Instead, I'd make the package that installed the Solr webapp and brought it up as you described, and have your independent index packages use either the CoreAdmin API or Collection API to create the indexes, depending on whether you're using Solr Cloud or not: http://wiki.apache.org/solr/CoreAdmin https://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Wed, Jun 19, 2013 at 8:27 PM, smanad sma...@gmail.com wrote: Hi, Is there a way to edit solr.xml as a part of debian package installation to add new cores. In my use case, there 4 solr indexes and they are managed/configured by different teams. The way I am thinking packages will work is as described below, 1. There will be a solr-base debian package which comes with solr installtion with tomcat setup (I am planning to use solr 4.3) 2. There will be individual index debian packages like, solr-index1, solr-index2 which will be dependent on solr-base. Each package's DEBIAN postinst script will have a logic to edit solr.xml to add new index like index1, index2, etc. Does this sound good? or is there a better/different way to do this? Any pointers will be much appreciated. Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud - Score calculation
Thanks for your response. So in case of SolrCloud, SOLR/zookeeper takes care of managing the indexing / searching. So in that case I assume most of the shards will be of equal size (I am just going to push the data to a leader). I assume IDF wont be a big issue then since the shards size are almost equal... Am I right? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Score-calculation-tp4071805p4071900.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud - Score calculation
Even if shards are exactly the same size, the distribution of terms may not be equal in each shard. But, yes, if shard size and term distribution are equal, then IDF should be comparable across shards, sort of. -- Jack Krupansky -Original Message- From: Learner Sent: Thursday, June 20, 2013 11:05 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud - Score calculation Thanks for your response. So in case of SolrCloud, SOLR/zookeeper takes care of managing the indexing / searching. So in that case I assume most of the shards will be of equal size (I am just going to push the data to a leader). I assume IDF wont be a big issue then since the shards size are almost equal... Am I right? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Score-calculation-tp4071805p4071900.html Sent from the Solr - User mailing list archive at Nabble.com.
Get all values from a field
Hello, I'm looking to retreive all distinct values of a specific field. My documents have field like : id name cat ref model brand I wich to be able to retreive all cat distinct values. How could I do that with Solr, I'm totally block Please help Regards David
Re: Get all values from a field
David - This is effectively faceting. If you want to see all cat values across all documents, do /select?q=*:*rows=0facet=onfacet.field=cat and you'll get what you're looking for. Erik On Jun 20, 2013, at 11:35 , It-forum wrote: Hello, I'm looking to retreive all distinct values of a specific field. My documents have field like : id name cat ref model brand I wich to be able to retreive all cat distinct values. How could I do that with Solr, I'm totally block Please help Regards David
Re: Informal poll on running Solr 4 on Java 7 with G1GC
On 6/20/2013 8:02 AM, John Nielsen wrote: We used to use G1, but recently went back to CMS. G1 gave us too long stop-the-world events. CMS uses more ressources for the same work, but it is more predictable and we get better worst-case performance out of it. This is exactly the behavior I saw. When you take a look at the overall stats and the memory graph over time, G1 looks way better. Unfortunately GC with any collector does sometimes get bad, and when that happens, un-tuned G1 is a little worse than un-tuned CMS. Perhaps if G1 were tuned, it would be really good, but I haven't been able to find any information on how to tune G1. jHiccup or gclogviewer can give you really good insight into how your GC is doing in both average and worst-case scenarios. jHiccup is a wrapper for your program and gclogviewer draws graphs from GC logs. I'm not sure whether gclogviewer works with G1 logs or not, but I know that jHiccup will work with G1. http://www.azulsystems.com/downloads/jHiccup http://code.google.com/p/gclogviewer/downloads/list http://code.google.com/p/gclogviewer/source/checkout http://code.google.com/p/gclogviewer/issues/detail?id=7 Thanks, Shawn
Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor
yes. that's right On Thu, Jun 20, 2013 at 8:16 PM, Constantin Wolber constantin.wol...@medicalcolumbus.de wrote: Hi, i may have been a little to fast with my response. After reading a bit more I imagine you meant running the full-import with the entity param for the root entity for full import. And running the delta import with the entity param for the delta entity. Is that correct? Regards Constantin -Ursprüngliche Nachricht- Von: Constantin Wolber [mailto:constantin.wol...@medicalcolumbus.de] Gesendet: Donnerstag, 20. Juni 2013 16:42 An: solr-user@lucene.apache.org Betreff: AW: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor Hi, and thanks for the answer. But I'm a little bit confused about what you are suggesting. I did not really use the rootEntity attribute before. But from what I read in the documentation as far as I can tell that would result in two documents (maybe with the same id which would probably result in only one document being stored) because one for each root entity. It would be great if you could just sketch the setup with the entities I provided. Because currently I have no idea on how to do it. Regards Constantin -Ursprüngliche Nachricht- Von: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] Gesendet: Donnerstag, 20. Juni 2013 15:42 An: solr-user@lucene.apache.org Betreff: Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor it is possible to create two separate root entities . one for full-import and another for delta. for the delta-import you can skip Cache that way On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber constantin.wol...@medicalcolumbus.de wrote: Hi, i searched for a solution for quite some time but did not manage to find some real hints on how to fix it. I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a tomcat 6 container. My data import setup is basically the following: Data-config.xml: entity name=article dataSource=ds1 query=SELECT * FROM article deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date gt; '${dih.last_index_time} deltaImportQuery=SELECT * FROM article WHERE myownid=${dih.delta.myownid} pk=myownid field column=myownid name=id/ entity name=supplier dataSource=ds2 query=SELECT * FROM supplier WHERE status=1 processor=CachedSqlEntityProcessor cacheKey=SUPPLIER_ID cacheLookup=article.ARTICLE_SUPPLIER_ID /entity entity name=attributes dataSource=ds1 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' Value:'+ATTRIBUTE_VALUE FROM attributes cacheKey=ARTICLE_ID cacheLookup=article.myownid processor=CachedSqlEntityProcessor /entity /entity Ok now for the problem: At first I tried everything without the Cache. But the full-import took a very long time. Because the attributes query is pretty slow compared to the rest. As a result I got a processing speed of around 150 Documents/s. When switching everything to the CachedSqlEntityProcessor the full import processed at the speed of 4000 Documents/s So full import is running quite fine. Now I wanted to use the delta import. When running the delta import I was expecting the ramp up time to be about the same as in full import since I need to load the whole table supplier and attributes to the cache in the first step. But when looking into the log file the weird thing is solr seems to refresh the Cache for every single document that is processed. So currently my delta-import is a lot slower than the full-import. I even tried to add the deltaImportQuery parameter to the entity but it doesn't change the behavior at all (of course I know it is not supposed to change anything in the setup I run). The following solutions would be possible in my opinion: 1. Is there any way to tell the config to ignore the Cache when running a delta import? That would help already because we are talking about the maximum of 500 documents changed in 15 minutes compared to over 5 million documents in total. 2. Get solr to not refresh the cash for every document. Best Regards Constantin Wolber -- - Noble Paul -- - Noble Paul
Need help on Solr
Hello, I am trying to index a pdf file on Solr. I am running icurrently Solr on Apache Tomcat 6. When I try to index it I get below error. Please help. I was not able to rectify this error with help of internet. ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer; Unable to create core: collection1 org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176) at org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176) at org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) ... 10 more INFO - 2013-06-20 20:43:41.553; org.apache.solr.servlet.SolrDispatchFilter; user.dir=C:\Program Files\Apache Software Foundation\Tomcat 6.0 INFO - 2013-06-20 20:43:41.553; org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init() done ERROR - 2013-06-20 20:43:41.820; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1212) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at
Re: Need help on Solr
org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' You might have defined an id field in the schema file. The out of box schema file already contains an id field . -- Shreejay On Thursday, June 20, 2013 at 9:16, Abhishek Bansal wrote: Hello, I am trying to index a pdf file on Solr. I am running icurrently Solr on Apache Tomcat 6. When I try to index it I get below error. Please help. I was not able to rectify this error with help of internet. ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer; Unable to create core: collection1 org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176) at org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176) at org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) ... 10 more INFO - 2013-06-20 20:43:41.553; org.apache.solr.servlet.SolrDispatchFilter; user.dir=C:\Program Files\Apache Software Foundation\Tomcat 6.0 INFO - 2013-06-20 20:43:41.553; org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init() done ERROR - 2013-06-20 20:43:41.820; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1212) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at
Re: Informal poll on running Solr 4 on Java 7 with G1GC
Awesome info, thanks Shawn! I'll post back my results with G1 after we've had some time to analyze it in production. On Thu, Jun 20, 2013 at 11:01 AM, Shawn Heisey s...@elyograg.org wrote: On 6/20/2013 8:02 AM, John Nielsen wrote: We used to use G1, but recently went back to CMS. G1 gave us too long stop-the-world events. CMS uses more ressources for the same work, but it is more predictable and we get better worst-case performance out of it. This is exactly the behavior I saw. When you take a look at the overall stats and the memory graph over time, G1 looks way better. Unfortunately GC with any collector does sometimes get bad, and when that happens, un-tuned G1 is a little worse than un-tuned CMS. Perhaps if G1 were tuned, it would be really good, but I haven't been able to find any information on how to tune G1. jHiccup or gclogviewer can give you really good insight into how your GC is doing in both average and worst-case scenarios. jHiccup is a wrapper for your program and gclogviewer draws graphs from GC logs. I'm not sure whether gclogviewer works with G1 logs or not, but I know that jHiccup will work with G1. http://www.azulsystems.com/downloads/jHiccup http://code.google.com/p/gclogviewer/downloads/list http://code.google.com/p/gclogviewer/source/checkout http://code.google.com/p/gclogviewer/issues/detail?id=7 Thanks, Shawn
solr rpm
I am wondering why there is no official Solr RPM. I wish Solr releases rpm like Sphinx does, http://sphinxsearch.com/downloads/release/ -- View this message in context: http://lucene.472066.n3.nabble.com/solr-rpm-tp4071905.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help on Solr
Yeah I know, out of the box there is one id field. I removed it from schema.xml I have also added below code to automatically generate an ID. field name=id type=uuid indexed=true stored=true default=NEW multiValued=false/ field name=name type=text_general indexed=true stored=true/ field name=text type=text_general indexed=true stored=true/ with regards, Abhishek Bansal On 20 June 2013 21:49, Shreejay shreej...@gmail.com wrote: org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' You might have defined an id field in the schema file. The out of box schema file already contains an id field . -- Shreejay On Thursday, June 20, 2013 at 9:16, Abhishek Bansal wrote: Hello, I am trying to index a pdf file on Solr. I am running icurrently Solr on Apache Tomcat 6. When I try to index it I get below error. Please help. I was not able to rectify this error with help of internet. ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer; Unable to create core: collection1 org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176) at org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176) at org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) ... 10 more INFO - 2013-06-20 20:43:41.553; org.apache.solr.servlet.SolrDispatchFilter; user.dir=C:\Program Files\Apache Software Foundation\Tomcat 6.0 INFO - 2013-06-20 20:43:41.553; org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init() done ERROR - 2013-06-20 20:43:41.820; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: [schema.xml] Duplicate field definition for 'id'
Re: Need help on Solr
As I am running Solr on windows + tomcat I am using below command to index pdf. I hope this command is not faulty. Please check java -jar -Durl= http://localhost:8080/solr-4.3.0/update/extract?literal.id=1commit=true; post.jar sample.pdf with regards, Abhishek Bansal On 20 June 2013 21:56, Abhishek Bansal abhishek.bansa...@gmail.com wrote: Yeah I know, out of the box there is one id field. I removed it from schema.xml I have also added below code to automatically generate an ID. field name=id type=uuid indexed=true stored=true default=NEW multiValued=false/ field name=name type=text_general indexed=true stored=true/ field name=text type=text_general indexed=true stored=true/ with regards, Abhishek Bansal On 20 June 2013 21:49, Shreejay shreej...@gmail.com wrote: org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' You might have defined an id field in the schema file. The out of box schema file already contains an id field . -- Shreejay On Thursday, June 20, 2013 at 9:16, Abhishek Bansal wrote: Hello, I am trying to index a pdf file on Solr. I am running icurrently Solr on Apache Tomcat 6. When I try to index it I get below error. Please help. I was not able to rectify this error with help of internet. ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer; Unable to create core: collection1 org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176) at org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:176) at org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) ... 10 more INFO - 2013-06-20 20:43:41.553; org.apache.solr.servlet.SolrDispatchFilter; user.dir=C:\Program Files\Apache Software
Re: solr rpm
On 6/20/2013 9:30 AM, adamc wrote: I am wondering why there is no official Solr RPM. I wish Solr releases rpm like Sphinx does, http://sphinxsearch.com/downloads/release/ I agree with you that Solr should be much easier to get running than it is. There are some roadblocks, though. Solr isn't an executable program. It's a webapp. It requires a java servlet container to run. The Solr *example* has a slimmed down install of Jetty in it that you can run, but it's just that - an example. It is *not* a trivial thing to create an installable package of Solr. Packages are available for Debian and Ubuntu because those distributions have maintainers that have done the required work to split our package into smaller pieces. They've done a very different job than you would see if we were to make an installer, because they are integrating Lucene as a separate dependency for both Solr and for other search packages. Thanks, Shawn
RE: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor
Instead of specifying CachedSqlEntityProcessor, you can specify SqlEntityProcessor with cacheImpl='SortedMapBackedCache'. If you parametertize this, to have SortedMapBackedCache for full updates but blank for deltas I think it will cache only on the full import. Another option is to parameterize the child queries with a where clause, so if it is creating a new cache with every row, the cache will only contain the data needed for that child row. A third option is to do your delta imports like described here: http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport My experience is that this generally performs better than using the delta import feature anyhow. The trick is on handling deletes, which will require its own entity and the $deleteDocById command. See http://wiki.apache.org/solr/DataImportHandler#Special_Commands But these are all workarounds. This sounds like a bug or some subtle configuration problem. I looked through the JIRA issues and did not see anything like this reported yet, but if you're pretty sure you are doing everything correctly you may want to open a bug ticket. Be sure to flag it as contrib - Dataimporthandler. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Constantin Wolber [mailto:constantin.wol...@medicalcolumbus.de] Sent: Thursday, June 20, 2013 3:21 AM To: solr-user@lucene.apache.org Subject: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor Hi, i searched for a solution for quite some time but did not manage to find some real hints on how to fix it. I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a tomcat 6 container. My data import setup is basically the following: Data-config.xml: entity name=article dataSource=ds1 query=SELECT * FROM article deltaQuery=SELECT myownid FROM articleHistory WHERE modified_date gt; '${dih.last_index_time} deltaImportQuery=SELECT * FROM article WHERE myownid=${dih.delta.myownid} pk=myownid field column=myownid name=id/ entity name=supplier dataSource=ds2 query=SELECT * FROM supplier WHERE status=1 processor=CachedSqlEntityProcessor cacheKey=SUPPLIER_ID cacheLookup=article.ARTICLE_SUPPLIER_ID /entity entity name=attributes dataSource=ds1 query=SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' Value:'+ATTRIBUTE_VALUE FROM attributes cacheKey=ARTICLE_ID cacheLookup=article.myownid processor=CachedSqlEntityProcessor /entity /entity Ok now for the problem: At first I tried everything without the Cache. But the full-import took a very long time. Because the attributes query is pretty slow compared to the rest. As a result I got a processing speed of around 150 Documents/s. When switching everything to the CachedSqlEntityProcessor the full import processed at the speed of 4000 Documents/s So full import is running quite fine. Now I wanted to use the delta import. When running the delta import I was expecting the ramp up time to be about the same as in full import since I need to load the whole table supplier and attributes to the cache in the first step. But when looking into the log file the weird thing is solr seems to refresh the Cache for every single document that is processed. So currently my delta-import is a lot slower than the full-import. I even tried to add the deltaImportQuery parameter to the entity but it doesn't change the behavior at all (of course I know it is not supposed to change anything in the setup I run). The following solutions would be possible in my opinion: 1. Is there any way to tell the config to ignore the Cache when running a delta import? That would help already because we are talking about the maximum of 500 documents changed in 15 minutes compared to over 5 million documents in total. 2. Get solr to not refresh the cash for every document. Best Regards Constantin Wolber
Re: solr rpm
Thanks Shawn for explaining so fully. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-rpm-tp4071905p4071941.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr rpm
there is an rpm build framework for building a jetty powered solr rpm here if you are interested https://github.com/boogieshafer/jetty-solr-rpm its currently set for solr 4.3.0 + built in jetty example + jetty start script and configs + jmx + logging via logback framework edit the build script and spec file to suit your needs From: Shawn Heisey Sent: Thursday, June 20, 2013 09:48 To: solr-user@lucene.apache.org Subject: Re: solr rpm On 6/20/2013 9:30 AM, adamc wrote: I am wondering why there is no official Solr RPM. I wish Solr releases rpm like Sphinx does, http://sphinxsearch.com/downloads/release/ I agree with you that Solr should be much easier to get running than it is. There are some roadblocks, though. Solr isn't an executable program. It's a webapp. It requires a java servlet container to run. The Solr *example* has a slimmed down install of Jetty in it that you can run, but it's just that - an example. It is *not* a trivial thing to create an installable package of Solr. Packages are available for Debian and Ubuntu because those distributions have maintainers that have done the required work to split our package into smaller pieces. They've done a very different job than you would see if we were to make an installer, because they are integrating Lucene as a separate dependency for both Solr and for other search packages. Thanks, Shawn
Re: update solr.xml dynamically to add new cores
Thanks Michael, both the reasons make sense. Currently I am not planning on using SolrCloud so as you suggested if I can use http://wiki.apache.org/solr/CoreAdmin api. While doing that did you mean running a curl command similar to this, http://localhost:8983/solr/admin/cores?action=CREATEname=coreXinstanceDir=path_to_instance_directoryconfig=config_file_name.xmlschema=schem_file_name.xmldataDir=data as a part of 'postinst' script? or running it manually on the host after the index package is installed? ( I would love to do it as a part of pkg installation.) Also, there will be two cases here, if I am installing a new index package in that case create will work however, if I am updating a package with some tweaks to configs and schema then I need to check status to see if core is available and if yes, use reload else create. Does this make sense? Michael Della Bitta-2 wrote Hi, I wouldn't edit solr.xml directly for two reasons. One being that an already running Solr installation won't update with changes to that file, and might actually overwrite the changes that you make to it. And two, it's going away in a future release of Solr. Instead, I'd make the package that installed the Solr webapp and brought it up as you described, and have your independent index packages use either the CoreAdmin API or Collection API to create the indexes, depending on whether you're using Solr Cloud or not: http://wiki.apache.org/solr/CoreAdmin https://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions lt;https://twitter.com/Appinionsgt; | g+: plus.google.com/appinions w: appinions.com lt;http://www.appinions.com/gt; On Wed, Jun 19, 2013 at 8:27 PM, smanad lt; smanad@ gt; wrote: Hi, Is there a way to edit solr.xml as a part of debian package installation to add new cores. In my use case, there 4 solr indexes and they are managed/configured by different teams. The way I am thinking packages will work is as described below, 1. There will be a solr-base debian package which comes with solr installtion with tomcat setup (I am planning to use solr 4.3) 2. There will be individual index debian packages like, solr-index1, solr-index2 which will be dependent on solr-base. Each package's DEBIAN postinst script will have a logic to edit solr.xml to add new index like index1, index2, etc. Does this sound good? or is there a better/different way to do this? Any pointers will be much appreciated. Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800p4071970.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr, ICUTokenizer with Latin-break-only-on-whitespace
(to solr-user, CC'ing author I'm responding to) I found the solr-user listserv contribution at: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201305.mbox/%3c51965e70.6070...@elyograg.org%3E Which explain a way you can supply custom rulefiles to ICUTokenizer, in this case to tell it to only break on whitespace for Latin character substrings. I am trying to use the technique explained there in Solr 4.3, but either it's not working, or it's not doing what I'd expect. I want, for instance, C++ Language to be tokenized into C++, Language. But the ICUTokenizer, even with the rulefiles=Latn:Latin-break-only-on-whitespace.rbbi, with the rbbi file from the Solr 4.3 source [1]. But the ICUTokenizer, even with the that rulefile, is still stripping the punctuation, and tokenizing that into C, Language. Can anyone give me any guidance or hints? I don't entirely understand the semantics of the rbbi file to try debugging there. Is something not working, or does the rbbi file just not express the semantics I want? Thanks for any tips. [1] http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_3_0/lucene/analysis/icu/src/test/org/apache/lucene/analysis/icu/segmentation/Latin-break-only-on-whitespace.rbbi?revision=1479557view=markup
Re: Partial update using solr 4.3 with csv input
Thanks for confirming. So if my input is a csv file, I will need a script to read the delta changes one by one, convert it to json and then use 'update' handler with that piece of json data. Makes sense? Jack Krupansky-2 wrote Correct, no atomic update for CSV format. There just isn't any place to put the atomic update options in such a simple text format. -- Jack Krupansky -Original Message- From: smanad Sent: Wednesday, June 19, 2013 8:30 PM To: solr-user@.apache Subject: Partial update using solr 4.3 with csv input I was going through this link http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ and one of the comments is about support for csv. Since the comment is almost a year old, just wondering if this is still true that, partial updates are possible only with xml and json input? Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801p4071972.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help on Solr
On Jun 20, 2013, at 18:26 , Abhishek Bansal abhishek.bansa...@gmail.com wrote: Yeah I know, out of the box there is one id field. I removed it from schema.xml I have also added below code to automatically generate an ID. field name=id type=uuid indexed=true stored=true default=NEW multiValued=false/ field name=name type=text_general indexed=true stored=true/ field name=text type=text_general indexed=true stored=true/ Is that a valid configuration for an id field (assuming that the field id is also defined as uniqueKey)?∆
Re: Solr, ICUTokenizer with Latin-break-only-on-whitespace
On 6/20/2013 1:26 PM, Jonathan Rochkind wrote: I want, for instance, C++ Language to be tokenized into C++, Language. But the ICUTokenizer, even with the rulefiles=Latn:Latin-break-only-on-whitespace.rbbi, with the rbbi file from the Solr 4.3 source [1]. But the ICUTokenizer, even with the that rulefile, is still stripping the punctuation, and tokenizing that into C, Language. This screenshot is using branch_4x downloaded and compiled a couple of hours ago, with the rbbi file you mentioned copied to the conf directory: https://dl.dropboxusercontent.com/u/97770508/icutokenizer-whitespace-only.png It shows that the ++ is maintained by the ICU tokenizer. It also illustrates a UI bug that I will have to show to steffkes where the ++ is lost from the input field after analysis. Thanks, Shawn
RE: Informal poll on running Solr 4 on Java 7 with G1GC
I've been trying it out on solr 3.6.1 with a 32GB heap and G1GC seems to be more prone to OOMEs than CMS. I have been running it on one slave box in our farm and the rest of the slaves are still on CMS and three times now it has gone OOM on me whereas the rest of our slaves kept chugging along with no errors. I even went from no other tuning params to using these suggested on Shawns wiki page here and that didn't help either, still got some OOMs. I'm giving it a 'fail' pretty soon here. -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -XX:+OptimizeStringConcat -XX:+UseFastAccessorMethods -XX:+UseG1GC -XX:+UseStringCache -XX:-UseSplitVerifier -XX:MaxGCPauseMillis=50 http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Thanks Robi -Original Message- From: Timothy Potter [mailto:thelabd...@gmail.com] Sent: Thursday, June 20, 2013 9:22 AM To: solr-user@lucene.apache.org Subject: Re: Informal poll on running Solr 4 on Java 7 with G1GC Awesome info, thanks Shawn! I'll post back my results with G1 after we've had some time to analyze it in production. On Thu, Jun 20, 2013 at 11:01 AM, Shawn Heisey s...@elyograg.org wrote: On 6/20/2013 8:02 AM, John Nielsen wrote: We used to use G1, but recently went back to CMS. G1 gave us too long stop-the-world events. CMS uses more ressources for the same work, but it is more predictable and we get better worst-case performance out of it. This is exactly the behavior I saw. When you take a look at the overall stats and the memory graph over time, G1 looks way better. Unfortunately GC with any collector does sometimes get bad, and when that happens, un-tuned G1 is a little worse than un-tuned CMS. Perhaps if G1 were tuned, it would be really good, but I haven't been able to find any information on how to tune G1. jHiccup or gclogviewer can give you really good insight into how your GC is doing in both average and worst-case scenarios. jHiccup is a wrapper for your program and gclogviewer draws graphs from GC logs. I'm not sure whether gclogviewer works with G1 logs or not, but I know that jHiccup will work with G1. http://www.azulsystems.com/downloads/jHiccup http://code.google.com/p/gclogviewer/downloads/list http://code.google.com/p/gclogviewer/source/checkout http://code.google.com/p/gclogviewer/issues/detail?id=7 Thanks, Shawn
Re: solr rpm
On Thu, Jun 20, 2013 at 12:48 PM, Shawn Heisey s...@elyograg.org wrote: They've done a very different job than you would see if we were to make an installer, because they are integrating Lucene as a separate dependency for both Solr and for other search packages. Is that the only thing that's different? Does this one thing make a lot of impact? Or is there a bunch of others? I wonder if one could write a 'reasonable deployment guide'. Would that then make it easier for package creators to do their job (Solr in Puppet, Solr in Chef, Solr in RPM, Solr on Windows, etc)? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: solr rpm
On 6/20/2013 1:44 PM, Alexandre Rafalovitch wrote: On Thu, Jun 20, 2013 at 12:48 PM, Shawn Heisey s...@elyograg.org wrote: They've done a very different job than you would see if we were to make an installer, because they are integrating Lucene as a separate dependency for both Solr and for other search packages. Is that the only thing that's different? Does this one thing make a lot of impact? They've split it into solr-common, solr-jetty, solr-tomcat, and at least one other package that's just lucene, no Solr. I wonder if one could write a 'reasonable deployment guide'. Would that then make it easier for package creators to do their job (Solr in Puppet, Solr in Chef, Solr in RPM, Solr on Windows, etc)? I would really like to do this, and I've filed some issues in JIRA about it. For me, it's just an issue of available time to work on it. I'm willing to do the work, and IMHO I'm reasonably capable. If you want to put some time into it, I'm willing to look it over and make sure it gets included and/or mentioned in what users see when they download. Thanks, Shawn
Re: Solr, ICUTokenizer with Latin-break-only-on-whitespace
Thank you... I started out writing an email with screenshots proving that it wasn't working for me in 4.3.0... and of course, having to confirm every single detail in order to say I confirmed it... I realized it was a mistake on my part, not testing what I thought I was testing. Does indeed appear to be working now. Thanks! And thanks for this feature. On 6/20/2013 3:40 PM, Shawn Heisey wrote: On 6/20/2013 1:26 PM, Jonathan Rochkind wrote: I want, for instance, C++ Language to be tokenized into C++, Language. But the ICUTokenizer, even with the rulefiles=Latn:Latin-break-only-on-whitespace.rbbi, with the rbbi file from the Solr 4.3 source [1]. But the ICUTokenizer, even with the that rulefile, is still stripping the punctuation, and tokenizing that into C, Language. This screenshot is using branch_4x downloaded and compiled a couple of hours ago, with the rbbi file you mentioned copied to the conf directory: https://dl.dropboxusercontent.com/u/97770508/icutokenizer-whitespace-only.png It shows that the ++ is maintained by the ICU tokenizer. It also illustrates a UI bug that I will have to show to steffkes where the ++ is lost from the input field after analysis. Thanks, Shawn
Re: Partial update using solr 4.3 with csv input
I'd have to see the whole scenario... What's an example of the original input, and then some examples of the kind of updates. Generally, CSV is most useful simply to bulk import (or export) data. It wasn't really designed for incremental update of existing documents. -- Jack Krupansky -Original Message- From: smanad Sent: Thursday, June 20, 2013 3:28 PM To: solr-user@lucene.apache.org Subject: Re: Partial update using solr 4.3 with csv input Thanks for confirming. So if my input is a csv file, I will need a script to read the delta changes one by one, convert it to json and then use 'update' handler with that piece of json data. Makes sense? Jack Krupansky-2 wrote Correct, no atomic update for CSV format. There just isn't any place to put the atomic update options in such a simple text format. -- Jack Krupansky -Original Message- From: smanad Sent: Wednesday, June 19, 2013 8:30 PM To: solr-user@.apache Subject: Partial update using solr 4.3 with csv input I was going through this link http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ and one of the comments is about support for csv. Since the comment is almost a year old, just wondering if this is still true that, partial updates are possible only with xml and json input? Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801p4071972.html Sent from the Solr - User mailing list archive at Nabble.com.
Multivalued facet with 0 unexpected results
Hi all, we are getting some facet (faceting a multivalued field) values with 0 results using *:* query. I think this is really strange, since we are using MatchAllQuery there is no way we can get 0 results in any value. That 0 results values were present in the index before the reindex we made. We fixed it, so far, sending a commit and an optimize. We still need to obtain facets with 0 results with current values, so mincount is not an option. Solr version: 3.6.0 Grouping: false post grouping faceting: false filter queries: 0 Is this any known bug or an intended behaviour? Only happens with uninvertedfield? Many thanks :) -- Un saludo, Samuel García.
Re: Multivalued facet with 0 unexpected results
just to clarify, we send manually the commit and optimize we use to fix this problem. The index process send its own commit, making searchable the new facet values. But it seems that this process is not deleting any previous value filled used by the uninvertedfield. On Thu, Jun 20, 2013 at 11:42 PM, Samuel García Martínez samuelgmarti...@gmail.com wrote: Hi all, we are getting some facet (faceting a multivalued field) values with 0 results using *:* query. I think this is really strange, since we are using MatchAllQuery there is no way we can get 0 results in any value. That 0 results values were present in the index before the reindex we made. We fixed it, so far, sending a commit and an optimize. We still need to obtain facets with 0 results with current values, so mincount is not an option. Solr version: 3.6.0 Grouping: false post grouping faceting: false filter queries: 0 Is this any known bug or an intended behaviour? Only happens with uninvertedfield? Many thanks :) -- Un saludo, Samuel García. -- Un saludo, Samuel García.
Re: Partial update using solr 4.3 with csv input
Note that even though partial updates sounds like what you should do (because only part of your data has changed), unless you are dealing with lots of data, just re-adding everything (if possible) can be plenty fast. So before you write complex code to construct partial updates from your csv files, benchmark to see if it's really a problem. For example, we used to fully import a DB (~800K records) always because it'd take around 5 minutes - there was no need to write a delta system. On Fri, Jun 21, 2013 at 12:58 AM, smanad sma...@gmail.com wrote: Thanks for confirming. So if my input is a csv file, I will need a script to read the delta changes one by one, convert it to json and then use 'update' handler with that piece of json data. Makes sense? Jack Krupansky-2 wrote Correct, no atomic update for CSV format. There just isn't any place to put the atomic update options in such a simple text format. -- Jack Krupansky -Original Message- From: smanad Sent: Wednesday, June 19, 2013 8:30 PM To: solr-user@.apache Subject: Partial update using solr 4.3 with csv input I was going through this link http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ and one of the comments is about support for csv. Since the comment is almost a year old, just wondering if this is still true that, partial updates are possible only with xml and json input? Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801p4071972.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Multivalued facet with 0 unexpected results
It sounds suspiciously similar to https://issues.apache.org/jira/browse/SOLR-3793 which was fixed in Solr 4.0 You should upgrade to a more recent Solr version (4.3.1 is the latest) and see if it's still a problem for you. On Fri, Jun 21, 2013 at 3:19 AM, Samuel García Martínez samuelgmarti...@gmail.com wrote: just to clarify, we send manually the commit and optimize we use to fix this problem. The index process send its own commit, making searchable the new facet values. But it seems that this process is not deleting any previous value filled used by the uninvertedfield. On Thu, Jun 20, 2013 at 11:42 PM, Samuel García Martínez samuelgmarti...@gmail.com wrote: Hi all, we are getting some facet (faceting a multivalued field) values with 0 results using *:* query. I think this is really strange, since we are using MatchAllQuery there is no way we can get 0 results in any value. That 0 results values were present in the index before the reindex we made. We fixed it, so far, sending a commit and an optimize. We still need to obtain facets with 0 results with current values, so mincount is not an option. Solr version: 3.6.0 Grouping: false post grouping faceting: false filter queries: 0 Is this any known bug or an intended behaviour? Only happens with uninvertedfield? Many thanks :) -- Un saludo, Samuel García. -- Un saludo, Samuel García. -- Regards, Shalin Shekhar Mangar.
SolrCloud replication issues
Hello, first: I am pretty much a Solr newcomer, so don't necessarily assume basic solr knowledge. My problem is that in my setup SolrCloud seems to create way too much network traffic for replication. I hope I'm just missing some proper config options. Here's the setup first: * I am running a five node SolrCloud cluster on top of an external 5 node zookeeper cluster, according to logs and clusterstate.json all nodes find each other and are happy * Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I thought upgrade might solve the issue because of https://issues.apache.org/jira/browse/SOLR-4471) * there is only one shard * solr.xml and solrconfig.xml are out of the box, except for the enabled soft commit autoSoftCommit maxTime1000/maxTime /autoSoftCommit * our index is minimal at the moment (dev and testing stage) 20-30Mb, about 30k small docs The issue is when I run smallish load tests against our app which posts ca 1-2 docs/sec to solr, the SolrCloud leader creates outgoing network traffic of 20-30Mbyte/sec and the non-leader receive 4-8MByte/sec each. The non-leaders logs are full of entries like INFO - 2013-06-21 01:08:58.624; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.640; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.643; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.651; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.892; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.893; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover So my assumption is I am making config errors and the cloud leader tries to push the index to all non-leaders over and over again. But I couldn't really find much doco on how to properly configure SolrCloud replication online. Any hints and help much appreciated. I can provide more info or data, just let me know what you need. Thanks in advance, Sven
Re: SolrCloud replication issues
This doesn't seem right. A leader will ask a replica to recover only when an update request could not be forwarded to it. Can you check your leader logs to see why updates are not being sent through to replicas? On Fri, Jun 21, 2013 at 7:03 AM, Sven Stark sven.st...@m-square.com.au wrote: Hello, first: I am pretty much a Solr newcomer, so don't necessarily assume basic solr knowledge. My problem is that in my setup SolrCloud seems to create way too much network traffic for replication. I hope I'm just missing some proper config options. Here's the setup first: * I am running a five node SolrCloud cluster on top of an external 5 node zookeeper cluster, according to logs and clusterstate.json all nodes find each other and are happy * Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I thought upgrade might solve the issue because of https://issues.apache.org/jira/browse/SOLR-4471) * there is only one shard * solr.xml and solrconfig.xml are out of the box, except for the enabled soft commit autoSoftCommit maxTime1000/maxTime /autoSoftCommit * our index is minimal at the moment (dev and testing stage) 20-30Mb, about 30k small docs The issue is when I run smallish load tests against our app which posts ca 1-2 docs/sec to solr, the SolrCloud leader creates outgoing network traffic of 20-30Mbyte/sec and the non-leader receive 4-8MByte/sec each. The non-leaders logs are full of entries like INFO - 2013-06-21 01:08:58.624; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.640; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.643; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.651; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.892; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.893; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover So my assumption is I am making config errors and the cloud leader tries to push the index to all non-leaders over and over again. But I couldn't really find much doco on how to properly configure SolrCloud replication online. Any hints and help much appreciated. I can provide more info or data, just let me know what you need. Thanks in advance, Sven -- Regards, Shalin Shekhar Mangar.
Re: Sharding and Replication clarification
On Wed, Jun 19, 2013 at 11:12 PM, Asif talla...@gmail.com wrote: Hi, I had questions on implementation of Sharding and Replication features of Solr/Cloud. 1. I noticed that when sharding is enabled for a collection - individual requests are sent to each node serving as a shard. Yes, search requests are distributed to a member of each logical shard. If you know the shard that you want to search you can specify a shards=shard1,shard2 parameter to limit searches to those shards only. 2. Replication too follows above strategy of sending individual documents to the nodes serving as a replica. Yes, full documents are replicated in SolrCloud for normal updates instead of index fragments as used to be the case in non-cloud replication. The old replication method is still used for recovery. I am working with a system that requires massive number of writes - I have noticed that due to above reason - the cloud eventually starts to fail (Even though I am using a ensemble). I do understand the reason behind individual updates - but why not batch them up or give a option to batch N updates in either of the above case - I did come across a presentation that talked about batching 10 updates for replication at least, but I do not think this is the case. - Asif If you send a batch update request then the requests to replicas will be batched. I think the default is that 10 documents per replica node are batched together. But if you send one document at a time then the replication will also happen one document at a time. -- Regards, Shalin Shekhar Mangar.
Re: SolrCloud replication issues
Thanks for the super quick reply. The logs are pretty big, but one thing comes up over and over again: Leader side: ERROR - 2013-06-21 01:44:24.014; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error Non-Leader side: 757682 [RecoveryThread] ERROR org.apache.solr.update.PeerSync – PeerSync: core=collection1 url=http://xxx:xxx:xx:xx:8983/solr Error applying updates from [Ljava.lang.String;@1be0799a ,update=[1, 1438251416655233024, SolrInputDocument[type=topic, fullId=9ce54310-d89a-11e2-b89d-22000af02b44, account=account1, site=mySite, topic=topic5, id=account1mySitetopic5, totalCount=195, approvedCount=195, declinedCount=0, flaggedCount=0, createdOn=2013-06-19T04:42:14.329Z, updatedOn=2013-06-19T04:42:14.386Z, _version_=1438251416655233024]] java.lang.UnsupportedOperationException at org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46) at org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:201) at org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:718) at org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:184) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:635) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:487) at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:335) at org.apache.solr.update.PeerSync.sync(PeerSync.java:265) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:366) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223) Unfortunately I don't see what kind of UnsupportedOperation this could be referring to. Many thanks, Sven On Fri, Jun 21, 2013 at 11:44 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This doesn't seem right. A leader will ask a replica to recover only when an update request could not be forwarded to it. Can you check your leader logs to see why updates are not being sent through to replicas? On Fri, Jun 21, 2013 at 7:03 AM, Sven Stark sven.st...@m-square.com.au wrote: Hello, first: I am pretty much a Solr newcomer, so don't necessarily assume basic solr knowledge. My problem is that in my setup SolrCloud seems to create way too much network traffic for replication. I hope I'm just missing some proper config options. Here's the setup first: * I am running a five node SolrCloud cluster on top of an external 5 node zookeeper cluster, according to logs and clusterstate.json all nodes find each other and are happy * Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I thought upgrade might solve the issue because of https://issues.apache.org/jira/browse/SOLR-4471) * there is only one shard * solr.xml and solrconfig.xml are out of the box, except for the enabled soft commit autoSoftCommit maxTime1000/maxTime /autoSoftCommit * our index is minimal at the moment (dev and testing stage) 20-30Mb, about 30k small docs The issue is when I run smallish load tests against our app which posts ca 1-2 docs/sec to solr, the SolrCloud leader creates outgoing network traffic of 20-30Mbyte/sec and the non-leader receive 4-8MByte/sec each. The non-leaders logs are full of entries like INFO - 2013-06-21 01:08:58.624; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.640; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.643; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.651; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO
Re: SolrCloud replication issues
Actually this looks very much like http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201304.mbox/%3ccacbkj07ob4kjxwe_ogzfuqg5qg99qwpovbzkdota8bihcis...@mail.gmail.com%3E Sven On Fri, Jun 21, 2013 at 11:54 AM, Sven Stark sven.st...@m-square.com.auwrote: Thanks for the super quick reply. The logs are pretty big, but one thing comes up over and over again: Leader side: ERROR - 2013-06-21 01:44:24.014; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error Non-Leader side: 757682 [RecoveryThread] ERROR org.apache.solr.update.PeerSync – PeerSync: core=collection1 url=http://xxx:xxx:xx:xx:8983/solr Error applying updates from [Ljava.lang.String;@1be0799a ,update=[1, 1438251416655233024, SolrInputDocument[type=topic, fullId=9ce54310-d89a-11e2-b89d-22000af02b44, account=account1, site=mySite, topic=topic5, id=account1mySitetopic5, totalCount=195, approvedCount=195, declinedCount=0, flaggedCount=0, createdOn=2013-06-19T04:42:14.329Z, updatedOn=2013-06-19T04:42:14.386Z, _version_=1438251416655233024]] java.lang.UnsupportedOperationException at org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46) at org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:201) at org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:718) at org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:184) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:635) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:487) at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:335) at org.apache.solr.update.PeerSync.sync(PeerSync.java:265) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:366) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223) Unfortunately I don't see what kind of UnsupportedOperation this could be referring to. Many thanks, Sven On Fri, Jun 21, 2013 at 11:44 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This doesn't seem right. A leader will ask a replica to recover only when an update request could not be forwarded to it. Can you check your leader logs to see why updates are not being sent through to replicas? On Fri, Jun 21, 2013 at 7:03 AM, Sven Stark sven.st...@m-square.com.au wrote: Hello, first: I am pretty much a Solr newcomer, so don't necessarily assume basic solr knowledge. My problem is that in my setup SolrCloud seems to create way too much network traffic for replication. I hope I'm just missing some proper config options. Here's the setup first: * I am running a five node SolrCloud cluster on top of an external 5 node zookeeper cluster, according to logs and clusterstate.json all nodes find each other and are happy * Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I thought upgrade might solve the issue because of https://issues.apache.org/jira/browse/SOLR-4471) * there is only one shard * solr.xml and solrconfig.xml are out of the box, except for the enabled soft commit autoSoftCommit maxTime1000/maxTime /autoSoftCommit * our index is minimal at the moment (dev and testing stage) 20-30Mb, about 30k small docs The issue is when I run smallish load tests against our app which posts ca 1-2 docs/sec to solr, the SolrCloud leader creates outgoing network traffic of 20-30Mbyte/sec and the non-leader receive 4-8MByte/sec each. The non-leaders logs are full of entries like INFO - 2013-06-21 01:08:58.624; org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that we recover INFO - 2013-06-21 01:08:58.640;
Re: SolrCloud replication issues
Okay so from the same thread, have you made sure the _version_ field is a long in schema? field name=_version_ type=long indexed=true stored=true multiValued=false/ On Fri, Jun 21, 2013 at 7:44 AM, Sven Stark sven.st...@m-square.com.au wrote: Actually this looks very much like http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201304.mbox/%3ccacbkj07ob4kjxwe_ogzfuqg5qg99qwpovbzkdota8bihcis...@mail.gmail.com%3E Sven On Fri, Jun 21, 2013 at 11:54 AM, Sven Stark sven.st...@m-square.com.auwrote: Thanks for the super quick reply. The logs are pretty big, but one thing comes up over and over again: Leader side: ERROR - 2013-06-21 01:44:24.014; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error Non-Leader side: 757682 [RecoveryThread] ERROR org.apache.solr.update.PeerSync – PeerSync: core=collection1 url=http://xxx:xxx:xx:xx:8983/solr Error applying updates from [Ljava.lang.String;@1be0799a ,update=[1, 1438251416655233024, SolrInputDocument[type=topic, fullId=9ce54310-d89a-11e2-b89d-22000af02b44, account=account1, site=mySite, topic=topic5, id=account1mySitetopic5, totalCount=195, approvedCount=195, declinedCount=0, flaggedCount=0, createdOn=2013-06-19T04:42:14.329Z, updatedOn=2013-06-19T04:42:14.386Z, _version_=1438251416655233024]] java.lang.UnsupportedOperationException at org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46) at org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:201) at org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:718) at org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:184) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:635) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:487) at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:335) at org.apache.solr.update.PeerSync.sync(PeerSync.java:265) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:366) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223) Unfortunately I don't see what kind of UnsupportedOperation this could be referring to. Many thanks, Sven On Fri, Jun 21, 2013 at 11:44 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This doesn't seem right. A leader will ask a replica to recover only when an update request could not be forwarded to it. Can you check your leader logs to see why updates are not being sent through to replicas? On Fri, Jun 21, 2013 at 7:03 AM, Sven Stark sven.st...@m-square.com.au wrote: Hello, first: I am pretty much a Solr newcomer, so don't necessarily assume basic solr knowledge. My problem is that in my setup SolrCloud seems to create way too much network traffic for replication. I hope I'm just missing some proper config options. Here's the setup first: * I am running a five node SolrCloud cluster on top of an external 5 node zookeeper cluster, according to logs and clusterstate.json all nodes find each other and are happy * Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I thought upgrade might solve the issue because of https://issues.apache.org/jira/browse/SOLR-4471) * there is only one shard * solr.xml and solrconfig.xml are out of the box, except for the enabled soft commit autoSoftCommit maxTime1000/maxTime /autoSoftCommit * our index is minimal at the moment (dev and testing stage) 20-30Mb, about 30k small docs The issue is when I run smallish load tests against our app which posts ca 1-2 docs/sec to solr, the SolrCloud leader creates outgoing network traffic of 20-30Mbyte/sec and the non-leader receive
Re: Informal poll on running Solr 4 on Java 7 with G1GC
It would be good to see some CMS configs too... Can you send your java params? On Wed, Jun 19, 2013 at 8:46 PM, Shawn Heisey s...@elyograg.org wrote: On 6/19/2013 4:18 PM, Timothy Potter wrote: I'm sure there's some site to do this but wanted to get a feel for who's running Solr 4 on Java 7 with G1 gc enabled? I have tried it, but found that G1 didn't give me any better GC pause characteristics than CMS without tuning, and may have actually been worse. Now I use CMS with several tuning options. Thanks, Shawn -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: SolrCloud replication issues
I think you're onto it. Our schema.xml had it field name=_version_ type=string indexed=true stored=true multiValued=false/ I'll change and test it. Will probably not happen before Monday though. Many thanks already, Sven On Fri, Jun 21, 2013 at 2:18 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Okay so from the same thread, have you made sure the _version_ field is a long in schema? field name=_version_ type=long indexed=true stored=true multiValued=false/ On Fri, Jun 21, 2013 at 7:44 AM, Sven Stark sven.st...@m-square.com.au wrote: Actually this looks very much like http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201304.mbox/%3ccacbkj07ob4kjxwe_ogzfuqg5qg99qwpovbzkdota8bihcis...@mail.gmail.com%3E Sven On Fri, Jun 21, 2013 at 11:54 AM, Sven Stark sven.st...@m-square.com.au wrote: Thanks for the super quick reply. The logs are pretty big, but one thing comes up over and over again: Leader side: ERROR - 2013-06-21 01:44:24.014; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983 /solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983 /solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983 /solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error Non-Leader side: 757682 [RecoveryThread] ERROR org.apache.solr.update.PeerSync – PeerSync: core=collection1 url=http://xxx:xxx:xx:xx:8983/solr Error applying updates from [Ljava.lang.String;@1be0799a ,update=[1, 1438251416655233024, SolrInputDocument[type=topic, fullId=9ce54310-d89a-11e2-b89d-22000af02b44, account=account1, site=mySite, topic=topic5, id=account1mySitetopic5, totalCount=195, approvedCount=195, declinedCount=0, flaggedCount=0, createdOn=2013-06-19T04:42:14.329Z, updatedOn=2013-06-19T04:42:14.386Z, _version_=1438251416655233024]] java.lang.UnsupportedOperationException at org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46) at org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:201) at org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:718) at org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:184) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:635) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:487) at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:335) at org.apache.solr.update.PeerSync.sync(PeerSync.java:265) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:366) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223) Unfortunately I don't see what kind of UnsupportedOperation this could be referring to. Many thanks, Sven On Fri, Jun 21, 2013 at 11:44 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This doesn't seem right. A leader will ask a replica to recover only when an update request could not be forwarded to it. Can you check your leader logs to see why updates are not being sent through to replicas? On Fri, Jun 21, 2013 at 7:03 AM, Sven Stark sven.st...@m-square.com.au wrote: Hello, first: I am pretty much a Solr newcomer, so don't necessarily assume basic solr knowledge. My problem is that in my setup SolrCloud seems to create way too much network traffic for replication. I hope I'm just missing some proper config options. Here's the setup first: * I am running a five node SolrCloud cluster on top of an external 5 node zookeeper cluster, according to logs and clusterstate.json all nodes find each other and are happy * Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I thought upgrade might solve the issue because of https://issues.apache.org/jira/browse/SOLR-4471) * there is only one shard * solr.xml and solrconfig.xml are out of the
Queuing for Solr Updates?
Is there a simpler way to kick off a DIH handler update when it is running? Scenario: 1. Doing an update using DIH 2. We need to kick off another update. Cannot since DIH is already running. So the program inserts into a table (ID=55) 3. Since the DIH is still running old update, we cannot fire an update to DIH. We want it to run right away. If there a simple way to queue up an update if it is still running? We wrote the query to catch all pending, so we only need to run it again if there is 1 or more pending updates. Thoughts? -- Bill Bell billnb...@gmail.com cell 720-256-8076
Varnish
Who is using varnish in front of SOLR? Anyone have any configs that work with the cache control headers of SOLR? -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Queuing for Solr Updates?
On 21 June 2013 11:12, William Bell billnb...@gmail.com wrote: Is there a simpler way to kick off a DIH handler update when it is running? Scenario: 1. Doing an update using DIH 2. We need to kick off another update. Cannot since DIH is already running. So the program inserts into a table (ID=55) 3. Since the DIH is still running old update, we cannot fire an update to DIH. We want it to run right away. If there a simple way to queue up an update if it is still running? We wrote the query to catch all pending, so we only need to run it again if there is 1 or more pending updates. [...] Your best bet might be to write an external daemon that monitors the DIH import status as well as the the update queue, and launches a new DIH import as required. Regards, Gora