Re: How to find the routing algorithm used?
At admin gui click on the Cloud link then Tree link. A page will open and choose clusterstate.json from list. Scroll down to end and you will see something like: router:compositeId 2013/5/16 santoash santo...@me.com Im trying to find out which routing algorithm (implicit/composite id) is being used in my cluster. We are running solr 4.1. I was expecting to see it in my clusterState (based on a previous thread that someone else posted) but I don't see it there. Could someone please help? Thanks! Santoash
Compatible collections SOLR4 / SOLRCloud?
Hi there, I am trying to figure out what SOLR means by compatible collection in order to be able to run the following query: Query all shards of multiple compatible collections, explicitly specified: http://localhost:8983/solr/collection1/select?collection=collection1_NY,collection1_NJ,collection1_CT Does this mean that the schema.xml must be exactly same between those collections or just partially same (share same fields used to satisfy the query)? cheers, /Marcin
Re: error while switching from log4j back to slf4j with solr 4.3
OK, solved. I have now run-jetty-run with log4j running. Just copied log4j libs from example/lib/ext to webapp/WEB-INF/classes and set -Dlog4j.configuration in run-jetty-run VM classpath. Thanks, Bernd Am 15.05.2013 16:31, schrieb Shawn Heisey: On 5/15/2013 12:52 AM, Bernd Fehling wrote: while I can't get solr 4.3 with run-jetty-run up and running under eclipse for debugging I tried to switch back to slf4j and followed the steps of http://wiki.apache.org/solr/SolrLogging Unfortunately eclipse bothers me with an error: The import org.apache.log4j.AppenderSkeleton cannot be resolved EventAppender.java /solr/core/src/java/org/apache/solr/logging/log4jline 19 Java Problem log4j-over-slf4j-1.6.6.jar has no class AppenderSkeleton as log4j-1.2.16.jar does. Can you please send a listing of the directory where you have your slf4j jars and the full exception stacktrace from your log? Please use a paste website, such as pastie.org. Thanks, Shawn
Re: indexing unrelated tables in single core
I am not able to index the fields from data base its getting failed... data-config.xml dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/test user=user password=dfsdf/ document entity name=catalogsearch_query query=select query_id,query_text from catalogsearch_query where num_results!= 0 field column=query_id name=query_id/ field column=query_text name=user_query/ /entity /document its showing all failed and 0 indexed On Wed, May 15, 2013 at 8:31 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: 1. Create a schema that accomodates both types of fields either using optional fields or dynamic fields. 2. Create some sort of differentiator key (e.g. schema), separately from id (which needs to be globally unique, so possibly schema+id) 3. Use that schema in filter queries (fq) to look only at subject of items 4. (Optionally) define separate search request handlers that force that schema parameter (using appends or invariants instead of defaults) That should get you most of the way there. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, May 15, 2013 at 7:07 AM, Rohan Thakur rohan.i...@gmail.com wrote: hi all I want to index 2 separate unrelated tables from database into single solr core and search in any one of the document separately how can I do it? please help thanks in advance regards Rohan
Re: indexing unrelated tables in single core
its saying in the logs that missing required field title which is no where in the database... On Thu, May 16, 2013 at 3:08 PM, Rohan Thakur rohan.i...@gmail.com wrote: I am not able to index the fields from data base its getting failed... data-config.xml dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/test user=user password=dfsdf/ document entity name=catalogsearch_query query=select query_id,query_text from catalogsearch_query where num_results!= 0 field column=query_id name=query_id/ field column=query_text name=user_query/ /entity /document its showing all failed and 0 indexed On Wed, May 15, 2013 at 8:31 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: 1. Create a schema that accomodates both types of fields either using optional fields or dynamic fields. 2. Create some sort of differentiator key (e.g. schema), separately from id (which needs to be globally unique, so possibly schema+id) 3. Use that schema in filter queries (fq) to look only at subject of items 4. (Optionally) define separate search request handlers that force that schema parameter (using appends or invariants instead of defaults) That should get you most of the way there. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, May 15, 2013 at 7:07 AM, Rohan Thakur rohan.i...@gmail.com wrote: hi all I want to index 2 separate unrelated tables from database into single solr core and search in any one of the document separately how can I do it? please help thanks in advance regards Rohan
Re: indexing unrelated tables in single core
True, it's complaining that your Solr schema has a required field 'title' and your query and data import config aren't providing it. On May 16, 2013 5:51 AM, Rohan Thakur rohan.i...@gmail.com wrote: its saying in the logs that missing required field title which is no where in the database... On Thu, May 16, 2013 at 3:08 PM, Rohan Thakur rohan.i...@gmail.com wrote: I am not able to index the fields from data base its getting failed... data-config.xml dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/test user=user password=dfsdf/ document entity name=catalogsearch_query query=select query_id,query_text from catalogsearch_query where num_results!= 0 field column=query_id name=query_id/ field column=query_text name=user_query/ /entity /document its showing all failed and 0 indexed On Wed, May 15, 2013 at 8:31 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: 1. Create a schema that accomodates both types of fields either using optional fields or dynamic fields. 2. Create some sort of differentiator key (e.g. schema), separately from id (which needs to be globally unique, so possibly schema+id) 3. Use that schema in filter queries (fq) to look only at subject of items 4. (Optionally) define separate search request handlers that force that schema parameter (using appends or invariants instead of defaults) That should get you most of the way there. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, May 15, 2013 at 7:07 AM, Rohan Thakur rohan.i...@gmail.com wrote: hi all I want to index 2 separate unrelated tables from database into single solr core and search in any one of the document separately how can I do it? please help thanks in advance regards Rohan
Lucene-Solr indexing document via Post method
Hi guys, I'm trying to run solr with apache tomcat. So, It's possible to indexing documents via Post method, using Lucene-Solr ? Which is the correct way to index documents in Solr ? thanks
loading dataimport configs
I am using the data importer that feeds off of mysql. When adding new DataImportHandler requestHandles to solrconfig.xml, I can upload my changes with the following command: ./zkcli.sh -zkhost 10.0.1.107:2181 -cmd upconfig -confdir configs -confname collection1 Good: I can see the changed files in zookeeper. I can get them and see that the contents changed as well. Bad: When I call curl http://localhost:8983/solr/collection1/data-point-1?command=full-import or browse to http://solr-ip/solr/#/collection1/dataimport/data-point-1 solr is complaining that the config for data-point-1 (for example) cannot be found. Any ideas what I might be doing wrong? -- CTO Zenlok株式会社
Question about Edismax - Solr 4.0
-- *Edismax and Filter Queries with Commas and spaces* -- Dear Experts, This appears to be a bug, please suggest if I'm wrong. If I search with the following filter query, 1) fq=title:(, 10) - I get no results. - The debug output does NOT show the section containing parsed_filter_queries if I carry a search with the filter query, 2) fq=title:(,10) - (No space between , and 10) - I get results and the debug output shows the parsed filter queries section as, arr name=filter_queries str(titles:(,10))/str str(collection:assets)/str As you can see above, I'm also passing in other filter queries (collection:assets) which appear correctly but they do not appear in case 1 above. I can't make this as part of the query parameter as that needs to be searched against multiple fields. Can someone suggest a fix in this case please. I'm using Solr 4.0. Many Thanks, Sandeep
Re: indexing unrelated tables in single core
hi I got the problem it is with the unique key defined in the schema.xml if i difine it to be query_id then while indexing it says missing mandatory key query_id which is not present in the root entity(data-config.xml) which is indexing the product from the database which has product_id as the unique key and when in schema I set product_id as the unique key then it says missing mandatory key product_id which is not present in the root entity(data-config.xml) which is indiexing the user query from another table in the database which has user_id as the unique key. how can I fix this thanks I want to index both the tables which are basically unrelated that is does not have any *Common* fields thanks rohan On Thu, May 16, 2013 at 3:24 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: True, it's complaining that your Solr schema has a required field 'title' and your query and data import config aren't providing it. On May 16, 2013 5:51 AM, Rohan Thakur rohan.i...@gmail.com wrote: its saying in the logs that missing required field title which is no where in the database... On Thu, May 16, 2013 at 3:08 PM, Rohan Thakur rohan.i...@gmail.com wrote: I am not able to index the fields from data base its getting failed... data-config.xml dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/test user=user password=dfsdf/ document entity name=catalogsearch_query query=select query_id,query_text from catalogsearch_query where num_results!= 0 field column=query_id name=query_id/ field column=query_text name=user_query/ /entity /document its showing all failed and 0 indexed On Wed, May 15, 2013 at 8:31 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: 1. Create a schema that accomodates both types of fields either using optional fields or dynamic fields. 2. Create some sort of differentiator key (e.g. schema), separately from id (which needs to be globally unique, so possibly schema+id) 3. Use that schema in filter queries (fq) to look only at subject of items 4. (Optionally) define separate search request handlers that force that schema parameter (using appends or invariants instead of defaults) That should get you most of the way there. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, May 15, 2013 at 7:07 AM, Rohan Thakur rohan.i...@gmail.com wrote: hi all I want to index 2 separate unrelated tables from database into single solr core and search in any one of the document separately how can I do it? please help thanks in advance regards Rohan
Solr 4.3.0: Shard instances using incorrect data directory on machine boot
Hi all, I hope you can advise a solution to our incorrect data directory issue. We have 2 physical servers using Solr 4.3.0, each with 24 separate tomcat instances (RedHat 6.4, java 1.7.0_10-b18, tomcat 7.0.34) with a solr shard in each. This configuration means that each shard has its own data directory declared. (Server OS, tomcat and solr, including shards, created via automated builds.) That is, for example, - tomcat instance, /var/local/tomcat/solrshard3/, port 8985 - corresponding solr instance, /usr/local/solrshard3/, with /usr/local/solrshard3/collection1/conf/solrconfig.xml - corresponding solr data directory, /var/local/solrshard3/collection1/data/ We process ~1.5 billion documents, which is why we use so 48 shards (24 leaders, 24 replicas). These physical servers are rebooted regularly to fsck their drives. When rebooted, we always see several (~10-20) shards failing to start (UI cloud view shows them as 'Down' or 'Recovering' though they never recover without intervention), though there is not a pattern to which shards fail to start - we haven't recorded any that always or never fail. On inspection, the UI dashboard for these failed shards displays, for example: - HostServer1 - Instance/usr/local/sholrshard3/collection1 - Data/var/local/solrshard6/collection1/data - Index /var/local/solrshard6/collection1/data/index To fix such failed shards, I manually restart the shard leader and replicas, which fixes the issue. However, of course, I would like to know a permanent cure for this, not a remedy. We use a separate zookeeper service, spread across 3 Virtual Machines within our private network of ~200 servers (physical and virtual). Network traffic is constant but relatively little across 1GB bandwidth. Any advice or suggestions greatly appreciated. Gil Gil Hoggarth Web Archiving Engineer The British Library, Boston Spa, West Yorkshire, LS23 7BQ
Re: indexing unrelated tables in single core
I mean to say that I want to index 2 tables that is using 2 root entity in data-config.xml one is product table and other is user search table these both have no foreign key and I want to index both of them as document in my solr index what should I do...its taking either one of them and rejecting other table as document when I am taking primary key of one table as unique key in the solr schema...and vice verca.how to solve this? On Thu, May 16, 2013 at 4:24 PM, Rohan Thakur rohan.i...@gmail.com wrote: hi I got the problem it is with the unique key defined in the schema.xml if i difine it to be query_id then while indexing it says missing mandatory key query_id which is not present in the root entity(data-config.xml) which is indexing the product from the database which has product_id as the unique key and when in schema I set product_id as the unique key then it says missing mandatory key product_id which is not present in the root entity(data-config.xml) which is indiexing the user query from another table in the database which has user_id as the unique key. how can I fix this thanks I want to index both the tables which are basically unrelated that is does not have any *Common* fields thanks rohan On Thu, May 16, 2013 at 3:24 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: True, it's complaining that your Solr schema has a required field 'title' and your query and data import config aren't providing it. On May 16, 2013 5:51 AM, Rohan Thakur rohan.i...@gmail.com wrote: its saying in the logs that missing required field title which is no where in the database... On Thu, May 16, 2013 at 3:08 PM, Rohan Thakur rohan.i...@gmail.com wrote: I am not able to index the fields from data base its getting failed... data-config.xml dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/test user=user password=dfsdf/ document entity name=catalogsearch_query query=select query_id,query_text from catalogsearch_query where num_results!= 0 field column=query_id name=query_id/ field column=query_text name=user_query/ /entity /document its showing all failed and 0 indexed On Wed, May 15, 2013 at 8:31 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: 1. Create a schema that accomodates both types of fields either using optional fields or dynamic fields. 2. Create some sort of differentiator key (e.g. schema), separately from id (which needs to be globally unique, so possibly schema+id) 3. Use that schema in filter queries (fq) to look only at subject of items 4. (Optionally) define separate search request handlers that force that schema parameter (using appends or invariants instead of defaults) That should get you most of the way there. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, May 15, 2013 at 7:07 AM, Rohan Thakur rohan.i...@gmail.com wrote: hi all I want to index 2 separate unrelated tables from database into single solr core and search in any one of the document separately how can I do it? please help thanks in advance regards Rohan
Re: indexing unrelated tables in single core
On 16 May 2013 16:24, Rohan Thakur rohan.i...@gmail.com wrote: hi I got the problem it is with the unique key defined in the schema.xml if i difine it to be query_id then while indexing it says missing mandatory key query_id which is not present in the root entity(data-config.xml) which is indexing the product from the database which has product_id as the unique key and when in schema I set product_id as the unique key then it says missing mandatory key product_id which is not present in the root entity(data-config.xml) which is indiexing the user query from another table in the database which has user_id as the unique key. how can I fix this thanks I want to index both the tables which are basically unrelated that is does not have any *Common* fields [...] Fix it in the SELECT statement: SELECT product_id as id,... for one entity, and SELECT query_id as id,... in the other and use id as the uniqueKey for Solr. Regards, Gora
Re: indexing unrelated tables in single core
hi mohanty I appreciate it but dint get that can you please elaborate? my dataconfig is like: entity name=catalogsearch_query query=select query_id,query_text from catalogsearch_query where num_results!= 0 field column=query_id name=value_id/ field column=query_text name=user_query/ /entity entity name=catalog_product_entity_varchar query=select value_id,value,entity_id,attribute_id from catalog_product_entity_varchar where attribute_id=60 field column=value_id name=value_id/ field column=value name=title/ field column=entity_id name=product_id/ field column=attribute_id name=attribute/ /entity my schema is like: fields field name=keyfeatures type=text_en_splitting indexed=true stored=true required= false/ field name=value_id type=plong indexed=true stored=false/ field name=product_id type=plong indexed=true stored=true/ field name=features type=text_en_splitting_tight indexed=true stored=false required=false multiValued=true/ !--field name=f_product_id type=plong indexed=true stored=true/ field name=f_value_id type=plong indexed=true stored=true/ -- field name=attribute type=plong indexed=false stored=false/ field name=title type=text_en_splitting indexed=true stored=true required= true/ field name=image type=text_en_splitting_tight indexed=false stored=false/ field name=url type=text_en_splitting_tight indexed=false stored=false/ field name=brand type=text_en indexed=true stored=true/ field name=procat type=text_en indexed=true stored=true/ field name=rootcat type=text_en indexed=true stored=true/ field name=color type=text_en indexed=true stored=true/ field name=sku type=text_en_splitting_tight indexed=true stored=true/ field name=spell type=tSpell indexed=true stored=true / field name=query_id type=plong indexed=true stored=true / field name=user_query type=text_en_splitting_tight indexed=true stored=true required=false/ /fields uniqueKeyvalue_id/uniqueKey !-- field name=solr_value type=text indexed=true stored=true/ -- !-- field for the QueryParser to use when an explicit fieldname is absent DEPRECATED: specify df in your request handler instead. -- defaultSearchFieldtitle/defaultSearchField thanks regards Rohan On Thu, May 16, 2013 at 5:11 PM, Gora Mohanty g...@mimirtech.com wrote: On 16 May 2013 16:24, Rohan Thakur rohan.i...@gmail.com wrote: hi I got the problem it is with the unique key defined in the schema.xml if i difine it to be query_id then while indexing it says missing mandatory key query_id which is not present in the root entity(data-config.xml) which is indexing the product from the database which has product_id as the unique key and when in schema I set product_id as the unique key then it says missing mandatory key product_id which is not present in the root entity(data-config.xml) which is indiexing the user query from another table in the database which has user_id as the unique key. how can I fix this thanks I want to index both the tables which are basically unrelated that is does not have any *Common* fields [...] Fix it in the SELECT statement: SELECT product_id as id,... for one entity, and SELECT query_id as id,... in the other and use id as the uniqueKey for Solr. Regards, Gora
Re: Can we search some mandatory words and some optional words in SOLR
Thanks Hoss, I modified accordingly. One more thing I observed, if I give search key as one of the below 1. +Java +mysql +php +(TCL Perl Selenium) -ethernet -switching -routing 2. +(TCL Perl Selenium) -ethernet -switching -routing 3. +(TCL Perl Selenium) It works as expected. Like if key is +(TCL Perl Selenium) , then it searches documents having atleast one or more keyword out of TCL Perl Selenium. Best Regards Kamal On Wed, May 15, 2013 at 10:58 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : +Java +mysql +php TCL Perl Selenium -ethernet -switching -routing that's missing one of the started requirements... : 2. Atleast one keyword out of* TCL Perl Selenium* should be present ...should be... +Java +mysql +php +(TCL Perl Selenium) -ethernet -switching -routing -Hoss
Concurrent connections
Is there a limitation on the number concurrent connections to a Solr host? Because we have some scripts running simultaious to fill Solr and when starting up to many we are getting this error: exception 'SolrClientException' with message 'Unsuccessful update request. Response Code 0. (null)' in solr_queue_processor.php:467 Stack trace: #0 solr_queue_processor.php(467): SolrClient-addDocument(Object(SolrInputDocument)) #1 {main} Thx
Is payload the right solution for my problem?
Hi, I recently read about payloads in the Apache Solr 4 Cookbook and would like to know if this is the right solution for my problem or if other methods are more suitable. Generally, I need to perform fulltext search in a field (including highlighting) where I need metadata per token in the search result, but I do not need to search in that metadata. I have documents containing data (not natural language), where each data entry contains multiple metadata informations. An example with a sentence and as XML-like structure could be meta attr1=val11 attr2=val2 attr3=val3This/meta meta attr1=val13 attr2=val7 attr3=val3is/meta meta attr1=val16 attr2=val22 attr3=val3one/meta meta attr1=val14 attr2=val2 attr3=val3sentence./meta Additionaly there exist some fields per document that i need for faceting etc. (id, category, timestamp etc.) When searching, I want to search only in This is one sentence., a search for attr1 or val3 should give no results. However, when searching for one in the search response I need to know attr1=val16 attr2=val22 and attr3=val3. My first intuition when creating the schema was to create a multiValue field content containing each word in the document. Then I add attr1, attr2 and attr3 as payload to each word/token. Is this the right way to use payloads? Or is there a better solution for such a task? I imagine this to be a common use case: searching in a cleaned version of the data and returning the original one. Could anyone please provide suggestions on how to tackle such a task? The book and the Solr wiki pages did not lead me to anything that I could immediately identify as a solution to my problem. If the proposed solution depends on the data: each document might have 3-8 additional attributes, and there might be between 100-1 tokens per document. Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Is-payload-the-right-solution-for-my-problem-tp4063814.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hierarchical Faceting
Hi, Thanks Upayavira . But still i am not getting results well. I have been followinghttp://wiki.apache.org/solr/HierarchicalFaceting I have hierarchical data for facet . Some documents also have multiple hierarchy. like : Doc#1 London UK 51.5 Doc#2 UK 54.0 Doc#3 Indiana United States 40.0, London UK51.5 Doc#4 United States 39.7, Washington United States 38.8 what can be optimal schema for indexing this data so that i get following result by solr query : 1) i want to retrieve hierarchical data count by facet pivot query . ex: facet.pivot=country,state 2) I want Lat values wrt every document in query output.ex: Doc#3 40.0,51.5 . Doc#2 54.0 3) I get direct search query like country:United states . state :Washington I think through this i am able to express my requirement along with data . Please tell me how can i put data index and retreive through query . I check out solution which you provided me about PathHierarchyTokenizerFactory. But along with hierarachy i have to put data with name State,district,lat,lon etc. So that i can also access direct query on fields. Thanks Varsha On 05/15/2013 10:32 PM, Upayavira wrote: Can't you use the PathHierarchyTokenizerFactory mentioned on that page? I think it is called descendent-path in the default schema. Won't that get you what you want? UK/London/Covent Garden becomes UK UK/London UK/London/Covent Garden and India/Maharastra/Pune/Dapodi becomes India India/Maharastra India/Maharastra/Pune India/Maharastra/Pune/Dapodi These fields can be multivalued. Upayavira On Wed, May 15, 2013, at 12:29 PM, varsha.yadav wrote: Hi I go through that but i want to index multiple location in single document and a single location have multiple feature/attribute like country,state,district etc. I want Index and want hierarchical facet result on facet pivot query. One more thing , my document varies may have single ,two ,three.. any number of location. On 05/15/2013 03:55 PM, Upayavira wrote: http://wiki.apache.org/solr/HierarchicalFaceting On Wed, May 15, 2013, at 09:44 AM, varsha.yadav wrote: Hi Everyone, I am working on Hierarchical Faceting. I am indexing location of document with their state and district. I would like to find counts of every country with state count and district count. I found facet pivot working well to give me count if i use single valued fields like --- doc str name=countryindia/str str name=statemaharashtra/str /doc doc str name=countryindia/str str name=stategujrat/str /doc doc str name=countryindia/str str name=districtFaridabad/str str name=stateHaryana/str /doc doc str name=countrychina/str str name=districtfoshan/str str name=stateguangdong/str /doc I found results that is fine : arr name=country,state,district,event lst str name=fieldcountry/str str name=valueindia/str int name=count1/int arr name=pivot lst str name=fieldstate/str str name=valuemaharashtra/str int name=count1/int arr name=pivot/arr lst str name=fieldstate/str str name=valueHaryana/str int name=count1/int arr name=pivot lst str name=fielddistrict/str str name=valueFaridabad/str int name=count1/int /lst /arr /lst /arr /lst /arr /lst lst str name=fieldcountry/str str name=valuechina/str int name=count1/int arr name=pivot lst str name=fieldstate/str /lst /arr But if my document have multiple location like : doc arr name=location strjapan|JAPAN|null|/str str brisbane|Australia|Queensland /str str afghanistan|AFGHANISTAN|null /str /arr /doc doc arr name=location str afghanistan|AFGHANISTAN|null /str /arr /doc doc arr name=location str brisbane|Australia|Queensland /str /str /arr /doc Can anyone tell , me how should i put data in solr index to get hierarical data. Thanks Varsha -- Thanks Regards Varsha -- Thanks Regards Varsha
Re: Solr 4.3.0: Shard instances using incorrect data directory on machine boot
What actual error do you see in Solr? Is there an exception and if so, can you post that? As I understand it, datatDir is set from the solrconfig.xml file, so either your instances are picking up the wrong file, or you have some override which is incorrect? Where do you set solr.data.dir, at the environment when you start Solr or in solrconfig? On 16 May 2013 12:23, Hoggarth, Gil gil.hogga...@bl.uk wrote: Hi all, I hope you can advise a solution to our incorrect data directory issue. We have 2 physical servers using Solr 4.3.0, each with 24 separate tomcat instances (RedHat 6.4, java 1.7.0_10-b18, tomcat 7.0.34) with a solr shard in each. This configuration means that each shard has its own data directory declared. (Server OS, tomcat and solr, including shards, created via automated builds.) That is, for example, - tomcat instance, /var/local/tomcat/solrshard3/, port 8985 - corresponding solr instance, /usr/local/solrshard3/, with /usr/local/solrshard3/collection1/conf/solrconfig.xml - corresponding solr data directory, /var/local/solrshard3/collection1/data/ We process ~1.5 billion documents, which is why we use so 48 shards (24 leaders, 24 replicas). These physical servers are rebooted regularly to fsck their drives. When rebooted, we always see several (~10-20) shards failing to start (UI cloud view shows them as 'Down' or 'Recovering' though they never recover without intervention), though there is not a pattern to which shards fail to start - we haven't recorded any that always or never fail. On inspection, the UI dashboard for these failed shards displays, for example: - HostServer1 - Instance/usr/local/sholrshard3/collection1 - Data/var/local/solrshard6/collection1/data - Index /var/local/solrshard6/collection1/data/index To fix such failed shards, I manually restart the shard leader and replicas, which fixes the issue. However, of course, I would like to know a permanent cure for this, not a remedy. We use a separate zookeeper service, spread across 3 Virtual Machines within our private network of ~200 servers (physical and virtual). Network traffic is constant but relatively little across 1GB bandwidth. Any advice or suggestions greatly appreciated. Gil Gil Hoggarth Web Archiving Engineer The British Library, Boston Spa, West Yorkshire, LS23 7BQ
FW:
http://hardonfonts.com/mmndsejat.php Michael Lorz
Re: Concurrent connections
Hi This is controlled by servlet container, so any errors should be in its logs. The same sort of question was asked just a few days ago... Otis Solr ElasticSearch Support http://sematext.com/ On May 16, 2013 8:00 AM, Arkadi Colson ark...@smartbit.be wrote: Is there a limitation on the number concurrent connections to a Solr host? Because we have some scripts running simultaious to fill Solr and when starting up to many we are getting this error: exception 'SolrClientException' with message 'Unsuccessful update request. Response Code 0. (null)' in solr_queue_processor.php:467 Stack trace: #0 solr_queue_processor.php(467): SolrClient-addDocument(** Object(SolrInputDocument)) #1 {main} Thx
Adding a field in schema , storing it and use it to search
Hi All Need help in adding a new field and making use of it during search. As of today I just search some keywords and whatever document (actually these are resumes of individuals) is retrieved from SOLR search I take these as input, then search in mysql for experience, salary etc and then selected resumes I show as search result. Say, while searching in SOLR, I want to achieve something as below. 1. Search keywords in those users resume whose experience is greater than 5 years. To achieve My understanding is 1. I need to define a new field in schema 2. During indexing, add this parameter 3. During search, have a condition like experience = 5 years When I will be adding a field , should I add as a normal field one as shown below *field name=experience type=integer indexed=true stored=true/* OR as a dynamic field as shown below *dynamicField name=exp_* type=double indexed=true stored=true multiValued=false/* And during search, how the condition should look like. Best regards Kamal
Re: Concurrent connections
Thx! I found the topic. Any idea what this is? SEVERE: The web application [/solr] created a ThreadLocal with key of type [org.apache.xmlbeans.impl.store.CharUtil$1] (value [org.apache.xmlbeans.impl.store.CharUtil$1@2af27db1]) and a value of type [java.lang.ref.SoftReference] (value [java.lang.ref.SoftReference@759c8d]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak. Met vriendelijke groeten Arkadi Colson Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen T +32 11 64 08 80 • F +32 11 64 08 81 On 05/16/2013 02:37 PM, Otis Gospodnetic wrote: Hi This is controlled by servlet container, so any errors should be in its logs. The same sort of question was asked just a few days ago... Otis Solr ElasticSearch Support http://sematext.com/ On May 16, 2013 8:00 AM, Arkadi Colson ark...@smartbit.be wrote: Is there a limitation on the number concurrent connections to a Solr host? Because we have some scripts running simultaious to fill Solr and when starting up to many we are getting this error: exception 'SolrClientException' with message 'Unsuccessful update request. Response Code 0. (null)' in solr_queue_processor.php:467 Stack trace: #0 solr_queue_processor.php(467): SolrClient-addDocument(** Object(SolrInputDocument)) #1 {main} Thx
Re: How to find the routing algorithm used?
I tried looking for it there but I don't see the word router in my clusterstate. I'm trying to figure out the router info since I have duplicate documents in my cluster (document with the same id). In the worst case I was expecting to see something like router: implicit. But I don't see anything. Any ideas? Thanks! -scr On May 16, 2013, at 12:31 AM, Furkan KAMACI furkankam...@gmail.com wrote: At admin gui click on the Cloud link then Tree link. A page will open and choose clusterstate.json from list. Scroll down to end and you will see something like: router:compositeId 2013/5/16 santoash santo...@me.com Im trying to find out which routing algorithm (implicit/composite id) is being used in my cluster. We are running solr 4.1. I was expecting to see it in my clusterState (based on a previous thread that someone else posted) but I don't see it there. Could someone please help? Thanks! Santoash
Re: Question about Edismax - Solr 4.0
You haven't indicated any problem here! What is the symptom that you actually think is a problem. There is no comma operator in any of the Solr query parsers. Comma is just another character that may or may not be included or discarded depending on the specific field type and analyzer. For example, a white space analyzer will keep commas, but the standard analyzer or the word delimiter filter will discard them. If title were a string type, all punctuation would be preserved, including commas and spaces (but spaces would need to be escaped or the term text enclosed in parentheses.) Let us know what your symptom is though, first. I mean, the filter query looks perfectly reasonable from an abstract perspective. -- Jack Krupansky -Original Message- From: Sandeep Mestry Sent: Thursday, May 16, 2013 6:51 AM To: solr-user@lucene.apache.org Subject: Question about Edismax - Solr 4.0 -- *Edismax and Filter Queries with Commas and spaces* -- Dear Experts, This appears to be a bug, please suggest if I'm wrong. If I search with the following filter query, 1) fq=title:(, 10) - I get no results. - The debug output does NOT show the section containing parsed_filter_queries if I carry a search with the filter query, 2) fq=title:(,10) - (No space between , and 10) - I get results and the debug output shows the parsed filter queries section as, arr name=filter_queries str(titles:(,10))/str str(collection:assets)/str As you can see above, I'm also passing in other filter queries (collection:assets) which appear correctly but they do not appear in case 1 above. I can't make this as part of the query parameter as that needs to be searched against multiple fields. Can someone suggest a fix in this case please. I'm using Solr 4.0. Many Thanks, Sandeep
Re: Can we search some mandatory words and some optional words in SOLR
Hi Hoss I was wondering between this two keys. Though they look similar, but result set differs. In 1st case I give key as +c +c++ +sip +( *tcl* perl shell script) -manual testing -ss7 In 2nd case I give key as +c +c++ +sip +(*tcl* perl shell script) -manual testing -ss7 Please note that before *tcl* , space is not present in 2nd case. In 1st case I get more results, and in 2nd case I get only 3 results. In first case, I see atleast one result was there, which does not have single optional key (means one document that does not contain either tcl or perl or shell script). Is it a known issue.., please help.. Or I am doing something wrong in key preparation, please let me know. Thanks Kamal On Wed, May 15, 2013 at 10:58 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : +Java +mysql +php TCL Perl Selenium -ethernet -switching -routing that's missing one of the started requirements... : 2. Atleast one keyword out of* TCL Perl Selenium* should be present ...should be... +Java +mysql +php +(TCL Perl Selenium) -ethernet -switching -routing -Hoss
RE: Strange fuzzy behavior in 4.2.1
In answering your first questions, any changes we’ve been making have been followed by a reindex. The data that is being indexed generally looks something like this (space indicating an actual space): TIM space , space JULIO JULIE space , space JIM So based off what we see from looking at top terms in the field and the analysis tool, at index time these records are being broken up such that TIM , JULIO can be found with tim or Julio. Just to make sure I’m not misunderstanding something about Solr/Lucene, when a record is indexed the index analysis chain result (tim , julio) is what is written to disk correct? So far as I understand it it’s the query analysis chain that has the issue with most filters not being applied during wildcard and fuzzy queries. Finally, some clarification as I’ve realized my original email might not have made this point well. I can have a particular record with a primary key of X and a name value of LEWIS , JULIA and be able to find that exact record with bulia~1 but not aulia~1, or GUERRERO , JULIAN , JULIAN can be found with julan~1 but not julia~1. It’s not that records go missing when searched for with fuzzy, but rather the fuzzy terms that will find them seem, to my eyes, inconsistent. Regards, Ryan Wilson rpwils...@gmail.com
Re: Transaction Logs Leaking FileDescriptors
See https://issues.apache.org/jira/browse/SOLR-3939 Do you see these log messages from this in your logs? log.info(I may be the new leader - try and sync); How reproducible is this bug for you? It would be great to know if the patch in the issue fixes things. -Yonik http://lucidworks.com On Wed, May 15, 2013 at 6:04 PM, Steven Bower sbo...@alcyon.net wrote: They are visible to ls... On Wed, May 15, 2013 at 5:49 PM, Yonik Seeley yo...@lucidworks.com wrote: On Wed, May 15, 2013 at 5:20 PM, Steven Bower sbo...@alcyon.net wrote: when the TransactionLog objects are dereferenced their RandomAccessFile object is not closed.. Have the files been deleted (unlinked from the directory), or are they still visible via ls? -Yonik http://lucidworks.com
Re: Strange fuzzy behavior in 4.2.1
Maybe you are running into the same problem I posted on another message thread about the hard-coded maxExpansions limit of 50. In other words, once Lucene finds 50 terms that do match, it won't find the additional matches. And that is not necessarily the top 50, but the first 50 in the index. See if you can reproduce the problem with a small data set of no more than a couple dozen documents. -- Jack Krupansky -Original Message- From: Ryan Wilson Sent: Thursday, May 16, 2013 9:28 AM To: solr-user@lucene.apache.org Subject: RE: Strange fuzzy behavior in 4.2.1 In answering your first questions, any changes we’ve been making have been followed by a reindex. The data that is being indexed generally looks something like this (space indicating an actual space): TIM space , space JULIO JULIE space , space JIM So based off what we see from looking at top terms in the field and the analysis tool, at index time these records are being broken up such that TIM , JULIO can be found with tim or Julio. Just to make sure I’m not misunderstanding something about Solr/Lucene, when a record is indexed the index analysis chain result (tim , julio) is what is written to disk correct? So far as I understand it it’s the query analysis chain that has the issue with most filters not being applied during wildcard and fuzzy queries. Finally, some clarification as I’ve realized my original email might not have made this point well. I can have a particular record with a primary key of X and a name value of LEWIS , JULIA and be able to find that exact record with bulia~1 but not aulia~1, or GUERRERO , JULIAN , JULIAN can be found with julan~1 but not julia~1. It’s not that records go missing when searched for with fuzzy, but rather the fuzzy terms that will find them seem, to my eyes, inconsistent. Regards, Ryan Wilson rpwils...@gmail.com
Re: indexing unrelated tables in single core
hi Mohanty I tried what you suggested of using id as common field and changing the SQL query to point to id and using id as uniqueKey it is working but now what it is doing is just keeping the id's that are not same in both the tables and discarding the id's that are same in both the tablesbut this is not correct as both the product_id and query_id has no relation as such both are representing separate things in each tables. regards Rohan On Thu, May 16, 2013 at 5:11 PM, Gora Mohanty g...@mimirtech.com wrote: On 16 May 2013 16:24, Rohan Thakur rohan.i...@gmail.com wrote: hi I got the problem it is with the unique key defined in the schema.xml if i difine it to be query_id then while indexing it says missing mandatory key query_id which is not present in the root entity(data-config.xml) which is indexing the product from the database which has product_id as the unique key and when in schema I set product_id as the unique key then it says missing mandatory key product_id which is not present in the root entity(data-config.xml) which is indiexing the user query from another table in the database which has user_id as the unique key. how can I fix this thanks I want to index both the tables which are basically unrelated that is does not have any *Common* fields [...] Fix it in the SELECT statement: SELECT product_id as id,... for one entity, and SELECT query_id as id,... in the other and use id as the uniqueKey for Solr. Regards, Gora
Multi-select faceting with OR operand
Hi all! Please tell me if it is posible to create multi-select facets ( http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters) but with OR as operand? I would like to accomplish something like this: === Document Type === [ ] Word (42) [x] PDF (96) [X] Excel(11) [ ] HTML (63) According to the example the query whould look like this q=mainqueryfq=status:publicfq={!tag=dt}doctype:pdffq={!tag=dt}doctype:Excelfacet=onfacet.field={!ex=dt}doctype. But I would like to get documents which have doctype:pdf OR doctype:Excel as results. How to specify in that query that I want OR instead of AND as operand? Is this possible? Regards, Alex
RE: Solr 4.3.0: Shard instances using incorrect data directory on machine boot
Thanks for your reply Daniel. The dataDir is set in each solrconfig.xml; each one has been checked to ensure it points to its corresponding location. The error we see is that on machine reboot not all of the shards start successfully, and if the fail was to be a leader the replicas can't take its place (presumably because the leader incorrect data directory is inconsistent with their own). More detail that I can add is that the catalina.out log for failed shards reports: May 15, 2013 5:56:02 PM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks SEVERE: The web application [/solr] created a ThreadLocal with key of type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value [org.apache.solr.schema.DateField$ThreadLocalDateFormat@524e13f6]) and a value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak. This doesn't (to me) relate to the problem, but that doesn't necessarily mean it's not. Plus, it's the only SEVERE reported and only reported in the failed shard catalina.out log. Checking the zookeeper logs, we're seeing: 2013-05-16 13:25:46,839 [myid:1] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker@762] - Connection broken for id 3, my id = 1, error = java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(Quoru mCnxManager.java:747) 2013-05-16 13:25:46,841 [myid:1] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker@765] - Interrupting SendWorker 2013-05-16 13:25:46,842 [myid:1] - WARN [SendWorker:3:QuorumCnxManager$SendWorker@679] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.re portInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw aitNanos(AbstractQueuedSynchronizer.java:2095) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389 ) at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(Quorum CnxManager.java:831) at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnx Manager.java:62) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(Quoru mCnxManager.java:667) 2013-05-16 13:25:46,843 [myid:1] - WARN [SendWorker:3:QuorumCnxManager$SendWorker@688] - Send worker leaving thread This is I think as separate issue in that this happens immediately after I restart a zookeeper. (I.e., I see this in a log, restart that zookeeper, and immediately see a similar issue in one of the other two zookeeper logs). -Original Message- From: Daniel Collins [mailto:danwcoll...@gmail.com] Sent: 16 May 2013 13:28 To: solr-user@lucene.apache.org Subject: Re: Solr 4.3.0: Shard instances using incorrect data directory on machine boot What actual error do you see in Solr? Is there an exception and if so, can you post that? As I understand it, datatDir is set from the solrconfig.xml file, so either your instances are picking up the wrong file, or you have some override which is incorrect? Where do you set solr.data.dir, at the environment when you start Solr or in solrconfig? On 16 May 2013 12:23, Hoggarth, Gil gil.hogga...@bl.uk wrote: Hi all, I hope you can advise a solution to our incorrect data directory issue. We have 2 physical servers using Solr 4.3.0, each with 24 separate tomcat instances (RedHat 6.4, java 1.7.0_10-b18, tomcat 7.0.34) with a solr shard in each. This configuration means that each shard has its own data directory declared. (Server OS, tomcat and solr, including shards, created via automated builds.) That is, for example, - tomcat instance, /var/local/tomcat/solrshard3/, port 8985 - corresponding solr instance, /usr/local/solrshard3/, with /usr/local/solrshard3/collection1/conf/solrconfig.xml - corresponding solr data directory, /var/local/solrshard3/collection1/data/ We process ~1.5 billion documents, which is why we use so 48 shards (24 leaders, 24 replicas). These physical servers are rebooted regularly to fsck their drives. When rebooted, we always see several (~10-20) shards failing to start (UI cloud view shows them as 'Down' or 'Recovering' though they never recover without intervention), though there is not a pattern to which shards fail to start - we haven't recorded any that always or never fail. On inspection, the UI dashboard for these failed shards displays, for example: - HostServer1 - Instance/usr/local/sholrshard3/collection1 - Data/var/local/solrshard6/collection1/data - Index
Re: Transaction Logs Leaking FileDescriptors
Looking at the timestamps on the tlog files they seem to have all been created around the same time (04:55).. starting around this time I start seeing the exception below (there were 1628).. in fact its getting tons of these (200k+) but most of the time inside regular commits... 2013-15-05 04:55:06.634 ERROR UpdateLog [recoveryExecutor-6-thread-7922] - java.lang.ArrayIndexOutOfBoundsException: 2603 at org.apache.lucene.codecs.lucene40.BitVector.get(BitVector.java:146) at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc(Lucene41PostingsReader.java:492) at org.apache.lucene.index.BufferedDeletesStream.applyTermDeletes(BufferedDeletesStream.java:407) at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:273) at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2973) at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2964) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2704) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2839) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2819) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536) at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1339) at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1163) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) On Thu, May 16, 2013 at 9:35 AM, Yonik Seeley yo...@lucidworks.com wrote: See https://issues.apache.org/jira/browse/SOLR-3939 Do you see these log messages from this in your logs? log.info(I may be the new leader - try and sync); How reproducible is this bug for you? It would be great to know if the patch in the issue fixes things. -Yonik http://lucidworks.com On Wed, May 15, 2013 at 6:04 PM, Steven Bower sbo...@alcyon.net wrote: They are visible to ls... On Wed, May 15, 2013 at 5:49 PM, Yonik Seeley yo...@lucidworks.com wrote: On Wed, May 15, 2013 at 5:20 PM, Steven Bower sbo...@alcyon.net wrote: when the TransactionLog objects are dereferenced their RandomAccessFile object is not closed.. Have the files been deleted (unlinked from the directory), or are they still visible via ls? -Yonik http://lucidworks.com
Re: Oracle Timestamp in SOLR
Hallo, : I have a field with the type TIMESTAMP(6) in an oracle view. ... : What is the best way to import it? ... : This way works but I do not know if this is the best practise: ... : TO_CHAR(LAST_ACTION_TIMESTAMP, '-MM-DD HH24:MI:SS') as : LAT instead of having your DB convert to a string, and then forcing DIH to parse that string, try asking your DB to cast to something that JDBC will respect as a Date object when DIH fetches the results I don't know much about oracle, but perhaps something like... SELECT ... CAST(LAST_ACTION_TIMESTAMP AS DATE) AS LAT This removes the time part of the timestamp in SOLR. althought it is shown in PL/SQL-Developer (Tool for Oracle). The only way I found in the net is to write an own converter :-( Thanks in advance for any other hints. Ciao Peter Schütt
Explicite update or delete of a dataset
Hallo, how can I update or delete a single dataset by a given ID? Thanks for any hint. Ciao Peter Schütt
Re: Transaction Logs Leaking FileDescriptors
Created https://issues.apache.org/jira/browse/SOLR-4831 to capture this issue On Thu, May 16, 2013 at 10:10 AM, Steven Bower sbo...@alcyon.net wrote: Looking at the timestamps on the tlog files they seem to have all been created around the same time (04:55).. starting around this time I start seeing the exception below (there were 1628).. in fact its getting tons of these (200k+) but most of the time inside regular commits... 2013-15-05 04:55:06.634 ERROR UpdateLog [recoveryExecutor-6-thread-7922] - java.lang.ArrayIndexOutOfBoundsException: 2603 at org.apache.lucene.codecs.lucene40.BitVector.get(BitVector.java:146) at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.nextDoc(Lucene41PostingsReader.java:492) at org.apache.lucene.index.BufferedDeletesStream.applyTermDeletes(BufferedDeletesStream.java:407) at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:273) at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2973) at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2964) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2704) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2839) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2819) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536) at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1339) at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1163) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) On Thu, May 16, 2013 at 9:35 AM, Yonik Seeley yo...@lucidworks.comwrote: See https://issues.apache.org/jira/browse/SOLR-3939 Do you see these log messages from this in your logs? log.info(I may be the new leader - try and sync); How reproducible is this bug for you? It would be great to know if the patch in the issue fixes things. -Yonik http://lucidworks.com On Wed, May 15, 2013 at 6:04 PM, Steven Bower sbo...@alcyon.net wrote: They are visible to ls... On Wed, May 15, 2013 at 5:49 PM, Yonik Seeley yo...@lucidworks.com wrote: On Wed, May 15, 2013 at 5:20 PM, Steven Bower sbo...@alcyon.net wrote: when the TransactionLog objects are dereferenced their RandomAccessFile object is not closed.. Have the files been deleted (unlinked from the directory), or are they still visible via ls? -Yonik http://lucidworks.com
Re: Lucene-Solr indexing document via Post method
Have you completed the Solr tutorial yet? If so, please ask a more specific question so we can understand what your problem is. http://lucene.apache.org/solr/tutorial.html -- Jack Krupansky -Original Message- From: Rider Carrion Cleger Sent: Thursday, May 16, 2013 6:43 AM To: solr-user@lucene.apache.org Subject: Lucene-Solr indexing document via Post method Hi guys, I'm trying to run solr with apache tomcat. So, It's possible to indexing documents via Post method, using Lucene-Solr ? Which is the correct way to index documents in Solr ? thanks
Re: Explicite update or delete of a dataset
Update is the same as add in Solr. To delete: curl http://localhost:8983/solr/update?commit=true \ -H 'Content-type:application/json' \ -d '{delete: {id:doc-0001}}' -- Jack Krupansky -Original Message- From: Peter Schütt Sent: Thursday, May 16, 2013 10:27 AM To: solr-user@lucene.apache.org Subject: Explicite update or delete of a dataset Hallo, how can I update or delete a single dataset by a given ID? Thanks for any hint. Ciao Peter Schütt
What is 503 Status For Admin Ping
I have made some little changes at example folder of Solr 4.2.1 When I start up it just with: java -jar start.jar I get that status: INFO: [collection1] webapp=/solr path=/admin/ping params={action=status_=1368715926563wt=json} status=503 QTime=0 When I click ping at (just once time )admin page I get that: May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/ping params={action=status_=1368715926563wt=json} status=503 QTime=0 May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/file/ params={file=admin-extra.html_=1368715926560} status=0 QTime=0 May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/ping params={ts=1368715928213_=1368715928214wt=json} hits=0 status=0 QTime=1 May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/ping params={ts=1368715928213_=1368715928214wt=json} status=0 QTime=3 What is that status 503 (If it is HTTP 503 why it is listed as INFO)?
RE: Solr 4.3.0: Shard instances using incorrect data directory on machine boot
The dataDir is set in each solrconfig.xml; each one has been checked to ensure it points to its corresponding location. The error we see is that on machine reboot not all of the shards start successfully, and if the fail was to be a leader the replicas can't take its place (presumably because the leader incorrect data directory is inconsistent with their own). Although you can set the dataDir in solrconfig.xml, I would strongly recommend that you don't. If you are using the old-style solr.xml (which has cores and core tags) then set the dataDir in each core tag in solr.xml. This gets read and set before the core is created, so there's less chance of it getting scrambled. The solrconfig is read as part of core creation. If you are using the new style solr.xml (new with 4.3.0) then you'll need absolute dataDir paths, and they need to go in each core.properties file. Due to a bug, relative paths won't work as expected. I need to see if I can make sure the fix makes it into 4.3.1. If moving dataDir out of solrconfig.xml fixes it, then we probably have a bug. Yout Zookeeper problems might be helped by increasing zkClientTimeout. Thanks, Shawn
Re: What is 503 Status For Admin Ping
Probably one or more shards of your collection are not available at ping operation time and the server returns the 503 code -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, May 16, 2013 at 3:58 PM, Furkan KAMACI wrote: I have made some little changes at example folder of Solr 4.2.1 When I start up it just with: java -jar start.jar I get that status: INFO: [collection1] webapp=/solr path=/admin/ping params={action=status_=1368715926563wt=json} status=503 QTime=0 When I click ping at (just once time )admin page I get that: May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/ping params={action=status_=1368715926563wt=json} status=503 QTime=0 May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/file/ params={file=admin-extra.html_=1368715926560} status=0 QTime=0 May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/ping params={ts=1368715928213_=1368715928214wt=json} hits=0 status=0 QTime=1 May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/ping params={ts=1368715928213_=1368715928214wt=json} status=0 QTime=3 What is that status 503 (If it is HTTP 503 why it is listed as INFO)?
Re: What is 503 Status For Admin Ping
It is a single node, started as standalone. I have just started a Solr instance without SolrCloud. 2013/5/16 Yago Riveiro yago.rive...@gmail.com Probably one or more shards of your collection are not available at ping operation time and the server returns the 503 code -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, May 16, 2013 at 3:58 PM, Furkan KAMACI wrote: I have made some little changes at example folder of Solr 4.2.1 When I start up it just with: java -jar start.jar I get that status: INFO: [collection1] webapp=/solr path=/admin/ping params={action=status_=1368715926563wt=json} status=503 QTime=0 When I click ping at (just once time )admin page I get that: May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/ping params={action=status_=1368715926563wt=json} status=503 QTime=0 May 16, 2013 5:52:06 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/file/ params={file=admin-extra.html_=1368715926560} status=0 QTime=0 May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/ping params={ts=1368715928213_=1368715928214wt=json} hits=0 status=0 QTime=1 May 16, 2013 5:52:08 PM org.apache.solr.core.SolrCore execute INFO: [collection1] webapp=/solr path=/admin/ping params={ts=1368715928213_=1368715928214wt=json} status=0 QTime=3 What is that status 503 (If it is HTTP 503 why it is listed as INFO)?
Re: error while switching from log4j back to slf4j with solr 4.3
On 5/16/2013 3:24 AM, Bernd Fehling wrote: OK, solved. I have now run-jetty-run with log4j running. Just copied log4j libs from example/lib/ext to webapp/WEB-INF/classes and set -Dlog4j.configuration in run-jetty-run VM classpath. The location where you copied those files is in the extracted .war file, and may get automatically wiped out at some point in the future, especially by an upgrade. It would be better to copy them to the external lib directory for your container. For jetty, that's lib/ext ... it is likely to be different for other containers. Thanks, Shawn
Zookeeper Ensemble Startup Parameters For SolrCloud?
I know that there have been many conversations about SolrCloud startup tips i.e. which type of garbage collector to use etc. Also I know that there is no an exact answer for this question. However I think that folks have some tips about this question. How do you start up your external Zookeeper, with which parameters and any tips for it?
Ho to adjust maxDocs and maxTime for autoCommit?
I will start my pre-production step soon. How can I adjust maxDocs and maxTime for autoCommit? What do you suggest for me to adjust that parameters?
Re: Zookeeper Ensemble Startup Parameters For SolrCloud?
On 5/16/2013 9:25 AM, Furkan KAMACI wrote: I know that there have been many conversations about SolrCloud startup tips i.e. which type of garbage collector to use etc. Also I know that there is no an exact answer for this question. However I think that folks have some tips about this question. How do you start up your external Zookeeper, with which parameters and any tips for it? An external zookeeper is just that - external, not part of Solr. I followed the zookeeper docs, and used the normal zookeeper port, 2181: http://zookeeper.apache.org/doc/r3.4.5/ Thanks, Shawn
Re: Strange fuzzy behavior in 4.2.1
This might explain why our dev database of 400,000 records doesn't seem to suffer from this. When we started seeing this in our test environment of 300,000,000 records, we thought we just weren't finding records in dev that were having the problem. One thing that this does not explain is that we have located a few terms that find nothing but the original term, despite having possible matches one edit away. For example, albert will not find anything but albert, despite there being alberta, albart, etc. I am reading into the maxExpansion variable and how it functions as I am writing this, so I might be missing the connection. I note that you say this is a hardcoded behavior. Would I be safe in assuming that I will need to build a custom solr.war to make changes to this setting? I wan to see if sliding this number up/down will let me confirm that it is indeed maxExpansions that is the problem. Finally, if it is maxExpansions that is the problem is there any solution beyond the aforementioned custom war? -Ryan Wilson On Thu, May 16, 2013 at 8:40 AM, Jack Krupansky j...@basetechnology.comwrote: Maybe you are running into the same problem I posted on another message thread about the hard-coded maxExpansions limit of 50. In other words, once Lucene finds 50 terms that do match, it won't find the additional matches. And that is not necessarily the top 50, but the first 50 in the index. See if you can reproduce the problem with a small data set of no more than a couple dozen documents. -- Jack Krupansky -Original Message- From: Ryan Wilson Sent: Thursday, May 16, 2013 9:28 AM To: solr-user@lucene.apache.org Subject: RE: Strange fuzzy behavior in 4.2.1 In answering your first questions, any changes we’ve been making have been followed by a reindex. The data that is being indexed generally looks something like this (space indicating an actual space): TIM space , space JULIO JULIE space , space JIM So based off what we see from looking at top terms in the field and the analysis tool, at index time these records are being broken up such that TIM , JULIO can be found with tim or Julio. Just to make sure I’m not misunderstanding something about Solr/Lucene, when a record is indexed the index analysis chain result (tim , julio) is what is written to disk correct? So far as I understand it it’s the query analysis chain that has the issue with most filters not being applied during wildcard and fuzzy queries. Finally, some clarification as I’ve realized my original email might not have made this point well. I can have a particular record with a primary key of X and a name value of LEWIS , JULIA and be able to find that exact record with bulia~1 but not aulia~1, or GUERRERO , JULIAN , JULIAN can be found with julan~1 but not julia~1. It’s not that records go missing when searched for with fuzzy, but rather the fuzzy terms that will find them seem, to my eyes, inconsistent. Regards, Ryan Wilson rpwils...@gmail.com
Re: Apache solr error
On 5/16/2013 6:30 AM, Nilesh Gaikwad wrote: Number of documents in index: 0 Number of pending deletions: 0 The search index is generated by running cron #12. *0%* of the site content has been sent to the server. There are 3587 items left to send. But as I can see, there should be around 4 documents in index, but it is not showing correct one Do you see any errors in your Solr server logs? Do you see anything in the logs for your application that gets called from cron? Thanks, Shawn
RE: Solr 4.3.0: Shard instances using incorrect data directory on machine boot
Thanks for your response Shawn, very much appreciated. Gil -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 16 May 2013 15:59 To: solr-user@lucene.apache.org Subject: RE: Solr 4.3.0: Shard instances using incorrect data directory on machine boot The dataDir is set in each solrconfig.xml; each one has been checked to ensure it points to its corresponding location. The error we see is that on machine reboot not all of the shards start successfully, and if the fail was to be a leader the replicas can't take its place (presumably because the leader incorrect data directory is inconsistent with their own). Although you can set the dataDir in solrconfig.xml, I would strongly recommend that you don't. If you are using the old-style solr.xml (which has cores and core tags) then set the dataDir in each core tag in solr.xml. This gets read and set before the core is created, so there's less chance of it getting scrambled. The solrconfig is read as part of core creation. If you are using the new style solr.xml (new with 4.3.0) then you'll need absolute dataDir paths, and they need to go in each core.properties file. Due to a bug, relative paths won't work as expected. I need to see if I can make sure the fix makes it into 4.3.1. If moving dataDir out of solrconfig.xml fixes it, then we probably have a bug. Yout Zookeeper problems might be helped by increasing zkClientTimeout. Thanks, Shawn
Re: Ho to adjust maxDocs and maxTime for autoCommit?
On 5/16/2013 9:36 AM, Furkan KAMACI wrote: I will start my pre-production step soon. How can I adjust maxDocs and maxTime for autoCommit? What do you suggest for me to adjust that parameters? Change the numbers for those settings in your solrconfig.xml. Look at the example solrconfig.xml. The example solrconfig.xml file has this, commented out. A minute with google would have also answered this question. Using only the things you asked about: http://lmgtfy.com/?q=solr+autocommit+maxtime+maxdocs The third hit is a Solr wiki article and contains an example update handler with the settings you need. Thanks, Shawn
Re: What is 503 Status For Admin Ping
On 5/16/2013 9:18 AM, Furkan KAMACI wrote: It is a single node, started as standalone. I have just started a Solr instance without SolrCloud. 2013/5/16 Yago Riveiro yago.rive...@gmail.com Probably one or more shards of your collection are not available at ping operation time and the server returns the 503 code -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, May 16, 2013 at 3:58 PM, Furkan KAMACI wrote: I have made some little changes at example folder of Solr 4.2.1 When I start up it just with: java -jar start.jar I get that status: INFO: [collection1] webapp=/solr path=/admin/ping params={action=status_=1368715926563wt=json} status=503 QTime=0 When a ping request failed, older Solr versions logged a huge java stacktrace and an error, and most of the time that information was not very helpful. Can you share your ping handler definition? I would guess that the query in your ping handler is failing, or that you have a healthcheckFile configured and it doesn't exist, so you would need to enable it. http://server:port/solr/corename/admin/ping?action=enable Here's how one of my ping handlers is set up: requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qt/lbcheck/str str name=q*:*/str str name=dfBody/str /lst lst name=defaults str name=echoParamsall/str /lst str name=healthcheckFileserver-enabled.txt/str /requestHandler When the ping handler is called, it sends a query for all docs to a search handler named /lbcheck (load balancer check), with a default field of Body. The healthcheckFile is relative to dataDir. The enable action creates this file, and the disable action deletes the file. Thanks, Shawn
Speed up import of Hierarchical Data
I am using the DataImportHandler to Query a SQL Server and populate Solr. Unfortunately, SQL does not have an understanding of hierarchical relationships, and hence I use Table Joins. The following is an outline of my table structure: PROD_TABLE - SKU (Primary Key) - Title (varchar) - Descr (varchar) CAT_TABLE - SKU (Foreign Key) - CategoryLevel (int i.e. 1, 2, 3 …) - CategoryName (varchar) I specify the SQL Query in the db-data-config.xml file – a snippet of which looks like: dataConfig dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost\/ document entity name=Product query=SELECT SKU, Title, Descr FROM PROD_TABLE field column=SKU name=SKU / field column=Title name=Title / field column=Descr name=Descr / entity name=Cat1 query=SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=1 field column=CategoryName name=Category1 / /entity entity name=Cat2 query=SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=2 field column=CategoryName name=Category2 / /entity entity name=Cat3 query=SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=3 field column=CategoryName name=Category3 / /entity /entity /document /dataConfig It seems like the DataImportHandler handler sends out three or four queries for each Product. This results in a very slow import. Is there any way to speed this up? I would not mind an intermediate step of first extracting SQL and then putting it into Solr. Thank you for all your help. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Ho to adjust maxDocs and maxTime for autoCommit?
Unless you have a specific reason to change the settings (any settings), the general recommendation is to leave them as is. That's not to say that these are the best settings or optimal for all situations, but simply that they are all considered to be reasonable. If you or anybody else has good reason to believe that any of the solrconfig settings for any feature are unreasonable, please file a Jira with suggested improvements. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Thursday, May 16, 2013 11:36 AM To: solr-user@lucene.apache.org Subject: Ho to adjust maxDocs and maxTime for autoCommit? I will start my pre-production step soon. How can I adjust maxDocs and maxTime for autoCommit? What do you suggest for me to adjust that parameters?
Re: Speed up import of Hierarchical Data
That sounds like a perfect match for http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor :) On Thursday, May 16, 2013 at 6:01 PM, O. Olson wrote: I am using the DataImportHandler to Query a SQL Server and populate Solr. Unfortunately, SQL does not have an understanding of hierarchical relationships, and hence I use Table Joins. The following is an outline of my table structure: PROD_TABLE - SKU (Primary Key) - Title (varchar) - Descr (varchar) CAT_TABLE - SKU (Foreign Key) - CategoryLevel (int i.e. 1, 2, 3 …) - CategoryName (varchar) I specify the SQL Query in the db-data-config.xml file – a snippet of which looks like: dataConfig dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost\/ document entity name=Product query=SELECT SKU, Title, Descr FROM PROD_TABLE field column=SKU name=SKU / field column=Title name=Title / field column=Descr name=Descr / entity name=Cat1 query=SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=1 field column=CategoryName name=Category1 / /entity entity name=Cat2 query=SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=2 field column=CategoryName name=Category2 / /entity entity name=Cat3 query=SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=3 field column=CategoryName name=Category3 / /entity /entity /document /dataConfig It seems like the DataImportHandler handler sends out three or four queries for each Product. This results in a very slow import. Is there any way to speed this up? I would not mind an intermediate step of first extracting SQL and then putting it into Solr. Thank you for all your help. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
Re: Question about Edismax - Solr 4.0
Thanks Jack for your reply.. The problem is, I'm finding results for fq=title:(,10) but not for fq=title:(, 10) - apologies if that was not clear from my first mail. I have already mentioned the debug analysis in my previous mail. Additionally, the title field is defined as below: fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I have the set catenate options to 1 for all types. I can understand if ',' getting ignored when it is on its own (title:(, 10)) but - Why solr is not searching for 10 in that case just like it did when the query was (title:(,10))? - And why other filter queries did not show up (collection:assets) in debug section? Thanks, Sandeep On 16 May 2013 13:57, Jack Krupansky j...@basetechnology.com wrote: You haven't indicated any problem here! What is the symptom that you actually think is a problem. There is no comma operator in any of the Solr query parsers. Comma is just another character that may or may not be included or discarded depending on the specific field type and analyzer. For example, a white space analyzer will keep commas, but the standard analyzer or the word delimiter filter will discard them. If title were a string type, all punctuation would be preserved, including commas and spaces (but spaces would need to be escaped or the term text enclosed in parentheses.) Let us know what your symptom is though, first. I mean, the filter query looks perfectly reasonable from an abstract perspective. -- Jack Krupansky -Original Message- From: Sandeep Mestry Sent: Thursday, May 16, 2013 6:51 AM To: solr-user@lucene.apache.org Subject: Question about Edismax - Solr 4.0 -- *Edismax and Filter Queries with Commas and spaces* -- Dear Experts, This appears to be a bug, please suggest if I'm wrong. If I search with the following filter query, 1) fq=title:(, 10) - I get no results. - The debug output does NOT show the section containing parsed_filter_queries if I carry a search with the filter query, 2) fq=title:(,10) - (No space between , and 10) - I get results and the debug output shows the parsed filter queries section as, arr name=filter_queries str(titles:(,10))/str str(collection:assets)/str As you can see above, I'm also passing in other filter queries (collection:assets) which appear correctly but they do not appear in case 1 above. I can't make this as part of the query parameter as that needs to be searched against multiple fields. Can someone suggest a fix in this case please. I'm using Solr 4.0. Many Thanks, Sandeep
SOLR test framework- ERROR: SolrIndexSearcher opens=1 closes=0
I am using SOLR 4.3.0, I have created multiple custom components. I am getting the below error when I run tests (using SOLR 4.3 test framework) against one of the custom componentAll the tests pass but I still get the below error once test gets completed. Can someone help me resolve this error? java.lang.AssertionError: ERROR: SolrIndexSearcher opens=1 closes=0 at __randomizedtesting.SeedInfo.seed([C2DCAC50C9ACBACE]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:252) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:700) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:680) -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-test-framework-ERROR-SolrIndexSearcher-opens-1-closes-0-tp4063940.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR test framework- ERROR: SolrIndexSearcher opens=1 closes=0
On 5/16/2013 10:46 AM, bbarani wrote: I am using SOLR 4.3.0, I have created multiple custom components. I am getting the below error when I run tests (using SOLR 4.3 test framework) against one of the custom componentAll the tests pass but I still get the below error once test gets completed. Can someone help me resolve this error? java.lang.AssertionError: ERROR: SolrIndexSearcher opens=1 closes=0 It looks like you opened a searcher object as part of your test but then didn't close it. If you didn't do this in the test itself, perhaps it's happening in your custom component. I'm a little fuzzy on test writing, though. Thanks. Shawn
Re: Oracle Timestamp in SOLR
: SELECT ... CAST(LAST_ACTION_TIMESTAMP AS DATE) AS LAT : : This removes the time part of the timestamp in SOLR. althought it is shown : in PL/SQL-Developer (Tool for Oracle). Hmmm... that makes no sense to me based on 10 seconds of googling... http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i1847 The DATE datatype stores the year (including the century), the month, the day, the hours, the minutes, and the seconds ...but i'll take your word for it. : The only way I found in the net is to write an own converter :-( There must be *some* way to either tweak your SQL or tweak your JDBC connection properties such that Oracle's JDBC driver will give you a legitimate java.sql.Date or java.sql.Timestamp instead of it's own internal class (that doesn't extend java.util.Date) ... otherwise it's just total freaking anarchy. -Hoss
Re: Question about Edismax - Solr 4.0
Could you show us the full query URL - spaces must be encoded in URL query parameters. Also show the actual field XML - you omitted that. Try the same query as a main query, using both defType=edismax and defType=lucene. Note that the filter query is parsed using the Lucene query parser, not edismax, independent of the defType parameter. But you don't have any edismax features in your fq anyway. But you can stick {!edismax} in front of the query to force edismax to be used for the fq, although it really shouldn't change anything: Also, catenate is fine for indexing, but will mess up your queries at query time, so set them to 0 in the query analyzer Also, make sure you have autoGeneratePhraseQueries=true on the field type, but that's not the issue here. -- Jack Krupansky -Original Message- From: Sandeep Mestry Sent: Thursday, May 16, 2013 12:42 PM To: solr-user@lucene.apache.org Subject: Re: Question about Edismax - Solr 4.0 Thanks Jack for your reply.. The problem is, I'm finding results for fq=title:(,10) but not for fq=title:(, 10) - apologies if that was not clear from my first mail. I have already mentioned the debug analysis in my previous mail. Additionally, the title field is defined as below: fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I have the set catenate options to 1 for all types. I can understand if ',' getting ignored when it is on its own (title:(, 10)) but - Why solr is not searching for 10 in that case just like it did when the query was (title:(,10))? - And why other filter queries did not show up (collection:assets) in debug section? Thanks, Sandeep On 16 May 2013 13:57, Jack Krupansky j...@basetechnology.com wrote: You haven't indicated any problem here! What is the symptom that you actually think is a problem. There is no comma operator in any of the Solr query parsers. Comma is just another character that may or may not be included or discarded depending on the specific field type and analyzer. For example, a white space analyzer will keep commas, but the standard analyzer or the word delimiter filter will discard them. If title were a string type, all punctuation would be preserved, including commas and spaces (but spaces would need to be escaped or the term text enclosed in parentheses.) Let us know what your symptom is though, first. I mean, the filter query looks perfectly reasonable from an abstract perspective. -- Jack Krupansky -Original Message- From: Sandeep Mestry Sent: Thursday, May 16, 2013 6:51 AM To: solr-user@lucene.apache.org Subject: Question about Edismax - Solr 4.0 -- *Edismax and Filter Queries with Commas and spaces* -- Dear Experts, This appears to be a bug, please suggest if I'm wrong. If I search with the following filter query, 1) fq=title:(, 10) - I get no results. - The debug output does NOT show the section containing parsed_filter_queries if I carry a search with the filter query, 2) fq=title:(,10) - (No space between , and 10) - I get results and the debug output shows the parsed filter queries section as, arr name=filter_queries str(titles:(,10))/str str(collection:assets)/str As you can see above, I'm also passing in other filter queries (collection:assets) which appear correctly but they do not appear in case 1 above. I can't make this as part of the query parameter as that needs to be searched against multiple fields. Can someone suggest a fix in this case please. I'm using Solr 4.0. Many Thanks, Sandeep
RE: Speed up import of Hierarchical Data
See https://issues.apache.org/jira/browse/SOLR-2943 . You can set up 2 DIH handlers. The first would query the CAT_TABLE and save it to a disk-backed cache, using DIHCacheWriter. You then would replace your 3 child entities in the 2nd DIH handler to use DIHCacheProcessor to read back the cached data. This is a little complicated to do, but it would let you just cache the data once and because it is disk-backed, will scale to whatever size the CAT_TABLE is. (For some details, see this thread: http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tt4015514.html) A simpler method is simply to specify cacheImpl=SortedMapBackedCache on the 3 child entities. (This is the same as using CachedSqlEntityProcessor.) It would generate 3 in-memory caches, each with the same data. If CAT_TABLE is small, this would be adequate. In between this would be to create a disk-backed cache Impl (or use the ones at SOLR-2613 or SOLR-2948) and specify it on cacheImpl. It would still create 3 identical caches, but they would be disk-backed and could scale beyond what in-memory can handle. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: O. Olson [mailto:olson_...@yahoo.it] Sent: Thursday, May 16, 2013 11:01 AM To: solr-user@lucene.apache.org Subject: Speed up import of Hierarchical Data I am using the DataImportHandler to Query a SQL Server and populate Solr. Unfortunately, SQL does not have an understanding of hierarchical relationships, and hence I use Table Joins. The following is an outline of my table structure: PROD_TABLE - SKU (Primary Key) - Title (varchar) - Descr (varchar) CAT_TABLE - SKU (Foreign Key) - CategoryLevel (int i.e. 1, 2, 3 …) - CategoryName (varchar) I specify the SQL Query in the db-data-config.xml file – a snippet of which looks like: dataConfig dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost\/ document entity name=Product query=SELECT SKU, Title, Descr FROM PROD_TABLE field column=SKU name=SKU / field column=Title name=Title / field column=Descr name=Descr / entity name=Cat1 query=SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=1 field column=CategoryName name=Category1 / /entity entity name=Cat2 query=SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=2 field column=CategoryName name=Category2 / /entity entity name=Cat3 query=SELECT CategoryName from CAT_TABLE where SKU='${Product.SKU}' AND CategoryLevel=3 field column=CategoryName name=Category3 / /entity /entity /document /dataConfig It seems like the DataImportHandler handler sends out three or four queries for each Product. This results in a very slow import. Is there any way to speed this up? I would not mind an intermediate step of first extracting SQL and then putting it into Solr. Thank you for all your help. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange fuzzy behavior in 4.2.1
Go ahead and file a Jira and hopefully that will attract some committer attention that might shed some more light. Beyond that, sure you can build Solr yourself and change the query parser code to put a larger number in for maxExpansion. You might also try developing a test case, say 100 small test documents with similar values and see if the 50 limit seems to account for behavior that you see with that test dataset. -- Jack Krupansky -Original Message- From: Ryan Wilson Sent: Thursday, May 16, 2013 11:37 AM To: solr-user@lucene.apache.org Subject: Re: Strange fuzzy behavior in 4.2.1 This might explain why our dev database of 400,000 records doesn't seem to suffer from this. When we started seeing this in our test environment of 300,000,000 records, we thought we just weren't finding records in dev that were having the problem. One thing that this does not explain is that we have located a few terms that find nothing but the original term, despite having possible matches one edit away. For example, albert will not find anything but albert, despite there being alberta, albart, etc. I am reading into the maxExpansion variable and how it functions as I am writing this, so I might be missing the connection. I note that you say this is a hardcoded behavior. Would I be safe in assuming that I will need to build a custom solr.war to make changes to this setting? I wan to see if sliding this number up/down will let me confirm that it is indeed maxExpansions that is the problem. Finally, if it is maxExpansions that is the problem is there any solution beyond the aforementioned custom war? -Ryan Wilson On Thu, May 16, 2013 at 8:40 AM, Jack Krupansky j...@basetechnology.comwrote: Maybe you are running into the same problem I posted on another message thread about the hard-coded maxExpansions limit of 50. In other words, once Lucene finds 50 terms that do match, it won't find the additional matches. And that is not necessarily the top 50, but the first 50 in the index. See if you can reproduce the problem with a small data set of no more than a couple dozen documents. -- Jack Krupansky -Original Message- From: Ryan Wilson Sent: Thursday, May 16, 2013 9:28 AM To: solr-user@lucene.apache.org Subject: RE: Strange fuzzy behavior in 4.2.1 In answering your first questions, any changes we’ve been making have been followed by a reindex. The data that is being indexed generally looks something like this (space indicating an actual space): TIM space , space JULIO JULIE space , space JIM So based off what we see from looking at top terms in the field and the analysis tool, at index time these records are being broken up such that TIM , JULIO can be found with tim or Julio. Just to make sure I’m not misunderstanding something about Solr/Lucene, when a record is indexed the index analysis chain result (tim , julio) is what is written to disk correct? So far as I understand it it’s the query analysis chain that has the issue with most filters not being applied during wildcard and fuzzy queries. Finally, some clarification as I’ve realized my original email might not have made this point well. I can have a particular record with a primary key of X and a name value of LEWIS , JULIA and be able to find that exact record with bulia~1 but not aulia~1, or GUERRERO , JULIAN , JULIAN can be found with julan~1 but not julia~1. It’s not that records go missing when searched for with fuzzy, but rather the fuzzy terms that will find them seem, to my eyes, inconsistent. Regards, Ryan Wilson rpwils...@gmail.com
Re: Deleting an entry from a collection when they key has : in it
You need to escape colons in queries, using either a backslash or enclosing the full query term in quotes. In your case, you have backslashes as well in your query, which the query parser will interpret as an escape! So, you need to escape those backslashes as well: D\:\\somedir\\somefile.pdf or D:\\somedir\\somefile.pdf -- Jack Krupansky -Original Message- From: Daniel Baughman Sent: Thursday, May 16, 2013 11:33 AM To: solr-user@lucene.apache.org Subject: Deleting an entry from a collection when they key has : in it Hi All, I seem to be really struggling to delete an entry from a search repository that has a : in the key. The key is path to the file ie, D:\somedir\somefile.pdf. I want to use a query to delete it and I just can't seem to make it go away. I've been trying stuff lke this: http://localhost:8983/solr/docrepo/update/?stream.body=%3Cdelete%3E%3Cquery% 3Ekey%3AD\:\\Webdocs\\sw4\\docRepo\\documents\\Hiring%20Manager\\Disciplinar y\\asdfasdf\.docx%3C%2Fquery%3E%3C%2Fdelete%3E http://localhost:8983/solr/docrepo/update/?stream.body=%3Cdelete%3E%3Cquery %3Ekey%3AD\:\\Webdocs\\sw4\\docRepo\\documents\\Hiring%20Manager\\Disciplina ry\\asdfasdf\.docx%3C%2Fquery%3E%3C%2Fdelete%3Eversion=2.2start=0rows=10 indent=on version=2.2start=0rows=10indent=on It doesn't throw an error but it doesn't delete the document either. Does anyone have any suggestions? Thanks, Dan
Re: Facets referenced by key
: I would then like to refer to these 'pseudo' field later in the request : string. I thought this would be how I'd do it: : : f.my_facet_key.facet.prefix=a_given_prefix ... that syntax was proposed in SOLR-1351 and a patch was made available, but it was never commited (it only supported a subset of faceting, needed more tests, and had unclear behavior about how the defaults where picked if you combined f.key.facet.foo + f.field.facet.foo + facet.foo) : I thought this would work, however it doesn't appear to. What does work is : if I define the prefix and mincount in the local params: : : facet.field={!ex=dt key=my_facet_key facet.prefix=a_given_prefix}the_facet_field Correct, SOLR-4717 added support to Solr 4.3 for specifying all of the facet options as local params such that that syntax would work. Given th way the use of Solr and localparams have evolved over the years it was considered a more natural and logical way to specify facet option on a per field or per key basis. : Is this expected? I'm also using sunspot and they construct the queries : with keys as in my first example, i.e. facet.field={!ex=dt : key=my_facet_key}the_facet_fieldf.my_facet_key.facet.prefix=a_given_prefix I can't comment on that ... i'm not sure why sunspot would assume that behavior would work (unless someone looked at SOLR-1351 once upon a time and assumed that would definitely be official at some point) -Hoss
RE: Deleting an entry from a collection when they key has : in it
Thanks for the idea http://localhost:8983/solr/docrepo/update/?stream.body=%3Cdelete%3E%3Cquery% 3Ekey%3AD\:\\Webdocs\\sw4\\docRepo\\documents\\Hiring%20Manager\\Disciplinar y\\asdfasdf\.docx%3C%2Fquery%3E%3C%2Fdelete%3E I do have :'s and \'s escaped, I believe. If in my schema, I have the key field set to indexed=false, then is that maybe the issue? I'm going to try to set that to true and rebuild the repository and see if that does it. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Thursday, May 16, 2013 11:20 AM To: solr-user@lucene.apache.org Subject: Re: Deleting an entry from a collection when they key has : in it You need to escape colons in queries, using either a backslash or enclosing the full query term in quotes. In your case, you have backslashes as well in your query, which the query parser will interpret as an escape! So, you need to escape those backslashes as well: D\:\\somedir\\somefile.pdf or D:\\somedir\\somefile.pdf -- Jack Krupansky -Original Message- From: Daniel Baughman Sent: Thursday, May 16, 2013 11:33 AM To: solr-user@lucene.apache.org Subject: Deleting an entry from a collection when they key has : in it Hi All, I seem to be really struggling to delete an entry from a search repository that has a : in the key. The key is path to the file ie, D:\somedir\somefile.pdf. I want to use a query to delete it and I just can't seem to make it go away. I've been trying stuff lke this: http://localhost:8983/solr/docrepo/update/?stream.body=%3Cdelete%3E%3Cquery% 3Ekey%3AD\:\\Webdocs\\sw4\\docRepo\\documents\\Hiring%20Manager\\Disciplinar y\\asdfasdf\.docx%3C%2Fquery%3E%3C%2Fdelete%3E http://localhost:8983/solr/docrepo/update/?stream.body=%3Cdelete%3E%3Cquery %3Ekey%3AD\:\\Webdocs\\sw4\\docRepo\\documents\\Hiring%20Manager\\Disciplina ry\\asdfasdf\.docx%3C%2Fquery%3E%3C%2Fdelete%3Eversion=2.2start=0rows=10 indent=on version=2.2start=0rows=10indent=on It doesn't throw an error but it doesn't delete the document either. Does anyone have any suggestions? Thanks, Dan
Migrating from 4.2.1 to 4.3.0
Greetings, I just started with Solr a couple weeks ago, with version 4.2.1. I installed the following setup: - ZooKeeper: 3 instances ensemble - Solr: on Tomcat, 4 instances - WebOrder_Collection: instances 1 and 2, 1 shard, 1 master, 1 replica - other_collectionA: instances 3 and 4, 1 shard, 1 master, 1 replica - other_CollectionB: instances 3 and 4, 1 shard, 1 master, 1 replica With version 4.2.1 everything works fine. But I do have a problem if I query instance 3 for something in the WebOrder_Collection. I found that this is a bug in 4.2.1,. I must query instances 1 or 2 to get results from WebOrder_Collection. Now that I have upgraded to 4.3.0 I have the following problem. My replicas will not recover. The recovery will retry, and retry, ... forever. Details. If I look at the Zoo, I see that: - node_name 10.0.2.15:8180_solr in solr 4.2.1 10.0.2.15:8180_ in solr 4.3.0 - base_url http://10.0.2.15:8180/solr in solr 4.2.1 http://10.0.2.15:8180 in solr 4.3.0 My solr logs show this: 8869687 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy – Error while trying to recover. core=WebOrder_Collection:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://10.0.2.15:8180 returned non ok status:404, message:Not Found at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223) I have not been able to find more info than that. The Solr cloud diagram shows instance1 as active and leader, instance 2 as recovering. My solrconfig.xml are identical, except for the LUCENE_42 or LUCENE_43 tag. Any idea? I hope that it is a configuration issue on my part... Thank you for any help, Nic.
Re: Migrating from 4.2.1 to 4.3.0
Your solr webapp context appears to be rather than solr. There was a JIRA issue in 4.3 that may have affected this, but I only saw it from a distance, so just a guess. What does it say in solr.xml for the context (an attribute on cores) - Mark On May 16, 2013, at 2:02 PM, M. Flatterie nicflatte...@yahoo.com wrote: Greetings, I just started with Solr a couple weeks ago, with version 4.2.1. I installed the following setup: - ZooKeeper: 3 instances ensemble - Solr: on Tomcat, 4 instances - WebOrder_Collection: instances 1 and 2, 1 shard, 1 master, 1 replica - other_collectionA: instances 3 and 4, 1 shard, 1 master, 1 replica - other_CollectionB: instances 3 and 4, 1 shard, 1 master, 1 replica With version 4.2.1 everything works fine. But I do have a problem if I query instance 3 for something in the WebOrder_Collection. I found that this is a bug in 4.2.1,. I must query instances 1 or 2 to get results from WebOrder_Collection. Now that I have upgraded to 4.3.0 I have the following problem. My replicas will not recover. The recovery will retry, and retry, ... forever. Details. If I look at the Zoo, I see that: - node_name 10.0.2.15:8180_solrin solr 4.2.1 10.0.2.15:8180_ in solr 4.3.0 - base_url http://10.0.2.15:8180/solr in solr 4.2.1 http://10.0.2.15:8180in solr 4.3.0 My solr logs show this: 8869687 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy – Error while trying to recover. core=WebOrder_Collection:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://10.0.2.15:8180 returned non ok status:404, message:Not Found at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223) I have not been able to find more info than that. The Solr cloud diagram shows instance1 as active and leader, instance 2 as recovering. My solrconfig.xml are identical, except for the LUCENE_42 or LUCENE_43 tag. Any idea? I hope that it is a configuration issue on my part... Thank you for any help, Nic.
having trouble storing large text blob fields - returns binary address in search results
hello environment: solr 3.5 can someone help me with the correct configuration for some large text blob fields? we have two fields in informix tables that are of type text. when we do a search the results for these fields come back looking like this: str name=attributes[B@17c232ee/str i have tried setting them up as clob fields - but this is not working (see details below) i have also tried treating them as plain string fields (removing the references to clob in the DIH) - but this does not work either. DIH configuration: entity transformer=quot;TemplateTransformer,ClobTransformerquot; name=quot;core1-partsquot; query=quot;select summ.*, 1 as item_type, 1 as part_cnt, '' as brand, ... lt;field column=quot;attr_valquot; name=quot;attributesquot; clob=quot;truequot; / field column=rsr_valname=restrictions clob=true / Schema.xml field name=attributes type=string indexed=false stored=true/ field name=restrictions type=string indexed=false stored=true/ thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/having-trouble-storing-large-text-blob-fields-returns-binary-address-in-search-results-tp4063979.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Migrating from 4.2.1 to 4.3.0
Afternoon, my solr.xml file is the following (and has not changed from 4.2.1 to 4.3.0): Context path=/solr docBase=/home/tcatadm1/apache-tomcat-7.0.39/webapps/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/home/solradm1 override=true/ /Context From: Mark Miller markrmil...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, May 16, 2013 2:28:52 PM Subject: Re: Migrating from 4.2.1 to 4.3.0 Your solr webapp context appears to be rather than solr. There was a JIRA issue in 4.3 that may have affected this, but I only saw it from a distance, so just a guess. What does it say in solr.xml for the context (an attribute on cores) - Mark On May 16, 2013, at 2:02 PM, M. Flatterie nicflatte...@yahoo.com wrote: Greetings, I just started with Solr a couple weeks ago, with version 4.2.1. I installed the following setup: - ZooKeeper: 3 instances ensemble - Solr: on Tomcat, 4 instances - WebOrder_Collection: instances 1 and 2, 1 shard, 1 master, 1 replica - other_collectionA: instances 3 and 4, 1 shard, 1 master, 1 replica - other_CollectionB: instances 3 and 4, 1 shard, 1 master, 1 replica With version 4.2.1 everything works fine. But I do have a problem if I query instance 3 for something in the WebOrder_Collection. I found that this is a bug in 4.2.1,. I must query instances 1 or 2 to get results from WebOrder_Collection. Now that I have upgraded to 4.3.0 I have the following problem. My replicas will not recover. The recovery will retry, and retry, ... forever. Details. If I look at the Zoo, I see that: - node_name 10.0.2.15:8180_solr in solr 4.2.1 10.0.2.15:8180_ in solr 4.3.0 - base_url http://10.0.2.15:8180/solr in solr 4.2.1 http://10.0.2.15:8180 in solr 4.3.0 My solr logs show this: 8869687 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy – Error while trying to recover. core=WebOrder_Collection:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://10.0.2.15:8180 returned non ok status:404, message:Not Found at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223) I have not been able to find more info than that. The Solr cloud diagram shows instance1 as active and leader, instance 2 as recovering. My solrconfig.xml are identical, except for the LUCENE_42 or LUCENE_43 tag. Any idea? I hope that it is a configuration issue on my part... Thank you for any help, Nic.
Re: Facets referenced by key
Thanks for the excellent clarification. I'll ask the sunspot guys about the localparams issue. I have a patch that would fix it Thanks Brendan On May 16, 2013, at 1:42 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I would then like to refer to these 'pseudo' field later in the request : string. I thought this would be how I'd do it: : : f.my_facet_key.facet.prefix=a_given_prefix ... that syntax was proposed in SOLR-1351 and a patch was made available, but it was never commited (it only supported a subset of faceting, needed more tests, and had unclear behavior about how the defaults where picked if you combined f.key.facet.foo + f.field.facet.foo + facet.foo) : I thought this would work, however it doesn't appear to. What does work is : if I define the prefix and mincount in the local params: : : facet.field={!ex=dt key=my_facet_key facet.prefix=a_given_prefix}the_facet_field Correct, SOLR-4717 added support to Solr 4.3 for specifying all of the facet options as local params such that that syntax would work. Given th way the use of Solr and localparams have evolved over the years it was considered a more natural and logical way to specify facet option on a per field or per key basis. : Is this expected? I'm also using sunspot and they construct the queries : with keys as in my first example, i.e. facet.field={!ex=dt : key=my_facet_key}the_facet_fieldf.my_facet_key.facet.prefix=a_given_prefix I can't comment on that ... i'm not sure why sunspot would assume that behavior would work (unless someone looked at SOLR-1351 once upon a time and assumed that would definitely be official at some point) -Hoss
Re: Oracle Timestamp in SOLR
On 5/16/2013 11:00 AM, Chris Hostetter wrote: There must be *some* way to either tweak your SQL or tweak your JDBC connection properties such that Oracle's JDBC driver will give you a legitimate java.sql.Date or java.sql.Timestamp instead of it's own internal class (that doesn't extend java.util.Date) ... otherwise it's just total freaking anarchy. Looks like you can use the V8Compatible connection property or upgrade the oracle jdbc driver. Upgrading the driver is probably the best option. http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-faq-090281.html#08_01 Thanks, Shawn
Aggregate word counts over a subset of documents
Is there a way to get aggregate word counts over a subset of documents? For example given the following data: { id: 1, category: cat1, includes: The green car., }, { id: 2, category: cat1, includes: The red car., }, { id: 3, category: cat2, includes: The black car., } I'd like to be able to get total term frequency counts per category. e.g. category name=cat1 lst name=the2/lst lst name=car2/lst lst name=green1/lst lst name=red1/lst /category category name=cat2 lst name=the1/lst lst name=car1/lst lst name=black1/lst /category I was initially hoping to do this within Solr and I tried using the TermFrequencyComponent. This gives term frequencies for individual documents and term frequencies for the entire index but doesn't seem to help with subsets. For example, TermFrequencyComponent would tell me that car occurs 3 times over all documents in the index and 1 time in document 1 but not that it occurs 2 times over cat1 documents and 1 time over cat2 documents. Is there a good way to use Solr/Lucene to gather aggregate results like this? I've been focusing on just using Solr with XML files but I could certainly write Java code if necessary. Thanks, David
Re: Migrating from 4.2.1 to 4.3.0
On 5/16/2013 12:37 PM, M. Flatterie wrote: Afternoon, my solr.xml file is the following (and has not changed from 4.2.1 to 4.3.0): Context path=/solr docBase=/home/tcatadm1/apache-tomcat-7.0.39/webapps/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/home/solradm1 override=true/ /Context That is not the solr.xml Mark is referring to. This solr.xml configures tomcat to load Solr. You will have /home/solradm1/solr.xml as well, that is the one we are concerned with. Thanks, Shawn
Re: Migrating from 4.2.1 to 4.3.0
Oups sorry about that, since it was referring context I thought it was the Tomcat one. Here is the /home/solradm1/solr.xml file (comments removed!) ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores defaultCoreName=WebOrder_Collection host=${host:} hostPort=8180 hostContext=${hostContext:} zkClientTimeout=${zkClientTimeout:15000} core name=WebOrder_Collection instanceDir=WebOrder_Collection property name=solr.data.dir value=/home/solradm1/WebOrder_Collection/data / property name=solr.ulog.dir value=/home/solradm1/WebOrder_Collection/ulog / /core /cores /solr Note: I configure solr.data.dir and solr.ulog.dir so I can run two instances on the same system and separate the data and ulog directories between the instances. Nic. From: Shawn Heisey s...@elyograg.org To: solr-user@lucene.apache.org Sent: Thursday, May 16, 2013 3:29:41 PM Subject: Re: Migrating from 4.2.1 to 4.3.0 On 5/16/2013 12:37 PM, M. Flatterie wrote: Afternoon, my solr.xml file is the following (and has not changed from 4.2.1 to 4.3.0): Context path=/solr docBase=/home/tcatadm1/apache-tomcat-7.0.39/webapps/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/home/solradm1 override=true/ /Context That is not the solr.xml Mark is referring to. This solr.xml configures tomcat to load Solr. You will have /home/solradm1/solr.xml as well, that is the one we are concerned with. Thanks, Shawn
Re: Aggregate word counts over a subset of documents
David, A Pivot Facet could possibly provide these results by the following syntax: facet.pivot=category,includes We would presume that includes is a tokenized field and thus a set of facet values would be rendered from the terms resoling from that tokenization. This would be nested in each category…and, of course, the entire set of documents considered for these facets is constrained by the current query. I think this maps to your requirement. Jason On May 16, 2013, at 12:29 PM, David Larochelle dlaroche...@cyber.law.harvard.edu wrote: Is there a way to get aggregate word counts over a subset of documents? For example given the following data: { id: 1, category: cat1, includes: The green car., }, { id: 2, category: cat1, includes: The red car., }, { id: 3, category: cat2, includes: The black car., } I'd like to be able to get total term frequency counts per category. e.g. category name=cat1 lst name=the2/lst lst name=car2/lst lst name=green1/lst lst name=red1/lst /category category name=cat2 lst name=the1/lst lst name=car1/lst lst name=black1/lst /category I was initially hoping to do this within Solr and I tried using the TermFrequencyComponent. This gives term frequencies for individual documents and term frequencies for the entire index but doesn't seem to help with subsets. For example, TermFrequencyComponent would tell me that car occurs 3 times over all documents in the index and 1 time in document 1 but not that it occurs 2 times over cat1 documents and 1 time over cat2 documents. Is there a good way to use Solr/Lucene to gather aggregate results like this? I've been focusing on just using Solr with XML files but I could certainly write Java code if necessary. Thanks, David
wiki versus downloads versus archives
http://wiki.apache.org/solr/Solr3.1 claims that Solr3.1 is available in a place where it is not, and I can't find a link on the front page to the archive for old releases.
Re: Migrating from 4.2.1 to 4.3.0
On 5/16/2013 1:40 PM, M. Flatterie wrote: Oups sorry about that, since it was referring context I thought it was the Tomcat one. Here is the /home/solradm1/solr.xml file (comments removed!) ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores defaultCoreName=WebOrder_Collection host=${host:} hostPort=8180 hostContext=${hostContext:} zkClientTimeout=${zkClientTimeout:15000} core name=WebOrder_Collection instanceDir=WebOrder_Collection property name=solr.data.dir value=/home/solradm1/WebOrder_Collection/data / property name=solr.ulog.dir value=/home/solradm1/WebOrder_Collection/ulog / /core /cores /solr The hostContext attribute needs changing. It should be this instead: hostContext=${hostContext:/solr} Looks like the previous version wasn't taking this attribute from your config, but the new version is. This is probably a bug that was fixed in 4.3. Thanks, Shawn
Re: wiki versus downloads versus archives
On 5/16/2013 2:21 PM, Benson Margulies wrote: http://wiki.apache.org/solr/Solr3.1 claims that Solr3.1 is available in a place where it is not, and I can't find a link on the front page to the archive for old releases. Download links fixed on the wiki pages for 3.1 and 3.2. Thanks, Shawn
Re: wiki versus downloads versus archives
tanks. On Thu, May 16, 2013 at 4:28 PM, Shawn Heisey s...@elyograg.org wrote: On 5/16/2013 2:21 PM, Benson Margulies wrote: http://wiki.apache.org/solr/**Solr3.1http://wiki.apache.org/solr/Solr3.1claims that Solr3.1 is available in a place where it is not, and I can't find a link on the front page to the archive for old releases. Download links fixed on the wiki pages for 3.1 and 3.2. Thanks, Shawn
Re: Zookeeper Ensemble Startup Parameters For SolrCloud?
Hi Shawn; You have some tips about JVM parameters starting a Solr node. What do you have special for Solr when you start a Zookeeper ensemble. i.e. heap size? 2013/5/16 Shawn Heisey s...@elyograg.org On 5/16/2013 9:25 AM, Furkan KAMACI wrote: I know that there have been many conversations about SolrCloud startup tips i.e. which type of garbage collector to use etc. Also I know that there is no an exact answer for this question. However I think that folks have some tips about this question. How do you start up your external Zookeeper, with which parameters and any tips for it? An external zookeeper is just that - external, not part of Solr. I followed the zookeeper docs, and used the normal zookeeper port, 2181: http://zookeeper.apache.org/doc/r3.4.5/ Thanks, Shawn
SurroundQParser does not analyze the query text
Hi, I'm trying to use Surround Query Parser for two reasons, which are not covered by proximity slops: 1. find documents with two words within a given distance, *unordered* 2. given two lists of words, find documents with (at least) one word from list A and (at least) one word from list B, within a given distance. The surround query parser looks great, but it have one big drawback - It does not analyze the query text. It is documented in the [weak :(] wiki page. Can this issue be solved somehow, or it is a bigger constraint? Should I open a JIRA issue for this? Any work-around?
Re: Zookeeper Ensemble Startup Parameters For SolrCloud?
On 5/16/2013 2:34 PM, Furkan KAMACI wrote: You have some tips about JVM parameters starting a Solr node. What do you have special for Solr when you start a Zookeeper ensemble. i.e. heap size? I haven't given it any JVM options. The ZK process on my primary server has a 5GB virtual memory size and is using 131MB of system memory. If you're not going to be creating a large number of collection or replicas and you're not using super-large config files, you could probably limit the max heap to a pretty small number and be OK. Thanks, Shawn
Re: Migrating from 4.2.1 to 4.3.0
Great it works, I am back on track! Thank you!!! Nic From: Shawn Heisey s...@elyograg.org To: solr-user@lucene.apache.org Sent: Thursday, May 16, 2013 4:25:09 PM Subject: Re: Migrating from 4.2.1 to 4.3.0 On 5/16/2013 1:40 PM, M. Flatterie wrote: Oups sorry about that, since it was referring context I thought it was the Tomcat one. Here is the /home/solradm1/solr.xml file (comments removed!) ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores defaultCoreName=WebOrder_Collection host=${host:} hostPort=8180 hostContext=${hostContext:} zkClientTimeout=${zkClientTimeout:15000} core name=WebOrder_Collection instanceDir=WebOrder_Collection property name=solr.data.dir value=/home/solradm1/WebOrder_Collection/data / property name=solr.ulog.dir value=/home/solradm1/WebOrder_Collection/ulog / /core /cores /solr The hostContext attribute needs changing. It should be this instead: hostContext=${hostContext:/solr} Looks like the previous version wasn't taking this attribute from your config, but the new version is. This is probably a bug that was fixed in 4.3. Thanks, Shawn
SOLR Junit test - How to resolve error - 'thread leaked from SUITE scope'?
I am using SOLR 4.3.0...I am currently getting the below error when running test for custom SOLR components. The tests pass without any issues but I am getting the below error after the tests are done.. Can someone let me how to resolve this issue? thread leaked from SUITE scope at com.solr.activemq.TestWriter: [junit]1) Thread[id=19, name=ActiveMQ Scheduler, state=WAITING, group=TGRP-TestWriter] [junit] at java.lang.Object.wait(Native Method) [junit] at java.lang.Object.wait(Object.java:503) [junit] at java.util.TimerThread.mainLoop(Timer.java:526) [junit] at java.util.TimerThread.run(Timer.java:505) [junit] com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at com.solr.activemq.TestWriter: [junit]1) Thread[id=19, name=ActiveMQ Scheduler, state=WAITING, group=TGRP-TestWriter] [junit] at java.lang.Object.wait(Native Method) [junit] at java.lang.Object.wait(Object.java:503) [junit] at java.util.TimerThread.mainLoop(Timer.java:526) [junit] at java.util.TimerThread.run(Timer.java:505) [junit] at __randomizedtesting.SeedInfo.seed([64E0A7A0D98E09EE]:0) -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Junit-test-How-to-resolve-error-thread-leaked-from-SUITE-scope-tp4064026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Zookeeper Ensemble Startup Parameters For SolrCloud?
Hi Shawn; I will have totally 18 Solr nodes at my current pre-prototype environment over one collection and I don't have large config files. I know that best and only recommend practice for estimating the heap size of my system needs is to run load tests and I will. I asked this question because of an example at Zookeeper wiki: You should take special care to set your Java max heap size correctly. In particular, you should not create a situation in which ZooKeeper swaps to disk. The disk is death to ZooKeeper. Everything is ordered, so if processing one request swaps the disk, all other queued requests will probably do the same. the disk. DON'T SWAP. Be conservative in your estimates: if you have 4G of RAM, do not set the Java max heap size to 6G or even 4G. For example, it is more likely you would use a 3G heap for a 4G machine, as the operating system and the cache also need memory. This may be a more Zookeeper related question but one more question too. Is there anything something like not to use Zookeeper on a virtual machine because of performance issues or not? 2013/5/16 Shawn Heisey s...@elyograg.org On 5/16/2013 2:34 PM, Furkan KAMACI wrote: You have some tips about JVM parameters starting a Solr node. What do you have special for Solr when you start a Zookeeper ensemble. i.e. heap size? I haven't given it any JVM options. The ZK process on my primary server has a 5GB virtual memory size and is using 131MB of system memory. If you're not going to be creating a large number of collection or replicas and you're not using super-large config files, you could probably limit the max heap to a pretty small number and be OK. Thanks, Shawn
Re: SOLR Junit test - How to resolve error - 'thread leaked from SUITE scope'?
On 5/16/2013 3:05 PM, bbarani wrote: I am using SOLR 4.3.0...I am currently getting the below error when running test for custom SOLR components. The tests pass without any issues but I am getting the below error after the tests are done.. Can someone let me how to resolve this issue? thread leaked from SUITE scope at com.solr.activemq.TestWriter: [junit]1) Thread[id=19, name=ActiveMQ Scheduler, state=WAITING, group=TGRP-TestWriter] It looks like your code incorporates ActiveMQ. That software apparently starts a scheduler thread, and you aren't shutting that down. I'm guessing that part of ActiveMQ initialization is creating some kind of scheduler object, and that you will need to call a close() or shutdown() method on that object as you wrap things up. If that doesn't help, you'll need to consult support resources for ActiveMQ. Thanks, Shawn
Re: Speed up import of Hierarchical Data
Thank you Stefan. I am new to Solr and I would need to read up more on CachedSqlEntityProcessor. Do you have any clue where to begin? There do not seem to be any tutorials online. The link you provided seems to have a very short and unclear explanation. After “Example 1” you have “The usage is exactly same as the other one.” What does “other one” refer to? I did not understand the description completely. This description seems to say that if the query is the same as a prior query it would fetched from the cache. From my case each of the Category queries are unique because they have a unique SKU and Category Level. Would CachedSqlEntityProcessor then help me? Thank you, O. O. Stefan Matheis-2 wrote That sounds like a perfect match for http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor :) -- View this message in context: http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924p4064034.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Aggregate word counts over a subset of documents
Jason, Thanks so much for your suggestion. This seems to do what I need. -- David On Thu, May 16, 2013 at 3:59 PM, Jason Hellman jhell...@innoventsolutions.com wrote: David, A Pivot Facet could possibly provide these results by the following syntax: facet.pivot=category,includes We would presume that includes is a tokenized field and thus a set of facet values would be rendered from the terms resoling from that tokenization. This would be nested in each category…and, of course, the entire set of documents considered for these facets is constrained by the current query. I think this maps to your requirement. Jason On May 16, 2013, at 12:29 PM, David Larochelle dlaroche...@cyber.law.harvard.edu wrote: Is there a way to get aggregate word counts over a subset of documents? For example given the following data: { id: 1, category: cat1, includes: The green car., }, { id: 2, category: cat1, includes: The red car., }, { id: 3, category: cat2, includes: The black car., } I'd like to be able to get total term frequency counts per category. e.g. category name=cat1 lst name=the2/lst lst name=car2/lst lst name=green1/lst lst name=red1/lst /category category name=cat2 lst name=the1/lst lst name=car1/lst lst name=black1/lst /category I was initially hoping to do this within Solr and I tried using the TermFrequencyComponent. This gives term frequencies for individual documents and term frequencies for the entire index but doesn't seem to help with subsets. For example, TermFrequencyComponent would tell me that car occurs 3 times over all documents in the index and 1 time in document 1 but not that it occurs 2 times over cat1 documents and 1 time over cat2 documents. Is there a good way to use Solr/Lucene to gather aggregate results like this? I've been focusing on just using Solr with XML files but I could certainly write Java code if necessary. Thanks, David
Re: Question about Edismax - Solr 4.0
Hi Jack, Thanks for your response again and for helping me out to get through this. The URL is definitely encoded for spaces and it looks like below. As I mentioned in my previous mail, I can't add it to query parameter as that searches on multiple fields. The title field is defined as below: field name=title type=text_wc indexed=true stored=false multiValued=true/ q=countrysiderows=20qt=assdismaxfq=%28title%3A%28,10%29%29fq=collection:assets requestHandler name=assdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qftitle^10 description^5 annotations^3 notes^2 categories/str str name=pftitle/str int name=ps0/int str name=q.alt*:*/str str name=fl*,score/str str name=mm100%/str str name=q.opAND/str str name=sortscore desc/str str name=facettrue/str str name=facet.limit-1/str str name=facet.mincount1/str str name=facet.fielduniq_subtype_id/str str name=facet.fieldcomponent_type/str str name=facet.fieldgenre_type/str /lst lst name=appends str name=fqcollection:assets/str /lst /requestHandler The term 'countryside' needs to be searched against multiple fields including titles, descriptions, annotations, categories, notes but the UI also has a feature to limit results by providing a title field. I can see that the filter queries are always parsed by LuceneQueryParser however I'd expect it to generate the parsed_filter_queries debug output in every situation. I have tried it as the main query with both edismax and lucene defType and it gives me correct output and correct results. But, there is some problem when this is used as a filter query as the the parser is not able to parse a comma with a space. Thanks again Jack, please let me know in case you need more inputs from my side. Best Regards, Sandeep On 16 May 2013 18:03, Jack Krupansky j...@basetechnology.com wrote: Could you show us the full query URL - spaces must be encoded in URL query parameters. Also show the actual field XML - you omitted that. Try the same query as a main query, using both defType=edismax and defType=lucene. Note that the filter query is parsed using the Lucene query parser, not edismax, independent of the defType parameter. But you don't have any edismax features in your fq anyway. But you can stick {!edismax} in front of the query to force edismax to be used for the fq, although it really shouldn't change anything: Also, catenate is fine for indexing, but will mess up your queries at query time, so set them to 0 in the query analyzer Also, make sure you have autoGeneratePhraseQueries=**true on the field type, but that's not the issue here. -- Jack Krupansky -Original Message- From: Sandeep Mestry Sent: Thursday, May 16, 2013 12:42 PM To: solr-user@lucene.apache.org Subject: Re: Question about Edismax - Solr 4.0 Thanks Jack for your reply.. The problem is, I'm finding results for fq=title:(,10) but not for fq=title:(, 10) - apologies if that was not clear from my first mail. I have already mentioned the debug analysis in my previous mail. Additionally, the title field is defined as below: fieldType name=text_wc class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.**WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.**LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.**WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory stemEnglishPossessive=0 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1 / filter class=solr.**LowerCaseFilterFactory/ /analyzer /fieldType I have the set catenate options to 1 for all types. I can understand if ',' getting ignored when it is on its own (title:(, 10)) but - Why solr is not searching for 10 in that case just like it did when the query was (title:(,10))? - And why other filter queries did not show up (collection:assets) in debug section? Thanks, Sandeep On 16 May 2013 13:57, Jack Krupansky j...@basetechnology.com wrote: You haven't indicated any problem here! What is the symptom that you actually think is a problem. There is no comma operator in any of the Solr query parsers. Comma is just another character that may or may not be included or discarded depending on the specific field type and analyzer. For example, a white space analyzer will keep commas, but the standard analyzer or the word delimiter filter will discard them. If title were a string type, all punctuation would be
RE: Speed up import of Hierarchical Data
Thank you James. Are there any examples of SortedMapBackedCache? I am new to Solr and I do not find many tutorials in this regard. I just modified the examples and they worked for me. What is a good way to learn these basics? O. O. Dyer, James-2 wrote See https://issues.apache.org/jira/browse/SOLR-2943 . You can set up 2 DIH handlers. The first would query the CAT_TABLE and save it to a disk-backed cache, using DIHCacheWriter. You then would replace your 3 child entities in the 2nd DIH handler to use DIHCacheProcessor to read back the cached data. This is a little complicated to do, but it would let you just cache the data once and because it is disk-backed, will scale to whatever size the CAT_TABLE is. (For some details, see this thread: http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tt4015514.html) A simpler method is simply to specify cacheImpl=SortedMapBackedCache on the 3 child entities. (This is the same as using CachedSqlEntityProcessor.) It would generate 3 in-memory caches, each with the same data. If CAT_TABLE is small, this would be adequate. In between this would be to create a disk-backed cache Impl (or use the ones at SOLR-2613 or SOLR-2948) and specify it on cacheImpl. It would still create 3 identical caches, but they would be disk-backed and could scale beyond what in-memory can handle. James Dyer Ingram Content Group (615) 213-4311 -- View this message in context: http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924p4064040.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Deleting an entry from a collection when they key has : in it
: If in my schema, I have the key field set to indexed=false, then is that : maybe the issue? I'm going to try to set that to true and rebuild the : repository and see if that does it. if a field is indexed=false you can not query on it. if you can not query on a field, then you can not delete by a query against that field. -Hoss
Re: SurroundQParser does not analyze the query text
The issue can certainly be solved. But to me, it's actually a bit of a feature by design for the Lucene-level surround query parser to not do analysis, as it seems to have been meant for advanced query writers to piece together sophisticated SpanQuery-based pattern matching kinds of things utilizing their knowledge of how text was analyzed and indexed. But for sure it could be modified to do analysis, probably using the multiterm analyzer feature in there now elsewhere now. I looked into this when I did the basic work of integrating the surround query parser, and determined it was a lot of work because it'd need changes in the Lucene level code to leverage analysis, and then glue at the Solr level to be field type aware and savvy. By all means open and JIRA and contribute! Workaround? Client-side calls can be made to analyze text, and the client-side could build up a query expression based on term-by-term (or phrase) analysis results. Maybe that means a prohibitive number of requests to Solr to build up a query in a way that leverages Solr's field type analysis settings, but it is a technologically possible technique maybe worth considering. Erik On May 16, 2013, at 16:38 , Isaac Hebsh wrote: Hi, I'm trying to use Surround Query Parser for two reasons, which are not covered by proximity slops: 1. find documents with two words within a given distance, *unordered* 2. given two lists of words, find documents with (at least) one word from list A and (at least) one word from list B, within a given distance. The surround query parser looks great, but it have one big drawback - It does not analyze the query text. It is documented in the [weak :(] wiki page. Can this issue be solved somehow, or it is a bigger constraint? Should I open a JIRA issue for this? Any work-around?
Re: SOLR test framework- ERROR: SolrIndexSearcher opens=1 closes=0
Thanks a lot for your response. I figured out that I am not closing the LocalSolrQueryRequest after handling the response..The error got resolved after closing the request object. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-test-framework-ERROR-SolrIndexSearcher-opens-1-closes-0-tp4063940p4064044.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr httpCaching for distinct handlers
Hi everybody, I would like to have distinct httpCaching configuration for distinct handlers, i.e if a request comes for select, send a cache control header of 1 minute ; and if receive a request for mlt then send a cache control header of 5 minutes. Is there a way to do that in my solrconfig.xml ? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-httpCaching-for-distinct-handlers-tp4064050.html Sent from the Solr - User mailing list archive at Nabble.com.
Null identity service When Running Solr 4.2.1 with log4j
I have Solr 4.2.1 and want to use log4j. I have followed wiki. Here are my jar versions: java -jar start.jar --version Active Options: [default, *] Version Information on 15 entries in the classpath. Note: order presented here is how they would appear on the classpath. changes to the OPTIONS=[option,option,...] command line option will be reflected here. 0: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-xml-8.1.8.v20121106.jar 1: 3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar 2: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-http-8.1.8.v20121106.jar 3: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-continuation-8.1.8.v20121106.jar 4: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-server-8.1.8.v20121106.jar 5: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-security-8.1.8.v20121106.jar 6: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-servlet-8.1.8.v20121106.jar 7: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-webapp-8.1.8.v20121106.jar 8: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-deploy-8.1.8.v20121106.jar 9: 1.7.5 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.7.5.jar 10: 1.2.17 | ${jetty.home}/lib/ext/log4j-1.2.17.jar 11: 1.7.5 | ${jetty.home}/lib/ext/slf4j-api-1.7.5.jar 12: 1.7.5 | ${jetty.home}/lib/ext/slf4j-log4j12-1.7.5.jar 13: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-util-8.1.8.v20121106.jar 14: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-io-8.1.8.v20121106.jar I created a log4j.properties under etc folder and this is inside of it: # Logging level log4j.rootLogger=WARN, file #- size rotation with log cleanup. log4j.appender.file=org.apache.log4j.RollingFileAppender log4j.appender.file.MaxFileSize=4MB log4j.appender.file.MaxBackupIndex=9 #- File to log to and log format log4j.appender.file.File=logs/solr.log log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%-5p - %d{-MM-dd HH:mm:ss.SSS}; %C; %m\n When I run start.jar I get that: java -Dlog4j.debug -Dlog4j.configuration=file:home/kk/Desktop/preprop/etc/log4j.properties -jar start.jar log4j: Using URL [file:home/kk/Desktop/preprop/etc/log4j.properties] for automatic log4j configuration. log4j: Reading configuration from URL file:home/kk/Desktop/preprop/etc/log4j.properties log4j: Parsing for [root] with value=[WARN, file]. log4j: Level token is [WARN]. log4j: Category root set to WARN log4j: Parsing appender named file. log4j: Parsing layout options for file. log4j: Setting property [conversionPattern] to [%-5p - %d{-MM-dd HH:mm:ss.SSS}; %C; %m ]. log4j: End of parsing for file. log4j: Setting property [maxBackupIndex] to [9]. log4j: Setting property [file] to [logs/solr.log]. log4j: Setting property [maxFileSize] to [4MB]. log4j: setFile called: logs/solr.log, true log4j: setFile ended log4j: Parsed file options. log4j: Finished configuring. *Null identity service, trying login service: null Finding identity service: null* What I am missing?
Re: Null identity service When Running Solr 4.2.1 with log4j
When I check under logs folder I see that there is a file called solr.log and has that line: WARN - 2013-05-17 02:16:47.688; org.apache.solr.core.CoreContainer; Log watching is not yet implemented for log4j 2013/5/17 Furkan KAMACI furkankam...@gmail.com I have Solr 4.2.1 and want to use log4j. I have followed wiki. Here are my jar versions: java -jar start.jar --version Active Options: [default, *] Version Information on 15 entries in the classpath. Note: order presented here is how they would appear on the classpath. changes to the OPTIONS=[option,option,...] command line option will be reflected here. 0: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-xml-8.1.8.v20121106.jar 1: 3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar 2: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-http-8.1.8.v20121106.jar 3: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-continuation-8.1.8.v20121106.jar 4: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-server-8.1.8.v20121106.jar 5: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-security-8.1.8.v20121106.jar 6: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-servlet-8.1.8.v20121106.jar 7: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-webapp-8.1.8.v20121106.jar 8: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-deploy-8.1.8.v20121106.jar 9: 1.7.5 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.7.5.jar 10: 1.2.17 | ${jetty.home}/lib/ext/log4j-1.2.17.jar 11: 1.7.5 | ${jetty.home}/lib/ext/slf4j-api-1.7.5.jar 12: 1.7.5 | ${jetty.home}/lib/ext/slf4j-log4j12-1.7.5.jar 13: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-util-8.1.8.v20121106.jar 14: 8.1.8.v20121106 | ${jetty.home}/lib/jetty-io-8.1.8.v20121106.jar I created a log4j.properties under etc folder and this is inside of it: # Logging level log4j.rootLogger=WARN, file #- size rotation with log cleanup. log4j.appender.file=org.apache.log4j.RollingFileAppender log4j.appender.file.MaxFileSize=4MB log4j.appender.file.MaxBackupIndex=9 #- File to log to and log format log4j.appender.file.File=logs/solr.log log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%-5p - %d{-MM-dd HH:mm:ss.SSS}; %C; %m\n When I run start.jar I get that: java -Dlog4j.debug -Dlog4j.configuration=file:home/kk/Desktop/preprop/etc/log4j.properties -jar start.jar log4j: Using URL [file:home/kk/Desktop/preprop/etc/log4j.properties] for automatic log4j configuration. log4j: Reading configuration from URL file:home/kk/Desktop/preprop/etc/log4j.properties log4j: Parsing for [root] with value=[WARN, file]. log4j: Level token is [WARN]. log4j: Category root set to WARN log4j: Parsing appender named file. log4j: Parsing layout options for file. log4j: Setting property [conversionPattern] to [%-5p - %d{-MM-dd HH:mm:ss.SSS}; %C; %m ]. log4j: End of parsing for file. log4j: Setting property [maxBackupIndex] to [9]. log4j: Setting property [file] to [logs/solr.log]. log4j: Setting property [maxFileSize] to [4MB]. log4j: setFile called: logs/solr.log, true log4j: setFile ended log4j: Parsed file options. log4j: Finished configuring. *Null identity service, trying login service: null Finding identity service: null* What I am missing?
Controlling which node(s) hold(s) a collection
Hi, Is it possible to control on which node(s) a collection should be placed? I've looked at http://wiki.apache.org/solr/SolrCloud and http://wiki.apache.org/solr/CoreAdmin and have searched the ML archives, but couldn't find any mentions of that. Use case: * Want to use SolrCloud for large indices that I want to shard and replicate * Have a number of smaller indices that need to live in the same cluster, but that I don't want to shard - queries are fast when executed against the whole index being on a single server, and they use join and pivot faceting, neither of which works with sharded indices I have 30+ such non-shardable indices of varying sizes and I want to make sure they are distributed over all cluster nodes nice and evenly. I'm assuming there is no better way than to manually control placement of my 1-shard collections (i that's even doable), but if there is a better way, I'm all eyeballs! Thanks, Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html
Re: Null identity service When Running Solr 4.2.1 with log4j
On 5/16/2013 5:22 PM, Furkan KAMACI wrote: *Null identity service, trying login service: null Finding identity service: null* What I am missing? That's a message from jetty that has nothing to do with Solr. https://bugs.eclipse.org/bugs/show_bug.cgi?id=396295 You'll probably need to upgrade your jetty version to get rid of it, but it's harmless. When I check under logs folder I see that there is a file called solr.log and has that line: WARN - 2013-05-17 02:16:47.688; org.apache.solr.core.CoreContainer; Log watching is not yet implemented for log4j This is normal for 4.2.1 - it means that you can't view the log in the admin UI, because the UI doesn't support log4j. You'll find that with your logging level set to WARN, Solr logs next to nothing - that message about the log watching may be the only message you see. Thanks, Shawn
Re: Null identity service When Running Solr 4.2.1 with log4j
Thanks Shawn. I have just wondered that how other people could used log4j with 4.2.1 because of there is a paragraph for Using log4j with Solr from source, 4.2.1 or earlier at wiki. 2013/5/17 Shawn Heisey s...@elyograg.org On 5/16/2013 5:22 PM, Furkan KAMACI wrote: *Null identity service, trying login service: null Finding identity service: null* What I am missing? That's a message from jetty that has nothing to do with Solr. https://bugs.eclipse.org/bugs/show_bug.cgi?id=396295 You'll probably need to upgrade your jetty version to get rid of it, but it's harmless. When I check under logs folder I see that there is a file called solr.log and has that line: WARN - 2013-05-17 02:16:47.688; org.apache.solr.core.CoreContainer; Log watching is not yet implemented for log4j This is normal for 4.2.1 - it means that you can't view the log in the admin UI, because the UI doesn't support log4j. You'll find that with your logging level set to WARN, Solr logs next to nothing - that message about the log watching may be the only message you see. Thanks, Shawn
Re: Controlling which node(s) hold(s) a collection
You can control simply with the CoreAdmin api - the core is created at the location of whatever url you use…simply fire the creates at whatever nodes you want the collection to live on. The collections api also takes a list of nodes names to use optionally. - Mark On May 16, 2013, at 7:34 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Is it possible to control on which node(s) a collection should be placed? I've looked at http://wiki.apache.org/solr/SolrCloud and http://wiki.apache.org/solr/CoreAdmin and have searched the ML archives, but couldn't find any mentions of that. Use case: * Want to use SolrCloud for large indices that I want to shard and replicate * Have a number of smaller indices that need to live in the same cluster, but that I don't want to shard - queries are fast when executed against the whole index being on a single server, and they use join and pivot faceting, neither of which works with sharded indices I have 30+ such non-shardable indices of varying sizes and I want to make sure they are distributed over all cluster nodes nice and evenly. I'm assuming there is no better way than to manually control placement of my 1-shard collections (i that's even doable), but if there is a better way, I'm all eyeballs! Thanks, Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html
Re: Null identity service When Running Solr 4.2.1 with log4j
Ok, I have used 4.3.0's jetty and lib folder (of course plus log4j.properties) and it works with 4.2.1 now. 2013/5/17 Furkan KAMACI furkankam...@gmail.com Thanks Shawn. I have just wondered that how other people could used log4j with 4.2.1 because of there is a paragraph for Using log4j with Solr from source, 4.2.1 or earlier at wiki. 2013/5/17 Shawn Heisey s...@elyograg.org On 5/16/2013 5:22 PM, Furkan KAMACI wrote: *Null identity service, trying login service: null Finding identity service: null* What I am missing? That's a message from jetty that has nothing to do with Solr. https://bugs.eclipse.org/bugs/show_bug.cgi?id=396295 You'll probably need to upgrade your jetty version to get rid of it, but it's harmless. When I check under logs folder I see that there is a file called solr.log and has that line: WARN - 2013-05-17 02:16:47.688; org.apache.solr.core.CoreContainer; Log watching is not yet implemented for log4j This is normal for 4.2.1 - it means that you can't view the log in the admin UI, because the UI doesn't support log4j. You'll find that with your logging level set to WARN, Solr logs next to nothing - that message about the log watching may be the only message you see. Thanks, Shawn