Taxonomy in SOLR
Hi, I am trying Solr and i have one question. In the schema that i set up, there are 10 fields with always same data(hierarchical taxonomies) but with 4 million documents, space disk and indexing time must be big. I need this field for auto complete. Is there another way to do this type of operation ? Damien
Re: Taxonomy in SOLR
Hi Damien, can you provide a schema sample plus example-data? Since your information is really general, I think no one can give you a situation-specific advice. Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2318200.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting started with writing parser
my solrconfig.xml http://pastebin.com/XDg0L4di my schema.xml http://pastebin.com/3Vqvr3C0 my try.xml http://pastebin.com/YWsB37ZW - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2318218.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DIH serialize
Hi Dennis, thank you for your answer, but didn't understand why you say it doesn't need serialization. I'm with the option C. but the main question is, how to put into one field a result of many fields: SELECT * FROM. thanks, Rich -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Monday, January 24, 2011 02:07 To: solr-user@lucene.apache.org Subject: Re: DIH serialize Depends on your process chain to the eventual viewer/consumer of the data. The questions to ask are: A/ Is the data IN Solr going to be viewed or processed in its original form: --set stored = 'true' ---no serialization needed. B/ If it's going to be anayzed and searched for separate from any other field, the analyzing will put it into an unreadable form. If you need to see it, then ---set indexed=true and stored=true ---no serializaton needed. C/ If it's NOT going to be viewed AS IS, and it's not going to be searched for AS IS, (i.e. other columns will be how the data is found), and you have another, serialzable format: --set indexed=false and stored=true --serialize AS PER THE INTENDED APPLICATION, not sure that Solr can do that at all. C/ If it's NOT going to be viewed AS IS, and it's not going to be searched for AS IS, (i.e. other columns will be how the data is found), and you have another, serialzable format: --set indexed=false and stored=true --serialize AS PER THE INTENDED APPLICATION, not sure that Solr can do that at all. D/ If it's NOT going to be viewed AS IS, BUT it's going to be searched for AS IS, (this column will be how the data is found), and you have another, serialzable format: --you need to put it into TWO columns --A SERIALIZED FIELD --set indexed=false and stored=true --AN UNSERIALIZED FIELD --set indexed=false and stored=true --serialize AS PER THE INTENDED APPLICATION, not sure that Solr can do that at all. Hope that helps! Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Papp Richard ccode...@gmail.com To: solr-user@lucene.apache.org Sent: Sun, January 23, 2011 2:02:05 PM Subject: DIH serialize Hi all, I wasted the last few hours trying to serialize some column values (from mysql) into a Solr column, but I just can't find such a function. I'll use the value in PHP - I don't know if it is possible to serialize in PHP style at all. This is what I tried and works with a given factor: in schema.xml: field name=main_timetable type=text indexed=false stored=true multiValued=true / in DIH xml: dataConfig script![CDATA[ function my_serialize(row) { row.put('main_timetable', row.toString()); return row; } ]]/script . entity name=main_timetable query= SELECT * FROM shop_time_table stt WHERE stt.shop_id = '${shop.id}'; transformer=script:my_serialize . Can I use java directly in script (script language=Java) ? How could I achieve this? Or any other idea? I need these values together (from a row) and I need then in PHP to handle the result easily. thanks, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5740 (20101228) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5740 (20101228) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Re: Taxonomy in SOLR
My schema : field name=id type=string indexed=true stored=true required=true / !-- Document -- field name=lead type=string indexed=true stored=true / field name=title type=string indexed=true stored=true required=true / field name=text type=string indexed=true stored=true required=true / !-- taxo -- dynamicField name=*_taxon_label type=string indexed=true stored=true / dynamicField name=*_taxon_type type=string indexed=true stored=true / dynamicField name=*_taxon_hierarchy type=string indexed=true stored=true multiValued=true / field name=type type=string indexed=true stored=true required=true / Le 24/01/2011 09:56, Em a écrit : Hi Damien, can you provide a schema sample plus example-data? Since your information is really general, I think no one can give you a situation-specific advice. Regards
Re: Taxonomy in SOLR
Hi Damien, why are you storing the taxonomies? When it comes to faceting, it only depends on indexed values. If there is a meaningful difference between the indexed and the stored value, I would prefer to use an RDBMs or something like that to reduce redundancy. Does this help? Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2318363.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing spatial columns
Hi, I'm a bit of a solr beginner. I have installed Solr 4.0 and I'm trying to index some spatial data stored in a sql server instance. I'm using the DataImportHandler here is my data-comfig.xml: dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost\sqlserver08;databaseName=Spatial user=sa password=sqlserver08/ document entity name=poi query=select OBJECTID,CATEGORY,NAME,POINT_X,POINT_Y from NZ_POI field column=OBJECTID name=id/ field column=CATEGORY name=category/ field column=NAME name=name/ field column=POINT_X name=lat/ field column=POINT_Y name=lon/ /entity /document /dataConfig In my schema file I have following definition: field name=category type=string indexed=true stored=true/ field name=name type=string indexed=true stored=true/ field name=lat type=tdouble indexed=true stored=true/ field name=lon type=tdouble indexed=true stored=true/ copyField source=category dest=text/ copyField source=name dest=text/ I have completed a data import with no errors in the log as far as i can tell. However when i inspect the schema i do not see the columns names lat\lon. When sending the query: http://localhost:8080/Solr/select/?q=Camp AND _val_:recip(dist(2, lon, lat, 44.794, -93.2696), 1, 1, 0)^100 I get an error undefined column. Does anybody have any ideas about whether the above is the correct procedure for indexing spatial data? Cheers S -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-spatial-columns-tp2318493p2318493.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy in SOLR
Yes, i am not obliged to store taxonomies. My taxonomies are type of english_taxon_label = Berlin english_taxon_type = location english_taxon_hierarchy = 0/world 1/world/europe 2/world/europe/germany 3/world/europe/germany/berlin I need *_taxon_hierarchy to faceting and label to auto complete. With a RDBMs, i have 100 entry max for one taxo, but with solr and 4 million documents the redundandcy is huge, no ? And i have 10 different taxonomies per document Damien Le 24/01/2011 10:30, Em a écrit : Hi Damien, why are you storing the taxonomies? When it comes to faceting, it only depends on indexed values. If there is a meaningful difference between the indexed and the stored value, I would prefer to use an RDBMs or something like that to reduce redundancy. Does this help? Regards
Re: Taxonomy in SOLR
100 Entries per taxon? Well, with Solr you got 100 taxon-entries * 4mio docs * 10 taxons. If your indexed taxon-versions are looking okay, you could leave out the DB-overhead and could do everything in Solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2318550.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delta Import occasionally missing records.
Thank you for your response. In what way is 'timestamp' not perfect? I've looked into the SolrEntityProcessor and added a timestamp field to our index. However i'm struggling to work out a query to get the max value od the timestamp field and does the SolrEntityProcessor entity appear before the root entity or does it wrap around the root entity. On 22 January 2011 07:24, Lance Norskog-2 [via Lucene] ml-node+2307215-627680969-326...@n3.nabble.comml-node%2b2307215-627680969-326...@n3.nabble.com wrote: The timestamp thing is not perfect. You can instead do a search against Solr and find the latest timestamp in the index. SOLR-1499 allows you to search against Solr in the DataImportHandler. On Fri, Jan 21, 2011 at 2:27 AM, btucker [hidden email]http://user/SendEmail.jtp?type=nodenode=2307215i=0 wrote: Hello We've just started using solr to provide search functionality for our application with the DataImportHandler performing a delta-import every 1 fired by crontab, which works great, however it does occasionally miss records that are added to the database while the delta-import is running. Our data-config.xml has the following queries in its root entity: query=SELECT id, date_published, date_created, publish_flag FROM Item WHERE id 0 AND record_type_id=0 ORDER BY id DESC preImportDeleteQuery=SELECT item_id AS Id FROM gnpd_production.item_deletions deletedPkQuery=SELECT item_id AS id FROM gnpd_production.item_deletions WHERE deletion_date = SUBDATE('${dataimporter.last_index_time}', INTERVAL 5 MINUTE) deltaImportQuery=SELECT id, date_published, date_created, publish_flag FROM Item WHERE id 0 AND record_type_id=0 AND id=${dataimporter.delta.id} ORDER BY id DESC deltaQuery=SELECT id, date_published, date_created, publish_flag FROM Item WHERE id 0 AND record_type_id=0 AND sys_time_stamp = SUBDATE('${dataimporter.last_index_time}', INTERVAL 1 MINUTE) ORDER BY id DESC I think the problem i'm having comes from the way solr stores the last_index_time in conf/dataimport.properties as stated on the wiki as When delta-import command is executed, it reads the start time stored in conf/dataimport.properties. It uses that timestamp to run delta queries and after completion, updates the timestamp in conf/dataimport.properties. Which to me seems to indicate that any records with a time-stamp between when the dataimport starts and ends will be missed as the last_index_time is set to when it completes the import. This doesn't seem quite right to me. I would have expected the last_index_time to refer to when the dataimport was last STARTED so that there was no gaps in the timestamp covered. I changed the deltaQuery of our config to include the SUBDATE by INTERVAL 1 MINUTE statement to alleviate this problem, but it does only cover times when the delta-import takes less than a minute. Any ideas as to how this can be overcome? ,other than increasing the INTERVAL to something larger. Regards Barry Tucker -- View this message in context: http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2300877.htmlhttp://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2300877.html?by-user=t Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog [hidden email] http://user/SendEmail.jtp?type=nodenode=2307215i=1 -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2307215.html To unsubscribe from Delta Import occasionally missing records., click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2300877code=YnR1Y2tlckBtaW50ZWwuY29tfDIzMDA4Nzd8LTEzMDE5MDUxOTI=. font size=1 face=Verdana Mintel International Group Ltd | 18-19 Long Lane | London EC1A 9PL UK Registered in England: Number 1475918. | VAT Number: GB 232 9342 72 Contact details for our other offices can be found at http://www.mintel.com/office-locations. This email and any attachments may include content that is confidential, privileged, or otherwise protected under applicable law. Unauthorised disclosure, copying, distribution, or use of the contents is prohibited and may be unlawful. If you have received this email in error, including without appropriate authorisation, then please reply to the sender about the error and delete this email and any attachments./font -- View this message in context: http://lucene.472066.n3.nabble.com/Delta-Import-occasionally-missing-records-tp2300877p2318572.html Sent from the Solr - User mailing list archive at Nabble.com.
please help Problem with dataImportHandler
this is the error that i'm getting.. no idea of what is it.. /apache-solr-1.4.1/example/exampledocs# java -jar post.jar sample.txt SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file sample.txt SimplePostTool: FATAL: Solr returned an error: Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_information_on_what_may_be_wrong__If_you_want_solr_to_continue_after_configuration_errors_changeabortOnConfigurationErrorfalseabortOnConfigurationError__in_null___orgapachesolrhandlerdataimportDataImportHandlerException_Exception_occurred_while_initializing_context__at_orgapachesolrhandlerdataimportDataImporterloadDataConfigDataImporterjava190__at_orgapachesolrhandlerdataimportDataImporterinitDataImporterjava101__at_orgapachesolrhandlerdataimportDataImportHandlerinformDataImportHandlerjava113__at_orgapachesolrcoreSolrResourceLoaderinformSolrResourceLoaderjava508__at_orgapachesolrcoreSolrCoreinitSolrCorejava588__at_orgapachesolrcoreCoreContainer$InitializerinitializeCoreContainerjava137__at_orgapachesolrservletSolrDispatchFilterinitSolrDispatchFilterjava83__at_orgmortbayjettyservletFilterHolderdoStartFilterHolderjava99__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyservletServletHandlerinitializeServletHandlerjava594__at_orgmortbayjettyservletContextstartContextContextjava139__at_orgmortbayjettywebappWebAppContextstartContextWebAppContextjava1218__at_orgmortbayjettyhandlerContextHandlerdoStartContextHandlerjava500__at_orgmortbayjettywebappWebAppContextdoStartWebAppContextjava448__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbayjettyhandlerContextHandlerCollectiondoStartContextHandlerCollectionjava161__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhan root@karunya-desktop:/home/karunya/apache-solr-1.4.1/example/exampledocs# - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2318585.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: please help Problem with dataImportHandler
This may be a dumb question, but Is the xml encoded in UTF-8? On Mon, Jan 24, 2011 at 7:08 AM, Dinesh mdineshkuma...@karunya.edu.inwrote: this is the error that i'm getting.. no idea of what is it.. /apache-solr-1.4.1/example/exampledocs# java -jar post.jar sample.txt SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file sample.txt SimplePostTool: FATAL: Solr returned an error: Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_information_on_what_may_be_wrong__If_you_want_solr_to_continue_after_configuration_errors_changeabortOnConfigurationErrorfalseabortOnConfigurationError__in_null___orgapachesolrhandlerdataimportDataImportHandlerException_Exception_occurred_while_initializing_context__at_orgapachesolrhandlerdataimportDataImporterloadDataConfigDataImporterjava190__at_orgapachesolrhandlerdataimportDataImporterinitDataImporterjava101__at_orgapachesolrhandlerdataimportDataImportHandlerinformDataImportHandlerjava113__at_orgapachesolrcoreSolrResourceLoaderinformSolrResourceLoaderjava508__at_orgapachesolrcoreSolrCoreinitSolrCorejava588__at_orgapachesolrcoreCoreContainer$InitializerinitializeCoreContainerjava137__at_orgapachesolrservletSolrDispatchFilterinitSolrDispatchFilterjava83__at_orgmortbayjettyservletFilterHolderdoStartFilterHolderjava99__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyservletServletHandlerinitializeServletHandlerjava594__at_orgmortbayjettyservletContextstartContextContextjava139__at_orgmortbayjettywebappWebAppContextstartContextWebAppContextjava1218__at_orgmortbayjettyhandlerContextHandlerdoStartContextHandlerjava500__at_orgmortbayjettywebappWebAppContextdoStartWebAppContextjava448__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbayjettyhandlerContextHandlerCollectiondoStartContextHandlerCollectionjava161__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhan root@karunya-desktop:/home/karunya/apache-solr-1.4.1/example/exampledocs# - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2318585.html Sent from the Solr - User mailing list archive at Nabble.com. -- __ Ezequiel. Http://www.ironicnet.com
Re: please help Problem with dataImportHandler
actually its a log file i seperately created an handler for that... its not XML - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2318617.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy in SOLR
Thanks Em, How i can calculate index time, update time and space disk used by one taxonomy ? Le 24/01/2011 10:58, Em a écrit : 100 Entries per taxon? Well, with Solr you got 100 taxon-entries * 4mio docs * 10 taxons. If your indexed taxon-versions are looking okay, you could leave out the DB-overhead and could do everything in Solr.
How data is replicating from Master to Slave?
Hi, I'm currently facing an issue with SOLR (exactly with the slaves replication) and after having spent quite a few time reading online I find myself having to ask for some enlightenment. To be more factual, here is the context that led me to this question. If the website administrator edited an existing category name, then I need to re-index all the documents with the newly edited category. Suppose the category is linked with more than 10 million records.I need to re-index all the 10 million documents in SOLR In the case of MySQL it should be like master server writes updates to its binary log files and maintains an index of those files.These binary log files serve as a record of updates to be sent to slave servers. My doubt is in SOLR how the data is replicating from Master to Slave? I'd like to know the internal process of data replication. Is that huge amount of data(10 million records) is copying from Master to slave? This is my first work with Solr. So I'm not sure how to tackle this issue. Regds dhanesh s.r
fieldType textgen. tokens 2
Hello. my field sender with fieldType=textgen cannot find any documents wich are more than 2 tokens long. -q=sender:name1 name2 name3 = 0 Documents found WHY ??? that is my field (original from default schema.xml) fieldType name=textgen class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 4GB Xmx - Solr2 for Update-Request - delta every 2 Minutes - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/fieldType-textgen-tokens-2-tp2318775p2318775.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fieldType textgen. tokens 2
This is not the fieldType but your query that is giving you trouble. You only specify fieldName for value name1, so Solr will use defaultField for values name2 and name3. You also omitted an operator, so Solr will use defaultOperator instead. See you schema.xml for the values for the defaults and use debugQuery=true to, well, debug queries. On Monday 24 January 2011 11:48:07 stockii wrote: Hello. my field sender with fieldType=textgen cannot find any documents wich are more than 2 tokens long. -q=sender:name1 name2 name3 = 0 Documents found WHY ??? that is my field (original from default schema.xml) fieldType name=textgen class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 4GB Xmx - Solr2 for Update-Request - delta every 2 Minutes - 4GB Xmx -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: How data is replicating from Master to Slave?
It's all explained on the wiki: http://wiki.apache.org/solr/SolrReplication#How_does_the_slave_replicate.3F On Monday 24 January 2011 11:25:45 dhanesh wrote: Hi, I'm currently facing an issue with SOLR (exactly with the slaves replication) and after having spent quite a few time reading online I find myself having to ask for some enlightenment. To be more factual, here is the context that led me to this question. If the website administrator edited an existing category name, then I need to re-index all the documents with the newly edited category. Suppose the category is linked with more than 10 million records.I need to re-index all the 10 million documents in SOLR In the case of MySQL it should be like master server writes updates to its binary log files and maintains an index of those files.These binary log files serve as a record of updates to be sent to slave servers. My doubt is in SOLR how the data is replicating from Master to Slave? I'd like to know the internal process of data replication. Is that huge amount of data(10 million records) is copying from Master to slave? This is my first work with Solr. So I'm not sure how to tackle this issue. Regds dhanesh s.r -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: fieldType textgen. tokens 2
that is my query: q=sender:name1+name2+name3 exaclty the request is: q=sender:(name1+name2+name3+OR+sender_2:name1+name2+name3) so solr is using another field for name2 and name3 ? debugging cannot help me, or i dont understand the debugging ... when i search only for name1 + name2 search is okay. but with name3 not ... in my test-enironment i used the same fieldType but it works fine... - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 4GB Xmx - Solr2 for Update-Request - delta every 2 Minutes - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/fieldType-textgen-tokens-2-tp2318775p2318865.html Sent from the Solr - User mailing list archive at Nabble.com.
Migrating from 1.4.0 to 1.4.1 solr
Hi, I want to migrate from 1.4.0 to 1.4.1 . Tried keeping the same conf for the cores as in 1.4.0, added the relevant core names in solr.xml and restarted solr but the old cores dont show up on the browser localhost:8983. There were a few cores in examples/multicore/ in the solr1.4.1 source from where I downloaded, these cores when included in solr.xml do show up on the browser. Pl do let me know the reason. Is there anything I need to do for the core migration? I dont have any data in these cores. Also if there was data is there a nice way of migrating from 1.4.0 to 1.4.1 (Which does not involve reindexing) ? Regards, Prasad
Re: Migrating from 1.4.0 to 1.4.1 solr
We can't guess what's wrong with the cores but you need to reindex anyway: http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/CHANGES.txt On Monday 24 January 2011 12:06:10 Prasad Joshi wrote: Hi, I want to migrate from 1.4.0 to 1.4.1 . Tried keeping the same conf for the cores as in 1.4.0, added the relevant core names in solr.xml and restarted solr but the old cores dont show up on the browser localhost:8983. There were a few cores in examples/multicore/ in the solr1.4.1 source from where I downloaded, these cores when included in solr.xml do show up on the browser. Pl do let me know the reason. Is there anything I need to do for the core migration? I dont have any data in these cores. Also if there was data is there a nice way of migrating from 1.4.0 to 1.4.1 (Which does not involve reindexing) ? Regards, Prasad -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Solr with Unknown Lucene Index?
Having found some code that searches a Lucene index, the only analyzers referenced are Lucene.Net.Analysis.Standard.StandardAnalyzer. How can I map this is Solr? The example schema doesn't seem to mention this, and specifying 'text' or 'string' for every field doesn't seem to help. Thanks Lee On 22/01/2011 21:50, Erick Erickson wrote: Sorry, I was out of town for a while. Luke just reads stuff, it doesn't try to interpret any schema. Solr makes certain assumptions about what *should* be in the index based on the schema. So getting Solr to just use a Lucene index really involves knowing that Lucene used, say, a StandardAnalyzer followed by a LowerCaseFilter followed by for some field And there's no way I know of to find that information out from a raw Lucene index. If you don't get things to match, your results will...er...vary. But perhaps you can guess well enough to make it work, although upgrading will be a problem. I really think your effort would be best spent finding the original indexing or querying code if at all possible and seeing the way that code defined the analysis chain (in the code) for each fields and using that as a basis for creating a close enough schema. Best Erick On Thu, Jan 20, 2011 at 3:59 AM, Lee Goddard lee...@gmail.com mailto:lee...@gmail.com wrote: Thanks, Erick. I think my question comes down to, 'how does Luke know how to read the indexes?' I will try the Luke mailing list. Cheers Lee On 19/01/2011 17:49, Erick Erickson wrote: I don't really think this is possible/reasonable. There's nothing fixed about a Lucene index, you could index a field in different documents with any number of analysis chains. The tricky part here will, as you've discovered, find a way to match the Solr schema closely enough to get your desired results. Are you sure there's no way to re-index the data? Or find the original code that indexed it? Best Erick On Wed, Jan 19, 2011 at 3:22 AM, Lee Goddard lee...@gmail.com mailto:lee...@gmail.com wrote: I have to use some Lucene indexes, and Solr looks like the perfect solution. However, all I know about the Lucene indexes are what Luke tells me, and simply setting the schema to represent all fields as text does not seem to be working -- though as this is my first Solr, I am not sure if that is due to some other issue. Is there some way to ascertain how the Solr schema should describe the Lucene fields? Many thanks in anticipation Lee
Re: Taxonomy in SOLR
Hi Daniem, ahm, the formula I wrote was no definitive guide, just some numbers I combined to visualize the amount of data - perhaps not even a complete formula. Well, when you can use your taxonomy as indexed-only you do not double the used disk space when you are indexing two equal documents. Lucene - and also Solr - are working with an inverted index: This means every document is mapped against its indexed terms. So your index-size will depend on the number of unique taxonomy-terms and the pointers of the documents to these terms. That's it. Usually the used disk-space for an index is much smaller than the size of the original data. I hope what I tried to explain was easy to understand. Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2319202.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: please help Problem with dataImportHandler
And what the logs says about it? On Mon, Jan 24, 2011 at 7:15 AM, Dinesh mdineshkuma...@karunya.edu.inwrote: actually its a log file i seperately created an handler for that... its not XML - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2318617.html Sent from the Solr - User mailing list archive at Nabble.com. -- __ Ezequiel. Http://www.ironicnet.com
Re: please help Problem with dataImportHandler
its a DHCP log.. i want ti index it - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2319627.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy in SOLR
Le 24/01/2011 13:10, Em a écrit : Hi Daniem, ahm, the formula I wrote was no definitive guide, just some numbers I combined to visualize the amount of data - perhaps not even a complete formula. Well, when you can use your taxonomy as indexed-only you do not double the used disk space when you are indexing two equal documents. So, five document or 4 mi with the same taxonomy are equal in using disk space to one ? Lucene - and also Solr - are working with an inverted index: This means every document is mapped against its indexed terms. So your index-size will depend on the number of unique taxonomy-terms and the pointers of the documents to these terms. That's it. Usually the used disk-space for an index is much smaller than the size of the original data. I hope what I tried to explain was easy to understand. Thanks, it's very helpfull ! How i can find more explaination on the internal structure of the Lucene indexer ? Damien
Re: please help Problem with dataImportHandler
I mean, when you run the DIH, what's the output of the Solr Log ? Probably there is more info about whats happening... On Mon, Jan 24, 2011 at 10:28 AM, Dinesh mdineshkuma...@karunya.edu.inwrote: its a DHCP log.. i want ti index it - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2319627.html Sent from the Solr - User mailing list archive at Nabble.com. -- __ Ezequiel. Http://www.ironicnet.com
Re: Indexing spatial columns
Hi MapButcher, There are a couple things that are going on here. 1. The spatial functionality is confusing between versions of Solr. I wish someone would update the solr Spatial Search wiki page. 2. You will want to use the jTDS Driver here instead of the one from Microsoft. http://jtds.sourceforge.net/ It works a little better. 3. For Solr 4.0 you will basically have to concatenate the lat/long fields in to a single column which in the example schema is called store 4. I don't know if individual columns actually exist for latitude and longitude in 4.0 but in 1.4.x I know the lat/long type HAD to be called lat and lng and had to be tdouble type which I see below. 5. Revert back to Solr 1.4.x and try using their plugin http://www.jteam.nl/news/spatialsolr.html 6. Try your queries in the Solr admin tool first before trying to integrate this in to your code. Overall, I have had great success with Solr Spatial in just doing a simple radius search. I am using the core 4.0 functionality and am having no problems. I will eventually get in to distance and bounding box queries do ehstever you figure out and share would be great! Good luck, Adam On Jan 24, 2011, at 4:46 AM, mapbutcher wrote: Hi, I'm a bit of a solr beginner. I have installed Solr 4.0 and I'm trying to index some spatial data stored in a sql server instance. I'm using the DataImportHandler here is my data-comfig.xml: dataConfig dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost\sqlserver08;databaseName=Spatial user=sa password=sqlserver08/ document entity name=poi query=select OBJECTID,CATEGORY,NAME,POINT_X,POINT_Y from NZ_POI field column=OBJECTID name=id/ field column=CATEGORY name=category/ field column=NAME name=name/ field column=POINT_X name=lat/ field column=POINT_Y name=lon/ /entity /document /dataConfig In my schema file I have following definition: field name=category type=string indexed=true stored=true/ field name=name type=string indexed=true stored=true/ field name=lat type=tdouble indexed=true stored=true/ field name=lon type=tdouble indexed=true stored=true/ copyField source=category dest=text/ copyField source=name dest=text/ I have completed a data import with no errors in the log as far as i can tell. However when i inspect the schema i do not see the columns names lat\lon. When sending the query: http://localhost:8080/Solr/select/?q=Camp AND _val_:recip(dist(2, lon, lat, 44.794, -93.2696), 1, 1, 0)^100 I get an error undefined column. Does anybody have any ideas about whether the above is the correct procedure for indexing spatial data? Cheers S -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-spatial-columns-tp2318493p2318493.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy in SOLR
Just for illustration: This is your original data: doc1 : hello world doc2: hello daniem doc3: hello pal Now, Lucene produces something like this from the input: hello: id_doc1,id_doc2,id_doc3 daniem: id_doc2 pal: id_doc3 Well, it's more complex, but enough for illustration. As you can see, the representation of a document is completly different. A document costs only a few bytes for a Lucene-internal id per word. If words occur more than one time per document AND you do not store termVectors, Lucene just adds the number of occurences per word per doc to its index: hello: id_doc1[1],id_doc2[1],id_doc3[1] daniem: id_doc2[1] pal: id_doc3[1] Imagine what happens to longer texts where especially stopwords or important words occur more than one time. I would suggest to start with the Lucene-Wiki, if you want to learn more about Lucene. Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2319920.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fieldType textgen. tokens 2
You need to get more familiar with debugging, spending the time on it is well worth the effort. But assuming the '+' in your pasted query are really URL-encoded spaces your syntax is really confused. sender:(name1 name2 name3 OR sender_2:name1 name2 name3) It *looks* like you intend something like sender:(name1 name2 name3) OR sender_2:(name1 name2 name3) note the added parentheses. Best Erick On Mon, Jan 24, 2011 at 6:04 AM, stockii stock.jo...@googlemail.com wrote: that is my query: q=sender:name1+name2+name3 exaclty the request is: q=sender:(name1+name2+name3+OR+sender_2:name1+name2+name3) so solr is using another field for name2 and name3 ? debugging cannot help me, or i dont understand the debugging ... when i search only for name1 + name2 search is okay. but with name3 not ... in my test-enironment i used the same fieldType but it works fine... - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 4GB Xmx - Solr2 for Update-Request - delta every 2 Minutes - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/fieldType-textgen-tokens-2-tp2318775p2318865.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fieldType textgen. tokens 2
i got this query from the mailing list. but i found the problem. wrong query. i dont know why i construct my query so ... =( but thanks for your help =) - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 4GB Xmx - Solr2 for Update-Request - delta every 2 Minutes - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/fieldType-textgen-tokens-2-tp2318775p2320121.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: one last questoni on dynamic fields
Yes, you can =) Prefix Suffix, both is working fine On Sun, Jan 23, 2011 at 9:54 PM, Geert-Jan Brits gbr...@gmail.com wrote: Yep you can. Although I'm not sure you can use a wildcard-prefix. (perhaps you can I'm just not sure) . I always use wildcard-suffixes. Cheers, Geert-Jan 2011/1/23 Dennis Gearon gear...@sbcglobal.net Is it possible to use ONE definition of a dynamic field type for inserting mulitple dynamic fields of that type with different names? Or do I need a seperate dynamic field definition for each eventual field? Can I do this? in schema.xml field name=ALL_OTHER_STANDARD_FILEDS type=OTHER_TYPES indexed=SOME_TIMES stored=USUALLY/ dynamicField name=*_i type=intindexed=true stored=true/ . . /in schema.xml and then doing for insert add doc field name=ALL_OTHER_STANDARD_FILEDSall their valuesfield field name=customA_i9802490824908field field name=customB_i9809084field field name=customC_i09845970011field field name=customD_i09874523459870field /doc /add Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Re: Taxonomy in SOLR
First, the redundancy is certainly there, but that's what Solr does, handles large amounts of data. 4 million documents is actually a pretty small corpus by Solr standards, so you may well be able to do exactly what you propose with acceptable performance/size. I'd advise just trying it with, say, 200,000 docs. Why 200K? because index growth is non-linear with the first bunch of documents taking up more space than the second. So index 100K, examine your indexes and index 100K more. Now use the delta to extrapolate to 4M. You don't need to store the taxonomy in each doc for auto-complete, you can get your auto-completion from a different index. Or you can index your taxonomies in a special document in Solr and query the (unique) field in that document for autocomplete. For faceting, you do need taxonomies. But remember that the nature of the inverted index is that unique terms are only stored once, and the document ID for each document that that term appears in is recorded. So if you have 3/europe/germany/berlin stored in 1M documents, your index space is really string length + overhead + space for 1M ids. Best Erick On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine dfonta...@rosebud.frwrote: Yes, i am not obliged to store taxonomies. My taxonomies are type of english_taxon_label = Berlin english_taxon_type = location english_taxon_hierarchy = 0/world 1/world/europe 2/world/europe/germany 3/world/europe/germany/berlin I need *_taxon_hierarchy to faceting and label to auto complete. With a RDBMs, i have 100 entry max for one taxo, but with solr and 4 million documents the redundandcy is huge, no ? And i have 10 different taxonomies per document Damien Le 24/01/2011 10:30, Em a écrit : Hi Damien, why are you storing the taxonomies? When it comes to faceting, it only depends on indexed values. If there is a meaningful difference between the indexed and the stored value, I would prefer to use an RDBMs or something like that to reduce redundancy. Does this help? Regards
Re: searching based on grouping result
Hi Thanks for the response. I didn't explain myself well, I am using the field collapsing and things are working as that page describes. I think my problem is that as well as field collapsing works, solr is still just returning a list of documents. There don't seem to be any operations I can do on collapsed groups as a whole. They are more of a display thing that can't be referenced in the query. Same thing with facets? Am I right in this? steve thansk again steve On Jan 22, 2011, at 12:53 AM, Otis Gospodnetic wrote: Steve, Does http://wiki.apache.org/solr/FieldCollapsing do what you need? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Steve Fuchs st...@aps.org To: solr-user@lucene.apache.org Sent: Fri, January 21, 2011 3:05:32 PM Subject: searching based on grouping result Hello All, MY index documents represent a set of papers each with an author id and the id of the referee that reviewed the paper. I also end up with a field in each document that tells me whether the referee still has the paper, but has not graded it. THis can be a boolean. In my final result I want to collapse the result by referee number and omit any referee that has this boolean true, it doesn't matter how many documents they have with the field set to false. Is there a way to set my query to honor the results of the grouping (or of a facet?) as in q: -referee_number.open_flag:* ? Thanks in advance. steve
Re: Multicore Relaod Theoretical Question
Em, that's correct. You can use 'lsof' to see file handles still in use. See http://0xfe.blogspot.com/2006/03/troubleshooting-unix-systems-with-lsof.html, Recipe #11. -Alexander On Sun, Jan 23, 2011 at 1:52 AM, Em mailformailingli...@yahoo.de wrote: Hi Alexander, thank you for your response. You said that the old index files were still in use. That means Linux does not *really* delete them until Solr frees its locks from it, which happens while reloading? Thank you for sharing your experiences! Kind regards, Em Alexander Kanarsky wrote: Em, yes, you can replace the index (get the new one into a separate folder like index.new and then rename it to the index folder) outside the Solr, then just do the http call to reload the core. Note that the old index files may still be in use (continue to serve the queries while reloading), even if the old index folder is deleted - that is on Linux filesystems, not sure about NTFS. That means the space on disk will be freed only when the old files are not referenced by Solr searcher any longer. -Alexander On Sat, Jan 22, 2011 at 1:51 PM, Em mailformailingli...@yahoo.de wrote: Hi Erick, thanks for your response. Yes, it's really not that easy. However, the target is to avoid any kind of master-slave-setup. The most recent idea i got is to create a new core with a data-dir pointing to an already existing directory with a fully optimized index. Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2310709.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2312778.html Sent from the Solr - User mailing list archive at Nabble.com.
Faceting Question
I am attempting to do facets on products similar to how hayneedle does it on their online stores (they do NOT use Solr). See: http://www.clockstyle.com/wall-clocks/antiqued/1359+1429+4294885075.cfm So simple example, my left nav might contain categories and 2 attributes, brand and capacity: Categories - Cat1 (23) selected - Cat2 (16) - Cat3 (5) Brand -Brand1 (18) -Brand2 (10) -Brand3 (0) Capacity -Capacity1 (14) -Capacity2 (9) Each category or attribute value is represented with a checkbox and can be selected or deselected. The initial entry into this page has one category selected. Other categories can be selected which might change the number of products related to each attribute value. The number of products in each category never changes. I should also be able to select one or more attribute. Logically this would look something like: (Cat1 Or Cat2) AND (Value1 OR Value2) AND (Value4) Behind the scenes I have each category and attribute value represented by a tag, which is just a numeric value. So I search on the tags field only and then facet on category, brand and capacity fields which are stored separately. My current Solr query ends up looking something like: fq={!tag=tag1}tags:( |1003| |1007|) AND tags:( |10015|)version=2.2start=0rows=10indent=onfacet=onfacet.field={!ex=tag1}categoryfacet.field=capacityfacet.field=brand This shows 2 categories being selected (1003 and 1007) and one attribute value (10015). This partially works - the categories work fine. The problem is, if I select, say a brand attribute (as in the above example the 10015 tag) it does filter to the selected categories AND the selected attribute BUT I'm not able to broaden the search by selecting another attribute value. I want to display of products to be filtered to what I select, but I want to be able to broaden the filter without having to back up. I feel like I'm close but still missing something. Is there a way to specify 2 tags that should be excluded from facet fields? I hope this example makes sense. Any help greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-Question-tp2320542p2320542.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy in SOLR
Thanks Em and Erick for your answers, Now, i better understand functioning of Solr. Damien Le 24/01/2011 16:23, Erick Erickson a écrit : First, the redundancy is certainly there, but that's what Solr does, handles large amounts of data. 4 million documents is actually a pretty small corpus by Solr standards, so you may well be able to do exactly what you propose with acceptable performance/size. I'd advise just trying it with, say, 200,000 docs. Why 200K? because index growth is non-linear with the first bunch of documents taking up more space than the second. So index 100K, examine your indexes and index 100K more. Now use the delta to extrapolate to 4M. You don't need to store the taxonomy in each doc for auto-complete, you can get your auto-completion from a different index. Or you can index your taxonomies in a special document in Solr and query the (unique) field in that document for autocomplete. For faceting, you do need taxonomies. But remember that the nature of the inverted index is that unique terms are only stored once, and the document ID for each document that that term appears in is recorded. So if you have 3/europe/germany/berlin stored in 1M documents, your index space is really string length + overhead +space for 1M ids. Best Erick On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontainedfonta...@rosebud.frwrote: Yes, i am not obliged to store taxonomies. My taxonomies are type of english_taxon_label = Berlin english_taxon_type = location english_taxon_hierarchy = 0/world 1/world/europe 2/world/europe/germany 3/world/europe/germany/berlin I need *_taxon_hierarchy to faceting and label to auto complete. With a RDBMs, i have 100 entry max for one taxo, but with solr and 4 million documents the redundandcy is huge, no ? And i have 10 different taxonomies per document Damien Le 24/01/2011 10:30, Em a écrit : Hi Damien, why are you storing the taxonomies? When it comes to faceting, it only depends on indexed values. If there is a meaningful difference between the indexed and the stored value, I would prefer to use an RDBMs or something like that to reduce redundancy. Does this help? Regards
Re: Taxonomy in SOLR
Hi Erick, in some usecases I really think that your suggestion with some unique-documents for meta-information is a good approach to solve some issues. However there is a hurdle for me and maybe you can help me to clear it: What is the best way to get such meta-data? I see three possible approaches: 1st: get it in another request 2nd: get it with a requestHandler 3rd: get it with a searchComponent I think the 2nd and 3rd are the cleanest ways. But to make a decision between them I run into two problems: RequestHandler: Should I extend the StandardRequestHandler to do what I need? If so, I could just query my index for the needed information and add it to the request before I pass it up the SearchComponents. SearchComponent: The problem with the SearchComponent is the distributed thing and how to test it. However, if this would be the cleanest way to go, one should go it. What would you do, if you want to add some meta-information to your request that was not given by the user? Regards, Em Erick Erickson wrote: First, the redundancy is certainly there, but that's what Solr does, handles large amounts of data. 4 million documents is actually a pretty small corpus by Solr standards, so you may well be able to do exactly what you propose with acceptable performance/size. I'd advise just trying it with, say, 200,000 docs. Why 200K? because index growth is non-linear with the first bunch of documents taking up more space than the second. So index 100K, examine your indexes and index 100K more. Now use the delta to extrapolate to 4M. You don't need to store the taxonomy in each doc for auto-complete, you can get your auto-completion from a different index. Or you can index your taxonomies in a special document in Solr and query the (unique) field in that document for autocomplete. For faceting, you do need taxonomies. But remember that the nature of the inverted index is that unique terms are only stored once, and the document ID for each document that that term appears in is recorded. So if you have 3/europe/germany/berlin stored in 1M documents, your index space is really string length + overhead + space for 1M ids. Best Erick On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine dfonta...@rosebud.frwrote: Yes, i am not obliged to store taxonomies. My taxonomies are type of english_taxon_label = Berlin english_taxon_type = location english_taxon_hierarchy = 0/world 1/world/europe 2/world/europe/germany 3/world/europe/germany/berlin I need *_taxon_hierarchy to faceting and label to auto complete. With a RDBMs, i have 100 entry max for one taxo, but with solr and 4 million documents the redundandcy is huge, no ? And i have 10 different taxonomies per document Damien Le 24/01/2011 10:30, Em a écrit : Hi Damien, why are you storing the taxonomies? When it comes to faceting, it only depends on indexed values. If there is a meaningful difference between the indexed and the stored value, I would prefer to use an RDBMs or something like that to reduce redundancy. Does this help? Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2320666.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting Question
fq={!tag=tag1}tags:( |1003| |1007|) AND tags:( |10015|)version=2.2start=0rows=10indent=onfacet=onfacet.field={!ex=tag1}categoryfacet.field=capacityfacet.field=brand I'm just guessing here, but perhaps {!tag=tag1} is only picking up the 'tags:( |1003| |1007|) '-part. If so {!ex=tag1} would only exclude 'tags:( |1003| |1007|) ' but it wouldn't exclude ' tags:( |10015|)' I believe this would 100% explain what you're seeing. Assuming my guess is correct you could try to a couple of things (none of which I'm absolutely certain will work, but you could try it out easily): 1. put fq in quotes: fq={!tag=tag1}tags:( |1003| |1007|) AND tags:(|10015|) -- this might instruct {!tag=tag1} to tag the whole fq-filter. 2. make multiple fq's, and exclude them all (not sure if you can exclude multiple fields): fq={!tag=tag1}tags:( |1003| |1007|)fq={!tag=tag2}tags:( |10015|)facet.field={!ex=tag1,tag2}category... hth, Geert-Jan 2011/1/24 beaviebugeater mbro...@cox.net I am attempting to do facets on products similar to how hayneedle does it on their online stores (they do NOT use Solr). See: http://www.clockstyle.com/wall-clocks/antiqued/1359+1429+4294885075.cfm So simple example, my left nav might contain categories and 2 attributes, brand and capacity: Categories - Cat1 (23) selected - Cat2 (16) - Cat3 (5) Brand -Brand1 (18) -Brand2 (10) -Brand3 (0) Capacity -Capacity1 (14) -Capacity2 (9) Each category or attribute value is represented with a checkbox and can be selected or deselected. The initial entry into this page has one category selected. Other categories can be selected which might change the number of products related to each attribute value. The number of products in each category never changes. I should also be able to select one or more attribute. Logically this would look something like: (Cat1 Or Cat2) AND (Value1 OR Value2) AND (Value4) Behind the scenes I have each category and attribute value represented by a tag, which is just a numeric value. So I search on the tags field only and then facet on category, brand and capacity fields which are stored separately. My current Solr query ends up looking something like: fq={!tag=tag1}tags:( |1003| |1007|) AND tags:( |10015|)version=2.2start=0rows=10indent=onfacet=onfacet.field={!ex=tag1}categoryfacet.field=capacityfacet.field=brand This shows 2 categories being selected (1003 and 1007) and one attribute value (10015). This partially works - the categories work fine. The problem is, if I select, say a brand attribute (as in the above example the 10015 tag) it does filter to the selected categories AND the selected attribute BUT I'm not able to broaden the search by selecting another attribute value. I want to display of products to be filtered to what I select, but I want to be able to broaden the filter without having to back up. I feel like I'm close but still missing something. Is there a way to specify 2 tags that should be excluded from facet fields? I hope this example makes sense. Any help greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-Question-tp2320542p2320542.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy in SOLR
I wasn't thinking about this for adding information to the *request*. Rather, in this case the autocomplete uses an Ajax call that just uses the TermsComponent to get the autocomplete data and display it. This is just textual, so adding it to the request is client-side magic. If you want your app to have access to the meta-data for other purposes, you'd just query and cache it from the app. You could use that to build up the links you embed in the page for new queries if you chose, no custom handlers necessary. Otherwise, I guess you'd create a custom request handler, that seems like a reasonable place. Best Erick On Mon, Jan 24, 2011 at 11:03 AM, Em mailformailingli...@yahoo.de wrote: Hi Erick, in some usecases I really think that your suggestion with some unique-documents for meta-information is a good approach to solve some issues. However there is a hurdle for me and maybe you can help me to clear it: What is the best way to get such meta-data? I see three possible approaches: 1st: get it in another request 2nd: get it with a requestHandler 3rd: get it with a searchComponent I think the 2nd and 3rd are the cleanest ways. But to make a decision between them I run into two problems: RequestHandler: Should I extend the StandardRequestHandler to do what I need? If so, I could just query my index for the needed information and add it to the request before I pass it up the SearchComponents. SearchComponent: The problem with the SearchComponent is the distributed thing and how to test it. However, if this would be the cleanest way to go, one should go it. What would you do, if you want to add some meta-information to your request that was not given by the user? Regards, Em Erick Erickson wrote: First, the redundancy is certainly there, but that's what Solr does, handles large amounts of data. 4 million documents is actually a pretty small corpus by Solr standards, so you may well be able to do exactly what you propose with acceptable performance/size. I'd advise just trying it with, say, 200,000 docs. Why 200K? because index growth is non-linear with the first bunch of documents taking up more space than the second. So index 100K, examine your indexes and index 100K more. Now use the delta to extrapolate to 4M. You don't need to store the taxonomy in each doc for auto-complete, you can get your auto-completion from a different index. Or you can index your taxonomies in a special document in Solr and query the (unique) field in that document for autocomplete. For faceting, you do need taxonomies. But remember that the nature of the inverted index is that unique terms are only stored once, and the document ID for each document that that term appears in is recorded. So if you have 3/europe/germany/berlin stored in 1M documents, your index space is really string length + overhead + space for 1M ids. Best Erick On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine dfonta...@rosebud.frwrote: Yes, i am not obliged to store taxonomies. My taxonomies are type of english_taxon_label = Berlin english_taxon_type = location english_taxon_hierarchy = 0/world 1/world/europe 2/world/europe/germany 3/world/europe/germany/berlin I need *_taxon_hierarchy to faceting and label to auto complete. With a RDBMs, i have 100 entry max for one taxo, but with solr and 4 million documents the redundandcy is huge, no ? And i have 10 different taxonomies per document Damien Le 24/01/2011 10:30, Em a écrit : Hi Damien, why are you storing the taxonomies? When it comes to faceting, it only depends on indexed values. If there is a meaningful difference between the indexed and the stored value, I would prefer to use an RDBMs or something like that to reduce redundancy. Does this help? Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2320666.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH serialize
Hi Rich, i'm a bit confused after reading your post .. what exactly you wanna try to achieve? Serializing (like http://php.net/serialize) your complete row into one field? Don't wanna search in them, just store and deliver them in your results? Does that make sense? Sounds a bit strange :) Regards Stefan On Mon, Jan 24, 2011 at 10:03 AM, Papp Richard ccode...@gmail.com wrote: Hi Dennis, thank you for your answer, but didn't understand why you say it doesn't need serialization. I'm with the option C. but the main question is, how to put into one field a result of many fields: SELECT * FROM. thanks, Rich -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Monday, January 24, 2011 02:07 To: solr-user@lucene.apache.org Subject: Re: DIH serialize Depends on your process chain to the eventual viewer/consumer of the data. The questions to ask are: A/ Is the data IN Solr going to be viewed or processed in its original form: --set stored = 'true' ---no serialization needed. B/ If it's going to be anayzed and searched for separate from any other field, the analyzing will put it into an unreadable form. If you need to see it, then ---set indexed=true and stored=true ---no serializaton needed. C/ If it's NOT going to be viewed AS IS, and it's not going to be searched for AS IS, (i.e. other columns will be how the data is found), and you have another, serialzable format: --set indexed=false and stored=true --serialize AS PER THE INTENDED APPLICATION, not sure that Solr can do that at all. C/ If it's NOT going to be viewed AS IS, and it's not going to be searched for AS IS, (i.e. other columns will be how the data is found), and you have another, serialzable format: --set indexed=false and stored=true --serialize AS PER THE INTENDED APPLICATION, not sure that Solr can do that at all. D/ If it's NOT going to be viewed AS IS, BUT it's going to be searched for AS IS, (this column will be how the data is found), and you have another, serialzable format: --you need to put it into TWO columns --A SERIALIZED FIELD --set indexed=false and stored=true --AN UNSERIALIZED FIELD --set indexed=false and stored=true --serialize AS PER THE INTENDED APPLICATION, not sure that Solr can do that at all. Hope that helps! Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Papp Richard ccode...@gmail.com To: solr-user@lucene.apache.org Sent: Sun, January 23, 2011 2:02:05 PM Subject: DIH serialize Hi all, I wasted the last few hours trying to serialize some column values (from mysql) into a Solr column, but I just can't find such a function. I'll use the value in PHP - I don't know if it is possible to serialize in PHP style at all. This is what I tried and works with a given factor: in schema.xml: field name=main_timetable type=text indexed=false stored=true multiValued=true / in DIH xml: dataConfig script![CDATA[ function my_serialize(row) { row.put('main_timetable', row.toString()); return row; } ]]/script . entity name=main_timetable query= SELECT * FROM shop_time_table stt WHERE stt.shop_id = '${shop.id }'; transformer=script:my_serialize . Can I use java directly in script (script language=Java) ? How could I achieve this? Or any other idea? I need these values together (from a row) and I need then in PHP to handle the result easily. thanks, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5740 (20101228) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 5740 (20101228) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Re: searching based on grouping result
Steve, and what exactly do you expect? You can work on the Group itself with http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters in a limited way, but of course it's just a normal Solr-Result, group by some Values, nothing really special. Can't be referenced in the query - what do you want to do there? Regards Stefan On Mon, Jan 24, 2011 at 4:27 PM, Steve Fuchs st...@aps.org wrote: Hi Thanks for the response. I didn't explain myself well, I am using the field collapsing and things are working as that page describes. I think my problem is that as well as field collapsing works, solr is still just returning a list of documents. There don't seem to be any operations I can do on collapsed groups as a whole. They are more of a display thing that can't be referenced in the query. Same thing with facets? Am I right in this? steve thansk again steve On Jan 22, 2011, at 12:53 AM, Otis Gospodnetic wrote: Steve, Does http://wiki.apache.org/solr/FieldCollapsing do what you need? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Steve Fuchs st...@aps.org To: solr-user@lucene.apache.org Sent: Fri, January 21, 2011 3:05:32 PM Subject: searching based on grouping result Hello All, MY index documents represent a set of papers each with an author id and the id of the referee that reviewed the paper. I also end up with a field in each document that tells me whether the referee still has the paper, but has not graded it. THis can be a boolean. In my final result I want to collapse the result by referee number and omit any referee that has this boolean true, it doesn't matter how many documents they have with the field set to false. Is there a way to set my query to honor the results of the grouping (or of a facet?) as in q: -referee_number.open_flag:* ? Thanks in advance. steve
RE: help integrating katta with solr
Hi Otis, I was implementing Katta because I discovered it before Solr Cloud. Before replying to your email, I took some time to go through the examples on the solr cloud wiki. The examples worked without any issue for me and I now have a better understanding of what solr cloud is offering. My experience with it so far is good. It seems to me that Solr Cloud and Katta both offer failover using zookeeper, load balancing, and easier shard deployment and shard searching. These are all important issues for my company and I as we have many sharded indexes. We are always looking for ways to simplify and shorten the time it takes to index, deploy, maintain, and trouble shoot those sharded collections. A major difference I see between the between the two is that Katta relies on Hadoop HDFS for storage whereas solr cloud has no such dependence. I still would like to integrate Katta into Solr. If for no other reason than to complete a task that I set out to do. Also, it would be nice to explore its differences from solr cloud, giving us a choice in which solution to implement. So, I am still looking for some assistance integrating Katta with Solr. :-) Thanks, Jerry -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Saturday, January 22, 2011 12:52 AM To: solr-user@lucene.apache.org Subject: Re: help integrating katta with solr Hi Jerry, Sorry, not a direct answer, but why Katta? Why nor SolrCloud (i.e. trunk) instead? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Jerry Mindek jerry.min...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, January 21, 2011 4:37:12 PM Subject: help integrating katta with solr Hello, I have been trying to integrate Katta with Solr sadly, without success. I am using the information from JIRA issue 1395 as a guide. However, this information seems out of date and incomplete. So far, I have attempted to integrate Katta with both solr trunk and branch-1.4. I am unable to get the patches applied completely and am totally unable to compile solr once the patches are applied. Could someone provide some tips or, an up to date guide on how to do this? Thanks, Jerry Mindek
Re: searching based on grouping result
Thanks What I'd really like to do is to exclude an entire group if a certain field is set to true in any of the documents that make up that group. I can't do it at index time because some of my users have certain documents hidden from them. So they shouldn't see the flag as set, while others would. I can do it in post processing, but that will mess up sorting and pagination. Thanks again steve On Jan 24, 2011, at 11:39 AM, Stefan Matheis wrote: Steve, and what exactly do you expect? You can work on the Group itself with http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters in a limited way, but of course it's just a normal Solr-Result, group by some Values, nothing really special. Can't be referenced in the query - what do you want to do there? Regards Stefan On Mon, Jan 24, 2011 at 4:27 PM, Steve Fuchs st...@aps.org wrote: Hi Thanks for the response. I didn't explain myself well, I am using the field collapsing and things are working as that page describes. I think my problem is that as well as field collapsing works, solr is still just returning a list of documents. There don't seem to be any operations I can do on collapsed groups as a whole. They are more of a display thing that can't be referenced in the query. Same thing with facets? Am I right in this? steve thansk again steve On Jan 22, 2011, at 12:53 AM, Otis Gospodnetic wrote: Steve, Does http://wiki.apache.org/solr/FieldCollapsing do what you need? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Steve Fuchs st...@aps.org To: solr-user@lucene.apache.org Sent: Fri, January 21, 2011 3:05:32 PM Subject: searching based on grouping result Hello All, MY index documents represent a set of papers each with an author id and the id of the referee that reviewed the paper. I also end up with a field in each document that tells me whether the referee still has the paper, but has not graded it. THis can be a boolean. In my final result I want to collapse the result by referee number and omit any referee that has this boolean true, it doesn't matter how many documents they have with the field set to false. Is there a way to set my query to honor the results of the grouping (or of a facet?) as in q: -referee_number.open_flag:* ? Thanks in advance. steve
Weird behaviour with phrase queries
Hi, I have a problem with phrase queries, from times to times I do not get any result where as I know I should get returned something. The search is run against a field of type text which definition is available at the following URL : - http://pastebin.com/Ncem7M8z This field is defined with the following configuration: field name=meta_text type=textindexed=true stored=true multiValued=true termVectors=true/ I use the following request handler: requestHandler name=custom class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qfmeta_text/str str name=pfmeta_text/str str name=bf/ str name=mm1lt;1 2lt;-1 5lt;-2 7lt;60%/str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler Depending on the kind of phrase query I use I get either exactly what I am looking for or nothing. Index' contents is all french so I thought about a possible problem with accents but I got queries working with phrase queries containing é and è chars like académie or ingénieur. As you will see the filter used in the text type uses the SnowballPorterFilterFactory for the english language, I plan to fix that by using the correct language for the index (French) and the following protwords http://bit.ly/i8JeX6 . But except this mistake with the stemmer, did I do something (else) wrong ? Did I overlook something ? What could explain I do not always get results for my phrase queries ? Thanks in advance for your feedback. Best Regards, -- Jérôme
Re: Multicore Relaod Theoretical Question
Thanks Alexander, what a valuable ressource :). - Em -- View this message in context: http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2321335.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy in SOLR
Thank you for the advice, Erick! I will take a look at extending the StandardRequestHandler for such usecases. Erick Erickson wrote: I wasn't thinking about this for adding information to the *request*. Rather, in this case the autocomplete uses an Ajax call that just uses the TermsComponent to get the autocomplete data and display it. This is just textual, so adding it to the request is client-side magic. If you want your app to have access to the meta-data for other purposes, you'd just query and cache it from the app. You could use that to build up the links you embed in the page for new queries if you chose, no custom handlers necessary. Otherwise, I guess you'd create a custom request handler, that seems like a reasonable place. Best Erick On Mon, Jan 24, 2011 at 11:03 AM, Em mailformailingli...@yahoo.de wrote: Hi Erick, in some usecases I really think that your suggestion with some unique-documents for meta-information is a good approach to solve some issues. However there is a hurdle for me and maybe you can help me to clear it: What is the best way to get such meta-data? I see three possible approaches: 1st: get it in another request 2nd: get it with a requestHandler 3rd: get it with a searchComponent I think the 2nd and 3rd are the cleanest ways. But to make a decision between them I run into two problems: RequestHandler: Should I extend the StandardRequestHandler to do what I need? If so, I could just query my index for the needed information and add it to the request before I pass it up the SearchComponents. SearchComponent: The problem with the SearchComponent is the distributed thing and how to test it. However, if this would be the cleanest way to go, one should go it. What would you do, if you want to add some meta-information to your request that was not given by the user? Regards, Em Erick Erickson wrote: First, the redundancy is certainly there, but that's what Solr does, handles large amounts of data. 4 million documents is actually a pretty small corpus by Solr standards, so you may well be able to do exactly what you propose with acceptable performance/size. I'd advise just trying it with, say, 200,000 docs. Why 200K? because index growth is non-linear with the first bunch of documents taking up more space than the second. So index 100K, examine your indexes and index 100K more. Now use the delta to extrapolate to 4M. You don't need to store the taxonomy in each doc for auto-complete, you can get your auto-completion from a different index. Or you can index your taxonomies in a special document in Solr and query the (unique) field in that document for autocomplete. For faceting, you do need taxonomies. But remember that the nature of the inverted index is that unique terms are only stored once, and the document ID for each document that that term appears in is recorded. So if you have 3/europe/germany/berlin stored in 1M documents, your index space is really string length + overhead + space for 1M ids. Best Erick On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine dfonta...@rosebud.frwrote: Yes, i am not obliged to store taxonomies. My taxonomies are type of english_taxon_label = Berlin english_taxon_type = location english_taxon_hierarchy = 0/world 1/world/europe 2/world/europe/germany 3/world/europe/germany/berlin I need *_taxon_hierarchy to faceting and label to auto complete. With a RDBMs, i have 100 entry max for one taxo, but with solr and 4 million documents the redundandcy is huge, no ? And i have 10 different taxonomies per document Damien Le 24/01/2011 10:30, Em a écrit : Hi Damien, why are you storing the taxonomies? When it comes to faceting, it only depends on indexed values. If there is a meaningful difference between the indexed and the stored value, I would prefer to use an RDBMs or something like that to reduce redundancy. Does this help? Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2320666.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2321340.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Weird behaviour with phrase queries
Hi Jerome, does your fieldtype contains a stopword-filter? Probably this could be the root of all evil :-). Could you provide us the fieldtype definition and the explain-content of an example-query? Did you check the analysis.jsp to have a look at the produced results? Regards, Em Jerome Renard wrote: Hi, I have a problem with phrase queries, from times to times I do not get any result where as I know I should get returned something. The search is run against a field of type text which definition is available at the following URL : - http://pastebin.com/Ncem7M8z This field is defined with the following configuration: field name=meta_text type=textindexed=true stored=true multiValued=true termVectors=true/ I use the following request handler: requestHandler name=custom class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qfmeta_text/str str name=pfmeta_text/str str name=bf/ str name=mm1lt;1 2lt;-1 5lt;-2 7lt;60%/str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler Depending on the kind of phrase query I use I get either exactly what I am looking for or nothing. Index' contents is all french so I thought about a possible problem with accents but I got queries working with phrase queries containing é and è chars like académie or ingénieur. As you will see the filter used in the text type uses the SnowballPorterFilterFactory for the english language, I plan to fix that by using the correct language for the index (French) and the following protwords http://bit.ly/i8JeX6 . But except this mistake with the stemmer, did I do something (else) wrong ? Did I overlook something ? What could explain I do not always get results for my phrase queries ? Thanks in advance for your feedback. Best Regards, -- Jérôme -- View this message in context: http://lucene.472066.n3.nabble.com/Weird-behaviour-with-phrase-queries-tp2321241p2321362.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Taxonomy in SOLR
There aren't any great general purpose out of the box ways to handle hieararchical data in Solr. Solr isn't an rdbms. There may be some particular advice on how to set up a particular Solr index to answer particular questions with regard to hieararchical data. I saw a great point made recently comparing rdbms to NoSQL stores, which applied to Solr too even though Solr is NOT a noSQL store. In rdbms, you set up your schema thinking only about your _data_, and modelling your data as flexibly as possible. Then once you've done that, you can ask pretty much any well-specified question you want of your data, and get a correct and reasonably performant answer. In Solr, on the other hand, we set up our schemas to answer particular questions. You have to first figure out what kinds of questions you will want to ask Solr, what kinds of queries you'll want to make, and then you can figure out how to structure your data to ask those questions. Some questions are actually very hard to set up Solr to answer -- in general Solr is about setting up your data so whatever question you have can be reduced to asking is token X in field Y. This can be especially tricky in cases where you want to use a single Solr index to answer multiple questions, where the questions are such that you really need to set up your data _differently_ to get Solr to optimally answer each question. Solr is not a general purpose store like an rdbms, where you can set up your schema once in terms of your data and use it to answer nearly any conceivable well-specified question after that. Instead, Solr does things that rdbms can't do quickly or can't do at all. But you lose some things too. On 1/24/2011 3:03 AM, Damien Fontaine wrote: Hi, I am trying Solr and i have one question. In the schema that i set up, there are 10 fields with always same data(hierarchical taxonomies) but with 4 million documents, space disk and indexing time must be big. I need this field for auto complete. Is there another way to do this type of operation ? Damien
Re: Weird behaviour with phrase queries
Try submitting your query from the admin page with debugQuery=on and see if that helps. The output is pretty dense, so feel free to cut-paste the results for help. Your stemmers have English as the language, which could also be interesting. As Em says, the analysis page may help here, but I'd start by taking out WordDelimiterFilterFactory, SnowballPorterFilterFactory and StopFilterFactory and build back up if you really need them. Although, again, the analysis page that's accessible from the admin page may help greatly (check debug in both index and query). Oh, and you MUST re-index after changing your schema to have a true test. Best Erick On Mon, Jan 24, 2011 at 12:31 PM, Jerome Renard jerome.ren...@gmail.comwrote: Hi, I have a problem with phrase queries, from times to times I do not get any result where as I know I should get returned something. The search is run against a field of type text which definition is available at the following URL : - http://pastebin.com/Ncem7M8z This field is defined with the following configuration: field name=meta_text type=textindexed=true stored=true multiValued=true termVectors=true/ I use the following request handler: requestHandler name=custom class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qfmeta_text/str str name=pfmeta_text/str str name=bf/ str name=mm1lt;1 2lt;-1 5lt;-2 7lt;60%/str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler Depending on the kind of phrase query I use I get either exactly what I am looking for or nothing. Index' contents is all french so I thought about a possible problem with accents but I got queries working with phrase queries containing é and è chars like académie or ingénieur. As you will see the filter used in the text type uses the SnowballPorterFilterFactory for the english language, I plan to fix that by using the correct language for the index (French) and the following protwords http://bit.ly/i8JeX6 . But except this mistake with the stemmer, did I do something (else) wrong ? Did I overlook something ? What could explain I do not always get results for my phrase queries ? Thanks in advance for your feedback. Best Regards, -- Jérôme
MySQL + DIH + SpatialSearch
I had difficulties getting this to work, so hopefully this will help others having the same issue. My environment: Solr 3.1 MySQL 5.0.77 Schema: fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ field name=latlng type=location indexed=true stored=true/ dynamicField name=*_coordinate type=tdouble indexed=true stored=false/ DIH data-config: dataSource driver=java.lang.String url=jdbc:mysql://xxx.xxx.xxx.xxx/db1 user=user password=secret readOnly=true batchSize=-1/ entity name=practice pk=id query=select id, name, concat_ws(',', lat, lng) as latlng from practice /entity I kept getting build errors similar to this: org.apache.solr.common.SolrException: org.apache.lucene.spatial.tier.InvalidGeoException: incompatible dimension (2) and values ([B@2964a05d). Only 0 values specified at org.apache.solr.schema.PointType.createFields(PointType.java:77) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:199) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProc essorFactory.java:60) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHand ler.java:291) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 625) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:265 ) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:184) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja va:335) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393 ) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:374) Caused by: org.apache.lucene.spatial.tier.InvalidGeoException: incompatible dimension (2) and values ([B@2964a05d). Only 0 values specified at org.apache.lucene.spatial.DistanceUtils.parsePoint(DistanceUtils.java:376) at org.apache.solr.schema.PointType.createFields(PointType.java:75) This would happen regardless of whether I used PointType, LatLonType, or GeoHashField. So I thought maybe I should pay attention to what the error says - incompatible dimension (2) and values ([B@2964a05d). Only 0 values specified. Looking at the code, this revealed that it's trying to parse B@2964a05d into a spatial field. So my DIH was getting bad values apparently, there's a bug in MySQL 5.0: http://bugs.mysql.com/bug.php?id=12030 where concat changes the character set to binary. To solve this, you can either: * upgrade to MySQL 5.5 (according to the bug page, it was fixed in 5.5, but I haven't tested it). * Or you can typecast before you concat: * entity name=practice pk=id query=select id, name, concat_ws(',', cast(lat as char), cast(lng as char)) as latlng from practice /entity Eric
Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene
We have two slaves replicating off one master every 2 minutes. Both using the CMS + ParNew Garbage collector. Specifically -server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing but periodically they both get into a GC storm and just keel over. Looking through the GC logs the amount of memory reclaimed in each GC run gets less and less until we get a concurrent mode failure and then Solr effectively dies. Is it possible there's a memory leak? I note that later versions of Lucene have fixed a few leaks. Our current versions are relatively old Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42 Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55 so I'm wondering if upgrading to later version of Lucene might help (of course it might not but I'm trying to investigate all options at this point). If so what's the best way to go about this? Can I just grab the Lucene jars and drop them somewhere (or unpack and then repack the solr war file?). Or should I use a nightly solr 1.4? Or am I barking up completely the wrong tree? I'm trawling through heap logs and gc logs at the moment trying to to see what other tuning I can do but any other hints, tips, tricks or cluebats gratefully received. Even if it's just Yeah, we had that problem and we added more slaves and periodically restarted them thanks, Simon
Re: Faceting Question
Hmm, thanks for the response. I'll play around with it and see if that helps. -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-Question-tp2320542p2321887.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DIH serialize
Hi Stefan, yes, this is exactly what I intend - I don't want to search in this field - just quicly return me the result in a serialized form (the search criteria is on other fields). Well, if I could serialize the data exactly as like the PHP serialize() does I would be maximally satisfied, but any other form in which I could compact the data easily into one field I would be pleased. Can anyone help me? I guess the script is quite a good way, but I don't know which function should I use there to compact the data to be easily usable in PHP. Or any other method? thanks, Rich -Original Message- From: Stefan Matheis [mailto:matheis.ste...@googlemail.com] Sent: Monday, January 24, 2011 18:23 To: solr-user@lucene.apache.org Subject: Re: DIH serialize Hi Rich, i'm a bit confused after reading your post .. what exactly you wanna try to achieve? Serializing (like http://php.net/serialize) your complete row into one field? Don't wanna search in them, just store and deliver them in your results? Does that make sense? Sounds a bit strange :) Regards Stefan On Mon, Jan 24, 2011 at 10:03 AM, Papp Richard ccode...@gmail.com wrote: Hi Dennis, thank you for your answer, but didn't understand why you say it doesn't need serialization. I'm with the option C. but the main question is, how to put into one field a result of many fields: SELECT * FROM. thanks, Rich -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Monday, January 24, 2011 02:07 To: solr-user@lucene.apache.org Subject: Re: DIH serialize Depends on your process chain to the eventual viewer/consumer of the data. The questions to ask are: A/ Is the data IN Solr going to be viewed or processed in its original form: --set stored = 'true' ---no serialization needed. B/ If it's going to be anayzed and searched for separate from any other field, the analyzing will put it into an unreadable form. If you need to see it, then ---set indexed=true and stored=true ---no serializaton needed. C/ If it's NOT going to be viewed AS IS, and it's not going to be searched for AS IS, (i.e. other columns will be how the data is found), and you have another, serialzable format: --set indexed=false and stored=true --serialize AS PER THE INTENDED APPLICATION, not sure that Solr can do that at all. C/ If it's NOT going to be viewed AS IS, and it's not going to be searched for AS IS, (i.e. other columns will be how the data is found), and you have another, serialzable format: --set indexed=false and stored=true --serialize AS PER THE INTENDED APPLICATION, not sure that Solr can do that at all. D/ If it's NOT going to be viewed AS IS, BUT it's going to be searched for AS IS, (this column will be how the data is found), and you have another, serialzable format: --you need to put it into TWO columns --A SERIALIZED FIELD --set indexed=false and stored=true --AN UNSERIALIZED FIELD --set indexed=false and stored=true --serialize AS PER THE INTENDED APPLICATION, not sure that Solr can do that at all. Hope that helps! Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others' mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Papp Richard ccode...@gmail.com To: solr-user@lucene.apache.org Sent: Sun, January 23, 2011 2:02:05 PM Subject: DIH serialize Hi all, I wasted the last few hours trying to serialize some column values (from mysql) into a Solr column, but I just can't find such a function. I'll use the value in PHP - I don't know if it is possible to serialize in PHP style at all. This is what I tried and works with a given factor: in schema.xml: field name=main_timetable type=text indexed=false stored=true multiValued=true / in DIH xml: dataConfig script![CDATA[ function my_serialize(row) { row.put('main_timetable', row.toString()); return row; } ]]/script . entity name=main_timetable query= SELECT * FROM shop_time_table stt WHERE stt.shop_id = '${shop.id }'; transformer=script:my_serialize . Can I use java directly in script (script language=Java) ? How could I achieve this? Or any other idea? I need these values together (from a row) and I need then in PHP to handle the result easily. thanks, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5740 (20101228) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information
Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene
Hi Simon, I got no experiences with a distributed environment. However, what you are talking about reminds me on another post on the mailing list. Could it be possible that your slaves not finished their replicating until the new replication-process starts? If so, there you got the OOM :). Just a thought, perhaps it helps. Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/Possible-Memory-Leaks-Upgrading-to-a-Later-Version-of-Solr-or-Lucene-tp2321777p2321959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH serialize
UNSUBSCRIBE On 1/23/11, Papp Richard ccode...@gmail.com wrote: Hi all, I wasted the last few hours trying to serialize some column values (from mysql) into a Solr column, but I just can't find such a function. I'll use the value in PHP - I don't know if it is possible to serialize in PHP style at all. This is what I tried and works with a given factor: in schema.xml: field name=main_timetable type=text indexed=false stored=true multiValued=true / in DIH xml: dataConfig script![CDATA[ function my_serialize(row) { row.put('main_timetable', row.toString()); return row; } ]]/script . entity name=main_timetable query= SELECT * FROM shop_time_table stt WHERE stt.shop_id = '${shop.id}'; transformer=script:my_serialize . Can I use java directly in script (script language=Java) ? How could I achieve this? Or any other idea? I need these values together (from a row) and I need then in PHP to handle the result easily. thanks, Rich
Re: Getting started with writing parser
On Mon, Jan 24, 2011 at 2:28 PM, Dinesh mdineshkuma...@karunya.edu.in wrote: my solrconfig.xml http://pastebin.com/XDg0L4di my schema.xml http://pastebin.com/3Vqvr3C0 my try.xml http://pastebin.com/YWsB37ZW [...] OK, thanks for the above. You also need to: * Give us a sample of your log files (for crying out loud, this has got to be the fifth time that I have asked you for this). * Tell us what happens when you run with the above configuration. From a cursory look at try.xml, you have not really understood how it works, or how to configure it for your needs. Regards, Gora
Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene
On Mon, Jan 24, 2011 at 08:00:53PM +0100, Markus Jelsma said: Are you using 3rd-party plugins? No third party plugins - this is actually pretty much stock tomcat6 + solr from Ubuntu. The only difference is that we've adapted the directory layout to fit in with our house style
Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene
On Mon, Jan 24, 2011 at 10:55:59AM -0800, Em said: Could it be possible that your slaves not finished their replicating until the new replication-process starts? If so, there you got the OOM :). This was one of my thoughts as well - we're currently running a slave which has no queries in it just to see if that exhibits similar behaviour. My reasoning against it is that we're not seeing any PERFORMANCE WARNING: Overlapping onDeckSearchers=x in the logs which is something I'd expect to see. 2 minutes doesn't seem like an unreasonable period of time either - the docs at http://wiki.apache.org/solr/SolrReplication suggest 20 seconds.
Re: Highlighting with/without Term Vectors
Just to add one thing, in case it makes a difference. Max document size on which highlighting needs to be done is few hundred kb's (in file system). In index its compressed so should be much smaller. Total documents are more than 100 million. On Tue, Jan 25, 2011 at 12:42 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Hi, Does anyone have any benchmarks how much highlighting speeds up with Term Vectors (compared to without it)? e.g. if highlighting on 20 documents take 1 sec with Term Vectors any idea how long it will take without them? I need to know since the index used for highlighting has a TVF file of around 450GB (approx 65% of total index size) so I am trying to see whether the decreasing the index size by dropping TVF would be more helpful for performance (less RAM, should be good for I/O too I guess) or keeping it is still better? I know the best way is try it out but indexing takes a very long time so trying to see whether its even worthy or not. -- Regards, Salman Akram -- Regards, Salman Akram
Re: please help Problem with dataImportHandler
: this is the error that i'm getting.. no idea of what is it.. Did you follow the instructions in the error message and look at your solr log file to see what the severe errors in solr configuration might be? : SimplePostTool: FATAL: Solr returned an error: : Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_information_on_what_may_be_wrong ... -Hoss
Re: No system property or default value specified for...
: I'm trying to dynamically add a core to a multi core system using the : following command: : : http://localhost:8983/solr/admin/cores?action=CREATEname=itemsinstanceDir=itemsconfig=data-config.xmlschema=schema.xmldataDir=datapersist=true : : the data-config.xml looks like this: : : dataConfig I think you are using the config param incorrectly -- it should be the solrconfig.xml file you want to use (assuming you don't want the one found in the conf directory of your instanceDir) that's the reason you are getting errors about needing to specify system props or default values for all those variables, because if that file was a solrconfig.xml file they must be specified before the SolrCore can be initialized -- but for a DIH data configs that's not neccessary. -Hoss
Re: searching based on grouping result
: Subject: searching based on grouping result : In-Reply-To: 913367.31366...@web121705.mail.ne1.yahoo.com : References: 913367.31366...@web121705.mail.ne1.yahoo.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss
Re: Weird behaviour with phrase queries
Hmmm, I don't see any screen shots. Several things: 1 If your stopword file has comments, I'm not sure what the effect would be. 2 Something's not right here, or I'm being fooled again. Your withresults xml has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:ecol d ingenieur)~0.01) ()/str and your noresults has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:academi charpenti)~0.01) DisjunctionMaxQuery((meta_text:academi charpenti~100)~0.01)/str the empty () in the first one often means you're NOT going to your configured dismax parser in solrconfig.xml. Yet that doesn't square with your custom qt, so I'm puzzled. Could we see your raw query string on the way in? It's almost as if you defined qt in one and defType in the other, which are not equivalent. 3 It may take 12 hours to index, but you could experiment with a smaller subset. You say you know that the noresults one should return documents, what proof do you have? If there's a single document that you know should match this, just index it and a few others and you should be able to make many runs until you get to the bottom of this... And obviously your stemming is happening on the query, are you sure it's happening at index time too? Best Erick On Mon, Jan 24, 2011 at 1:51 PM, Jerome Renard jerome.ren...@gmail.comwrote: Hi Em, Erick thanks for your feedback. Em : yes Here is the stopwords.txt I use : - http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt On Mon, Jan 24, 2011 at 6:58 PM, Erick Erickson erickerick...@gmail.comwrote: Try submitting your query from the admin page with debugQuery=on and see if that helps. The output is pretty dense, so feel free to cut-paste the results for help. Your stemmers have English as the language, which could also be interesting. Yes, I noticed that this will be fixed. As Em says, the analysis page may help here, but I'd start by taking out WordDelimiterFilterFactory, SnowballPorterFilterFactory and StopFilterFactory and build back up if you really need them. Although, again, the analysis page that's accessible from the admin page may help greatly (check debug in both index and query). You will find attached two xml files one with no results (noresult.xml.gz) and one with a lot of results (withresults.xml.gz). You will also find attached two screenshots showing there is a highlighted section in the Index analyzer section when analysing text. Oh, and you MUST re-index after changing your schema to have a true test. Yes, the problem is that reindexing takes around 12 hours which makes it really hard for testing :/ Thanks in advance for your feedback. Best Regards, -- Jérôme
Re: Solr with Unknown Lucene Index?
: Having found some code that searches a Lucene index, the only analyzers : referenced are Lucene.Net.Analysis.Standard.StandardAnalyzer. : : How can I map this is Solr? The example schema doesn't seem to mention this, : and specifying 'text' or 'string' for every field doesn't seem to help. 1) that analyzer seems to be a Lucene.Net analyzer, so the java equivilent would be org.apache.lucene.analsys.standard.StandardAnalyzer 2) the example schema.xml demonstrates how to use an existing Analyzer implementation... !-- One can also specify an existing Analyzer class that has a default constructor via the class attribute on the analyzer element fieldType name=text_greek class=solr.TextField analyzer class=org.apache.lucene.analysis.el.GreekAnalyzer/ /fieldType -- 3) i'm getting the sense from your comments that you aren't very familiar with lucene/solr in general. An important thing to understand is that just because the code that created the index only ever uses StandardAnalyzer doens't mean it will make sense to use that analyzer on every field when attempting to search that field from solr -- some fields may have been indexed w/o using any analysis, some may be numeric fields with special encoding, some may be compressed, etc... trying to reverse engineer what the schema should look like to open any arbitrary index requires a lot of understanding about how that index was built -- it's easy to just dump the terms found in an index w/o knowing anything about where those terms came fom (that's what Luke does) but that doens't help your recognize things like this list of X words were treated as stop words, and don't appera in the index, so my query analyzer needs to be configured with those same X words In short: you can eaisly make solr *read* the index (just like luke) but that won't neccessarily help you *use* the index in a meaninigful way. -Hoss
Re: Specifying an AnalyzerFactory in the schema
: I notice that in the schema, it is only possible to specify a Analyzer class, : but not a Factory class as for the other elements (Tokenizer, Fitler, etc.). : This limits the use of this feature, as it is impossible to specify parameters : for the Analyzer. : I have looked at the IndexSchema implementation, and I think this requires a : simple fix. Do I open an issue about it ? Support for constructing Analyzers directly is very crude, and primarily existed for making it easy for people with old indexes and analyzers to keep working. moving foward, Lucene/Solr eventtually won't ship concret Analyzers implementations at all (at least, that's the last concensus i remember) so enhancing support for loading Analyzers (or AnalyzerFactories) doesn't make much sense. Practically speaking, if you have an existing Analyzer that you want to use in Solr, instead of writting an AnalyzerFactory for it, you could just write a TokenizerFactory that wraps it instead -- functinally that would let you achieve everything ana AnalyzerFactory would, except that Solr would already handle letting the schema.xml specify the positionIncrementGap (which you could happily ignore if you wanted) -Hoss
Solr set up issues with Magento
Hello Team: I am in the process of setting up Solr 1.4 with Magento ENterprise Edition 1.9. When I try to index the products I get the following error message. Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor fini sh INFO: {} 0 0 Jan 24, 2011 3:30:14 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'in_stock' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.jav a:289) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpd ateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co ntentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter .java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte r.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV alve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextV alve.java:161) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j ava:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j ava:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 550) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal ve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav a:380) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java :243) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce ss(Http11Protocol.java:188) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce ss(Http11Protocol.java:166) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoin t.java:288) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:908) at java.lang.Thread.run(Thread.java:662) Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={wt=json} status=400 QTime=0 Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor fini sh INFO: {rollback=} 0 16 Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute I am a new to both Magento and SOlr. I could have done some thing stupid during installation. I really look forward for your help. Thank you, Sandhya -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-set-up-issues-with-Magento-tp2323858p2323858.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr set up issues with Magento
Hi, You haven't defined the field in Solr's schema.xml configuration so it needs to be added first. Perhaps following the tutorial might be a good idea. http://lucene.apache.org/solr/tutorial.html Cheers. Hello Team: I am in the process of setting up Solr 1.4 with Magento ENterprise Edition 1.9. When I try to index the products I get the following error message. Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor fini sh INFO: {} 0 0 Jan 24, 2011 3:30:14 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'in_stock' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.jav a:289) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpd ateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co ntentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter .java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte r.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl icationFilterChain.java:244) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF ilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV alve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextV alve.java:161) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j ava:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j ava:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 550) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal ve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav a:380) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java :243) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce ss(Http11Protocol.java:188) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce ss(Http11Protocol.java:166) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoin t.java:288) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:908) at java.lang.Thread.run(Thread.java:662) Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={wt=json} status=400 QTime=0 Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jan 24, 2011 3:30:14 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback Jan 24, 2011 3:30:14 PM org.apache.solr.update.processor.LogUpdateProcessor fini sh INFO: {rollback=} 0 16 Jan 24, 2011 3:30:14 PM org.apache.solr.core.SolrCore execute I am a new to both Magento and SOlr. I could have done some thing stupid during installation. I really look forward for your help. Thank you, Sandhya
Re: Stemming for Finnish language
: I tried following in my schema.xml, but I got : org.apache.solr.common.SolrException: Error loading class : 'solr.FinnishLightStemFilterFactory' FinnishLightStemFilterFactory is a class that exists in SVN on the 3x and trunk branches, but does not exist in the Solr 1.4.1 release (it was added later) if you are trying ot use Solr 1.4.1, this won't work, if you are getting this error using a 3x or trunk development version, please elaborate on how you are installing/running Solr -Hoss
synonyms file, and example cases
Hello, I have been looking at the solr synonym file that was an example, I did not understand some notation: aaa = bbb = 1 2 ccc = 1,2 a\=a = b\=b a\,a = b\,b fooaaa,baraaa,bazaaa The first one says search for when query is aaa. am I correct? the second one finds 1 2 when query is bbb the third one is find 1 or 2 when query is ccc the fourth, and fifth one I have not understood. the last one, i assume is a group, bidirectional mapping between fooaaa,baraaa,bazaaa I am especially interested with this last one, if I do aaa,bbb it will find aaa and bbb when either aaa or bbb is queryied? am I correct in those assumptions? Best regards, C.B.
Re: How call I make one request for all cores and get response classified by cores
: I have a group of subindex, each of which is a core in my solr now. I want : to make one query for some of them, how can I do that? And classify response : doc by index, using facet search? some background: multi core is when you have multiple solr cores on one solr instance; each core can have different configs. distributed search is when you execute a search on a core and specify in the query a list of other cores on other solr instances to treat as shards and aggregate the results from all of them; each shard must have identicle schemas. That said: you can to a distributed search, across a bunch of shards that are all on the same solr instance. if you index a constant value in each one identifying which sub-indx it comes from, you should have what you're looking for. -Hoss
Re: Adding weightage to the facets count
: prod1 has tag called “Light Weight” with weightage 20, : prod2 has tag called “Light Weight” with weightage 100, : : If i get facet for “Light Weight” , i will get Light Weight (2) , : here i need to consider the weightage in to account, and the result will be : Light Weight (120) : : How can we achieve this?Any ideas are really helpful. It's not really possible with Solr out of the box. Faceting is fast and efficient in Solr because it's all done using set intersections (and most of the sets can be kept in ram very compactly and reused). For what you are describing you'd need to no only assocaited a weighted payload with every TermPosition, but also factor that weight in when doing the faceting, which means efficient set operations are now out the window. If you know java it would be probably be possible to write a custom SolrPlugin (a SearchComponent) to do this type of faceting in special cases (assuming you indexed in a particular way) but i'm not sure off hte top of my head how well it would scale -- the basic algo i'm thinking of is (after indexing each facet term wit ha weight payload) to iterate over the DocSet of all matching documents in parallel with an iteration over a TermPositions, skipping ahead to only the docs that match the query, and recording the sum of the payloads for each term. Hmmm... except TermPositions iterates over term, doc, freq, position tuples, so you would have to iterate over every term, and for every term then loop over all matching docs ... like i said, not sure how efficient it would wind up being. You might be happier all arround if you just do some sampling -- store the tag+weight pairs so thta htey cna be retireved with each doc, and then when you get your top facet constraints back, look at the first page of results, and figure out what the sun weight is for each of those constraints based solely on the page#1 results. i've had happy users using a similar appraoch in the past. -Hoss
Re: Getting started with writing parser
http://pastebin.com/CkxrEh6h this is my sample log - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2326646.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: please help Problem with dataImportHandler
http://pastebin.com/tjCs5dHm this is the log produced by the solr server - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2326659.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr suggester and spell checker
Hi, I am using the default example in the latest stable build (apache-solr-4.0-2011-01-23_11-24-01). I read the wiki on http://wiki.apache.org/solr/Suggester and my expectation is that suggester would correct terms in addition to completing terms. The handler for suggest is configured with spellcheck as true. requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=spellchecktrue/str .. /requestHandler However, the query http://localhost:8983/solr/suggest?q=belkn%20enc returns str name='collation'belkn encoded/str (belkn is not corrected to belkin). The spellchecker component corrects belkn to belkin though. http://localhost:8983/solr/spell?q=belkn%20encodedspellcheck=truespellcheck.collate=truespellcheck.build=true str name='collation'belkin encoded/str Would really appreciate any input on how suggester can correct as well as complete terms in the input. Thanks Madhu -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-suggester-and-spell-checker-tp2326907p2326907.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting started with writing parser
i don't even know whether the regex expression that i'm using for my log is correct or no.. i very much worried i couldn't proceed in my project already 1/3 rd of the timing is over.. please help.. this is just the first stage.. after this i have ti setup up all the log to be redirected to SYSLOG and from there i'll send it to SOLR server.. then i have to analyse all the data's that i obtained from DNS, DHCP, WIFI, SWITCES.. and i have to prepare a user based report on his actions.. please help me cause the day's i have keeps reducing.. my project leader is questioning me a lot.. pls.. - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2326917.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Questions for MultiCore Setup
Hi, just wanted to push this topic again. Thank you! Em wrote: By the way: although I am asking for SolrCloud explicitly again, I will take your advice and try distributed search first to understand the concept better. Regards Em wrote: Hi Lance, thanks for your explanation. As far as I know in distributed search i have to tell Solr what other shards it has to query. So, if I want to query a specific core, present in all my shards, i could tell Solr this by using the shards-param plus specified core on each shard. Using SolrCloud's distrib=true feature (it sets all the known shards automatically?), a collection should consist only of one type of core-schema, correct? How does SolrCloud knows that shard_x and shard_y are replicas of eachother (I took a look at the possibility to specify alternative shards if one is not available)? If it does not know that they are replicas of eachother, I should use the syntax of specifying alternative shards for failover due to performance-reasons, because querying 2 identical and available cores seems to be wasted capacity, no? Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2327089.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: old index files not deleted on slave
Interestingly that worked. I deleted the slave index and restarted. After the first replication I shut down the server, deleted the lock file and started it again. It seems to be behaving itself now even though a lock file seems to be recreated. Thanks a lot for the help. This still seems like a bug though? I don't have any writers open on the slaves, in fact one slave is only doing replication right now (no reads) to try to isolate the problem. On Sat, Jan 22, 2011 at 7:34 PM, Alexander Kanarsky kanarsky2...@gmail.com wrote: I see the file -rw-rw-r-- 1 feeddo feeddo 0 Dec 15 01:19 lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock was created on Dec. 15. At the end of the replication, as far as I remember, the SnapPuller tries to open the writer to ensure the old files are deleted, and in your case it cannot obtain a lock on the index folder on Dec 16, 17,18. Can you reproduce the problem if you delete the lock file, restart the slave and try replication again? Do you have any other Writer(s) open for this folder outside of this core? -Alexander On Sat, Jan 22, 2011 at 3:52 PM, feedly team feedly...@gmail.com wrote: The file system checked out, I also tried creating a slave on a different machine and could reproduce the issue. I logged SOLR-2329. On Sat, Dec 18, 2010 at 8:01 PM, Lance Norskog goks...@gmail.com wrote: This could be a quirk of the native locking feature. What's the file system? Can you fsck it? If this error keeps happening, please file this. It should not happen. Add the text above and also your solrconfigs if you can. One thing you could try is to change from the native locking policy to the simple locking policy - but only on the child. On Sat, Dec 18, 2010 at 4:44 PM, feedly team feedly...@gmail.com wrote: I have set up index replication (triggered on optimize). The problem I am having is the old index files are not being deleted on the slave. After each replication, I can see the old files still hanging around as well as the files that have just been pulled. This causes the data directory size to increase by the index size every replication until the disk fills up. Checking the logs, I see the following error: SEVERE: SnapPull failed org.apache.solr.common.SolrException: Index fetch failed : at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:265) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/solrhome/data/index/lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1065) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:954) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:192) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:99) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173) at org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376) at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:471) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319) ... 11 more lsof reveals that the file is still opened from the java process. I am running 4.0 rev 993367 with patch SOLR-1316. Otherwise, the setup is pretty vanilla. The OS is linux, the indexes are on local directories, write permissions look ok, nothing unusual in the config (default deletion policy, etc.). Contents of the index data dir: master: -rw-rw-r-- 1 feeddo feeddo 191 Dec 14 01:06 _1lg.fnm -rw-rw-r-- 1 feeddo feeddo 26M Dec 14 01:07 _1lg.fdx -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 14 01:07 _1lg.fdt -rw-rw-r-- 1 feeddo feeddo 474M Dec 14 01:12 _1lg.tis -rw-rw-r-- 1 feeddo feeddo 15M Dec 14 01:12
Re: Weird behaviour with phrase queries
Erick, On Mon, Jan 24, 2011 at 9:57 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I don't see any screen shots. Several things: 1 If your stopword file has comments, I'm not sure what the effect would be. Ha, I thought comments were supported in stopwords.txt 2 Something's not right here, or I'm being fooled again. Your withresults xml has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:ecol d ingenieur)~0.01) ()/str and your noresults has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:academi charpenti)~0.01) DisjunctionMaxQuery((meta_text:academi charpenti~100)~0.01)/str the empty () in the first one often means you're NOT going to your configured dismax parser in solrconfig.xml. Yet that doesn't square with your custom qt, so I'm puzzled. Could we see your raw query string on the way in? It's almost as if you defined qt in one and defType in the other, which are not equivalent. You are right I fixed this problem (my bad). 3 It may take 12 hours to index, but you could experiment with a smaller subset. You say you know that the noresults one should return documents, what proof do you have? If there's a single document that you know should match this, just index it and a few others and you should be able to make many runs until you get to the bottom of this... I could but I always thought I had to fully re-index after updating schema.xml. If I update only few documents will that take the changes into account without breaking the rest ? And obviously your stemming is happening on the query, are you sure it's happening at index time too? Since you did not get the screenshots you will find attached the full output of the analysis for a phrase that works and for another that does not. Thanks for your support Best Regards, -- Jérôme analysis-noresults.html.gz Description: GNU Zip compressed data analysis-withresults.html.gz Description: GNU Zip compressed data